unitpackage.entry
A Data Package describing tabulated data for which the units of the column names (pandas) or fields (frictionless) are known and the resource has additional metadata describing the underlying data.
A description of such datapackags can be found in the documentation in Unitpackage Structure.
Datapackages are the individual elements of a Collection
and
are denoted as entry
.
EXAMPLES:
Metadata included in an entries resource is accessible as an attribute:
>>> entry = Entry.create_examples()[0]
>>> entry.source
{'citation key': 'alves_2011_electrochemistry_6010',
'url': 'https://doi.org/10.1039/C0CP01001D',
'figure': '1a',
'curve': 'solid',
'bibdata': '@article{alves_2011_electrochemistry_6010,...}
The data of the resource can be called as a pandas dataframe:
>>> entry = Entry.create_examples()[0]
>>> entry.df
t E j
0 0.000000 -0.103158 -0.998277
1 0.020000 -0.102158 -0.981762
...
Data Packages containing published data, also contain information on the source of the data.:
>>> from unitpackage.collection import Collection
>>> db = Collection.create_example()
>>> entry = db['alves_2011_electrochemistry_6010_f1a_solid']
>>> entry.bibliography
Entry('article',
fields=[
('title', 'Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases'),
('journal', 'Physical Chemistry Chemical Physics'),
('volume', '13'),
('number', '13'),
('pages', '6010--6021'),
('year', '2011'),
('publisher', 'Royal Society of Chemistry'),
('abstract', 'We investigated ...')],
persons=OrderedCaseInsensitiveDict([('author', [Person('Alves, Otavio B'), Person('Hoster, Harry E'), Person('Behm, Rolf J{\\"u}rgen')])]))
- class unitpackage.entry.Entry(package)
A frictionless data package describing tabulated data.
EXAMPLES:
Entries can be directly created:
>>> from unitpackage.local import collect_datapackage >>> from unitpackage.entry import Entry >>> entry = Entry(collect_datapackage('./examples/no_bibliography/no_bibliography.json')) >>> entry Entry('no_bibliography')
or more simply:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
Entries can also be created by other means such as, a CSV
Entry.from_csv
or a pandas dataframeEntry.from_df
.Normally, entries are obtained by opening a
Collection
of entries:>>> from unitpackage.collection import Collection >>> collection = Collection.create_example() >>> entry = next(iter(collection))
- property bibliography
Return a pybtex bibliography object.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.bibliography Entry('article', fields=[ ('title', ... ... >>> entry_no_bib = Entry.create_examples(name="no_bibliography")[0] >>> entry_no_bib.bibliography ''
- citation(backend='text')
Return a formatted reference for the entry’s bibliography such as:
Doe, et al., Journal Name, volume (YEAR) page, “Title”
Rendering default is plain text ‘text’, but can be changed to any format supported by pybtex, such as markdown ‘md’, ‘latex’ or ‘html’.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.citation(backend='text') 'O. B. Alves et al. Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases. Physical Chemistry Chemical Physics, 13(13):6010–6021, 2011.' >>> print(entry.citation(backend='md')) O\. B\. Alves *et al\.* *Electrochemistry at Ru\(0001\) in a flowing CO\-saturated electrolyte—reactive and inert adlayer phases*\. *Physical Chemistry Chemical Physics*, 13\(13\):6010–6021, 2011\.
- classmethod create_examples(name='')
Return some example entries for use in automated tests.
The examples are created from datapackages in the unitpackage’s examples directory. These are only available from the development environment.
EXAMPLES:
>>> Entry.create_examples() [Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]
An entry without associated BIB file.
>>> Entry.create_examples(name="no_bibliography") [Entry('no_bibliography')]
- property df
Return the data of this entry as a data frame.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
The units and descriptions of the axes in the data frame can be recovered:
>>> entry.package.get_resource('echemdb').schema.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- field_unit(field_name)
Return the unit of the
field_name
of theechemdb
resource.EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.field_unit('E') 'V'
- classmethod from_csv(csvname, metadata=None, fields=None)
Returns an entry constructed from a CSV with a single header line.
EXAMPLES:
Units describing the fields can be provided:
>>> import os >>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}] >>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', fields=fields) >>> entry Entry('from_csv') >>> entry.package {'resources': [{'name': ...
Metadata can be appended:
>>> import os >>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', metadata=metadata, fields=fields) >>> entry.user 'Max Doe'
- classmethod from_df(df, metadata=None, fields=None, outdir=None, *, basename)
Returns an entry constructed from a pandas dataframe.
EXAMPLES:
>>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> entry = Entry.from_df(df=df, basename='test_df') >>> entry Entry('test_df')
Metadata and field descriptions can be added:
>>> import os >>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields) >>> entry.user 'Max Doe'
Save the entry:
>>> entry.save(outdir='./test/generated/from_df')
TESTS
Verify that all fields are properly created even when they are not specified as fields:
>>> import os >>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields) >>> entry.package.get_resource('echemdb').schema.fields [{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]
- classmethod from_local(filename)
Return an entry from a :param filename:
EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
- property identifier
Return a unique identifier for this entry, i.e., its basename.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.identifier 'alves_2011_electrochemistry_6010_f1a_solid'
- plot(x_label=None, y_label=None, name=None)
Return a 2D plot of this entry.
The default plot is constructed from the first two columns of the dataframne.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.plot() Figure(...)
The 2D plot can also be returned with custom axis units available in the resource:
>>> entry.plot(x_label='j', y_label='E') Figure(...)
- rename_fields(field_names, keep_original_name_as=None)
Returns a
Entry
with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as{'t':'t_rel', 'E':'E_we'}
. The original field names can be kept in a new key.EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_examples()[0] >>> renamed_entry = entry.rename_fields({'t': 't_rel'}, keep_original_name_as='originalName') >>> renamed_entry.df t_rel E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ... >>> renamed_entry.package.get_resource('echemdb').schema.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
TESTS:
Provide alternatives for non-existing fields:
>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName') >>> renamed_entry.package.get_resource('echemdb').schema.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- rescale(units)
Returns a rescaled
Entry
with axes in the specifiedunits
. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.EXAMPLES:
The units without any rescaling:
>>> entry = Entry.create_examples()[0] >>> entry.package.get_resource('echemdb').schema.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
A rescaled entry using different units:
>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'}) >>> rescaled_entry.package.get_resource('echemdb').schema.fields [{'name': 't', 'type': 'number', 'unit': 'h'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]
The values in the data frame are scaled to match the new units:
>>> rescaled_entry.df t E j 0 0.000000 -0.103158 -99.827664 1 0.000006 -0.102158 -98.176205 ...
- save(*, outdir, basename=None)
Create a unitpackage, i.e., a CSV file and a JSON file, in the directory
outdir
.EXAMPLES:
The output files are named
identifier.csv
andidentifier.json
using the identifier of the original resource:>>> import os >>> entry = Entry.create_examples()[0] >>> entry.save(outdir='./test/generated') >>> basename = entry.identifier >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
When a
basename
is set, the files are namedbasename.csv
andbasename.json
. Note that for a valid frictionless package this base name MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters’:>>> import os >>> entry = Entry.create_examples()[0] >>> basename = 'save_basename' >>> entry.save(basename=basename, outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
TESTS:
Save entry with metadata containing datetime format, which is not natively supported by JSON.
>>> import os >>> from datetime import datetime >>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> basename = 'save_datetime' >>> entry = Entry.from_df(df=df, basename=basename, metadata={'current time':datetime.now()}) >>> entry.save(outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True