unitpackage.entry
A frictionless tabular Resource describing tabulated data for which the units of the column names (pandas) or fields (frictionless) are known and the resource has additional metadata describing the underlying data.
A description of such resources can be found in the documentation in Unitpackage Structure.
Resources are the individual elements of a Collection
and
are denoted as entry
.
EXAMPLES:
Metadata included in an entry is accessible as an attribute:
>>> entry = Entry.create_examples()[0]
>>> entry.source
{'citation key': 'alves_2011_electrochemistry_6010',
'url': 'https://doi.org/10.1039/C0CP01001D',
'figure': '1a',
'curve': 'solid',
'bibdata': '@article{alves_2011_electrochemistry_6010,...}
The data of the entry can be called as a pandas dataframe:
>>> entry = Entry.create_examples()[0]
>>> entry.df
t E j
0 0.000000 -0.103158 -0.998277
1 0.020000 -0.102158 -0.981762
...
Data Entries containing published data, also contain information on the source of the data.:
>>> from unitpackage.collection import Collection
>>> db = Collection.create_example()
>>> entry = db['alves_2011_electrochemistry_6010_f1a_solid']
>>> entry.bibliography
Entry('article',
fields=[
('title', 'Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases'),
('journal', 'Physical Chemistry Chemical Physics'),
('volume', '13'),
('number', '13'),
('pages', '6010--6021'),
('year', '2011'),
('publisher', 'Royal Society of Chemistry'),
('abstract', 'We investigated ...')],
persons=OrderedCaseInsensitiveDict([('author', [Person('Alves, Otavio B'), Person('Hoster, Harry E'), Person('Behm, Rolf J{\\"u}rgen')])]))
- class unitpackage.entry.Entry(resource)
A frictionless Resource describing tabulated data.
EXAMPLES:
Entries can be directly created from a frictionless Data Package containing a single frictionless Resource:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
or directly form a frictionless Resource:
>>> from unitpackage.entry import Entry >>> from frictionless import Resource >>> entry = Entry(Resource('./examples/local/no_bibliography/no_bibliography.json')) >>> entry Entry('no_bibliography')
Entries can also be created by other means such as, a CSV
Entry.from_csv
or a pandas dataframeEntry.from_df
.Normally, entries are obtained by opening a
Collection
of entries:>>> from unitpackage.collection import Collection >>> collection = Collection.create_example() >>> entry = next(iter(collection))
- add_columns(df, new_fields)
Adds a column to the dataframe with specified field properties and returns an updated entry.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
The units and descriptions of the axes in the data frame can be recovered:
>>> import pandas as pd >>> import astropy.units as u >>> df = pd.DataFrame() >>> df['P/A'] = entry.df['E'] * entry.df['j'] >>> new_field_unit = u.Unit(entry.field_unit('E')) * u.Unit(entry.field_unit('j')) >>> new_entry = entry.add_columns(df['P/A'], new_fields=[{'name':'P/A', 'unit': new_field_unit}]) >>> new_entry.df t E j P/A 0 0.000000 -0.103158 -0.998277 0.102981 1 0.020000 -0.102158 -0.981762 0.100295 ... >>> new_entry.field_unit('P/A') Unit("A V / m2")
- property bibliography
Return a pybtex bibliography object associated with this entry.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.bibliography Entry('article', fields=[ ('title', ... ... >>> entry_no_bib = Entry.create_examples(name="no_bibliography")[0] >>> entry_no_bib.bibliography ''
- citation(backend='text')
Return a formatted reference for the entry’s bibliography such as:
Doe, et al., Journal Name, volume (YEAR) page, “Title”
Rendering default is plain text ‘text’, but can be changed to any format supported by pybtex, such as markdown ‘md’, ‘latex’ or ‘html’.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.citation(backend='text') 'O. B. Alves et al. Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases. Physical Chemistry Chemical Physics, 13(13):6010–6021, 2011.' >>> print(entry.citation(backend='md')) O\. B\. Alves *et al\.* *Electrochemistry at Ru\(0001\) in a flowing CO\-saturated electrolyte—reactive and inert adlayer phases*\. *Physical Chemistry Chemical Physics*, 13\(13\):6010–6021, 2011\.
- classmethod create_examples(name='')
Return some example entries for use in automated tests.
The examples are created from Data Packages in the unitpackage’s examples directory. These are only available from the development environment.
EXAMPLES:
>>> Entry.create_examples() [Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]
An entry without associated BIB file.
>>> Entry.create_examples(name="no_bibliography") [Entry('no_bibliography')]
- property df
Return the data of this entry’s “MutableResource” as a data frame.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
The units and descriptions of the axes in the data frame can be recovered:
>>> entry.mutable_resource.schema.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- field_unit(field_name)
Return the unit of the
field_name
of theMutableResource
resource.EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.field_unit('E') 'V'
- classmethod from_csv(csvname, metadata=None, fields=None)
Returns an entry constructed from a CSV with a single header line.
EXAMPLES:
Units describing the fields can be provided:
>>> import os >>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}] >>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', fields=fields) >>> entry Entry('from_csv') >>> entry.resource {'name': 'from_csv', ...
Metadata can be appended:
>>> import os >>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', metadata=metadata, fields=fields) >>> entry.user 'Max Doe'
- classmethod from_df(df, metadata=None, fields=None, outdir=None, *, basename)
Returns an entry constructed from a pandas dataframe.
EXAMPLES:
>>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> entry = Entry.from_df(df=df, basename='test_df') >>> entry Entry('test_df')
Metadata and field descriptions can be added:
>>> import os >>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields) >>> entry.user 'Max Doe'
Save the entry:
>>> entry.save(outdir='./test/generated/from_df')
TESTS
Verify that all fields are properly created even when they are not specified as fields:
>>> import os >>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields) >>> entry.resource.schema.fields [{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]
- classmethod from_local(filename)
Return an entry from a :param filename containing a frictionless Data Package. The Data Package must contain a single resource.
Otherwise use collection.from_local_file to create a collection from all resources within.
EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
- property identifier
Return a unique identifier for this entry, i.e., its basename.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.identifier 'alves_2011_electrochemistry_6010_f1a_solid'
- property mutable_resource
Return the data of this entry’s “MutableResource” as a data frame.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.mutable_resource {'name': 'memory', 'type': 'table', 'data': [], 'format': 'pandas', 'mediatype': 'application/pandas', 'schema': {'fields': [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]}}
- plot(x_label=None, y_label=None, name=None)
Return a 2D plot of this entry.
The default plot is constructed from the first two columns of the dataframe.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.plot() Figure(...)
The 2D plot can also be returned with custom axis units available in the resource:
>>> entry.plot(x_label='j', y_label='E') Figure(...)
- rename_fields(field_names, keep_original_name_as=None)
Returns a
Entry
with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as{'t':'t_rel', 'E':'E_we'}
. The original field names can be kept in a new key.EXAMPLES:
The original dataframe:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_examples()[0] >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
Dataframe with modified column names:
>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'E': 'E_we'}, keep_original_name_as='originalName') >>> renamed_entry.df t_rel E_we j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
Updated fields of the “MutableResource”:
>>> renamed_entry.mutable_resource.schema.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E_we', 'type': 'number', 'unit': 'V', 'reference': 'RHE', 'originalName': 'E'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
TESTS:
Provide alternatives for non-existing fields:
>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName') >>> renamed_entry.mutable_resource.schema.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- rescale(units)
Returns a rescaled
Entry
with axes in the specifiedunits
. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.EXAMPLES:
The units without any rescaling:
>>> entry = Entry.create_examples()[0] >>> entry.resource.schema.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
A rescaled entry using different units:
>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'}) >>> rescaled_entry.mutable_resource.schema.fields [{'name': 't', 'type': 'number', 'unit': 'h'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]
The values in the data frame are scaled to match the new units:
>>> rescaled_entry.df t E j 0 0.000000 -0.103158 -99.827664 1 0.000006 -0.102158 -98.176205 ...
- save(*, outdir, basename=None)
Create a Data Package, i.e., a CSV file and a JSON file, in the directory
outdir
.EXAMPLES:
The output files are named
identifier.csv
andidentifier.json
using the identifier of the original resource:>>> import os >>> entry = Entry.create_examples()[0] >>> entry.save(outdir='./test/generated') >>> basename = entry.identifier >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
When a
basename
is set, the files are namedbasename.csv
andbasename.json
. Note that for a valid frictionless Data Package this base name MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters’:>>> import os >>> entry = Entry.create_examples()[0] >>> basename = 'save_basename' >>> entry.save(basename=basename, outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
TESTS:
Save the entry as Data Package with metadata containing datetime format, which is not natively supported by JSON.
>>> import os >>> from datetime import datetime >>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> basename = 'save_datetime' >>> entry = Entry.from_df(df=df, basename=basename, metadata={'current time':datetime.now()}) >>> entry.save(outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True