unitpackage.entry
A frictionless tabular Resource describing tabulated data for which the units of the column names (pandas) or fields (frictionless) are known and the resource has additional metadata describing the underlying data.
A description of such resources can be found in the documentation in Unitpackage Structure.
Resources are the individual elements of a Collection and
are denoted as entry.
EXAMPLES:
Metadata included in an entry is accessible as an attribute:
>>> from unitpackage.entry import Entry
>>> entry = Entry.create_example()
>>> entry.echemdb.source
{'citationKey': 'alves_2011_electrochemistry_6010',
'url': 'https://doi.org/10.1039/C0CP01001D',
'figure': '1a',
'curve': 'solid',
'bibdata': '@article{alves_2011_electrochemistry_6010,...}
The data of the entry can be called as a pandas dataframe:
>>> entry = Entry.create_example()
>>> entry.df
t E j
0 0.000000 -0.103158 -0.998277
1 0.020000 -0.102158 -0.981762
...
Entries can be created from various sources, such as csv files or pandas dataframes:
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv')
>>> entry
Entry('from_csv')
Information on the fields such as units can be updated:
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> entry = entry.update_fields(fields=fields)
>>> entry.fields
[{'name': 'E', 'type': 'integer', 'unit': 'mV'},
{'name': 'I', 'type': 'integer', 'unit': 'A'}]
Metadata to the resource can be updated in-place:
>>> metadata = {'echemdb': {'source': {'citationKey': 'new_key'}}}
>>> entry.metadata.from_dict(metadata)
>>> entry.metadata
{'echemdb': {'source': {'citationKey': 'new_key'}}}
- class unitpackage.entry.Entry(resource)
A frictionless Resource describing tabulated data.
EXAMPLES:
Entries can be directly created from a frictionless Data Package containing a single frictionless Resource:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
or directly form a frictionless Resource:
>>> from unitpackage.entry import Entry >>> from frictionless import Resource >>> entry = Entry(Resource('./examples/local/no_bibliography/no_bibliography.json')) >>> entry Entry('no_bibliography')
Entries can also be created by other means such as, a CSV
Entry.from_csvor a pandas dataframeEntry.from_df.Normally, entries are obtained by opening a
Collectionof entries:>>> from unitpackage.collection import Collection >>> collection = Collection.create_example() >>> entry = next(iter(collection))
- add_columns(df, new_fields)
Adds a column to the dataframe with specified field properties and returns an updated entry.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
The units and descriptions of the axes in the data frame can be recovered:
>>> import pandas as pd >>> import astropy.units as u >>> df = pd.DataFrame() >>> df['P/A'] = entry.df['E'] * entry.df['j'] >>> new_field_unit = u.Unit(entry.field_unit('E')) * u.Unit(entry.field_unit('j')) >>> new_entry = entry.add_columns(df['P/A'], new_fields=[{'name':'P/A', 'unit': new_field_unit}]) >>> new_entry.df t E j P/A 0 0.000000 -0.103158 -0.998277 0.102981 1 0.020000 -0.102158 -0.981762 0.100295 ... >>> new_entry.field_unit('P/A') Unit("A V / m2")
TESTS:
Validate that the identifier is preserved:
>>> new_entry.identifier 'alves_2011_electrochemistry_6010_f1a_solid'
- add_offset(field_name=None, offset=None, unit='')
Return an entry with an offset (with specified units) to a specified field of the entry. The offset properties are stored in the fields metadata.
If offsets are applied consecutively, the value is updated.
EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_example() >>> entry.df.head() t E j 0 0.00 -0.103158 -0.998277 1 0.02 -0.102158 -0.981762 ... >>> new_entry = entry.add_offset('E', 0.1, 'V') >>> new_entry.df.head() t E j 0 0.00 -0.003158 -0.998277 1 0.02 -0.002158 -0.981762 ... >>> new_entry.resource.schema.get_field('E') {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE', 'offset': {'value': 0.1, 'unit': 'V'}}
An offset with a different unit than that of the field.:
>>> new_entry = entry.add_offset('E', 250, 'mV') >>> new_entry.df.head() t E j 0 0.00 0.146842 -0.998277 1 0.02 0.147842 -0.981762 ...
A consecutively added offset:
>>> new_entry_1 = new_entry.add_offset('E', 0.150, 'V') >>> new_entry_1.df.head() t E j 0 0.00 0.296842 -0.998277 1 0.02 0.297842 -0.981762 ... >>> new_entry_1.resource.schema.get_field('E') {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE', 'offset': {'value': 0.4, 'unit': 'V'}}
If no unit is provided, the field unit is used instead.:
>>> new_entry_2 = new_entry.add_offset('E', 0.150) >>> new_entry_2.df.head() t E j 0 0.00 0.296842 -0.998277 1 0.02 0.297842 -0.981762 ...
- apply_scaling_factor(field_name=None, scaling_factor=None)
Return an entry with a
scaling_factorapplied to a specified field of the entry. The scaling factor is stored in the fields metadata.If scaling factors are applied consecutively, the value is updated (i.e., the cumulative scaling factor is the product of the individual factors).
EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_example() >>> entry.df.head() t E j 0 0.00 -0.103158 -0.998277 1 0.02 -0.102158 -0.981762 ... >>> new_entry = entry.apply_scaling_factor('j', 2) >>> new_entry.df.head() t E j 0 0.00 -0.103158 -1.996553 1 0.02 -0.102158 -1.963524 ... >>> new_entry.resource.schema.get_field('j') {'name': 'j', 'type': 'number', 'unit': 'A / m2', 'scalingFactor': {'value': 2.0}}
A consecutively applied scaling factor:
>>> new_entry_1 = new_entry.apply_scaling_factor('j', 3) >>> new_entry_1.df.head() t E j 0 0.00 -0.103158 -5.989660 1 0.02 -0.102158 -5.890572 ... >>> new_entry_1.resource.schema.get_field('j') {'name': 'j', 'type': 'number', 'unit': 'A / m2', 'scalingFactor': {'value': 6.0}}
Scaling by a float:
>>> new_entry_2 = entry.apply_scaling_factor('E', 1e3) >>> new_entry_2.df.head() t E j 0 0.00 -103.158422 -0.998277 1 0.02 -102.158422 -0.981762 ...
- classmethod create_example(name=None)
Return an example entry for use in automated tests.
The examples are created from Data Packages in the unitpackage’s examples directory. These are only available from the development environment.
EXAMPLES:
>>> Entry.create_example() Entry('alves_2011_electrochemistry_6010_f1a_solid') >>> Entry.create_example(name="no_bibliography") Entry('no_bibliography')
- default_metadata_key = ''
Default metadata key to use when accessing the descriptor. If empty string, the entire metadata dict is used. Subclasses can override this.
- property df
Return the data of this entry’s resource as a data frame.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
The units and descriptions of the axes in the data frame can be recovered:
>>> entry.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
TESTS:
>>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> entry = Entry.from_df(df=df, basename='test_df') >>> entry.df x y 0 1 2 1 2 3 2 3 4
- field_unit(field_name)
Return the unit of the
field_nameof the resource.EXAMPLES:
>>> entry = Entry.create_example() >>> entry.field_unit('E') 'V'
- property fields
Return the fields of the resource’s schema.
This is a convenience property that returns self.resource.schema.fields.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- classmethod from_csv(csvname, encoding=None, header_lines=None, column_header_lines=None, decimal=None, delimiters=None, device=None)
Returns an entry constructed from a CSV.
The file is always parsed through a loader which captures the file’s structure (delimiter, decimal separator, header, column headers) in the entry’s metadata under
dsvDescription.A
devicecan be specified to select a device-specific loader (e.g.,'eclab'or'gamry').EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv') >>> entry Entry('from_csv')
The loader’s file structure information is stored in the metadata:
>>> entry.metadata['dsvDescription']['loader'] 'BaseLoader' >>> entry.metadata['dsvDescription']['delimiter'] ','
Important
Upper case filenames are converted to lower case entry identifiers!
A filename containing upper case characters:
>>> entry = Entry.from_csv(csvname='examples/from_csv/UpperCase.csv') >>> entry Entry('uppercase')
CSV with a more complex structure, such as multiple header lines can be constructed:
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv_multiple_headers.csv', column_header_lines=2) >>> entry.fields [{'name': 'E / V', 'type': 'integer'}, {'name': 'j / A / cm2', 'type': 'integer'}]
A device-specific loader can be used to parse instrument files:
>>> entry = Entry.from_csv(csvname='test/loader_data/eclab_cv.mpt', device='eclab') >>> entry Entry('eclab_cv') >>> entry.df mode ox/red error ... (Q-Qo)/C I Range P/W 0 2 1 0 ... 0.000000e+00 41 0.000001 1 2 0 0 ... -3.622761e-08 41 -0.000003 ... >>> entry.metadata['dsvDescription']['loader'] 'ECLabLoader' >>> entry.metadata['dsvDescription']['delimiter'] '\t'
- classmethod from_df(df, *, basename)
Returns an entry constructed from a pandas dataframe. A name basename for the entry must be provided. The name must be lower-case and contain only alphanumeric characters along with . , _ or - characters’. (Upper case characters are converted to lower case.)
EXAMPLES:
>>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> entry = Entry.from_df(df=df, basename='test_df') >>> entry Entry('test_df')
Metadata and field descriptions can be added:
>>> import os >>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_df(df=df, basename='test_df').update_fields(fields=fields) >>> entry.metadata.from_dict(metadata) >>> entry.metadata {'user': 'Max Doe'}
Save the entry:
>>> entry.save(outdir='./test/generated/from_df')
Important
Basenames with upper case characters are stored with lower case characters! To separate words use underscores.
The basename will always be converted to lowercase entry identifiers:
>>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> entry = Entry.from_df(df=df, basename='TEST_DF') >>> entry Entry('test_df')
TESTS:
Verify that all fields are properly created even when they are not specified as fields:
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> entry = Entry.from_df(df=df, basename='test_df').update_fields(fields=fields) >>> entry.fields [{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]
- classmethod from_local(filename)
Return an entry from a :param filename containing a frictionless Data Package. The Data Package must contain a single resource.
Otherwise use collection.from_local_file to create a collection from all resources within.
EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
- property identifier
Return a unique identifier for this entry, i.e., its basename.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.identifier 'alves_2011_electrochemistry_6010_f1a_solid'
- load_metadata(filename, file_format=None, key=None)
Load metadata from a file and return self for method chaining.
The file format is auto-detected from the extension if not specified. Supported formats are ‘yaml’ and ‘json’.
EXAMPLES:
Load metadata from a YAML file:
>>> import os >>> import tempfile >>> import yaml >>> entry = Entry.create_example() >>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f: ... yaml.dump({'source': {'citationKey': 'chain_test'}}, f) ... temp_path = f.name >>> entry.load_metadata(temp_path, key='echemdb').metadata.echemdb.source.citationKey 'chain_test' >>> os.unlink(temp_path)
Load metadata from a JSON file with auto-detection:
>>> import os >>> import json >>> import tempfile >>> entry = Entry.create_example() >>> with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f: ... json.dump({'custom': {'data': 'value'}}, f) ... temp_path = f.name >>> entry.load_metadata(temp_path).metadata.custom.data 'value' >>> os.unlink(temp_path)
- property metadata
Access and manage entry metadata.
Returns a MetadataDescriptor that supports both dict and attribute-style access. Allows loading metadata from various sources. Modifications are applied in-place.
The descriptor is cached for efficiency, but still reflects metadata changes since it delegates to the underlying resource.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.metadata['echemdb']['source']['citationKey'] 'alves_2011_electrochemistry_6010' >>> entry.metadata.echemdb['source']['citationKey'] 'alves_2011_electrochemistry_6010'
Load metadata from a dict:
>>> new_entry = Entry.create_example() >>> new_entry.metadata.from_dict({'echemdb': {'test': 'data'}}) >>> new_entry.metadata['echemdb']['test'] 'data'
The descriptor is cached but still sees metadata updates:
>>> entry = Entry.create_example() >>> descriptor1 = entry.metadata >>> entry.metadata.from_dict({'custom': {'key': 'value'}}) >>> descriptor2 = entry.metadata >>> descriptor1 is descriptor2 True >>> descriptor1['custom']['key'] 'value'
- plot(x_label=None, y_label=None, name=None)
Return a 2D plot of this entry.
The default plot is constructed from the first two columns of the dataframe.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.plot() Figure(...)
The 2D plot can also be returned with custom axis units available in the resource:
>>> entry.plot(x_label='j', y_label='E') Figure(...)
A plot from data without units:
>>> import pandas as pd >>> data = {'t': [0, 1, 2], 'E': [0.0, 1.0, 2.0]} >>> df = pd.DataFrame(data) >>> entry = Entry.from_df(df=df, basename='test_df') >>> entry.plot() Figure(...)
- remove_column(field_name)
Removes a single column from the dataframe and returns an updated entry.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ... >>> new_entry = entry.remove_column('E') >>> new_entry.df t j 0 0.000000 -0.998277 1 0.020000 -0.981762 ... >>> 'E' in new_entry.df.columns False
- remove_columns(*field_names)
Removes specified columns from the dataframe and returns an updated entry.
EXAMPLES:
>>> entry = Entry.create_example() >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ... >>> new_entry = entry.remove_columns('E', 'j') >>> new_entry.df t 0 0.000000 1 0.020000 ... >>> 'E' in new_entry.df.columns False
- rename_field(field_name, new_name, keep_original_name_as=None)
Returns a
Entrywith a single renamed field and corresponding dataframe column name.The original field name can optionally be kept in a new key.
EXAMPLES:
The original dataframe:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_example() >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
Dataframe with a single modified column name:
>>> renamed_entry = entry.rename_field('t', 't_rel', keep_original_name_as='originalName') >>> renamed_entry.df t_rel E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
Updated fields of the resource:
>>> renamed_entry.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
TESTS:
Renaming a non-existing field has no effect:
>>> renamed_entry = entry.rename_field('x', 'y', keep_original_name_as='originalName') >>> renamed_entry.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- rename_fields(field_names, keep_original_name_as=None)
Returns a
Entrywith updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as{'t':'t_rel', 'E':'E_we'}. The original field names can be kept in a new key.EXAMPLES:
The original dataframe:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_example() >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
Dataframe with modified column names:
>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'E': 'E_we'}, keep_original_name_as='originalName') >>> renamed_entry.df t_rel E_we j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
Updated fields of the resource:
>>> renamed_entry.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E_we', 'type': 'number', 'unit': 'V', 'reference': 'RHE', 'originalName': 'E'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
TESTS:
Provide alternatives for non-existing fields:
>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName') >>> renamed_entry.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- rescale(units)
Returns a rescaled
Entrywith axes in the specifiedunits. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.EXAMPLES:
The units without any rescaling:
>>> entry = Entry.create_example() >>> entry.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
A rescaled entry using different units:
>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'}) >>> rescaled_entry.fields [{'name': 't', 'type': 'number', 'unit': 'h'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]
The values in the data frame are scaled to match the new units:
>>> rescaled_entry.df t E j 0 0.000000 -0.103158 -99.827664 1 0.000006 -0.102158 -98.176205 ...
- save(*, outdir, basename=None)
Create a Data Package, i.e., a CSV file and a JSON file, in the directory
outdir.EXAMPLES:
The output files are named
identifier.csvandidentifier.jsonusing the identifier of the original resource:>>> import os >>> entry = Entry.create_example() >>> entry.save(outdir='./test/generated') >>> basename = entry.identifier >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
When a
basenameis set, the files are namedbasename.csvandbasename.json.Note
For a valid frictionless Data Package the basename MUST be lower-case and contain only alphanumeric characters along with
.,_or-characters’A valid basename:
>>> import os >>> entry = Entry.create_example() >>> basename = 'save_basename' >>> entry.save(basename=basename, outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
Upper case characters are saved lower case:
>>> import os >>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> basename = 'Upper_Case_Save' >>> entry = Entry.from_df(df=df, basename=basename) >>> entry.save(outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename.lower()}.json') and os.path.exists(f'test/generated/{basename.lower()}.csv') True >>> new_entry = Entry.from_local(f'test/generated/{basename.lower()}.json') >>> new_entry.resource {'name': 'upper_case_save', 'type': 'table', 'path': 'upper_case_save.csv', ...
TESTS:
Save the entry as Data Package with metadata containing datetime format, which is not natively supported by JSON.:
>>> import os >>> from datetime import datetime >>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> basename = 'save_datetime' >>> entry = Entry.from_df(df=df, basename=basename) >>> entry.metadata.from_dict({'currentTime':datetime.now()}) >>> entry.save(outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
- update_fields(fields)
Return a new entry with updated fields in the resource.
The :param fields: list must must be structured such as [{‘name’:’E’, ‘unit’: ‘mV’}, {‘name’:’T’, ‘unit’: ‘K’}].
EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_example() >>> entry.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
Updating the fields returns a new entry with updated field metadata:
>>> fields = [{'name':'E', 'unit': 'mV'}, ... {'name':'j', 'unit': 'uA / cm2'}, ... {'name':'x', 'unit': 'm'}] >>> new_entry = entry.update_fields(fields) >>> new_entry.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'mV', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]
The original entry remains unchanged:
>>> entry.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]