unitpackage.collection

A collection of frictionless Resources that can be accessed and stored as

a [frictionless Data Package](https://github.com/frictionlessdata/datapackage-py).

EXAMPLES:

Create a collection from frictionless Resources stored within local frictionless Data Packages in the data/ directory:

>>> collection = Collection.from_local('data/')

Create a collection from the Data Packages published in the echemdb data repository, and that are displayed on the echemdb website.:

>>> collection = Collection.from_remote()

Search the collection for entries, for example, from a single publication providing its DOI:

>>> collection.filter(lambda entry: entry.echemdb.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), ...
class unitpackage.collection.Collection(package=None)

A collection of frictionless Resources, that can be accessed and stored as a [frictionless Data Package](https://github.com/frictionlessdata/datapackage-py).

EXAMPLES:

An empty collection:

>>> collection = Collection([])
>>> collection
[]

An example collection (only available from the development environment):

>>> collection = Collection.create_example()
>>> collection.package.resource_names
['alves_2011_electrochemistry_6010_f1a_solid',
'engstfeld_2018_polycrystalline_17743_f4b_1',
'no_bibliography']

Collections must contain Resources with unique identifiers:

>>> db = Collection.from_local("./examples/duplicates")
Traceback (most recent call last):
...
ValueError: Collection contains duplicate entries: ['duplicate']
class Entry(resource)

A frictionless Resource describing tabulated data.

EXAMPLES:

Entries can be directly created from a frictionless Data Package containing a single frictionless Resource:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')

or directly form a frictionless Resource:

>>> from unitpackage.entry import Entry
>>> from frictionless import Resource
>>> entry = Entry(Resource('./examples/local/no_bibliography/no_bibliography.json'))
>>> entry
Entry('no_bibliography')

Entries can also be created by other means such as, a CSV Entry.from_csv or a pandas dataframe Entry.from_df.

Normally, entries are obtained by opening a Collection of entries:

>>> from unitpackage.collection import Collection
>>> collection = Collection.create_example()
>>> entry = next(iter(collection))
add_columns(df, new_fields)

Adds a column to the dataframe with specified field properties and returns an updated entry.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> import pandas as pd
>>> import astropy.units as u
>>> df = pd.DataFrame()
>>> df['P/A'] = entry.df['E'] * entry.df['j']
>>> new_field_unit = u.Unit(entry.field_unit('E')) * u.Unit(entry.field_unit('j'))
>>> new_entry = entry.add_columns(df['P/A'], new_fields=[{'name':'P/A', 'unit': new_field_unit}])
>>> new_entry.df
              t         E         j       P/A
0      0.000000 -0.103158 -0.998277  0.102981
1      0.020000 -0.102158 -0.981762  0.100295
...

>>> new_entry.field_unit('P/A')
Unit("A V / m2")

TESTS:

Validate that the identifier is preserved:

>>> new_entry.identifier
'alves_2011_electrochemistry_6010_f1a_solid'
add_offset(field_name=None, offset=None, unit='')

Return an entry with an offset (with specified units) to a specified field of the entry. The offset properties are stored in the fields metadata.

If offsets are applied consecutively, the value is updated.

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_example()
>>> entry.df.head()
      t         E         j
0  0.00 -0.103158 -0.998277
1  0.02 -0.102158 -0.981762
...

>>> new_entry = entry.add_offset('E', 0.1, 'V')
>>> new_entry.df.head()
      t         E         j
0  0.00 -0.003158 -0.998277
1  0.02 -0.002158 -0.981762
...

>>> new_entry.resource.schema.get_field('E')
{'name': 'E',
'type': 'number',
'unit': 'V',
'reference': 'RHE',
'offset': {'value': 0.1, 'unit': 'V'}}

An offset with a different unit than that of the field.:

>>> new_entry = entry.add_offset('E', 250, 'mV')
>>> new_entry.df.head()
      t         E         j
0  0.00  0.146842 -0.998277
1  0.02  0.147842 -0.981762
...

A consecutively added offset:

>>> new_entry_1 = new_entry.add_offset('E', 0.150, 'V')
>>> new_entry_1.df.head()
      t         E         j
0  0.00  0.296842 -0.998277
1  0.02  0.297842 -0.981762
...

>>> new_entry_1.resource.schema.get_field('E')
{'name': 'E',
'type': 'number',
'unit': 'V',
'reference': 'RHE',
'offset': {'value': 0.4, 'unit': 'V'}}

If no unit is provided, the field unit is used instead.:

>>> new_entry_2 = new_entry.add_offset('E', 0.150)
>>> new_entry_2.df.head()
      t         E         j
0  0.00  0.296842 -0.998277
1  0.02  0.297842 -0.981762
...
apply_scaling_factor(field_name=None, scaling_factor=None)

Return an entry with a scaling_factor applied to a specified field of the entry. The scaling factor is stored in the fields metadata.

If scaling factors are applied consecutively, the value is updated (i.e., the cumulative scaling factor is the product of the individual factors).

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_example()
>>> entry.df.head()
      t         E         j
0  0.00 -0.103158 -0.998277
1  0.02 -0.102158 -0.981762
...

>>> new_entry = entry.apply_scaling_factor('j', 2)
>>> new_entry.df.head()
      t         E         j
0  0.00 -0.103158 -1.996553
1  0.02 -0.102158 -1.963524
...

>>> new_entry.resource.schema.get_field('j')
{'name': 'j',
'type': 'number',
'unit': 'A / m2',
'scalingFactor': {'value': 2.0}}

A consecutively applied scaling factor:

>>> new_entry_1 = new_entry.apply_scaling_factor('j', 3)
>>> new_entry_1.df.head()
      t         E         j
0  0.00 -0.103158 -5.989660
1  0.02 -0.102158 -5.890572
...

>>> new_entry_1.resource.schema.get_field('j')
{'name': 'j',
'type': 'number',
'unit': 'A / m2',
'scalingFactor': {'value': 6.0}}

Scaling by a float:

>>> new_entry_2 = entry.apply_scaling_factor('E', 1e3)
>>> new_entry_2.df.head()
      t           E         j
0  0.00 -103.158422 -0.998277
1  0.02 -102.158422 -0.981762
...
classmethod create_example(name=None)

Return an example entry for use in automated tests.

The examples are created from Data Packages in the unitpackage’s examples directory. These are only available from the development environment.

EXAMPLES:

>>> Entry.create_example()
Entry('alves_2011_electrochemistry_6010_f1a_solid')

>>> Entry.create_example(name="no_bibliography")
Entry('no_bibliography')
default_metadata_key = ''

Default metadata key to use when accessing the descriptor. If empty string, the entire metadata dict is used. Subclasses can override this.

property df

Return the data of this entry’s resource as a data frame.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> entry.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

TESTS:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='test_df')
>>> entry.df
   x  y
0  1  2
1  2  3
2  3  4
field_unit(field_name)

Return the unit of the field_name of the resource.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.field_unit('E')
'V'
property fields

Return the fields of the resource’s schema.

This is a convenience property that returns self.resource.schema.fields.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
classmethod from_csv(csvname, encoding=None, header_lines=None, column_header_lines=None, decimal=None, delimiters=None, device=None)

Returns an entry constructed from a CSV.

The file is always parsed through a loader which captures the file’s structure (delimiter, decimal separator, header, column headers) in the entry’s metadata under dsvDescription.

A device can be specified to select a device-specific loader (e.g., 'eclab' or 'gamry').

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv')
>>> entry
Entry('from_csv')

The loader’s file structure information is stored in the metadata:

>>> entry.metadata['dsvDescription']['loader']
'BaseLoader'
>>> entry.metadata['dsvDescription']['delimiter']
','

Important

Upper case filenames are converted to lower case entry identifiers!

A filename containing upper case characters:

>>> entry = Entry.from_csv(csvname='examples/from_csv/UpperCase.csv')
>>> entry
Entry('uppercase')

CSV with a more complex structure, such as multiple header lines can be constructed:

>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv_multiple_headers.csv', column_header_lines=2)
>>> entry.fields
[{'name': 'E / V', 'type': 'integer'},
{'name': 'j / A / cm2', 'type': 'integer'}]

A device-specific loader can be used to parse instrument files:

>>> entry = Entry.from_csv(csvname='test/loader_data/eclab_cv.mpt', device='eclab')
>>> entry
Entry('eclab_cv')

>>> entry.df
    mode  ox/red  error  ...      (Q-Qo)/C  I Range       P/W
0      2       1      0  ...  0.000000e+00       41  0.000001
1      2       0      0  ... -3.622761e-08       41 -0.000003
...

>>> entry.metadata['dsvDescription']['loader']
'ECLabLoader'
>>> entry.metadata['dsvDescription']['delimiter']
'\t'
classmethod from_df(df, *, basename)

Returns an entry constructed from a pandas dataframe. A name basename for the entry must be provided. The name must be lower-case and contain only alphanumeric characters along with . , _ or - characters’. (Upper case characters are converted to lower case.)

EXAMPLES:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='test_df')
>>> entry
Entry('test_df')

Metadata and field descriptions can be added:

>>> import os
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df').update_fields(fields=fields)
>>> entry.metadata.from_dict(metadata)
>>> entry.metadata
{'user': 'Max Doe'}

Save the entry:

>>> entry.save(outdir='./test/generated/from_df')

Important

Basenames with upper case characters are stored with lower case characters! To separate words use underscores.

The basename will always be converted to lowercase entry identifiers:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='TEST_DF')
>>> entry
Entry('test_df')

TESTS:

Verify that all fields are properly created even when they are not specified as fields:

>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> entry = Entry.from_df(df=df, basename='test_df').update_fields(fields=fields)
>>> entry.fields
[{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]
classmethod from_local(filename)

Return an entry from a :param filename containing a frictionless Data Package. The Data Package must contain a single resource.

Otherwise use collection.from_local_file to create a collection from all resources within.

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')
property identifier

Return a unique identifier for this entry, i.e., its basename.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.identifier
'alves_2011_electrochemistry_6010_f1a_solid'
load_metadata(filename, file_format=None, key=None)

Load metadata from a file and return self for method chaining.

The file format is auto-detected from the extension if not specified. Supported formats are ‘yaml’ and ‘json’.

EXAMPLES:

Load metadata from a YAML file:

>>> import os
>>> import tempfile
>>> import yaml
>>> entry = Entry.create_example()
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
...     yaml.dump({'source': {'citationKey': 'chain_test'}}, f)
...     temp_path = f.name
>>> entry.load_metadata(temp_path, key='echemdb').metadata.echemdb.source.citationKey
'chain_test'
>>> os.unlink(temp_path)

Load metadata from a JSON file with auto-detection:

>>> import os
>>> import json
>>> import tempfile
>>> entry = Entry.create_example()
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
...     json.dump({'custom': {'data': 'value'}}, f)
...     temp_path = f.name
>>> entry.load_metadata(temp_path).metadata.custom.data
'value'
>>> os.unlink(temp_path)
property metadata

Access and manage entry metadata.

Returns a MetadataDescriptor that supports both dict and attribute-style access. Allows loading metadata from various sources. Modifications are applied in-place.

The descriptor is cached for efficiency, but still reflects metadata changes since it delegates to the underlying resource.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.metadata['echemdb']['source']['citationKey']
'alves_2011_electrochemistry_6010'

>>> entry.metadata.echemdb['source']['citationKey']
'alves_2011_electrochemistry_6010'

Load metadata from a dict:

>>> new_entry = Entry.create_example()
>>> new_entry.metadata.from_dict({'echemdb': {'test': 'data'}})
>>> new_entry.metadata['echemdb']['test']
'data'

The descriptor is cached but still sees metadata updates:

>>> entry = Entry.create_example()
>>> descriptor1 = entry.metadata
>>> entry.metadata.from_dict({'custom': {'key': 'value'}})
>>> descriptor2 = entry.metadata
>>> descriptor1 is descriptor2
True
>>> descriptor1['custom']['key']
'value'
plot(x_label=None, y_label=None, name=None)

Return a 2D plot of this entry.

The default plot is constructed from the first two columns of the dataframe.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.plot()
Figure(...)

The 2D plot can also be returned with custom axis units available in the resource:

>>> entry.plot(x_label='j', y_label='E')
Figure(...)

A plot from data without units:

>>> import pandas as pd
>>> data = {'t': [0, 1, 2], 'E': [0.0, 1.0, 2.0]}
>>> df = pd.DataFrame(data)
>>> entry = Entry.from_df(df=df, basename='test_df')
>>> entry.plot()
Figure(...)
remove_column(field_name)

Removes a single column from the dataframe and returns an updated entry.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

>>> new_entry = entry.remove_column('E')
>>> new_entry.df
              t         j
0      0.000000 -0.998277
1      0.020000 -0.981762
...

>>> 'E' in new_entry.df.columns
False
remove_columns(*field_names)

Removes specified columns from the dataframe and returns an updated entry.

EXAMPLES:

>>> entry = Entry.create_example()
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

>>> new_entry = entry.remove_columns('E', 'j')
>>> new_entry.df
              t
0      0.000000
1      0.020000
...

>>> 'E' in new_entry.df.columns
False
rename_field(field_name, new_name, keep_original_name_as=None)

Returns a Entry with a single renamed field and corresponding dataframe column name.

The original field name can optionally be kept in a new key.

EXAMPLES:

The original dataframe:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_example()
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Dataframe with a single modified column name:

>>> renamed_entry = entry.rename_field('t', 't_rel', keep_original_name_as='originalName')
>>> renamed_entry.df
          t_rel         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Updated fields of the resource:

>>> renamed_entry.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

TESTS:

Renaming a non-existing field has no effect:

>>> renamed_entry = entry.rename_field('x', 'y', keep_original_name_as='originalName')
>>> renamed_entry.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
rename_fields(field_names, keep_original_name_as=None)

Returns a Entry with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as {'t':'t_rel', 'E':'E_we'}. The original field names can be kept in a new key.

EXAMPLES:

The original dataframe:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_example()
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Dataframe with modified column names:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'E': 'E_we'}, keep_original_name_as='originalName')
>>> renamed_entry.df
          t_rel      E_we         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Updated fields of the resource:

>>> renamed_entry.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E_we', 'type': 'number', 'unit': 'V', 'reference': 'RHE', 'originalName': 'E'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

TESTS:

Provide alternatives for non-existing fields:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName')
>>> renamed_entry.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
rescale(units)

Returns a rescaled Entry with axes in the specified units. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.

EXAMPLES:

The units without any rescaling:

>>> entry = Entry.create_example()
>>> entry.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

A rescaled entry using different units:

>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'})
>>> rescaled_entry.fields
[{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]

The values in the data frame are scaled to match the new units:

>>> rescaled_entry.df
             t         E          j
0     0.000000 -0.103158 -99.827664
1     0.000006 -0.102158 -98.176205
...
save(*, outdir, basename=None)

Create a Data Package, i.e., a CSV file and a JSON file, in the directory outdir.

EXAMPLES:

The output files are named identifier.csv and identifier.json using the identifier of the original resource:

>>> import os
>>> entry = Entry.create_example()
>>> entry.save(outdir='./test/generated')
>>> basename = entry.identifier
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

When a basename is set, the files are named basename.csv and basename.json.

Note

For a valid frictionless Data Package the basename MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters’

A valid basename:

>>> import os
>>> entry = Entry.create_example()
>>> basename = 'save_basename'
>>> entry.save(basename=basename, outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

Upper case characters are saved lower case:

>>> import os
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'Upper_Case_Save'
>>> entry = Entry.from_df(df=df, basename=basename)
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename.lower()}.json') and os.path.exists(f'test/generated/{basename.lower()}.csv')
True

>>> new_entry = Entry.from_local(f'test/generated/{basename.lower()}.json')
>>> new_entry.resource
{'name': 'upper_case_save',
'type': 'table',
'path': 'upper_case_save.csv',
...

TESTS:

Save the entry as Data Package with metadata containing datetime format, which is not natively supported by JSON.:

>>> import os
>>> from datetime import datetime
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'save_datetime'
>>> entry = Entry.from_df(df=df, basename=basename)
>>> entry.metadata.from_dict({'currentTime':datetime.now()})
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True
update_fields(fields)

Return a new entry with updated fields in the resource.

The :param fields: list must must be structured such as [{‘name’:’E’, ‘unit’: ‘mV’}, {‘name’:’T’, ‘unit’: ‘K’}].

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_example()
>>> entry.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

Updating the fields returns a new entry with updated field metadata:

>>> fields = [{'name':'E', 'unit': 'mV'},
... {'name':'j', 'unit': 'uA / cm2'},
... {'name':'x', 'unit': 'm'}]
>>> new_entry = entry.update_fields(fields)
>>> new_entry.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'mV', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]

The original entry remains unchanged:

>>> entry.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
classmethod create_example()

Return a sample collection for use in automated tests (only accessible from the development environment).

EXAMPLES:

>>> Collection.create_example()
[Entry('alves_2011_electrochemistry_6010_f1a_solid'),
Entry('engstfeld_2018_polycrystalline_17743_f4b_1'),
Entry('no_bibliography')]
filter(predicate)

Return the subset of the collection that satisfies predicate.

EXAMPLES:

>>> collection = Collection.create_example()
>>> collection.filter(lambda entry: entry.echemdb.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid')]

The filter predicate can use properties that are not present on all entries in the collection. If a property is missing the element is removed from the collection:

>>> collection.filter(lambda entry: entry.non.existing.property)
[]
classmethod from_local(datadir)

Create a collection from local Data Packages.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_local('./examples/local/')
>>> collection
[Entry('alves_2011_electrochemistry_6010_f1a_solid'),
Entry('engstfeld_2018_polycrystalline_17743_f4b_1'),
Entry('no_bibliography')]
classmethod from_local_file(filename)

Create a collection from a local Data Package.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_local_file('./examples/local/engstfeld_2018_polycrystalline_17743/engstfeld_2018_polycrystalline_17743_f4b_1.json')
>>> collection
[Entry('engstfeld_2018_polycrystalline_17743_f4b_1')]
classmethod from_remote(url=None, data=None, outdir=None)

Create a collection from a url containing a zip.

When no url is provided a collection is created from the Data Packages published on the echemdb data repository displayed on the echemdb website.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_remote()
>>> collection.filter(lambda entry: entry.echemdb.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('alves_2011_electrochemistry_6010_f2_red')]

The folder containing the data in the zip can be specified with the :param data:. An output directory for the extracted data can be specified with the :param outdir:.

property identifiers

Return a list of identifiers of the collection, i.e., the names of the resources in the datapackage.

This method is basically equivalent to package.resource_names.

EXAMPLES:

>>> collection = Collection.create_example()
>>> len(collection.identifiers)
3
rescale(units)

Return a rescaled collection with all entries rescaled to the specified units.

Reuses the interface of rescale(). Provide a dict, where the key is the field name and the value the new unit, such as {'j': 'uA / cm2', 't': 'h'}.

Fields that are not present in an entry are silently ignored for that entry.

EXAMPLES:

The units without any rescaling:

>>> collection = Collection.create_example()
>>> collection[0].fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

A rescaled collection:

>>> rescaled = collection.rescale({'j': 'uA / cm2', 't': 'h'})
>>> rescaled[0].fields
[{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]

The number of entries in the collection is preserved:

>>> len(rescaled) == len(collection)
True
save_entries(outdir=None)

Save the entries of this collection as Data Packages (CSV and JSON) to the output directory :param outdir:.

EXAMPLES:

>>> db = Collection.create_example()
>>> db.save_entries(outdir='./test/generated/saved_collection')
>>> import glob
>>> glob.glob('test/generated/saved_collection/**.json')
['test...