Usage

The unitpackage module allows interacting with collections and entries from specifically designed frictionless datapackages.

Collection

A collection can be generated from a remote or a local source.

To illustrate the usage of unitpackage, we create a collection from the entries on echemdb.org:

from unitpackage.collection import Collection
db = Collection.from_remote()

Type db to highlight the entries within the collection or show the number of entries in the collection with.

len(db)
205

You can iterate over these entries

next(iter(db))
Entry('alves_2011_electrochemistry_6010_f1a_solid')

The collection can be filtered for specific descriptors, wherby a new collection is created.

filtered_db = db.filter(lambda entry: entry.experimental.tags == ['BCV','HER'])
len(filtered_db)
3

Entry

Each entry consists of descriptors describing the data in the resource of the datapackage. Packages describing literature data can also contain a bibliography reference (see Bibliography). The entry also has additional methods for descriptor representation, data manipulation and data visualization.

Entries can be selected by their identifier from a collection. For our example database such identifiers can directly be inferred from echemdb.org for each entry.

entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry
Entry('engstfeld_2018_polycrystalline_17743_f4b_1')

Other approaches to create entries from CSV or pandas dataframes directly are described here.

Resource Metadata

The metadata associated with the resource is located in db.package.get_resource('echemdb').custom['metadata']. From an entry such information can be retrieved by entry['name'], where name is the respective descriptor in the metadata descriptor. Alternatively you can write entry.name where all spaces should be replaced by underscores.

entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry['source']['citation key']
'engstfeld_2018_polycrystalline_17743'
entry.source.citation_key
'engstfeld_2018_polycrystalline_17743'

entry.package provides a full list of available descriptors.

Units and values

Entries containing both a unit and a value are returned as astropy units or quantities.

entry.figure_description.scan_rate
50.0 mV / s

The unit and value can be accessed separately

entry.figure_description.scan_rate.value
50.0
entry.figure_description.scan_rate.unit
'mV / s'

Data

The datapackage consists of two resources.

  • One resource is named according to the entry’s identifier. It describes the data in the CSV.

  • One resource is named “echemdb”. It contains the data as a pandas dataframe used by the unitpackage module (see Unitpackage Structure for more details.)

Note

The content of the CSV never changes unless it is explicitly overwritten. Changes to the data with the unitpackage module are only applied to the echemdb resource.

entry.package.resource_names
['engstfeld_2018_polycrystalline_17743_f4b_1', 'echemdb']

The data can be returned as a pandas dataframe.

entry.df.head()
t E j
0 0.000000 -0.196962 0.043009
1 0.011368 -0.196393 0.051408
2 0.030365 -0.195443 0.058212
3 0.050365 -0.194443 0.062875
4 0.055176 -0.194203 0.063810

The description of the fields (column names) including units and/or other information are included in the resource schema.

entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
            {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
            {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]}

The units of the dataframe can be rescaled.

rescaled_entry = entry.rescale({'t' : 'h', 'E': 'mV', 'j' : 'uA / cm2'})
rescaled_entry.df.head()
t E j
0 0.000000 -196.961730 4.300884
1 0.000003 -196.393321 5.140820
2 0.000008 -195.443463 5.821203
3 0.000014 -194.443463 6.287469
4 0.000015 -194.202944 6.381011

The units are updated in the package schema of the ‘echemdb’ resource.

rescaled_entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 'h'},
            {'name': 'E', 'type': 'number', 'unit': 'mV', 'reference': 'RHE'},
            {'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]}

The units of a specific field can be retrieved.

rescaled_entry.field_unit('E')
'mV'

Plotting

The data can be visualized in a plotly figure. Without providing the dimensions of the x any y labels specifically the first two columns are plotted against each other or you can specify the dimensions.

entry.plot()