Usage

The unitpackage module allows interacting with collections and entries from specifically designed frictionless datapackages.

Collection

A collection can be generated from a remote or a local source.

To illustrate the usage of unitpackage, we create a collection from the entries on echemdb.org:

from unitpackage.collection import Collection
db = Collection.from_remote()

Type db to highlight the entries within the collection or show the number of entries in the collection with.

len(db)
205

You can iterate over these entries

next(iter(db))
Entry('alves_2011_electrochemistry_6010_f1a_solid')

The collection can be filtered for specific descriptors, wherby a new collection is created.

filtered_db = db.filter(lambda entry: entry.experimental.tags == ['BCV','HER'])
len(filtered_db)
3

Entry

Each entry consists of descriptors describing the data in the resource of the datapackage. Packages describing literature data can also contain a bibliography reference (see Bibliography). The entry also has additional methods for descriptor representation, data manipulation and data visualization.

Entries can be selected by their identifier from a collection. For our example database such identifiers can directly be inferred from echemdb.org for each entry.

entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry
Entry('engstfeld_2018_polycrystalline_17743_f4b_1')

Other approaches to create entries from CSV or pandas dataframes directly are described here.

Resource Metadata

The metadata associated with the resource is located in db.package.get_resource('echemdb').custom['metadata']. From an entry such information can be retrieved by entry['name'], where name is the respective descriptor in the metadata descriptor. Alternatively you can write entry.name where all spaces should be replaced by underscores.

entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry['source']['citation key']
'engstfeld_2018_polycrystalline_17743'
entry.source.citation_key
'engstfeld_2018_polycrystalline_17743'

entry.package provides a full list of available descriptors.

Units and values

Entries containing both a unit and a value are returned as astropy units or quantities.

entry.figure_description.scan_rate
50.0 mV / s

The unit and value can be accessed separately

entry.figure_description.scan_rate.value
50.0
entry.figure_description.scan_rate.unit
'mV / s'

Data

The datapackage consists of two resources.

  • One resource is named according to the entry’s identifier. It describes the data in the CSV.

  • One resource is named “echemdb”. It contains the data as a pandas dataframe used by the unitpackage module (see Unitpackage Structure for more details.)

Note

The content of the CSV never changes unless it is explicitly overwritten. Changes to the data with the unitpackage module are only applied to the echemdb resource.

entry.package.resource_names
['engstfeld_2018_polycrystalline_17743_f4b_1', 'echemdb']

The data can be returned as a pandas dataframe.

entry.df.head()
t E j
0 0.000000 -0.196962 0.043009
1 0.011368 -0.196393 0.051408
2 0.030365 -0.195443 0.058212
3 0.050365 -0.194443 0.062875
4 0.055176 -0.194203 0.063810

The description of the fields (column names) including units and/or other information are included in the resource schema.

entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
            {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
            {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]}

The units of the dataframe can be rescaled.

rescaled_entry = entry.rescale({'t' : 'h', 'E': 'mV', 'j' : 'uA / cm2'})
rescaled_entry.df.head()
t E j
0 0.000000 -196.961730 4.300884
1 0.000003 -196.393321 5.140820
2 0.000008 -195.443463 5.821203
3 0.000014 -194.443463 6.287469
4 0.000015 -194.202944 6.381011

The units are updated in the package schema of the ‘echemdb’ resource.

rescaled_entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 'h'},
            {'name': 'E', 'type': 'number', 'unit': 'mV', 'reference': 'RHE'},
            {'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]}

The units of a specific field can be retrieved.

rescaled_entry.field_unit('E')
'mV'

Plotting

The data can be visualized in a plotly figure. Without providing the dimensions of the x any y labels specifically the first two columns are plotted against each other or you can specify the dimensions.

entry.plot()

The dimensions of the axis can be specified explicitly.

entry.plot(x_label='t', y_label='j')

A plot with rescaled axis is obtained by rescaling the entry first.

entry.rescale({'E':'mV', 'j':'uA / cm2'}).plot(x_label='t', y_label='j')

Bibliography

An entry can be associated with bibliography data. The bibliography must must be prvided as abibtex string nested in source.bib_data. The bibliography to all entries is stored as a pybtex database db.bibliography, which contains bibtex entries.

len(db.bibliography.entries)
44

Each entry in the echemdb databse can be cited.

entry.citation(backend='text') # other available backends: 'latex' or 'markdown'. 'text' is default.
'A. K. Engstfeld et al. Polycrystalline and single-crystal Cu electrodes: influence of experimental conditions on the electrochemical properties in alkaline media. Chem.-Eur. J, 24(67):17743–17755, 2018.'

Individual db.bibliography entries can be accessed with the citation key associated with a unitpackage entry.

bibtex_key = entry.source.citation_key
bibtex_key
'engstfeld_2018_polycrystalline_17743'
citation_entry = db.bibliography.entries[bibtex_key]
citation_entry
Entry('article',
  fields=[
    ('title', 'Polycrystalline and single-crystal Cu electrodes: influence of experimental conditions on the electrochemical properties in alkaline media'), 
    ('journal', 'Chem.-Eur. J'), 
    ('volume', '24'), 
    ('number', '67'), 
    ('pages', '17743--17755'), 
    ('year', '2018'), 
    ('abstract', 'Single and polycrystalline Cu electrodes serve as model systems for the study of the electroreduction of CO2, CO and nitrate, or for corrosion studies; even so, there are very few reports combining electrochemical measurements with structural characterization. Herein both the electrochemical properties of polycrystalline Cu and single crystal Cu(1 0 0) electrodes in alkaline solutions (0.1 m KOH and 0.1 m NaOH) are investigated. It is demonstrated that the pre-treatment of the electrodes plays a crucial role in determining their electrochemical properties. Scanning tunneling microscopy, X-ray photoelectron spectroscopy and cyclic voltammetry are performed on Cu(1 0 0) electrodes prepared under UHV conditions; it is shown that the electrochemical properties of these atomically well-defined electrodes are distinct from electrodes prepared by other methods. Also highlighted is the significant role of residual oxygen and electrolyte convection in influencing the electrochemical properties.')],
  persons=OrderedCaseInsensitiveDict([('author', [Person('Engstfeld, Albert K'), Person('Maagaard, Thomas'), Person('Horch, Sebastian'), Person('Chorkendorff, Ib'), Person('Stephens, Ifan EL')])]))

Individiual fields are accessible, such as year or title.

citation_entry.fields['year']
'2018'
citation_entry.fields['title']
'Polycrystalline and single-crystal Cu electrodes: influence of experimental conditions on the electrochemical properties in alkaline media'

The authors are accessible via persons. Read more in the pybtex documentation.

citation_entry.persons['author']
[Person('Engstfeld, Albert K'),
 Person('Maagaard, Thomas'),
 Person('Horch, Sebastian'),
 Person('Chorkendorff, Ib'),
 Person('Stephens, Ifan EL')]
citation_entry.persons['author'][0]
Person('Engstfeld, Albert K')
print(citation_entry.persons['author'][0])
Engstfeld, Albert K