Usage
The unitpackage
module allows interacting with collections and entries from specifically designed frictionless datapackages.
Collection
A collection can be generated from a remote or a local source.
To illustrate the usage of unitpackage
, we create a collection from the entries on echemdb.org:
from unitpackage.collection import Collection
db = Collection.from_remote()
Type db
to highlight the entries within the collection or show the number of entries in the collection with.
len(db)
205
You can iterate over these entries
next(iter(db))
Entry('alves_2011_electrochemistry_6010_f1a_solid')
The collection can be filtered for specific descriptors, wherby a new collection is created.
filtered_db = db.filter(lambda entry: entry.experimental.tags == ['BCV','HER'])
len(filtered_db)
3
Entry
Each entry consists of descriptors describing the data in the resource of the datapackage. Packages describing literature data can also contain a bibliography reference (see Bibliography). The entry also has additional methods for descriptor representation, data manipulation and data visualization.
Entries can be selected by their identifier from a collection. For our example database such identifiers can directly be inferred from echemdb.org for each entry.
entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry
Entry('engstfeld_2018_polycrystalline_17743_f4b_1')
Other approaches to create entries from CSV or pandas dataframes directly are described here.
Resource Metadata
The metadata associated with the resource is located in db.package.get_resource('echemdb').custom['metadata']
.
From an entry
such information can be retrieved by entry['name']
,
where name is the respective descriptor in the metadata descriptor. Alternatively you can write entry.name
where all spaces should be replaced by underscores.
entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry['source']['citation key']
'engstfeld_2018_polycrystalline_17743'
entry.source.citation_key
'engstfeld_2018_polycrystalline_17743'
entry.package
provides a full list of available descriptors.
Units and values
Entries containing both a unit and a value are returned as astropy units or quantities.
entry.figure_description.scan_rate
50.0 mV / s
The unit and value can be accessed separately
entry.figure_description.scan_rate.value
50.0
entry.figure_description.scan_rate.unit
'mV / s'
Data
The datapackage consists of two resources.
One resource is named according to the entry’s identifier. It describes the data in the CSV.
One resource is named “echemdb”. It contains the data as a pandas dataframe used by the unitpackage module (see Unitpackage Structure for more details.)
Note
The content of the CSV never changes unless it is explicitly overwritten.
Changes to the data with the unitpackage
module are only applied to the echemdb
resource.
entry.package.resource_names
['engstfeld_2018_polycrystalline_17743_f4b_1', 'echemdb']
The data can be returned as a pandas dataframe.
entry.df.head()
t | E | j | |
---|---|---|---|
0 | 0.000000 | -0.196962 | 0.043009 |
1 | 0.011368 | -0.196393 | 0.051408 |
2 | 0.030365 | -0.195443 | 0.058212 |
3 | 0.050365 | -0.194443 | 0.062875 |
4 | 0.055176 | -0.194203 | 0.063810 |
The description of the fields (column names) including units and/or other information are included in the resource schema.
entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]}
The units of the dataframe can be rescaled.
rescaled_entry = entry.rescale({'t' : 'h', 'E': 'mV', 'j' : 'uA / cm2'})
rescaled_entry.df.head()
t | E | j | |
---|---|---|---|
0 | 0.000000 | -196.961730 | 4.300884 |
1 | 0.000003 | -196.393321 | 5.140820 |
2 | 0.000008 | -195.443463 | 5.821203 |
3 | 0.000014 | -194.443463 | 6.287469 |
4 | 0.000015 | -194.202944 | 6.381011 |
The units are updated in the package schema of the ‘echemdb’ resource.
rescaled_entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'mV', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]}
The units of a specific field can be retrieved.
rescaled_entry.field_unit('E')
'mV'
Plotting
The data can be visualized in a plotly figure. Without providing the dimensions of the x any y labels specifically the first two columns are plotted against each other or you can specify the dimensions.
entry.plot()
The dimensions of the axis can be specified explicitly.
entry.plot(x_label='t', y_label='j')
A plot with rescaled axis is obtained by rescaling the entry first.
entry.rescale({'E':'mV', 'j':'uA / cm2'}).plot(x_label='t', y_label='j')
Bibliography
An entry can be associated with bibliography data. The bibliography must must be prvided as abibtex string nested in source.bib_data. The bibliography to all entries is stored as a pybtex database db.bibliography
,
which contains bibtex entries.
len(db.bibliography.entries)
44
Each entry in the echemdb databse can be cited.
entry.citation(backend='text') # other available backends: 'latex' or 'markdown'. 'text' is default.
'A. K. Engstfeld et al. Polycrystalline and single-crystal Cu electrodes: influence of experimental conditions on the electrochemical properties in alkaline media. Chem.-Eur. J, 24(67):17743–17755, 2018.'
Individual db.bibliography
entries can be accessed with the citation key associated with a unitpackage entry.
bibtex_key = entry.source.citation_key
bibtex_key
'engstfeld_2018_polycrystalline_17743'
citation_entry = db.bibliography.entries[bibtex_key]
citation_entry
Entry('article',
fields=[
('title', 'Polycrystalline and single-crystal Cu electrodes: influence of experimental conditions on the electrochemical properties in alkaline media'),
('journal', 'Chem.-Eur. J'),
('volume', '24'),
('number', '67'),
('pages', '17743--17755'),
('year', '2018'),
('abstract', 'Single and polycrystalline Cu electrodes serve as model systems for the study of the electroreduction of CO2, CO and nitrate, or for corrosion studies; even so, there are very few reports combining electrochemical measurements with structural characterization. Herein both the electrochemical properties of polycrystalline Cu and single crystal Cu(1 0 0) electrodes in alkaline solutions (0.1 m KOH and 0.1 m NaOH) are investigated. It is demonstrated that the pre-treatment of the electrodes plays a crucial role in determining their electrochemical properties. Scanning tunneling microscopy, X-ray photoelectron spectroscopy and cyclic voltammetry are performed on Cu(1 0 0) electrodes prepared under UHV conditions; it is shown that the electrochemical properties of these atomically well-defined electrodes are distinct from electrodes prepared by other methods. Also highlighted is the significant role of residual oxygen and electrolyte convection in influencing the electrochemical properties.')],
persons=OrderedCaseInsensitiveDict([('author', [Person('Engstfeld, Albert K'), Person('Maagaard, Thomas'), Person('Horch, Sebastian'), Person('Chorkendorff, Ib'), Person('Stephens, Ifan EL')])]))
Individiual fields
are accessible, such as year
or title
.
citation_entry.fields['year']
'2018'
citation_entry.fields['title']
'Polycrystalline and single-crystal Cu electrodes: influence of experimental conditions on the electrochemical properties in alkaline media'
The authors are accessible via persons
. Read more in the pybtex documentation.
citation_entry.persons['author']
[Person('Engstfeld, Albert K'),
Person('Maagaard, Thomas'),
Person('Horch, Sebastian'),
Person('Chorkendorff, Ib'),
Person('Stephens, Ifan EL')]
citation_entry.persons['author'][0]
Person('Engstfeld, Albert K')
print(citation_entry.persons['author'][0])
Engstfeld, Albert K