Usage
The unitpackage
module allows interacting with collections and entries from specifically designed frictionless datapackages.
Collection
A collection can be generated from a remote or a local source.
To illustrate the usage of unitpackage
, we create a collection from the entries on echemdb.org:
from unitpackage.collection import Collection
db = Collection.from_remote()
Type db
to highlight the entries within the collection or show the number of entries in the collection with.
len(db)
205
You can iterate over these entries
next(iter(db))
Entry('alves_2011_electrochemistry_6010_f1a_solid')
The collection can be filtered for specific descriptors, wherby a new collection is created.
filtered_db = db.filter(lambda entry: entry.experimental.tags == ['BCV','HER'])
len(filtered_db)
3
Entry
Each entry consists of descriptors describing the data in the resource of the datapackage. Packages describing literature data can also contain a bibliography reference (see Bibliography). The entry also has additional methods for descriptor representation, data manipulation and data visualization.
Entries can be selected by their identifier from a collection. For our example database such identifiers can directly be inferred from echemdb.org for each entry.
entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry
Entry('engstfeld_2018_polycrystalline_17743_f4b_1')
Other approaches to create entries from CSV or pandas dataframes directly are described here.
Resource Metadata
The metadata associated with the resource is located in db.package.get_resource('echemdb').custom['metadata']
.
From an entry
such information can be retrieved by entry['name']
,
where name is the respective descriptor in the metadata descriptor. Alternatively you can write entry.name
where all spaces should be replaced by underscores.
entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
entry['source']['citation key']
'engstfeld_2018_polycrystalline_17743'
entry.source.citation_key
'engstfeld_2018_polycrystalline_17743'
entry.package
provides a full list of available descriptors.
Units and values
Entries containing both a unit and a value are returned as astropy units or quantities.
entry.figure_description.scan_rate
50.0 mV / s
The unit and value can be accessed separately
entry.figure_description.scan_rate.value
50.0
entry.figure_description.scan_rate.unit
'mV / s'
Data
The datapackage consists of two resources.
One resource is named according to the entry’s identifier. It describes the data in the CSV.
One resource is named “echemdb”. It contains the data as a pandas dataframe used by the unitpackage module (see Unitpackage Structure for more details.)
Note
The content of the CSV never changes unless it is explicitly overwritten.
Changes to the data with the unitpackage
module are only applied to the echemdb
resource.
entry.package.resource_names
['engstfeld_2018_polycrystalline_17743_f4b_1', 'echemdb']
The data can be returned as a pandas dataframe.
entry.df.head()
t | E | j | |
---|---|---|---|
0 | 0.000000 | -0.196962 | 0.043009 |
1 | 0.011368 | -0.196393 | 0.051408 |
2 | 0.030365 | -0.195443 | 0.058212 |
3 | 0.050365 | -0.194443 | 0.062875 |
4 | 0.055176 | -0.194203 | 0.063810 |
The description of the fields (column names) including units and/or other information are included in the resource schema.
entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]}
The units of the dataframe can be rescaled.
rescaled_entry = entry.rescale({'t' : 'h', 'E': 'mV', 'j' : 'uA / cm2'})
rescaled_entry.df.head()
t | E | j | |
---|---|---|---|
0 | 0.000000 | -196.961730 | 4.300884 |
1 | 0.000003 | -196.393321 | 5.140820 |
2 | 0.000008 | -195.443463 | 5.821203 |
3 | 0.000014 | -194.443463 | 6.287469 |
4 | 0.000015 | -194.202944 | 6.381011 |
The units are updated in the package schema of the ‘echemdb’ resource.
rescaled_entry.package.get_resource('echemdb').schema
{'fields': [{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'mV', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]}
The units of a specific field can be retrieved.
rescaled_entry.field_unit('E')
'mV'
Plotting
The data can be visualized in a plotly figure. Without providing the dimensions of the x any y labels specifically the first two columns are plotted against each other or you can specify the dimensions.
entry.plot()