Welcome to unitpackage’s documentation!

Binder DOI

Annotation of scientific data plays a crucial role in research data management workflows to ensure that the data is stored according to the FAIR principles. A simple CSV file recorded during an experiment usually does, for example, not provide any information on the units of the values within the CSV, nor does it provide information on what system has been investigated, or who performed the experiment. Such information can be stored in frictionless datapackages, which consist of a CSV (data) file which is annotated with a JSON file. The unitpackage module provides a Python library to interact with such datapackages which have a very specific structure. An example demonstrating the usage of a collection of datapackages along with the unitpackage Python library is found on echemdb.org. The website shows a collection of electrochemical data, stored following the echemdb’s metadata schema in the electrochemistry-data repository.

Examples

A collection of datapackages can be generated from local files or from a remote repository, such as echemdb.org. To illustrate the usage of unitpackage we collect the data to echemdb.org from the data repository, which is downloaded by default when the method from_remote() does not receive a url argument.

Note

For simplicity we denote the collection as db (database), even thought it is not a database in that sense.

from unitpackage.collection import Collection
db = Collection.from_remote()

A single entry can be retrieved with an identifiers available in the database

entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']

The metadata of the datapackage is available from entry.package.

The data related to an entry can be returned as a pandas dataframe.

entry.df.head()
t E j
0 0.000000 -0.196962 0.043009
1 0.011368 -0.196393 0.051408
2 0.030365 -0.195443 0.058212
3 0.050365 -0.194443 0.062875
4 0.055176 -0.194203 0.063810

The units of the columns can be retrieved.

entry.field_unit('j')
'A / m2'

The values in the dataframe can be changed to other compatible units.

rescaled_entry = entry.rescale({'E' : 'mV', 'j' : 'uA / m2'})
rescaled_entry.df.head()
t E j
0 0.000000 -196.961730 43008.842162
1 0.011368 -196.393321 51408.199892
2 0.030365 -195.443463 58212.028842
3 0.050365 -194.443463 62874.687137
4 0.055176 -194.202944 63810.108398

The data can be visualized in a plotly figure:

entry.plot('E', 'j')