Welcome to unitpackage’s documentation!
Annotation of scientific data plays a crucial role in research data management workflows to ensure that the data is stored according to the FAIR principles. A simple CSV file recorded during an experiment usually does, for example, not provide any information on the units of the values within the CSV, nor does it provide information on what system has been investigated, or who performed the experiment. Such information can be stored in frictionless datapackages, which consist of a CSV (data) file which is annotated with a JSON file.
The unitpackage
module provides a Python library to interact with such datapackages which have a very specific structure.
An example demonstrating the usage of a collection of datapackages along with the unitpackage
Python library is found on echemdb.org. The website shows a collection of electrochemical data, stored following the echemdb’s metadata schema in the electrochemistry-data repository.
Examples
A collection of datapackages can be generated from local files or from a remote repository, such as echemdb.org. To illustrate the usage of unitpackage
we collect the data to echemdb.org from the data repository, which is downloaded by default when the method from_remote()
does not receive a url argument.
Note
For simplicity we denote the collection as db
(database), even thought it is not a database in that sense.
from unitpackage.collection import Collection
db = Collection.from_remote()
A single entry can be retrieved with an identifiers available in the database
entry = db['engstfeld_2018_polycrystalline_17743_f4b_1']
The metadata of the datapackage is available from entry.package
.
The data related to an entry can be returned as a pandas dataframe.
entry.df.head()
t | E | j | |
---|---|---|---|
0 | 0.000000 | -0.196962 | 0.043009 |
1 | 0.011368 | -0.196393 | 0.051408 |
2 | 0.030365 | -0.195443 | 0.058212 |
3 | 0.050365 | -0.194443 | 0.062875 |
4 | 0.055176 | -0.194203 | 0.063810 |
The units of the columns can be retrieved.
entry.field_unit('j')
'A / m2'
The values in the dataframe can be changed to other compatible units.
rescaled_entry = entry.rescale({'E' : 'mV', 'j' : 'uA / m2'})
rescaled_entry.df.head()
t | E | j | |
---|---|---|---|
0 | 0.000000 | -196.961730 | 43008.842162 |
1 | 0.011368 | -196.393321 | 51408.199892 |
2 | 0.030365 | -195.443463 | 58212.028842 |
3 | 0.050365 | -194.443463 | 62874.687137 |
4 | 0.055176 | -194.202944 | 63810.108398 |
The data can be visualized in a plotly figure:
entry.plot('E', 'j')
Specific Collections
For certain datasets, unitpackage can be extended by additional modules. Such a module is the CVCollection
class which loads a collection of packages containing cyclic voltammograms which are stored according to the echemdb metadata schema. Such data is usually found in the field of electrochemistry as illustrated on echemdb.org.
from unitpackage.cv.cv_collection import CVCollection
db = CVCollection.from_remote()
db.describe()
{'number of references': 44,
'number of entries': 205,
'materials': {'Ag', 'Au', 'Cu', 'Pt', 'Ru'}}
Filtering the collection for entries having specific properties, e.g., containing Pt as working electrode material, returns a new collection.
db_filtered = db.filter(lambda entry: entry.get_electrode('WE').material == 'Pt')
db_filtered.describe()
{'number of references': 19, 'number of entries': 130, 'materials': {'Pt'}}
Note
The filtering method is also available to the base class Collection
.
Further Usage
Frictionless datapackages or unitpackges are perfectly machine readable making the underling data and metadata reusable in many ways.
The
unitpackage
API can be used to filter collections of similar data for certain properties, thus allowing for simple comparison of different data sets. For example, you could think of comparing local files recorded in the laboratory with data published in a repository.The content of datapackages can be included in other applications or the generation of a website. The latter has been demonstrated for electrochemical data on echemdb.org. The datapackages could also be published with the frictionless Livemark data presentation framework.
You can cite this project as described on our zenodo page.
Installation
This package is available on PiPY and can be installed with pip:
pip install unitpackage
The package is also available on conda-forge an can be installed with conda
conda install -c conda-forge unitpackage
or mamba
mamba install -c conda-forge unitpackage
See the installation instructions for further details.
License
The contents of this repository are licensed under the GNU General Public License v3.0 or, at your option, any later version.