Welcome to RawToFigure

Welcome to RawToFigure#

This documentation describes echemdbs’ take on research data management (RDM) from raw data to publishable figures. We aim at providing a lightweight approach to annotating research data with metadata when raw data is created and create frictionless based datapackages (unitpackages) further usage. These unitpackages can be used to browse your data locally based on descriptors provided in the medadata, for comparison with published data, for integration in data processing workflows, or for creation of entries in electronic lab notebooks (ELN).

Example#

Consider you record the following data as a data.csv.

t U
0 0 101
1 1 102
2 2 105

From the data it is unclear which units the values have, nor can you infer which voltage U has been measured or if it has been applied to something. Such information is stored as additional metadata along with the csv automatically using autotag-metadata, a tool which observes a folder for file changes and writes the metadata from a template. For the above CSV the YAML could look as follows.

experimentalist: Max Doe
supervisor: John Mustermann
research question: Resistance of a resistor connected in series to a power supply.
figure description:
    fields:
      - name: t
        unit: s
      - name: U
        unit: mV
        description: Voltage across resistor 1.

There is no limitation on the amount of metadata stored along with your data as illustrated on the example of echemdbs’ metadata schema for electrochemical data.

The CSV and YAML can be used to create a unitpackage, a file standard which is based on frictionless datpackages. For our purpose we create unitpackages with echemdbconverters, providing a simple command line interface (CLI).

!echemdbconverters csv files/data/data.csv --metadata files/data/data.csv.meta.yaml --outdir files/data/generated

A collection of such datapackages can be loaded with the unitpackage API to browse, explore, modify or visualize the entries. Here we display the original data of the CSV above with different units.

from unitpackage.collection import Collection

db = Collection.from_local('files/data/generated')
entry = db['data']
entry.rescale({'t':'ms', 'U':'V'}).plot('t', 'U')

The metadata from the YAML is also directly accessible.

entry.research_question
'Resistance of a resistor connected in series to a power supply.'

Further usage#

The standardized unitpackages allow for further integration of research data in different projects. For example, a collection of electrochemical data extracted from the literature is shown on the echemdb website and are directly accessible with API introduced above. In principle this allows direct comparison between published and raw data.

We suggest that locally stored unitpackages are also useful to generate automatically entries in ELNs, which play an important role in RDM workflows.