Welcome to echemdb-converters’s documentation!

echemdbconverters provides a modular API for loading non-standardized DSV (data-separated value) or CVS (comma-separated value) files, commonly created from software used to operate laboratory equipment. Key issues of these files are, for example, lengthy header lines containing various metadata relevant to the recording software, the use of , as a decimal separator in some regions of this world, or files containing multiple data tables, etc.

echemdbconverters provides a mean to load data directly as a pandas Data Frame and allows conversion of data via a CLI into frictionless Data Packages (or unitpackages supporting the use of units) for seamless integration in existing workflows.

Our approach aims at providing a single interface to load data into a certain format independent of the data source. Filetypes supported and tested by echemdbconverters are:

Manufacturer

Device type

Software

Filesuffix

Loader

device

Biologic

Potentiostat

EClab

mpt

EClabLoader

eclab

Gamry

Potentiostat

Gamry Instruments Framework

DTA

GamryLoader

gamry

Todo

Improve table, such as including links.

Examples

Consider the following DSV. It consists of three parts:

  • the header usually contains metadata relevant to the software and user predefined settings.

  • column header lines containing acronyms (dimensions) and often units for the data in one ore more rows

  • the data block, where each column consists of identical data types

Hide code cell source
from io import StringIO
file = StringIO('''# I am messy data
Random stuff
maybe metadata : 3
in different formats = abc123
hopefully, some information
on where the data block starts!
t\tE\tj
s\tV\tA/cm2
0\t0\t0
1\t1\t1
2\t2\t2
''')
from echemdbconverters.baseloader import BaseLoader
csv = BaseLoader(file, header_lines=6, column_header_lines=2)
file.seek(0)
print(csv.file.read())
# I am messy data
Random stuff
maybe metadata : 3
in different formats = abc123
hopefully, some information
on where the data block starts!
t	E	j
s	V	A/cm2
0	0	0
1	1	1
2	2	2

A pandas Data Frame can be created with limited input data. The delimiter of the data block is evaluated using the clevercsv module (unless specified). Multiple column headers will be flattened.

from echemdbconverters.baseloader import BaseLoader
csv = BaseLoader(file, header_lines=6,
                 column_header_lines=2,
                 delimiters=None,
                 decimal=None)
csv.df
t / s E / V j / A/cm2
0 0 0 0
1 1 1 1
2 2 2 2

All parts of the file are accessible from the API for further use. For example the extraction of metadata from the header.

print(csv.header.read())
# I am messy data
Random stuff
maybe metadata : 3
in different formats = abc123
hopefully, some information
on where the data block starts!
print(csv.column_headers.read())
t	E	j
s	V	A/cm2
print(csv.data.read())
0	0	0
1	1	1
2	2	2

The data can also be converted into frictionless Data Packages using the CLI.

Note

The input and output files for and from the following commands can be found in the test folder of the repository.

The CLI only works for standard CSV without header and a single column header line, and specific converters summarized above.

A “standard” CSV

!echemdbconverters csv ../test/data/default.csv --outdir ../test/generated
/home/runner/work/echemdb-converters/echemdb-converters/.pixi/envs/dev/lib/python3.12/site-packages/stringcase.py:247: SyntaxWarning: invalid escape sequence '\W'
  return re.sub("\W+", "", string)

A specific file type, including additional YAML metadata.

!echemdbconverters csv ../test/data/eclab_cv.mpt --device eclab --metadata ../test/data/eclab_cv.mpt.metadata --outdir ../test/generated

Further usage

Use echemdbs’ unitpackage to browse, modify and visualize the Data Packages.

from unitpackage.collection import Collection
db = Collection.from_local('../test/generated')
entry = db['eclab_cv']
entry
Entry('eclab_cv')

Installation

This package is available on PiPY and can be installed with pip:

pip install echemdbconverters

See the installation instructions for further details.

License

The contents of this repository are licensed under the GNU General Public License v3.0 or, at your option, any later version.