Loaders
unitpackage.loaders provides modules for loading non-standardized DSV (delimiter-separated value) files, e.g. TSV (tab-separated) or CSV (comma-separated), commonly created from software used to operate laboratory equipment.
Key issues of these files are, for example, lengthy header lines containing various metadata relevant to the recording software,
the use of , as a decimal separator in some world regions,
or files containing multiple data tables, etc.
unitpackage.loaders provides modules to parse non-standard files and load the data directly as a pandas Data Frame, which are used by unitpackage to create frictionless Data Packages (or unitpackages supporting the use of units).
The CLI allows conversion of data directly into such Data Packages for seamless integration in existing workflows.
Our approach aims at providing a single interface to load data into a certain format independent of the data source.
Filetypes supported and tested by unitpackage.loaders are:
Manufacturer |
Device type |
Software |
Filesuffix |
Loader |
device |
|---|---|---|---|---|---|
Biologic |
Potentiostat |
EClab |
mpt |
EClabLoader |
eclab |
Gamry |
Potentiostat |
Gamry Instruments Framework |
DTA |
GamryLoader |
gamry |
Todo
Improve table, such as including links.
Examples
Consider the following DSV, that consists of three parts:
the header, usually containing metadata relevant to the software and user predefined settings,
column header lines, containing acronyms (dimensions) and often units for the data in one or more rows,
the data part, where each column consists of identical data types.
# I am messy data
Random stuff
maybe metadata : 3
in different formats = abc123
hopefully, some information
on where the data part starts!
t E j
s V A/cm2
0 0 0
1 1 1
2 2 2
A pandas Data Frame can be created with limited input data.
The delimiter of the data part is evaluated using the clevercsv module (unless specified as an argument).
Multiple column headers will be flattened.
from unitpackage.loaders.baseloader import BaseLoader
csv = BaseLoader(file, header_lines=6,
column_header_lines=2,
delimiters=None,
decimal=None)
csv.df
| t / s | E / V | j / A/cm2 | |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 |
All parts of the file are accessible from the API for further use. For example the extraction of metadata from the header.
print(csv.header.read())
# I am messy data
Random stuff
maybe metadata : 3
in different formats = abc123
hopefully, some information
on where the data part starts!
print(csv.column_headers.read())
t E j
s V A/cm2
print(csv.data.read())
0 0 0
1 1 1
2 2 2
The data can also be converted into frictionless Data Packages using the CLI.
Note
The input and output files for and from the following commands can be found in the test folder of the repository.
The CLI only works for standard CSV without header and a single column header line, and specific converters summarized above.
A “standard” CSV
!unitpackage csv ../../test/loader_data/default.csv --outdir ../../test/generated/loader_data
A specific file type, including additional YAML metadata.
!unitpackage csv ../../test/loader_data/eclab_cv.mpt --device eclab --metadata ../../test/loader_data/eclab_cv.mpt.metadata --outdir ../../test/generated/loader_data/
Further usage
Use echemdbs’ unitpackage to browse, modify and visualize the Data Packages.
from unitpackage.collection import Collection
db = Collection.from_local('../../test/generated/loader_data/')
entry = db['eclab_cv']
entry
Entry('eclab_cv')