Loaders

unitpackage.loaders provides modules for loading non-standardized DSV (delimiter-separated value) files, e.g. TSV (tab-separated) or CSV (comma-separated), commonly created from software used to operate laboratory equipment. Key issues of these files are, for example, lengthy header lines containing various metadata relevant to the recording software, the use of , as a decimal separator in some world regions, or files containing multiple data tables, etc.

unitpackage.loaders provides modules to parse non-standard files and load the data directly as a pandas Data Frame, which are used by unitpackage to create frictionless Data Packages (or unitpackages supporting the use of units). The CLI allows conversion of data directly into such Data Packages for seamless integration in existing workflows.

Our approach aims at providing a single interface to load data into a certain format independent of the data source. Filetypes supported and tested by unitpackage.loaders are:

Manufacturer	Device type	Software	Filesuffix	Loader	device
Biologic	Potentiostat	EClab	mpt	EClabLoader	eclab
Gamry	Potentiostat	Gamry Instruments Framework	DTA	GamryLoader	gamry

Todo

Improve table, such as including links.

Examples

Consider the following DSV, that consists of three parts:

the header, usually containing metadata relevant to the software and user predefined settings,
column header lines, containing acronyms (dimensions) and often units for the data in one or more rows,
the data part, where each column consists of identical data types.

# I am messy data
Random stuff
maybe metadata : 3
in different formats = abc123
hopefully, some information
on where the data part starts!
t	E	j
s	V	A/cm2
0	0	0
1	1	1
2	2	2

A pandas Data Frame can be created with limited input data. The delimiter of the data part is evaluated automatically (unless specified as an argument). Multiple column headers will be flattened.

from unitpackage.loaders.baseloader import BaseLoader
csv = BaseLoader(file, header_lines=6,
                 column_header_lines=2,
                 delimiters=None,
                 decimal=None)
csv.df

	t / s	E / V	j / A/cm2
0	0	0	0
1	1	1	1
2	2	2	2

All parts of the file are accessible from the API for further use. For example the extraction of metadata from the header.

print(csv.header.read())

# I am messy data
Random stuff
maybe metadata : 3
in different formats = abc123
hopefully, some information
on where the data part starts!

print(csv.column_headers.read())

t	E	j
s	V	A/cm2

print(csv.data.read())

0	0
1	1
2	2

CLI

The data can also be converted into frictionless Data Packages using the CLI.

Note

The input and output files for and from the following commands can be found in the test folder of the repository.

The CLI only works for standard CSV without header and a single column header line, and specific converters summarized above.

A “standard” CSV

!unitpackage csv ../../test/loader_data/default.csv --outdir ../../test/generated/loader_data