echemdbconverters.baseloader

TODO:: Include module imports in each doctest. TODO:: Reword Loader for CSV files (https://datatracker.ietf.org/doc/html/rfc4180) which consist of a single header line containing the column (field) names and rows with comma separated values.

In pandas the names of the columns are referred to as column_names, whereas in a frictionless datapackage the column names are called fields. The datapackage contains information about, i.e., the type of data, a title and a set of descriptors.

The CSV object has the following properties:

TODO:: Add examples for the following functions
  • a DataFrame

  • the column names

  • the header contents

  • the number of header lines

Loaders for non standard CSV files can be called:

TODO:: Add example

class echemdbconverters.baseloader.BaseLoader(file, header_lines=None, column_header_lines=None, decimal=None, delimiters=None)

Loads a CSV, where the first line must contain the column (field) names and the following lines comma separated values.

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.df
   a  b
0  0  0
1  1  1

A list of column names:

>>> csv.column_header_names
['a', 'b']

TODO: Link to device list in the documentation. More specific loaders can be selected.:

>>> from io import StringIO
>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0.1\t0\t0
... 2\t1\t1.4\t5\t1
... ''')
>>> from echemdbconverters.baseloader import BaseLoader
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.df
   mode  time/s  Ewe/V  <I>/mA  control/V
0     2       0    0.1       0          0
1     2       1    1.4       5          1
property column_header_lines

The number of lines containing the descriptive information of the data for each column.

EXAMPLES:

A file with a single column header line:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.column_header_lines
1

A file with a two column header lines:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... x,y
... 0,0
... 1,1''')
>>> csv = BaseLoader(file, column_header_lines=2)
>>> csv.column_header_lines
2
property column_header_names

A list of column header names constructed from the lines containing the column head names.

EXAMPLES:

A file with a single column header line:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.column_header_names
['a', 'b']

For a file containing two or more column header lines, we create a single name for each column including the information from the following lines and separating those with a /.:

>>> from io import StringIO
>>> file = StringIO(r'''T,v
... K,m/s
... 0,0
... 1,1''')
>>> csv = BaseLoader(file, column_header_lines=2)
>>> csv.column_header_names
['T / K', 'v / m/s']
property column_headers

The lines in the file containing the descriptive information of the data for each column.

EXAMPLES:

A file with a single column header line:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.column_headers.readlines()
['a,b\n']

A file with two column header lines, which is sometimes, for example, used for storing units to the values:

>>> from io import StringIO
>>> file = StringIO(r'''T,v
... K,m/s
... 0,0
... 1,1''')
>>> csv = BaseLoader(file, column_header_lines=2)
>>> csv.column_headers.readlines()
['T,v\n', 'K,m/s\n']
static create(device=None)

Calls a specific loader based on a given device.

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0.1\t0\t0
... 2\t1\t1.4\t5\t1
... ''')
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.df
   mode  time/s  Ewe/V  <I>/mA  control/V
0     2       0    0.1       0          0
1     2       1    1.4       5          1
property data

A file like object with the data of the CSV without header lines.

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> type(csv.data)
<class '_io.StringIO'>

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.data.readlines()
['0,0\n', '1,1']
property decimal

The decimal separator in the floats in the CSV data.

EXAMPLES:

A standard CVS containing floats with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0.0,0.0
... 1.0,1.0''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

For CVS containing only integers we simply return None.:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A standard CVS containing integers with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A standard CVS containing integers and floats with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0,0.0
... 1,1.0''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A TSV containing floats with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0.0
... 1\t1.0''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A TSV containing integers and floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0\t0,0
... 1\t1,0''')
>>> csv = BaseLoader(file)
>>> # csv.data.readlines()[1].strip().split(csv.delimiter)
>>> csv.decimal
','

Data rows containing both ‘.’ and ‘,’:

>>> from io import StringIO
>>> file = StringIO('''a\tb\ttext
... 0.0\t0.0\ta,b
... 1.0\t1.0\tc,d''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

Data rows containing both ‘.’ and ‘,’ in the values:

>>> from io import StringIO
>>> file = StringIO('''a\tb\ttext
... 0.1\t0,0\ta,b
... 1.1\t1,0\tc,d''')
>>> csv = BaseLoader(file)
>>> csv.decimal
Traceback (most recent call last):
...
ValueError: Decimal separator could not be determined. Found both ',' and '.' in numeric values in a single data line.

Implementation in a specific device loader:

>>> from io import StringIO
>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0,1\t0\t0
... 2\t1\t1,4\t5\t1
... ''')
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.decimal
','
property delimiter

The delimiter in the CSV, which is extracted from the first two lines of the CSV data.

A CSV containing integers:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
','

A CSV containing floats:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0.0,0.0
... 1.0,1.0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
','

A TSV containing floats with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0.0
... 1\t1.0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A TSV with three columns containing floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb\tc
... 0,0\t0,0\t0,0
... 1,1\t1,0\t0,0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A TSV with two columns containing floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0,0\t0,0
... 1,1\t1,0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A TSV containing integers and floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0,0
... 1\t1,0
... ''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A rather messy file:

>>> from io import StringIO
>>> file = StringIO(('''# I am messy data
... Random stuff
... maybe metadata : 3
... in different formats = abc123
... hopefully, some information
... on where the data block starts!
... t\tE\tj
... s\tV\tA/cm2
... 0\t0\t0
... 1\t1\t1
... 2\t2\t2
... '''))
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'
property df

A pandas dataframe of the data in the CSV.

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.df
   a  b
0  0  0
1  1  1
property file

A file like object of the loaded file.

EXAMPLES::
>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> type(csv.file)
<class '_io.StringIO'>
property header

The header of the CSV (excluding column names).

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> type(csv.header)
<class '_io.StringIO'>

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.header.readlines()
[]
property header_lines

The number of header lines in a CSV excluding the line with the column names.

EXAMPLES:

Files for the base loader do not have a header:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.header_lines
0

Implementation in a specific device loader:

>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0,1\t0\t0
... 2\t1\t1,4\t5\t1
... ''')
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.header_lines
5
property metadata

A dict containing the metadata of the file found in its header.

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.metadata
Traceback (most recent call last):
...
NotImplementedError