`unitpackage.loaders.baseloader`

Loader for CSV and other delimiter-separated value files.

The BaseLoader reads files consisting of an optional header, one or more column-name lines, and data rows with a consistent delimiter. Delimiters and decimal separators are auto-detected when not specified explicitly.

Device-specific loaders (e.g. EC-Lab, Gamry) can be selected via BaseLoader.create() to handle non-standard header layouts. See BaseLoader.known_loaders() for supported devices.

class unitpackage.loaders.baseloader.BaseLoader(file, header_lines=None, column_header_lines=None, decimal=None, delimiter=None, candidate_delimiters=None)

Loads a CSV, where the first line must contain the column (field) names and the following lines comma separated values.

EXAMPLES:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.df
   a  b
0  0  0
1  1  1

A list of column names:

>>> csv.column_header_names
['a', 'b']

More specific loaders can be selected via create() (see known_loaders() for supported devices):

>>> from io import StringIO
>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0.1\t0\t0
... 2\t1\t1.4\t5\t1
... ''')
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.df
   mode  time/s  Ewe/V  <I>/mA  control/V
0     2       0    0.1       0          0
1     2       1    1.4       5          1

Candidate delimiters can be provided explicitly for autodetection.:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0
... 1\t1''')
>>> csv = BaseLoader(file, candidate_delimiters=[';', '\t'])
>>> csv.delimiter
'\t'

property column_header_lines

The number of lines containing the descriptive information of the data for each column.

EXAMPLES:

A file with a single column header line:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.column_header_lines
1

A file with a two column header lines:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... x,y
... 0,0
... 1,1''')
>>> csv = BaseLoader(file, column_header_lines=2)
>>> csv.column_header_lines
2

property column_header_names

A list of column header names constructed from the lines containing the column head names.

EXAMPLES:

A file with a single column header line:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.column_header_names
['a', 'b']

For a file containing two or more column header lines, we create a single name for each column including the information from the following lines and separating those with a /.:

>>> from io import StringIO
>>> file = StringIO(r'''T,v
... K,m/s
... 0,0
... 1,1''')
>>> csv = BaseLoader(file, column_header_lines=2)
>>> csv.column_header_names
['T / K', 'v / m/s']

A file where header and data lines have a leading delimiter. The leading empty field is preserved and auto-labeled:

>>> from io import StringIO
>>> file = StringIO(''',a,b
... ,0,0
... ,1,1''')
>>> csv = BaseLoader(file, delimiter=',')
>>> csv.column_header_names
['unknown 1', 'a', 'b']

property column_headers

The lines in the file containing the descriptive information of the data for each column.

EXAMPLES:

A file with a single column header line:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.column_headers.readlines()
['a,b\n']

A file with two column header lines, which is sometimes, for example, used for storing units to the values:

>>> from io import StringIO
>>> file = StringIO(r'''T,v
... K,m/s
... 0,0
... 1,1''')
>>> csv = BaseLoader(file, column_header_lines=2)
>>> csv.column_headers.readlines()
['T,v\n', 'K,m/s\n']

classmethod create(device=None)

Calls a specific loader based on a given device.

EXAMPLES:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0.1\t0\t0
... 2\t1\t1.4\t5\t1
... ''')
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.df
   mode  time/s  Ewe/V  <I>/mA  control/V
0     2       0    0.1       0          0
1     2       1    1.4       5          1

An unknown device loader provides a list with supported Loaders:

>>> BaseLoader.create('unknown_device')
Traceback (most recent call last):
...
KeyError: "Device wth name 'unknown_device' is not in the list of supported Loaders (['eclab', 'gamry'])'."

property data

A file like object with the data of the CSV without header lines.

EXAMPLES:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> type(csv.data)
<class '_io.StringIO'>

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.data.readlines()
['0,0\n', '1,1']

property decimal

The decimal separator in the floats in the CSV data.

EXAMPLES:

A standard CVS containing floats with a single header line:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO('''a,b
... 0.0,0.0
... 1.0,1.0''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

For CVS containing only integers we simply return None.:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A standard CVS containing integers with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A standard CVS containing integers and floats with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0,0.0
... 1,1.0''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A TSV containing floats with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0.0
... 1\t1.0''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

A TSV containing integers and floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0,0
... 1\t1,0''')
>>> csv = BaseLoader(file)
>>> csv.decimal
','

Data rows containing both ‘.’ and ‘,’:

>>> from io import StringIO
>>> file = StringIO('''a\tb\ttext
... 0.0\t0.0\ta,b
... 1.0\t1.0\tc,d''')
>>> csv = BaseLoader(file)
>>> csv.decimal
'.'

Data rows containing both ‘.’ and ‘,’ in the values:

>>> from io import StringIO
>>> file = StringIO('''a\tb\ttext
... 0.1\t0,0\ta,b
... 1.1\t1,0\tc,d''')
>>> csv = BaseLoader(file)
>>> csv.decimal
Traceback (most recent call last):
...
ValueError: Decimal separator could not be determined. Found both ',' and '.' in numeric values in a single data line.

Implementation in a specific device loader:

>>> from io import StringIO
>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0,1\t0\t0
... 2\t1\t1,4\t5\t1
... ''')
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.decimal
','

property delimiter

The delimiter in the CSV, which is extracted from the first two lines of the CSV data.

A CSV containing integers:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO('''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
','

A CSV containing floats:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 0.0,0.0
... 1.0,1.0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
','

A TSV containing floats with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0.0
... 1\t1.0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A TSV with three columns containing floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb\tc
... 0,0\t0,0\t0,0
... 1,1\t1,0\t0,0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A TSV with two columns containing floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0,0\t0,0
... 1,1\t1,0''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A TSV containing integers and floats using , as decimal separator with a single header line:

>>> from io import StringIO
>>> file = StringIO('''a\tb
... 0\t0,0
... 1\t1,0
... ''')
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

A rather messy file:

>>> from io import StringIO
>>> file = StringIO(('''t\tE\tj
... s\tV\tA/cm2
... 0\t0\t0
... 1\t1\t1
... 2\t2\t2
... '''))
>>> csv = BaseLoader(file)
>>> csv.delimiter
'\t'

Candidate delimiters are considered for sniffing even if the correct delimiter is not the first candidate:

>>> from io import StringIO
>>> file = StringIO('''a\tb\n0\t0\n1\t1''')
>>> csv = BaseLoader(file, candidate_delimiters=[';', '\t', ','])
>>> csv.delimiter
'\t'

Inconsistent field counts between column headers and data rows are logged as warnings. Blank column names are auto-labeled so that extra data fields remain accessible in the dataframe.

property df

A pandas dataframe of the data in the CSV.

EXAMPLES:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.df
   a  b
0  0  0
1  1  1

A file with two column header lines, which is sometimes, for example, used for storing units to the values:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... m,s
... 0,0
... 1,1''')
>>> csv = BaseLoader(file, column_header_lines=2)
>>> csv.df
   a / m  b / s
0      0      0
1      1      1

When the header has more fields than data rows, missing values are represented as NaN in the dataframe and trailing blank names are auto-labeled:

>>> import logging
>>> logging.getLogger("loader").setLevel(logging.ERROR)
>>> from io import StringIO
>>> file = StringIO('''a,b,
... 1,2
... 3,4''')
>>> csv = BaseLoader(file, delimiter=',')
>>> csv.column_header_names
['a', 'b', 'unknown 1']
>>> csv.df
   a  b  unknown 1
0  1  2        NaN
1  3  4        NaN

When data rows have more fields than the header, column names are auto-labeled so the extra data column stays available:

>>> from io import StringIO
>>> file = StringIO('''a,b
... 1,2,3
... 4,5,6''')
>>> csv = BaseLoader(file, delimiter=',')
>>> csv.column_header_names
['a', 'b', 'unknown 1']
>>> csv.df
   a  b  unknown 1
0  1  2          3
1  4  5          6

The rows and columns can both have more fields than the header:

>>> from io import StringIO
>>> file = StringIO('''a,b,,
... 1,2,3
... 4,5,6''')
>>> csv = BaseLoader(file, delimiter=',')
>>> csv.column_header_names
['a', 'b', 'unknown 1', 'unknown 2']
>>> csv.df
   a  b  unknown 1  unknown 2
0  1  2          3        NaN
1  4  5          6        NaN

A file where header and data have a leading delimiter, but some data rows have a value in that leading position:

>>> import logging
>>> logging.getLogger("loader").setLevel(logging.ERROR)
>>> from io import StringIO
>>> file = StringIO(''',a,b
... ,0,0
... X,1,1''')
>>> csv = BaseLoader(file, delimiter=',')
>>> csv.column_header_names
['unknown 1', 'a', 'b']
>>> csv.df
  unknown 1  a  b
0       NaN  0  0
1         X  1  1

property file

A file like object of the loaded file.

EXAMPLES::

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> type(csv.file)
<class '_io.StringIO'>

property header

The header of the CSV (excluding column names).

EXAMPLES:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> type(csv.header)
<class '_io.StringIO'>

EXAMPLES:

>>> from io import StringIO
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.header.readlines()
[]

property header_lines

The number of header lines in a CSV excluding the line with the column names.

EXAMPLES:

Files for the base loader do not have a header:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.header_lines
0

Implementation in a specific device loader:

>>> file = StringIO('''EC-Lab ASCII FILE
... Nb header lines : 6
...
... Device metadata : some metadata
...
... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V
... 2\t0\t0,1\t0\t0
... 2\t1\t1,4\t5\t1
... ''')
>>> csv = BaseLoader.create('eclab')(file)
>>> csv.header_lines
5

classmethod known_loaders()

A list of known loaders. Refer to the documentation for details on supported file types for the individual Loaders.

EXAMPLES:

>>> from unitpackage.loaders.baseloader import BaseLoader
>>> BaseLoader.known_loaders()
['eclab', 'gamry']

property metadata

A dict describing the structure of the loaded DSV file, including the dialect (delimiter, decimal separator), header content, and column header names.

EXAMPLES:

>>> from io import StringIO
>>> from unitpackage.loaders.baseloader import BaseLoader
>>> file = StringIO(r'''a,b
... 0,0
... 1,1''')
>>> csv = BaseLoader(file)
>>> csv.metadata
{'loader': 'BaseLoader',
'delimiter': ',',
'decimal': '.',
'headerLines': 0,
'columnHeaderLines': 1,
'header': '',
'columnHeaders': 'a,b\n',
'columnHeaderNames': ['a', 'b']}

unitpackage.loaders.baseloader

`unitpackage.loaders.baseloader`