unitpackage.loaders.baseloader
Loader for CSV and other delimiter-separated value files.
The BaseLoader reads files consisting of an optional header,
one or more column-name lines, and data rows with a consistent delimiter.
Delimiters and decimal separators are auto-detected when not specified
explicitly.
Device-specific loaders (e.g. EC-Lab, Gamry) can be selected via
BaseLoader.create() to handle non-standard header layouts.
See BaseLoader.known_loaders() for supported devices.
- class unitpackage.loaders.baseloader.BaseLoader(file, header_lines=None, column_header_lines=None, decimal=None, delimiter=None, candidate_delimiters=None)
Loads a CSV, where the first line must contain the column (field) names and the following lines comma separated values.
EXAMPLES:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.df a b 0 0 0 1 1 1
A list of column names:
>>> csv.column_header_names ['a', 'b']
More specific loaders can be selected via
create()(seeknown_loaders()for supported devices):>>> from io import StringIO >>> file = StringIO('''EC-Lab ASCII FILE ... Nb header lines : 6 ... ... Device metadata : some metadata ... ... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V ... 2\t0\t0.1\t0\t0 ... 2\t1\t1.4\t5\t1 ... ''') >>> csv = BaseLoader.create('eclab')(file) >>> csv.df mode time/s Ewe/V <I>/mA control/V 0 2 0 0.1 0 0 1 2 1 1.4 5 1
Candidate delimiters can be provided explicitly for autodetection.:
>>> from io import StringIO >>> file = StringIO('''a\tb ... 0\t0 ... 1\t1''') >>> csv = BaseLoader(file, candidate_delimiters=[';', '\t']) >>> csv.delimiter '\t'
- property column_header_lines
The number of lines containing the descriptive information of the data for each column.
EXAMPLES:
A file with a single column header line:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.column_header_lines 1
A file with a two column header lines:
>>> from io import StringIO >>> file = StringIO(r'''a,b ... x,y ... 0,0 ... 1,1''') >>> csv = BaseLoader(file, column_header_lines=2) >>> csv.column_header_lines 2
- property column_header_names
A list of column header names constructed from the lines containing the column head names.
EXAMPLES:
A file with a single column header line:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.column_header_names ['a', 'b']
For a file containing two or more column header lines, we create a single name for each column including the information from the following lines and separating those with a
/.:>>> from io import StringIO >>> file = StringIO(r'''T,v ... K,m/s ... 0,0 ... 1,1''') >>> csv = BaseLoader(file, column_header_lines=2) >>> csv.column_header_names ['T / K', 'v / m/s']
A file where header and data lines have a leading delimiter. The leading empty field is preserved and auto-labeled:
>>> from io import StringIO >>> file = StringIO(''',a,b ... ,0,0 ... ,1,1''') >>> csv = BaseLoader(file, delimiter=',') >>> csv.column_header_names ['unknown 1', 'a', 'b']
- property column_headers
The lines in the file containing the descriptive information of the data for each column.
EXAMPLES:
A file with a single column header line:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.column_headers.readlines() ['a,b\n']
A file with two column header lines, which is sometimes, for example, used for storing units to the values:
>>> from io import StringIO >>> file = StringIO(r'''T,v ... K,m/s ... 0,0 ... 1,1''') >>> csv = BaseLoader(file, column_header_lines=2) >>> csv.column_headers.readlines() ['T,v\n', 'K,m/s\n']
- classmethod create(device=None)
Calls a specific loader based on a given device.
EXAMPLES:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO('''EC-Lab ASCII FILE ... Nb header lines : 6 ... ... Device metadata : some metadata ... ... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V ... 2\t0\t0.1\t0\t0 ... 2\t1\t1.4\t5\t1 ... ''') >>> csv = BaseLoader.create('eclab')(file) >>> csv.df mode time/s Ewe/V <I>/mA control/V 0 2 0 0.1 0 0 1 2 1 1.4 5 1
An unknown device loader provides a list with supported Loaders:
>>> BaseLoader.create('unknown_device') Traceback (most recent call last): ... KeyError: "Device wth name 'unknown_device' is not in the list of supported Loaders (['eclab', 'gamry'])'."
- property data
A file like object with the data of the CSV without header lines.
EXAMPLES:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> type(csv.data) <class '_io.StringIO'> >>> from io import StringIO >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.data.readlines() ['0,0\n', '1,1']
- property decimal
The decimal separator in the floats in the CSV data.
EXAMPLES:
A standard CVS containing floats with a single header line:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO('''a,b ... 0.0,0.0 ... 1.0,1.0''') >>> csv = BaseLoader(file) >>> csv.decimal '.'
For CVS containing only integers we simply return None.:
>>> from io import StringIO >>> file = StringIO('''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.decimal '.'
A standard CVS containing integers with a single header line:
>>> from io import StringIO >>> file = StringIO('''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.decimal '.'
A standard CVS containing integers and floats with a single header line:
>>> from io import StringIO >>> file = StringIO('''a,b ... 0,0.0 ... 1,1.0''') >>> csv = BaseLoader(file) >>> csv.decimal '.'
A TSV containing floats with a single header line:
>>> from io import StringIO >>> file = StringIO('''a\tb ... 0\t0.0 ... 1\t1.0''') >>> csv = BaseLoader(file) >>> csv.decimal '.'
A TSV containing integers and floats using , as decimal separator with a single header line:
>>> from io import StringIO >>> file = StringIO('''a\tb ... 0\t0,0 ... 1\t1,0''') >>> csv = BaseLoader(file) >>> csv.decimal ','
Data rows containing both ‘.’ and ‘,’:
>>> from io import StringIO >>> file = StringIO('''a\tb\ttext ... 0.0\t0.0\ta,b ... 1.0\t1.0\tc,d''') >>> csv = BaseLoader(file) >>> csv.decimal '.'
Data rows containing both ‘.’ and ‘,’ in the values:
>>> from io import StringIO >>> file = StringIO('''a\tb\ttext ... 0.1\t0,0\ta,b ... 1.1\t1,0\tc,d''') >>> csv = BaseLoader(file) >>> csv.decimal Traceback (most recent call last): ... ValueError: Decimal separator could not be determined. Found both ',' and '.' in numeric values in a single data line.
Implementation in a specific device loader:
>>> from io import StringIO >>> file = StringIO('''EC-Lab ASCII FILE ... Nb header lines : 6 ... ... Device metadata : some metadata ... ... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V ... 2\t0\t0,1\t0\t0 ... 2\t1\t1,4\t5\t1 ... ''') >>> csv = BaseLoader.create('eclab')(file) >>> csv.decimal ','
- property delimiter
The delimiter in the CSV, which is extracted from the first two lines of the CSV data.
A CSV containing integers:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO('''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.delimiter ','
A CSV containing floats:
>>> from io import StringIO >>> file = StringIO('''a,b ... 0.0,0.0 ... 1.0,1.0''') >>> csv = BaseLoader(file) >>> csv.delimiter ','
A TSV containing floats with a single header line:
>>> from io import StringIO >>> file = StringIO('''a\tb ... 0\t0.0 ... 1\t1.0''') >>> csv = BaseLoader(file) >>> csv.delimiter '\t'
A TSV with three columns containing floats using , as decimal separator with a single header line:
>>> from io import StringIO >>> file = StringIO('''a\tb\tc ... 0,0\t0,0\t0,0 ... 1,1\t1,0\t0,0''') >>> csv = BaseLoader(file) >>> csv.delimiter '\t'
A TSV with two columns containing floats using , as decimal separator with a single header line:
>>> from io import StringIO >>> file = StringIO('''a\tb ... 0,0\t0,0 ... 1,1\t1,0''') >>> csv = BaseLoader(file) >>> csv.delimiter '\t'
A TSV containing integers and floats using , as decimal separator with a single header line:
>>> from io import StringIO >>> file = StringIO('''a\tb ... 0\t0,0 ... 1\t1,0 ... ''') >>> csv = BaseLoader(file) >>> csv.delimiter '\t'
A rather messy file:
>>> from io import StringIO >>> file = StringIO(('''t\tE\tj ... s\tV\tA/cm2 ... 0\t0\t0 ... 1\t1\t1 ... 2\t2\t2 ... ''')) >>> csv = BaseLoader(file) >>> csv.delimiter '\t'
Candidate delimiters are considered for sniffing even if the correct delimiter is not the first candidate:
>>> from io import StringIO >>> file = StringIO('''a\tb\n0\t0\n1\t1''') >>> csv = BaseLoader(file, candidate_delimiters=[';', '\t', ',']) >>> csv.delimiter '\t'
Inconsistent field counts between column headers and data rows are logged as warnings. Blank column names are auto-labeled so that extra data fields remain accessible in the dataframe.
- property df
A pandas dataframe of the data in the CSV.
EXAMPLES:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.df a b 0 0 0 1 1 1
A file with two column header lines, which is sometimes, for example, used for storing units to the values:
>>> from io import StringIO >>> file = StringIO(r'''a,b ... m,s ... 0,0 ... 1,1''') >>> csv = BaseLoader(file, column_header_lines=2) >>> csv.df a / m b / s 0 0 0 1 1 1
When the header has more fields than data rows, missing values are represented as
NaNin the dataframe and trailing blank names are auto-labeled:>>> import logging >>> logging.getLogger("loader").setLevel(logging.ERROR) >>> from io import StringIO >>> file = StringIO('''a,b, ... 1,2 ... 3,4''') >>> csv = BaseLoader(file, delimiter=',') >>> csv.column_header_names ['a', 'b', 'unknown 1'] >>> csv.df a b unknown 1 0 1 2 NaN 1 3 4 NaN
When data rows have more fields than the header, column names are auto-labeled so the extra data column stays available:
>>> from io import StringIO >>> file = StringIO('''a,b ... 1,2,3 ... 4,5,6''') >>> csv = BaseLoader(file, delimiter=',') >>> csv.column_header_names ['a', 'b', 'unknown 1'] >>> csv.df a b unknown 1 0 1 2 3 1 4 5 6
The rows and columns can both have more fields than the header:
>>> from io import StringIO >>> file = StringIO('''a,b,, ... 1,2,3 ... 4,5,6''') >>> csv = BaseLoader(file, delimiter=',') >>> csv.column_header_names ['a', 'b', 'unknown 1', 'unknown 2'] >>> csv.df a b unknown 1 unknown 2 0 1 2 3 NaN 1 4 5 6 NaN
A file where header and data have a leading delimiter, but some data rows have a value in that leading position:
>>> import logging >>> logging.getLogger("loader").setLevel(logging.ERROR) >>> from io import StringIO >>> file = StringIO(''',a,b ... ,0,0 ... X,1,1''') >>> csv = BaseLoader(file, delimiter=',') >>> csv.column_header_names ['unknown 1', 'a', 'b'] >>> csv.df unknown 1 a b 0 NaN 0 0 1 X 1 1
- property file
A file like object of the loaded file.
- EXAMPLES::
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> type(csv.file) <class '_io.StringIO'>
- property header
The header of the CSV (excluding column names).
EXAMPLES:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> type(csv.header) <class '_io.StringIO'>
EXAMPLES:
>>> from io import StringIO >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.header.readlines() []
- property header_lines
The number of header lines in a CSV excluding the line with the column names.
EXAMPLES:
Files for the base loader do not have a header:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.header_lines 0
Implementation in a specific device loader:
>>> file = StringIO('''EC-Lab ASCII FILE ... Nb header lines : 6 ... ... Device metadata : some metadata ... ... mode\ttime/s\tEwe/V\t<I>/mA\tcontrol/V ... 2\t0\t0,1\t0\t0 ... 2\t1\t1,4\t5\t1 ... ''') >>> csv = BaseLoader.create('eclab')(file) >>> csv.header_lines 5
- classmethod known_loaders()
A list of known loaders. Refer to the documentation for details on supported file types for the individual Loaders.
EXAMPLES:
>>> from unitpackage.loaders.baseloader import BaseLoader >>> BaseLoader.known_loaders() ['eclab', 'gamry']
- property metadata
A dict describing the structure of the loaded DSV file, including the dialect (delimiter, decimal separator), header content, and column header names.
EXAMPLES:
>>> from io import StringIO >>> from unitpackage.loaders.baseloader import BaseLoader >>> file = StringIO(r'''a,b ... 0,0 ... 1,1''') >>> csv = BaseLoader(file) >>> csv.metadata {'loader': 'BaseLoader', 'delimiter': ',', 'decimal': '.', 'headerLines': 0, 'columnHeaderLines': 1, 'header': '', 'columnHeaders': 'a,b\n', 'columnHeaderNames': ['a', 'b']}