Load and Save

unitpackage Entries and Collections can be loaded from different sources and stored as datapackages (CSV and JSON) to a specified output directory.

Load collections

From local files

A local collection of datapackages can be created by collecting datapackages recursively, which are stored in a specific folder in the file system.

from unitpackage.collection import Collection

db = Collection.from_local("../files")
db
[Entry('demo_package'), Entry('demo_package_cv'), Entry('demo_package_metadata')]

From URL

A collection of datapackages can be created by collecting datapackages recursively from a url to a ZIP file. The data is extracted to a temporary directory.

Note

Without providing the argument url to the from_remote method, the data shown on echemdb.org will be downloaded from the echemdb data repository and stored as collection instead.

from unitpackage.collection import Collection

db = Collection.from_remote()
db
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('alves_2011_electrochemistry_6010_f2_red'), Entry('atkin_2009_afm_13266_f4a_solid'), Entry('atkin_2009_afm_13266_f4b_solid'), Entry('berger_2017_lithium_261_f1_black'), Entry('berger_2017_lithium_261_f1_inset'), Entry('berger_2017_lithium_261_f1_pink'), Entry('briega-martos_2021_cation_48_f1cs_black'), Entry('briega-martos_2021_cation_48_f1cs_blue'), Entry('briega-martos_2021_cation_48_f1cs_green'), Entry('briega-martos_2021_cation_48_f1cs_pink'), Entry('briega-martos_2021_cation_48_f1cs_red'), Entry('briega-martos_2021_cation_48_f1li_black'), Entry('briega-martos_2021_cation_48_f1li_blue'), Entry('briega-martos_2021_cation_48_f1li_green'), Entry('briega-martos_2021_cation_48_f1li_pink'), Entry('briega-martos_2021_cation_48_f1li_red'), Entry('briega-martos_2021_cation_48_f1na_black'), Entry('briega-martos_2021_cation_48_f1na_blue'), Entry('briega-martos_2021_cation_48_f1na_green'), Entry('briega-martos_2021_cation_48_f1na_pink'), Entry('briega-martos_2021_cation_48_f1na_red'), Entry('briega_martos_2018_understanding_j3045_f1a_black'), Entry('briega_martos_2018_understanding_j3045_f1a_blue'), Entry('briega_martos_2018_understanding_j3045_f1a_green'), Entry('briega_martos_2018_understanding_j3045_f1a_red'), Entry('briega_martos_2018_understanding_j3045_f1b_black'), Entry('briega_martos_2018_understanding_j3045_f1b_blue'), Entry('briega_martos_2018_understanding_j3045_f1b_green'), Entry('briega_martos_2018_understanding_j3045_f1b_red'), Entry('briega_martos_2018_understanding_j3045_f1c_black'), Entry('briega_martos_2018_understanding_j3045_f1c_blue'), Entry('briega_martos_2018_understanding_j3045_f1c_green'), Entry('briega_martos_2018_understanding_j3045_f1c_red'), Entry('briega_martos_2018_understanding_j3045_f1d_black'), Entry('briega_martos_2018_understanding_j3045_f1d_blue'), Entry('briega_martos_2018_understanding_j3045_f1d_green'), Entry('briega_martos_2018_understanding_j3045_f1d_red'), Entry('briega_martos_2018_understanding_j3045_f1e_black'), Entry('briega_martos_2018_understanding_j3045_f1e_blue'), Entry('briega_martos_2018_understanding_j3045_f1e_green'), Entry('briega_martos_2018_understanding_j3045_f1e_red'), Entry('briega_martos_2018_understanding_j3045_f1f_black'), Entry('briega_martos_2018_understanding_j3045_f1f_blue'), Entry('briega_martos_2018_understanding_j3045_f1f_green'), Entry('briega_martos_2018_understanding_j3045_f1f_red'), Entry('briega_martos_2018_understanding_j3045_f1g_black'), Entry('briega_martos_2018_understanding_j3045_f1g_blue'), Entry('briega_martos_2018_understanding_j3045_f1g_green'), Entry('briega_martos_2018_understanding_j3045_f1g_red'), Entry('briega_martos_2018_understanding_j3045_f1h_black'), Entry('briega_martos_2018_understanding_j3045_f1h_blue'), Entry('briega_martos_2018_understanding_j3045_f1h_green'), Entry('briega_martos_2018_understanding_j3045_f1h_red'), Entry('clavilier_1980_preparation_205_f2_solid'), Entry('clavilier_1990_insitu_1_f1_solid'), Entry('clavilier_1990_insitu_1_f5_10-10-9'), Entry('clavilier_1990_insitu_1_f5_110'), Entry('clavilier_1990_insitu_1_f5_554'), Entry('clavilier_1990_insitu_1_f5_775'), Entry('domke_2003_determination_113_f2_dash-dot'), Entry('domke_2003_determination_113_f2_dash-dot-dot'), Entry('domke_2003_determination_113_f2_dashed'), Entry('domke_2003_determination_113_f2_dotted'), Entry('domke_2003_determination_113_f2_solid'), Entry('domke_2003_determination_113_f3b_dotted'), Entry('durand_1992_insitu_1977_f1a_solid'), Entry('endo_1999_in-situ_19_f1_a'), Entry('endo_1999_in-situ_19_f1_b'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('garcia_2011_enthalpic_501_f1a_black'), Entry('garcia_2011_enthalpic_501_f1a_green'), Entry('garcia_2011_enthalpic_501_f1a_red'), Entry('gasparotto_2009_in_11140_f1_dashed'), Entry('gomez-marin_2012_surface_558_f1_black'), Entry('gomez-marin_2012_surface_558_f1_blue'), Entry('gomez-marin_2012_surface_558_f1_red'), Entry('gomez-marin_2012_surface_558_f2_blue'), Entry('gomez-marin_2012_surface_558_f2_pink'), Entry('gomez-marin_2012_surface_558_f2_red'), Entry('gomez_1993_electrochemical_189_f1_solid'), Entry('gomez_2003_effect_228_f1_dotted'), Entry('gomez_2003_effect_228_f2_dotted'), Entry('gomez_2003_effect_228_f3_dotted'), Entry('hamad_2003_electrosorption_211_f1a_dotted'), Entry('hamad_2003_electrosorption_211_f1a_solid'), Entry('horswell_2004_a_10970_f1a_dashed'), Entry('horswell_2004_a_10970_f1a_dotted'), Entry('horswell_2004_a_10970_f1a_solid'), Entry('jerkiewicz_2009_effect_12309_f1a_solid'), Entry('jovic_1999_cyclic_247_f1_dashed'), Entry('jovic_1999_cyclic_247_f1_solid'), Entry('jovic_1999_cyclic_247_f4_solid'), Entry('jovic_1999_cyclic_247_f6_dashed'), Entry('jovic_1999_cyclic_247_f6_solid'), Entry('jovic_1999_cyclic_247_f7_dashed'), Entry('jovic_1999_cyclic_247_f7_solid'), Entry('kerner_2002_measurement_2055_f2a_solid'), Entry('kerner_2002_measurement_2055_f3a_thick'), Entry('kerner_2002_measurement_2055_f3a_thin'), Entry('kerner_2002_measurement_2055_f4a_thick'), Entry('kerner_2002_measurement_2055_f4a_thin'), Entry('kerner_2002_measurement_2055_f5a_thick'), Entry('kerner_2002_measurement_2055_f5a_thin'), Entry('kerner_2002_measurement_2055_f6a_1'), Entry('kerner_2002_measurement_2055_f6a_2'), Entry('kibler_2000_in-situ_73_f4a_black'), Entry('kibler_2000_in-situ_73_f5a_black'), Entry('kibler_2000_in-situ_73_f7a_black'), Entry('kibler_2000_in-situ_73_f8a_black'), Entry('li_2000_chronocoulometric_95_f1_dash-dotted'), Entry('li_2000_chronocoulometric_95_f1_dashed'), Entry('li_2000_chronocoulometric_95_f1_dotted'), Entry('li_2000_chronocoulometric_95_f1_solid'), Entry('lipkowski_1998_ionic_2875_f1a_br'), Entry('lipkowski_1998_ionic_2875_f1a_cl'), Entry('lipkowski_1998_ionic_2875_f1a_i'), Entry('lipkowski_1998_ionic_2875_f1a_so4'), Entry('mello_2018_bromide_18562_f1a_black'), Entry('mello_2018_bromide_18562_f1a_red'), Entry('mello_2018_bromide_18562_f1b_black'), Entry('mello_2018_bromide_18562_f1b_red'), Entry('mello_2018_bromide_18562_f1c_black'), Entry('mello_2018_bromide_18562_f1c_red'), Entry('mello_2018_bromide_18562_f1d_black'), Entry('mello_2018_bromide_18562_f1d_red'), Entry('mello_2018_bromide_18562_f1e_black'), Entry('mello_2018_bromide_18562_f1e_red'), Entry('mello_2018_bromide_18562_f1f_black'), Entry('mello_2018_bromide_18562_f1f_red'), Entry('mello_2018_bromide_18562_f1g_black'), Entry('mello_2018_bromide_18562_f1g_red'), Entry('mello_2018_bromide_18562_f1h_black'), Entry('mello_2018_bromide_18562_f1h_red'), Entry('mello_2018_bromide_18562_f1i_black'), Entry('mello_2018_bromide_18562_f1i_red'), Entry('nakamura_2011_structure_165433_f1a_blue'), Entry('nakamura_2011_structure_165433_f1a_green'), Entry('nakamura_2011_structure_165433_f1a_red'), Entry('nakamura_2011_structure_165433_f1b_blue'), Entry('nakamura_2011_structure_165433_f1b_red'), Entry('nakamura_2014_structural_22136_f1a_inset'), Entry('nishihara_1994_underpotential_75_f1a_solid'), Entry('nishihara_1994_underpotential_75_f1b_solid'), Entry('nishihara_1994_underpotential_75_f2a_solid'), Entry('nishihara_1994_underpotential_75_f2b_solid'), Entry('nishihara_1994_underpotential_75_f3a_solid'), Entry('nishihara_1994_underpotential_75_f3b_solid'), Entry('nishihara_1994_underpotential_75_f4a_solid'), Entry('nishihara_1994_underpotential_75_f4b_solid'), Entry('nishihara_1994_underpotential_75_f5a_solid'), Entry('nishihara_1994_underpotential_75_f5b_solid'), Entry('nishihara_1994_underpotential_75_f6a_solid'), Entry('nishihara_1994_underpotential_75_f6b_solid'), Entry('nishihara_1994_underpotential_75_f7a_solid'), Entry('nishihara_1994_underpotential_75_f7b_solid'), Entry('ocko_1997_halide_55_f6a_solid'), Entry('pajkossy_1996_impedance_209_f1a_dotted'), Entry('pajkossy_1996_impedance_209_f1a_solid'), Entry('pajkossy_1996_impedance_209_f1b_dotted'), Entry('pajkossy_1996_impedance_209_f1b_solid'), Entry('pajkossy_1996_impedance_209_f3a_a'), Entry('pajkossy_1996_impedance_209_f3a_b'), Entry('pajkossy_1996_impedance_209_f3a_dashed'), Entry('pajkossy_1996_impedance_209_f8a_a'), Entry('pajkossy_1996_impedance_209_f8a_b'), Entry('pajkossy_1996_impedance_209_f8a_dashed'), Entry('pajkossy_2001_double_3063_f2_inset'), Entry('pajkossy_2001_double_3063_f5a_solid'), Entry('pajkossy_2001_double_3063_f6a_solid'), Entry('rehim_1998_electrochemical_1103_f1_solid'), Entry('rudnev_2020_structural_501_f2c_1'), Entry('sandbeck_2019_dissolution_2997_f1a_solid_red'), Entry('sandbeck_2019_dissolution_2997_f1b_solid_blue'), Entry('sandbeck_2019_dissolution_2997_f1c_solid_green'), Entry('sato_2006_effect_725_f4a_red'), Entry('sato_2006_effect_725_f4b_red'), Entry('sato_2006_effect_725_f4c_red'), Entry('sato_2006_effect_725_f4d_red'), Entry('schnaidt_2017_a_4141_f2_solid'), Entry('schuett_2021_electrodeposition_20461_fs1_blue'), Entry('shi_1996_chloride_225_f1_dotted'), Entry('shi_1996_chloride_225_f1_solid'), Entry('shi_1996_chloride_225_f1a_dashed'), Entry('shi_1996_chloride_225_f1a_solid'), Entry('taguchi_2007_electrochemical_6023_f2a_solid'), Entry('taguchi_2007_electrochemical_6023_f2b_solid'), Entry('taguchi_2007_electrochemical_6023_f2c_solid'), Entry('taguchi_2007_electrochemical_6023_f2d_solid'), Entry('taguchi_2007_electrochemical_6023_f2e_solid'), Entry('taguchi_2007_electrochemical_6023_f2f_solid'), Entry('taguchi_2007_electrochemical_6023_f2g_solid'), Entry('taguchi_2007_electrochemical_6023_f2h_solid'), Entry('taguchi_2007_electrochemical_6023_f2i_solid'), Entry('wandlowski_1996_structural_10277_f1a_dashed'), Entry('wandlowski_1996_structural_10277_f1a_solid'), Entry('wang_1996_ordered_6672_f1_solid'), Entry('wang_1996_ordered_6672_f2a_solid'), Entry('wang_1996_ordered_6672_f2b_solid'), Entry('wang_1997_lateral_1_f1a_solid'), Entry('wen_2015_potential-dependent_6062_f1_black'), Entry('wen_2015_potential-dependent_6062_f1_blueinset'), Entry('wen_2015_potential-dependent_6062_fs1_black'), Entry('zei_1991_the_295_f3a_solid'), Entry('zei_1991_the_295_f3b_solid')]

Providing an output directory with the parameter outdir allows saving the packages in a specific output directory. A parameter data allows specifying the folder within the ZIP containing the datapackages.

from unitpackage.collection import Collection

db = Collection.from_remote(data='data', outdir='generated/from_url')

Load entries

An individual entry can be loaded from a local datapackage.

from unitpackage.entry import Entry

entry = Entry.from_local("../files/demo_package.json")
entry
Entry('demo_package')

Alternatively, entries can be created from CSV files or pandas DataFrames. Metadata and field descriptors can be added after creation using update_fields() and metadata.from_dict().

From a CSV file:

from unitpackage.entry import Entry

csv_entry = Entry.from_csv(csvname="../files/demo_package.csv")
csv_entry
Entry('demo_package')

For CSV files with more complex structures, additional arguments can be provided:

  • header_lines — number of header lines to skip before the data

  • column_header_lines — number of lines containing column headers (multiple lines are flattened and separated by /)

  • decimal — decimal separator (e.g., ',' for European-style numbers)

  • delimiters — column delimiter (auto-detected if not specified)

  • encoding — file encoding

For example, a CSV with multiple header lines:

csv_entry = Entry.from_csv(csvname='../../examples/from_csv/from_csv_multiple_headers.csv', column_header_lines=2)
csv_entry.fields
[{'name': 'E / V', 'type': 'integer'},
 {'name': 'j / A / cm2', 'type': 'integer'}]

For even more complex file formats from laboratory equipment, see the Loaders section.

From specific device file formats

Files from laboratory equipment (devices) often have complex structures with lengthy headers, non-standard delimiters, and instrument-specific column names. Entry.from_csv supports a device parameter that selects the appropriate loader for the file format.

For example, loading a BioLogic EC-Lab MPT file:

from unitpackage.entry import Entry

entry = Entry.from_csv(csvname='../../test/loader_data/eclab_cv.mpt', device='eclab')
entry
Entry('eclab_cv')

The loader automatically detects headers and delimiters. The resulting entry contains the raw column names from the instrument:

entry.fields
[{'name': 'mode', 'type': 'integer'},
 {'name': 'ox/red', 'type': 'integer'},
 {'name': 'error', 'type': 'integer'},
 {'name': 'control changes', 'type': 'integer'},
 {'name': 'counter inc.', 'type': 'integer'},
 {'name': 'time/s', 'type': 'number'},
 {'name': 'control/V', 'type': 'number'},
 {'name': 'Ewe/V', 'type': 'number'},
 {'name': '<I>/mA', 'type': 'number'},
 {'name': 'cycle number', 'type': 'number'},
 {'name': '(Q-Qo)/C', 'type': 'number'},
 {'name': 'I Range', 'type': 'integer'},
 {'name': 'P/W', 'type': 'number'}]

Information on the file structure is stored in the entry’s metadata under dsvDescription:

entry.metadata['dsvDescription']['loader']
'ECLabLoader'

Domain-specific loading

For submodules such as echemdb, convenience methods provide additional processing. EchemdbEntry.from_mpt loads an MPT file and then updates the fields with units, renames them to short standardized names, and keeps only the most relevant columns for electrochemistry:

from unitpackage.database.echemdb_entry import EchemdbEntry

entry = EchemdbEntry.from_mpt('../../test/loader_data/eclab_cv.mpt')
entry.df.head()
Fields with names ['Ri/Ohm', '-Im(Z)/Ohm', '-Im(Zce)/Ohm', '-Im(Zwe-ce)/Ohm', '(Q-Qo)/mA.h', '<Ece>/V', '<Ewe>/V', '|Ece|/V', '|Energy|/W.h', '|Ewe|/V', '|I|/A', '|Y|/Ohm-1', '|Z|/Ohm', '|Zce|/Ohm', '|Zwe-ce|/Ohm', 'Analog IN 1/V', 'Analog IN 2/V', 'Analog IN 3/V', 'Capacitance charge/µF', 'Capacitance discharge/µF', 'Capacity/mA.h', 'charge time/s', 'Conductivity/S.cm-1', 'control/mA', 'control/V/mA', 'Cp-2/µF-2', 'Cp/µF', 'Cs-2/µF-2', 'Cs/µF', 'cycle time/s', 'd(Q-Qo)/dE/mA.h/V', 'dI/dt/mA/s', 'discharge time/s', 'dQ/C', 'dq/mA.h', 'dQ/mA.h', 'Ece/V', 'Ecell/V', 'Efficiency/%', 'Energy charge/W.h', 'Energy discharge/W.h', 'Energy/W.h', 'Ewe-Ece/V', 'freq/Hz', 'half cycle', 'I/mA', 'Im(Y)/Ohm-1', 'Ns changes', 'Ns', 'NSD Ewe/%', 'NSD I/%', 'NSR Ewe/%', 'NSR I/%', 'Phase(Y)/deg', 'Phase(Z)/deg', 'Phase(Zce)/deg', 'Phase(Zwe-ce)/deg', 'Q charge/discharge/mA.h', 'Q charge/mA.h', 'Q charge/mA.h/g', 'Q discharge/mA.h', 'Q discharge/mA.h/g', 'R/Ohm', 'Rcmp/Ohm', 'Re(Y)/Ohm-1', 'Re(Z)/Ohm', 'Re(Zce)/Ohm', 'Re(Zwe-ce)/Ohm', 'step time/s', 'THD Ewe/%', 'THD I/%', 'x', 'z cycle'] were provided but do not appear in the field names of tabular resource ['mode', 'ox/red', 'error', 'control changes', 'counter inc.', 'time/s', 'control/V', 'Ewe/V', '<I>/mA', 'cycle number', '(Q-Qo)/C', 'I Range', 'P/W'].
t E I cycle
0 86.761598 0.849737 0.001722 1.0
1 86.772598 0.849149 -0.003851 1.0
2 86.792598 0.848137 -0.004390 1.0
3 86.812598 0.847135 -0.004741 1.0
4 86.832598 0.846144 -0.004914 1.0

The fields now have units, short names, and a reference to the original BioLogic column name:

entry.fields
[{'name': 't',
  'type': 'number',
  'description': 'Time.',
  'unit': 's',
  'dimension': 't',
  'originalName': 'time/s'},
 {'name': 'E',
  'type': 'number',
  'description': 'WE potential versus REF.',
  'unit': 'V',
  'dimension': 'E',
  'originalName': 'Ewe/V'},
 {'name': 'I',
  'type': 'number',
  'description': 'Average current over the potential step (calculated from I = '
                 'dQ/dt).',
  'unit': 'mA',
  'dimension': 'I',
  'originalName': '<I>/mA'},
 {'name': 'cycle',
  'type': 'number',
  'description': 'Cycle number.',
  'originalName': 'cycle number'}]

From a pandas DataFrame:

import pandas as pd
from unitpackage.entry import Entry

data = {'x': [1,2,3], 'v': [1,3,2]}
df = pd.DataFrame(data)

df_entry = Entry.from_df(df, basename='df_data')
df_entry
Entry('df_data')

For more details on adding metadata and field descriptions, see Creating Unitpackages.

Save entries

Entries can be saved as JSON and CSV in a specified folder either directly from a collection

from unitpackage.collection import Collection

db = Collection.from_local("../files")
db.save_entries(outdir="../generated/files")

or from a single entry.

from unitpackage.entry import Entry

entry = Entry.from_local("../files/demo_package.json")
entry.save(outdir="../generated/files/saved_entry")

The basename of the entry can be modified.

entry.save(basename=entry.identifier + "_r" , outdir="../generated/files/saved_entry")