unitpackage.collection

A collection of datapackages with units.

EXAMPLES:

Create a collection from local frictionless data packages in the data/ directory:

>>> collection = Collection.from_local('data/')

Create a collection from the data packages published in the on echemdb:

>>> collection = Collection.from_remote()  

Search the collection for entries from a single publication:

>>> collection.filter(lambda entry: entry.source.url == 'https://doi.org/10.1039/C0CP01001D')  
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), ...
class unitpackage.collection.Collection(data_packages=None)

A collection of [frictionless data packages](https://github.com/frictionlessdata/datapackage-py).

EXAMPLES:

An empty collection:

>>> collection = Collection([])
>>> len(collection)
0
class Entry(package)

A frictionless data package describing tabulated data.

EXAMPLES:

Entries can be directly created:

>>> from unitpackage.local import collect_datapackage
>>> from unitpackage.entry import Entry
>>> entry = Entry(collect_datapackage('./examples/no_bibliography/no_bibliography.json'))
>>> entry
Entry('no_bibliography')

or more simply:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')

Entries can also be created by other means such as, a CSV Entry.from_csv or a pandas dataframe Entry.from_df.

Normally, entries are obtained by opening a Collection of entries:

>>> from unitpackage.collection import Collection
>>> collection = Collection.create_example()
>>> entry = next(iter(collection))
property bibliography

Return a pybtex bibliography object.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.bibliography 
Entry('article',
fields=[
    ('title', ...
    ...

>>> entry_no_bib = Entry.create_examples(name="no_bibliography")[0]
>>> entry_no_bib.bibliography
''
citation(backend='text')

Return a formatted reference for the entry’s bibliography such as:

  1. Doe, et al., Journal Name, volume (YEAR) page, “Title”

Rendering default is plain text ‘text’, but can be changed to any format supported by pybtex, such as markdown ‘md’, ‘latex’ or ‘html’.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.citation(backend='text')
'O. B. Alves et al. Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases. Physical Chemistry Chemical Physics, 13(13):6010–6021, 2011.'
>>> print(entry.citation(backend='md'))
O\. B\. Alves *et al\.*
*Electrochemistry at Ru\(0001\) in a flowing CO\-saturated electrolyte—reactive and inert adlayer phases*\.
*Physical Chemistry Chemical Physics*, 13\(13\):6010–6021, 2011\.
classmethod create_examples(name='')

Return some example entries for use in automated tests.

The examples are created from datapackages in the unitpackage’s examples directory. These are only available from the development environment.

EXAMPLES:

>>> Entry.create_examples()
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]

An entry without associated BIB file.

>>> Entry.create_examples(name="no_bibliography")
[Entry('no_bibliography')]
property df

Return the data of this entry as a data frame.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> entry.package.get_resource('echemdb').schema.fields 
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
field_unit(field_name)

Return the unit of the field_name of the echemdb resource.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.field_unit('E')
'V'
classmethod from_csv(csvname, metadata=None, fields=None)

Returns an entry constructed from a CSV with a single header line.

EXAMPLES:

Units describing the fields can be provided:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', fields=fields)
>>> entry
Entry('from_csv')

>>> entry.package 
{'resources': [{'name':
...

Metadata can be appended:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'
classmethod from_df(df, metadata=None, fields=None, outdir=None, *, basename)

Returns an entry constructed from a pandas dataframe.

EXAMPLES:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='test_df')
>>> entry
Entry('test_df')

Metadata and field descriptions can be added:

>>> import os
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'

Save the entry:

>>> entry.save(outdir='./test/generated/from_df')

TESTS

Verify that all fields are properly created even when they are not specified as fields:

>>> import os
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.package.get_resource('echemdb').schema.fields
[{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]
classmethod from_local(filename)

Return an entry from a :param filename:

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')
property identifier

Return a unique identifier for this entry, i.e., its basename.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.identifier
'alves_2011_electrochemistry_6010_f1a_solid'
plot(x_label=None, y_label=None, name=None)

Return a 2D plot of this entry.

The default plot is constructed from the first two columns of the dataframne.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.plot()
Figure(...)

The 2D plot can also be returned with custom axis units available in the resource:

>>> entry.plot(x_label='j', y_label='E')
Figure(...)
rename_fields(field_names, keep_original_name_as=None)

Returns a Entry with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as {'t':'t_rel', 'E':'E_we'}. The original field names can be kept in a new key.

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_examples()[0]
>>> renamed_entry = entry.rename_fields({'t': 't_rel'}, keep_original_name_as='originalName')
>>> renamed_entry.df
          t_rel         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

>>> renamed_entry.package.get_resource('echemdb').schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

TESTS:

Provide alternatives for non-existing fields:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName')
>>> renamed_entry.package.get_resource('echemdb').schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
rescale(units)

Returns a rescaled Entry with axes in the specified units. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.

EXAMPLES:

The units without any rescaling:

>>> entry = Entry.create_examples()[0]
>>> entry.package.get_resource('echemdb').schema.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

A rescaled entry using different units:

>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'})
>>> rescaled_entry.package.get_resource('echemdb').schema.fields
[{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]

The values in the data frame are scaled to match the new units:

>>> rescaled_entry.df
             t         E          j
0     0.000000 -0.103158 -99.827664
1     0.000006 -0.102158 -98.176205
...
save(*, outdir, basename=None)

Create a unitpackage, i.e., a CSV file and a JSON file, in the directory outdir.

EXAMPLES:

The output files are named identifier.csv and identifier.json using the identifier of the original resource:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> entry.save(outdir='./test/generated')
>>> basename = entry.identifier
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

When a basename is set, the files are named basename.csv and basename.json. Note that for a valid frictionless package this base name MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters’:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> basename = 'save_basename'
>>> entry.save(basename=basename, outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

TESTS:

Save entry with metadata containing datetime format, which is not natively supported by JSON.

>>> import os
>>> from datetime import datetime
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'save_datetime'
>>> entry = Entry.from_df(df=df, basename=basename, metadata={'current time':datetime.now()})
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True
property bibliography

Return a pybtex database of all bibtex bibliography files.

EXAMPLES:

>>> collection = Collection.create_example()
>>> collection.bibliography
BibliographyData(
  entries=OrderedCaseInsensitiveDict([
    ('alves_2011_electrochemistry_6010', Entry('article',
    ...
    ('engstfeld_2018_polycrystalline_17743', Entry('article',
    ...

A collection with entries without bibliography.

>>> collection = Collection.create_example()["no_bibliography"]
>>> collection.bibliography
''
classmethod create_example()

Return a sample collection for use in automated tests.

EXAMPLES:

>>> Collection.create_example()  
[Entry('alves_2011_electrochemistry_6010_f1a_solid'),
Entry('engstfeld_2018_polycrystalline_17743_f4b_1'),
Entry('no_bibliography')]
filter(predicate)

Return the subset of the collection that satisfies predicate.

EXAMPLES:

>>> collection = Collection.create_example()
>>> collection.filter(lambda entry: entry.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid')]

The filter predicate can use properties that are not present on all entries in the collection. If a property is missing the element is removed from the collection:

>>> collection.filter(lambda entry: entry.non.existing.property)
[]
classmethod from_local(datadir)

Create a collection from local datapackages.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_local('./examples')
>>> collection  
[Entry('alves_2011_electrochemistry_6010_f1a_solid'),
Entry('engstfeld_2018_polycrystalline_17743_f4b_1'),
Entry('no_bibliography')]
classmethod from_remote(url=None, data=None, outdir=None)

Create a collection from a url containing a zip.

When no url is provided a collection is created from the data packages published on echemdb.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_remote()  

The folder containing the data in the zip can be specified with the :param data:. An output directory for the extracted data can be specified with the :param outdir:.

save_entries(outdir=None)

Save the entries of this collection as datapackages (CSV and JSON) to the output directory :param outdir:.

EXAMPLES:

>>> db = Collection.create_example()
>>> db.save_entries(outdir='test/generated/saved_collection')
>>> import glob
>>> glob.glob('test/generated/saved_collection/**.json') 
['test...