`unitpackage.collection`

A collection of frictionless Resources that can be accessed and stored as: a [frictionless Data Package](https://github.com/frictionlessdata/datapackage-py).

EXAMPLES:

Create a collection from frictionless Resources stored within local frictionless Data Packages in the data/ directory:

>>> collection = Collection.from_local('data/')

Create a collection from the Data Packages published in the echemdb data repository, and that are displayed on the echemdb website.:

>>> collection = Collection.from_remote()

Search the collection for entries, for example, from a single publication providing its DOI:

>>> collection.filter(lambda entry: entry.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), ...

class unitpackage.collection.Collection(package=None)

A collection of frictionless Resources, that can be accessed and stored as a [frictionless Data Package](https://github.com/frictionlessdata/datapackage-py).

EXAMPLES:

An empty collection:

>>> collection = Collection([])
>>> collection
[]

An example collection (only available from the development environment):

>>> collection = Collection.create_example()
>>> collection.package.resource_names
['alves_2011_electrochemistry_6010_f1a_solid',
'engstfeld_2018_polycrystalline_17743_f4b_1',
'no_bibliography']

Collections must contain Resources with unique identifiers:

>>> db = Collection.from_local("./examples/duplicates")
Traceback (most recent call last):
...
ValueError: Collection contains duplicate entries: ['duplicate']

class Entry(resource)

A frictionless Resource describing tabulated data.

EXAMPLES:

Entries can be directly created from a frictionless Data Package containing a single frictionless Resource:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')

or directly form a frictionless Resource:

>>> from unitpackage.entry import Entry
>>> from frictionless import Resource
>>> entry = Entry(Resource('./examples/local/no_bibliography/no_bibliography.json'))
>>> entry
Entry('no_bibliography')

Entries can also be created by other means such as, a CSV Entry.from_csv or a pandas dataframe Entry.from_df.

Normally, entries are obtained by opening a Collection of entries:

>>> from unitpackage.collection import Collection
>>> collection = Collection.create_example()
>>> entry = next(iter(collection))

add_columns(df, new_fields)

Adds a column to the dataframe with specified field properties and returns an updated entry.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> import pandas as pd
>>> import astropy.units as u
>>> df = pd.DataFrame()
>>> df['P/A'] = entry.df['E'] * entry.df['j']
>>> new_field_unit = u.Unit(entry.field_unit('E')) * u.Unit(entry.field_unit('j'))
>>> new_entry = entry.add_columns(df['P/A'], new_fields=[{'name':'P/A', 'unit': new_field_unit}])
>>> new_entry.df
              t         E         j       P/A
0      0.000000 -0.103158 -0.998277  0.102981
1      0.020000 -0.102158 -0.981762  0.100295
...

>>> new_entry.field_unit('P/A')
Unit("A V / m2")

property bibliography

Return a pybtex bibliography object associated with this entry.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.bibliography
Entry('article',
fields=[
    ('title', ...
    ...

>>> entry_no_bib = Entry.create_examples(name="no_bibliography")[0]
>>> entry_no_bib.bibliography
''

citation(backend='text')

Return a formatted reference for the entry’s bibliography such as:

Doe, et al., Journal Name, volume (YEAR) page, “Title”

Rendering default is plain text ‘text’, but can be changed to any format supported by pybtex, such as markdown ‘md’, ‘latex’ or ‘html’.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.citation(backend='text')
'O. B. Alves et al. Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases. Physical Chemistry Chemical Physics, 13(13):6010–6021, 2011.'
>>> print(entry.citation(backend='md'))
O\. B\. Alves *et al\.*
*Electrochemistry at Ru\(0001\) in a flowing CO\-saturated electrolyte—reactive and inert adlayer phases*\.
*Physical Chemistry Chemical Physics*, 13\(13\):6010–6021, 2011\.

classmethod create_examples(name='')

Return some example entries for use in automated tests.

The examples are created from Data Packages in the unitpackage’s examples directory. These are only available from the development environment.

EXAMPLES:

>>> Entry.create_examples()
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]

An entry without associated BIB file.

>>> Entry.create_examples(name="no_bibliography")
[Entry('no_bibliography')]

property df

Return the data of this entry’s “MutableResource” as a data frame.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> entry.mutable_resource.schema.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

field_unit(field_name)

Return the unit of the field_name of the MutableResource resource.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.field_unit('E')
'V'

classmethod from_csv(csvname, metadata=None, fields=None)

Returns an entry constructed from a CSV with a single header line.

EXAMPLES:

Units describing the fields can be provided:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', fields=fields)
>>> entry
Entry('from_csv')

>>> entry.resource
{'name': 'from_csv',
...

Metadata can be appended:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'

Important

Upper case filenames are converted to lower case entry identifiers!

A filename containing upper case characters:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> entry = Entry.from_csv(csvname='examples/from_csv/UpperCase.csv', fields=fields)
>>> entry
Entry('uppercase')

Casing in the filename is preserved in the metadata:

>>> entry.resource
{'name': 'uppercase',
'type': 'table',
'path': 'UpperCase.csv',
...

classmethod from_df(df, metadata=None, fields=None, outdir=None, *, basename)

Returns an entry constructed from a pandas dataframe.

EXAMPLES:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='test_df')
>>> entry
Entry('test_df')

Metadata and field descriptions can be added:

>>> import os
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'

Save the entry:

>>> entry.save(outdir='./test/generated/from_df')

Important

Basenames with upper case characters are stored with lower case characters! To separate words use underscores.

The basename will always be converted to lowercase entry identifiers:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='TEST_DF')
>>> entry
Entry('test_df')

TESTS:

Verify that all fields are properly created even when they are not specified as fields:

>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.resource.schema.fields
[{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]

classmethod from_local(filename)

Return an entry from a :param filename containing a frictionless Data Package. The Data Package must contain a single resource.

Otherwise use collection.from_local_file to create a collection from all resources within.

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')

property identifier

Return a unique identifier for this entry, i.e., its basename.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.identifier
'alves_2011_electrochemistry_6010_f1a_solid'

property mutable_resource

Return the data of this entry’s “MutableResource” as a data frame.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.mutable_resource
{'name': 'memory',
'type': 'table',
'data': [],
'format': 'pandas',
'mediatype': 'application/pandas',
'schema': {'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
                    {'name': 'E',
                        'type': 'number',
                        'unit': 'V',
                        'reference': 'RHE'},
                    {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]}}

plot(x_label=None, y_label=None, name=None)

Return a 2D plot of this entry.

The default plot is constructed from the first two columns of the dataframe.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.plot()
Figure(...)

The 2D plot can also be returned with custom axis units available in the resource:

>>> entry.plot(x_label='j', y_label='E')
Figure(...)

rename_fields(field_names, keep_original_name_as=None)

Returns a Entry with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as {'t':'t_rel', 'E':'E_we'}. The original field names can be kept in a new key.

EXAMPLES:

The original dataframe:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Dataframe with modified column names:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'E': 'E_we'}, keep_original_name_as='originalName')
>>> renamed_entry.df
          t_rel      E_we         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Updated fields of the “MutableResource”:

>>> renamed_entry.mutable_resource.schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E_we', 'type': 'number', 'unit': 'V', 'reference': 'RHE', 'originalName': 'E'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

TESTS:

Provide alternatives for non-existing fields:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName')
>>> renamed_entry.mutable_resource.schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

rescale(units)

Returns a rescaled Entry with axes in the specified units. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.

EXAMPLES:

The units without any rescaling:

>>> entry = Entry.create_examples()[0]
>>> entry.resource.schema.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

A rescaled entry using different units:

>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'})
>>> rescaled_entry.mutable_resource.schema.fields
[{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]

The values in the data frame are scaled to match the new units:

>>> rescaled_entry.df
             t         E          j
0     0.000000 -0.103158 -99.827664
1     0.000006 -0.102158 -98.176205
...

save(*, outdir, basename=None)

Create a Data Package, i.e., a CSV file and a JSON file, in the directory outdir.

EXAMPLES:

The output files are named identifier.csv and identifier.json using the identifier of the original resource:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> entry.save(outdir='./test/generated')
>>> basename = entry.identifier
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

When a basename is set, the files are named basename.csv and basename.json.

Note

For a valid frictionless Data Package the basename MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters’

A valid basename:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> basename = 'save_basename'
>>> entry.save(basename=basename, outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

Upper case characters are saved lower case:

>>> import os
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'Upper_Case_Save'
>>> entry = Entry.from_df(df=df, basename=basename)
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename.lower()}.json') and os.path.exists(f'test/generated/{basename.lower()}.csv')
True

>>> new_entry = Entry.from_local(f'test/generated/{basename.lower()}.json')
>>> new_entry.resource
{'name': 'upper_case_save',
'type': 'table',
'path': 'upper_case_save.csv',
...

TESTS:

Save the entry as Data Package with metadata containing datetime format, which is not natively supported by JSON.:

>>> import os
>>> from datetime import datetime
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'save_datetime'
>>> entry = Entry.from_df(df=df, basename=basename, metadata={'currentTime':datetime.now()})
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

property bibliography

Return a pybtex database of all bibtex bibliography files, associated with the entries.

EXAMPLES:

>>> collection = Collection.create_example()
>>> collection.bibliography
BibliographyData(
  entries=OrderedCaseInsensitiveDict([
    ('alves_2011_electrochemistry_6010', Entry('article',
    ...
    ('engstfeld_2018_polycrystalline_17743', Entry('article',
    ...

A derived collection includes only the bibliographic entries of the remaining entries:

>>> collection.filter(lambda entry: entry.source.citationKey != 'alves_2011_electrochemistry_6010').bibliography
BibliographyData(
  entries=OrderedCaseInsensitiveDict([
    ('engstfeld_2018_polycrystalline_17743', Entry('article',
    ...

A collection with entries without bibliography:

>>> collection = Collection.create_example()["no_bibliography"]
>>> collection.bibliography
''

classmethod create_example()

Return a sample collection for use in automated tests (only accessible from the development environment).

EXAMPLES:

>>> Collection.create_example()
[Entry('alves_2011_electrochemistry_6010_f1a_solid'),
Entry('engstfeld_2018_polycrystalline_17743_f4b_1'),
Entry('no_bibliography')]

filter(predicate)

Return the subset of the collection that satisfies predicate.

EXAMPLES:

>>> collection = Collection.create_example()
>>> collection.filter(lambda entry: entry.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid')]

The filter predicate can use properties that are not present on all entries in the collection. If a property is missing the element is removed from the collection:

>>> collection.filter(lambda entry: entry.non.existing.property)
[]

classmethod from_local(datadir)

Create a collection from local Data Packages.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_local('./examples/local/')
>>> collection
[Entry('alves_2011_electrochemistry_6010_f1a_solid'),
Entry('engstfeld_2018_polycrystalline_17743_f4b_1'),
Entry('no_bibliography')]

classmethod from_local_file(filename)

Create a collection from a local Data Package.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_local_file('./examples/local/engstfeld_2018_polycrystalline_17743/engstfeld_2018_polycrystalline_17743_f4b_1.json')
>>> collection
[Entry('engstfeld_2018_polycrystalline_17743_f4b_1')]

classmethod from_remote(url=None, data=None, outdir=None)

Create a collection from a url containing a zip.

When no url is provided a collection is created from the Data Packages published on the echemdb data repository displayed on the echemdb website.

EXAMPLES:

>>> from unitpackage.collection import Collection
>>> collection = Collection.from_remote()
>>> collection.filter(lambda entry: entry.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('alves_2011_electrochemistry_6010_f2_red')]

The folder containing the data in the zip can be specified with the :param data:. An output directory for the extracted data can be specified with the :param outdir:.

property identifiers

Return a list of identifiers of the collection, i.e., the names of the resources in the datapackage.

This method is basically equivalent to package.resource_names.

EXAMPLES:

>>> collection = Collection.create_example()
>>> len(collection.identifiers)
3

save_entries(outdir=None)

Save the entries of this collection as Data Packages (CSV and JSON) to the output directory :param outdir:.

EXAMPLES:

>>> db = Collection.create_example()
>>> db.save_entries(outdir='./test/generated/saved_collection')
>>> import glob
>>> glob.glob('test/generated/saved_collection/**.json')
['test...

unitpackage.collection

`unitpackage.collection`