unitpackage.entry

A Data Package describing tabulated data for which the units of the column names (pandas) or fields (frictionless) are known and the resource has additional metadata describing the underlying data.

A description of such datapackags can be found in the documentation in Unitpackage Structure.

Datapackages are the individual elements of a Collection and are denoted as entry.

EXAMPLES:

Metadata included in an entries resource is accessible as an attribute:

>>> entry = Entry.create_examples()[0]
>>> entry.source 
{'citation key': 'alves_2011_electrochemistry_6010',
'url': 'https://doi.org/10.1039/C0CP01001D',
'figure': '1a',
'curve': 'solid',
'bibdata': '@article{alves_2011_electrochemistry_6010,...}

The data of the resource can be called as a pandas dataframe:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Data Packages containing published data, also contain information on the source of the data.:

>>> from unitpackage.collection import Collection
>>> db = Collection.create_example()
>>> entry = db['alves_2011_electrochemistry_6010_f1a_solid']
>>> entry.bibliography  
Entry('article',
  fields=[
    ('title', 'Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases'),
    ('journal', 'Physical Chemistry Chemical Physics'),
    ('volume', '13'),
    ('number', '13'),
    ('pages', '6010--6021'),
    ('year', '2011'),
    ('publisher', 'Royal Society of Chemistry'),
    ('abstract', 'We investigated ...')],
  persons=OrderedCaseInsensitiveDict([('author', [Person('Alves, Otavio B'), Person('Hoster, Harry E'), Person('Behm, Rolf J{\\"u}rgen')])]))
class unitpackage.entry.Entry(package)

A frictionless data package describing tabulated data.

EXAMPLES:

Entries can be directly created:

>>> from unitpackage.local import collect_datapackage
>>> from unitpackage.entry import Entry
>>> entry = Entry(collect_datapackage('./examples/no_bibliography/no_bibliography.json'))
>>> entry
Entry('no_bibliography')

or more simply:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')

Entries can also be created by other means such as, a CSV Entry.from_csv or a pandas dataframe Entry.from_df.

Normally, entries are obtained by opening a Collection of entries:

>>> from unitpackage.collection import Collection
>>> collection = Collection.create_example()
>>> entry = next(iter(collection))
property bibliography

Return a pybtex bibliography object.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.bibliography 
Entry('article',
fields=[
    ('title', ...
    ...

>>> entry_no_bib = Entry.create_examples(name="no_bibliography")[0]
>>> entry_no_bib.bibliography
''
citation(backend='text')

Return a formatted reference for the entry’s bibliography such as:

  1. Doe, et al., Journal Name, volume (YEAR) page, “Title”

Rendering default is plain text ‘text’, but can be changed to any format supported by pybtex, such as markdown ‘md’, ‘latex’ or ‘html’.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.citation(backend='text')
'O. B. Alves et al. Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases. Physical Chemistry Chemical Physics, 13(13):6010–6021, 2011.'
>>> print(entry.citation(backend='md'))
O\. B\. Alves *et al\.*
*Electrochemistry at Ru\(0001\) in a flowing CO\-saturated electrolyte—reactive and inert adlayer phases*\.
*Physical Chemistry Chemical Physics*, 13\(13\):6010–6021, 2011\.
classmethod create_examples(name='')

Return some example entries for use in automated tests.

The examples are created from datapackages in the unitpackage’s examples directory. These are only available from the development environment.

EXAMPLES:

>>> Entry.create_examples()
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]

An entry without associated BIB file.

>>> Entry.create_examples(name="no_bibliography")
[Entry('no_bibliography')]
property df

Return the data of this entry as a data frame.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> entry.package.get_resource('echemdb').schema.fields 
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
field_unit(field_name)

Return the unit of the field_name of the echemdb resource.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.field_unit('E')
'V'
classmethod from_csv(csvname, metadata=None, fields=None)

Returns an entry constructed from a CSV with a single header line.

EXAMPLES:

Units describing the fields can be provided:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', fields=fields)
>>> entry
Entry('from_csv')

>>> entry.package 
{'resources': [{'name':
...

Metadata can be appended:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'
classmethod from_df(df, metadata=None, fields=None, outdir=None, *, basename)

Returns an entry constructed from a pandas dataframe.

EXAMPLES:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='test_df')
>>> entry
Entry('test_df')

Metadata and field descriptions can be added:

>>> import os
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'

Save the entry:

>>> entry.save(outdir='./test/generated/from_df')

TESTS

Verify that all fields are properly created even when they are not specified as fields:

>>> import os
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.package.get_resource('echemdb').schema.fields
[{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]
classmethod from_local(filename)

Return an entry from a :param filename:

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')
property identifier

Return a unique identifier for this entry, i.e., its basename.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.identifier
'alves_2011_electrochemistry_6010_f1a_solid'
plot(x_label=None, y_label=None, name=None)

Return a 2D plot of this entry.

The default plot is constructed from the first two columns of the dataframne.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.plot()
Figure(...)

The 2D plot can also be returned with custom axis units available in the resource:

>>> entry.plot(x_label='j', y_label='E')
Figure(...)
rename_fields(field_names, keep_original_name_as=None)

Returns a Entry with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as {'t':'t_rel', 'E':'E_we'}. The original field names can be kept in a new key.

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_examples()[0]
>>> renamed_entry = entry.rename_fields({'t': 't_rel'}, keep_original_name_as='originalName')
>>> renamed_entry.df
          t_rel         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

>>> renamed_entry.package.get_resource('echemdb').schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

TESTS:

Provide alternatives for non-existing fields:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName')
>>> renamed_entry.package.get_resource('echemdb').schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
rescale(units)

Returns a rescaled Entry with axes in the specified units. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.

EXAMPLES:

The units without any rescaling:

>>> entry = Entry.create_examples()[0]
>>> entry.package.get_resource('echemdb').schema.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

A rescaled entry using different units:

>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'})
>>> rescaled_entry.package.get_resource('echemdb').schema.fields
[{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]

The values in the data frame are scaled to match the new units:

>>> rescaled_entry.df
             t         E          j
0     0.000000 -0.103158 -99.827664
1     0.000006 -0.102158 -98.176205
...
save(*, outdir, basename=None)

Create a unitpackage, i.e., a CSV file and a JSON file, in the directory outdir.

EXAMPLES:

The output files are named identifier.csv and identifier.json using the identifier of the original resource:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> entry.save(outdir='./test/generated')
>>> basename = entry.identifier
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

When a basename is set, the files are named basename.csv and basename.json. Note that for a valid frictionless package this base name MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters’:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> basename = 'save_basename'
>>> entry.save(basename=basename, outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

TESTS:

Save entry with metadata containing datetime format, which is not natively supported by JSON.

>>> import os
>>> from datetime import datetime
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'save_datetime'
>>> entry = Entry.from_df(df=df, basename=basename, metadata={'current time':datetime.now()})
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True