`unitpackage.entry`

A frictionless tabular Resource describing tabulated data for which the units of the column names (pandas) or fields (frictionless) are known and the resource has additional metadata describing the underlying data.

A description of such resources can be found in the documentation in Unitpackage Structure.

Resources are the individual elements of a Collection and are denoted as entry.

EXAMPLES:

Metadata included in an entry is accessible as an attribute:

>>> entry = Entry.create_examples()[0]
>>> entry.source
{'citationKey': 'alves_2011_electrochemistry_6010',
'url': 'https://doi.org/10.1039/C0CP01001D',
'figure': '1a',
'curve': 'solid',
'bibdata': '@article{alves_2011_electrochemistry_6010,...}

The data of the entry can be called as a pandas dataframe:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Data Entries containing published data, also contain information on the source of the data.:

>>> from unitpackage.collection import Collection
>>> db = Collection.create_example()
>>> entry = db['alves_2011_electrochemistry_6010_f1a_solid']
>>> entry.bibliography
Entry('article',
  fields=[
    ('title', 'Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases'),
    ('journal', 'Physical Chemistry Chemical Physics'),
    ('volume', '13'),
    ('number', '13'),
    ('pages', '6010--6021'),
    ('year', '2011'),
    ('publisher', 'Royal Society of Chemistry'),
    ('abstract', 'We investigated ...')],
  persons={'author': [Person('Alves, Otavio B'), Person('Hoster, Harry E'), Person('Behm, Rolf J{\\"u}rgen')]})

class unitpackage.entry.Entry(resource)

A frictionless Resource describing tabulated data.

EXAMPLES:

Entries can be directly created from a frictionless Data Package containing a single frictionless Resource:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')

or directly form a frictionless Resource:

>>> from unitpackage.entry import Entry
>>> from frictionless import Resource
>>> entry = Entry(Resource('./examples/local/no_bibliography/no_bibliography.json'))
>>> entry
Entry('no_bibliography')

Entries can also be created by other means such as, a CSV Entry.from_csv or a pandas dataframe Entry.from_df.

Normally, entries are obtained by opening a Collection of entries:

>>> from unitpackage.collection import Collection
>>> collection = Collection.create_example()
>>> entry = next(iter(collection))

add_columns(df, new_fields)

Adds a column to the dataframe with specified field properties and returns an updated entry.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> import pandas as pd
>>> import astropy.units as u
>>> df = pd.DataFrame()
>>> df['P/A'] = entry.df['E'] * entry.df['j']
>>> new_field_unit = u.Unit(entry.field_unit('E')) * u.Unit(entry.field_unit('j'))
>>> new_entry = entry.add_columns(df['P/A'], new_fields=[{'name':'P/A', 'unit': new_field_unit}])
>>> new_entry.df
              t         E         j       P/A
0      0.000000 -0.103158 -0.998277  0.102981
1      0.020000 -0.102158 -0.981762  0.100295
...

>>> new_entry.field_unit('P/A')
Unit("A V / m2")

property bibliography

Return a pybtex bibliography object associated with this entry.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.bibliography
Entry('article',
fields=[
    ('title', ...
    ...

>>> entry_no_bib = Entry.create_examples(name="no_bibliography")[0]
>>> entry_no_bib.bibliography
''

citation(backend='text')

Return a formatted reference for the entry’s bibliography such as:

Doe, et al., Journal Name, volume (YEAR) page, “Title”

Rendering default is plain text ‘text’, but can be changed to any format supported by pybtex, such as markdown ‘md’, ‘latex’ or ‘html’.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.citation(backend='text')
'O. B. Alves et al. Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases. Physical Chemistry Chemical Physics, 13(13):6010–6021, 2011.'
>>> print(entry.citation(backend='md'))
O\. B\. Alves *et al\.*
*Electrochemistry at Ru\(0001\) in a flowing CO\-saturated electrolyte—reactive and inert adlayer phases*\.
*Physical Chemistry Chemical Physics*, 13\(13\):6010–6021, 2011\.

classmethod create_examples(name='')

Return some example entries for use in automated tests.

The examples are created from Data Packages in the unitpackage’s examples directory. These are only available from the development environment.

EXAMPLES:

>>> Entry.create_examples()
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]

An entry without associated BIB file.

>>> Entry.create_examples(name="no_bibliography")
[Entry('no_bibliography')]

property df

Return the data of this entry’s “MutableResource” as a data frame.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

The units and descriptions of the axes in the data frame can be recovered:

>>> entry.mutable_resource.schema.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

field_unit(field_name)

Return the unit of the field_name of the MutableResource resource.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.field_unit('E')
'V'

classmethod from_csv(csvname, metadata=None, fields=None)

Returns an entry constructed from a CSV with a single header line.

EXAMPLES:

Units describing the fields can be provided:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', fields=fields)
>>> entry
Entry('from_csv')

>>> entry.resource
{'name': 'from_csv',
...

Metadata can be appended:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'

Important

Upper case filenames are converted to lower case entry identifiers!

A filename containing upper case characters:

>>> import os
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> entry = Entry.from_csv(csvname='examples/from_csv/UpperCase.csv', fields=fields)
>>> entry
Entry('uppercase')

Casing in the filename is preserved in the metadata:

>>> entry.resource
{'name': 'uppercase',
'type': 'table',
'path': 'UpperCase.csv',
...

classmethod from_df(df, metadata=None, fields=None, outdir=None, *, basename)

Returns an entry constructed from a pandas dataframe.

EXAMPLES:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='test_df')
>>> entry
Entry('test_df')

Metadata and field descriptions can be added:

>>> import os
>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.user
'Max Doe'

Save the entry:

>>> entry.save(outdir='./test/generated/from_df')

Important

Basenames with upper case characters are stored with lower case characters! To separate words use underscores.

The basename will always be converted to lowercase entry identifiers:

>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> entry = Entry.from_df(df=df, basename='TEST_DF')
>>> entry
Entry('test_df')

TESTS:

Verify that all fields are properly created even when they are not specified as fields:

>>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}]
>>> metadata = {'user':'Max Doe'}
>>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields)
>>> entry.resource.schema.fields
[{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]

classmethod from_local(filename)

Return an entry from a :param filename containing a frictionless Data Package. The Data Package must contain a single resource.

Otherwise use collection.from_local_file to create a collection from all resources within.

EXAMPLES:

>>> from unitpackage.entry import Entry
>>> entry = Entry.from_local('./examples/local/no_bibliography/no_bibliography.json')
>>> entry
Entry('no_bibliography')

property identifier

Return a unique identifier for this entry, i.e., its basename.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.identifier
'alves_2011_electrochemistry_6010_f1a_solid'

property mutable_resource

Return the data of this entry’s “MutableResource” as a data frame.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.mutable_resource
{'name': 'memory',
'type': 'table',
'data': [],
'format': 'pandas',
'mediatype': 'application/pandas',
'schema': {'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
                    {'name': 'E',
                        'type': 'number',
                        'unit': 'V',
                        'reference': 'RHE'},
                    {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]}}

plot(x_label=None, y_label=None, name=None)

Return a 2D plot of this entry.

The default plot is constructed from the first two columns of the dataframe.

EXAMPLES:

>>> entry = Entry.create_examples()[0]
>>> entry.plot()
Figure(...)

The 2D plot can also be returned with custom axis units available in the resource:

>>> entry.plot(x_label='j', y_label='E')
Figure(...)

rename_fields(field_names, keep_original_name_as=None)

Returns a Entry with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as {'t':'t_rel', 'E':'E_we'}. The original field names can be kept in a new key.

EXAMPLES:

The original dataframe:

>>> from unitpackage.entry import Entry
>>> entry = Entry.create_examples()[0]
>>> entry.df
              t         E         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Dataframe with modified column names:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'E': 'E_we'}, keep_original_name_as='originalName')
>>> renamed_entry.df
          t_rel      E_we         j
0      0.000000 -0.103158 -0.998277
1      0.020000 -0.102158 -0.981762
...

Updated fields of the “MutableResource”:

>>> renamed_entry.mutable_resource.schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E_we', 'type': 'number', 'unit': 'V', 'reference': 'RHE', 'originalName': 'E'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

TESTS:

Provide alternatives for non-existing fields:

>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName')
>>> renamed_entry.mutable_resource.schema.fields
[{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

rescale(units)

Returns a rescaled Entry with axes in the specified units. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.

EXAMPLES:

The units without any rescaling:

>>> entry = Entry.create_examples()[0]
>>> entry.resource.schema.fields
[{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'A / m2'}]

A rescaled entry using different units:

>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'})
>>> rescaled_entry.mutable_resource.schema.fields
[{'name': 't', 'type': 'number', 'unit': 'h'},
{'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'},
{'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]

The values in the data frame are scaled to match the new units:

>>> rescaled_entry.df
             t         E          j
0     0.000000 -0.103158 -99.827664
1     0.000006 -0.102158 -98.176205
...

save(*, outdir, basename=None)

Create a Data Package, i.e., a CSV file and a JSON file, in the directory outdir.

EXAMPLES:

The output files are named identifier.csv and identifier.json using the identifier of the original resource:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> entry.save(outdir='./test/generated')
>>> basename = entry.identifier
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

When a basename is set, the files are named basename.csv and basename.json.

Note

For a valid frictionless Data Package the basename MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters’

A valid basename:

>>> import os
>>> entry = Entry.create_examples()[0]
>>> basename = 'save_basename'
>>> entry.save(basename=basename, outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

Upper case characters are saved lower case:

>>> import os
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'Upper_Case_Save'
>>> entry = Entry.from_df(df=df, basename=basename)
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename.lower()}.json') and os.path.exists(f'test/generated/{basename.lower()}.csv')
True

>>> new_entry = Entry.from_local(f'test/generated/{basename.lower()}.json')
>>> new_entry.resource
{'name': 'upper_case_save',
'type': 'table',
'path': 'upper_case_save.csv',
...

TESTS:

Save the entry as Data Package with metadata containing datetime format, which is not natively supported by JSON.:

>>> import os
>>> from datetime import datetime
>>> import pandas as pd
>>> from unitpackage.entry import Entry
>>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]})
>>> basename = 'save_datetime'
>>> entry = Entry.from_df(df=df, basename=basename, metadata={'currentTime':datetime.now()})
>>> entry.save(outdir='./test/generated')
>>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv')
True

unitpackage.entry

`unitpackage.entry`