unitpackage.collection
A collection of datapackages with units.
EXAMPLES:
Create a collection from local frictionless data packages in the data/ directory:
>>> collection = Collection.from_local('data/')
Create a collection from the data packages published in the on echemdb:
>>> collection = Collection.from_remote()
Search the collection for entries from a single publication:
>>> collection.filter(lambda entry: entry.source.url == 'https://doi.org/10.1039/C0CP01001D')
[Entry('alves_2011_electrochemistry_6010_f1a_solid'), ...
- class unitpackage.collection.Collection(data_packages=None)
A collection of [frictionless data packages](https://github.com/frictionlessdata/datapackage-py).
EXAMPLES:
An empty collection:
>>> collection = Collection([]) >>> len(collection) 0
- class Entry(package)
A frictionless data package describing tabulated data.
EXAMPLES:
Entries can be directly created:
>>> from unitpackage.local import collect_datapackage >>> from unitpackage.entry import Entry >>> entry = Entry(collect_datapackage('./examples/no_bibliography/no_bibliography.json')) >>> entry Entry('no_bibliography')
or more simply:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
Entries can also be created by other means such as, a CSV
Entry.from_csv
or a pandas dataframeEntry.from_df
.Normally, entries are obtained by opening a
Collection
of entries:>>> from unitpackage.collection import Collection >>> collection = Collection.create_example() >>> entry = next(iter(collection))
- property bibliography
Return a pybtex bibliography object.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.bibliography Entry('article', fields=[ ('title', ... ... >>> entry_no_bib = Entry.create_examples(name="no_bibliography")[0] >>> entry_no_bib.bibliography ''
- citation(backend='text')
Return a formatted reference for the entry’s bibliography such as:
Doe, et al., Journal Name, volume (YEAR) page, “Title”
Rendering default is plain text ‘text’, but can be changed to any format supported by pybtex, such as markdown ‘md’, ‘latex’ or ‘html’.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.citation(backend='text') 'O. B. Alves et al. Electrochemistry at Ru(0001) in a flowing CO-saturated electrolyte—reactive and inert adlayer phases. Physical Chemistry Chemical Physics, 13(13):6010–6021, 2011.' >>> print(entry.citation(backend='md')) O\. B\. Alves *et al\.* *Electrochemistry at Ru\(0001\) in a flowing CO\-saturated electrolyte—reactive and inert adlayer phases*\. *Physical Chemistry Chemical Physics*, 13\(13\):6010–6021, 2011\.
- classmethod create_examples(name='')
Return some example entries for use in automated tests.
The examples are created from datapackages in the unitpackage’s examples directory. These are only available from the development environment.
EXAMPLES:
>>> Entry.create_examples() [Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]
An entry without associated BIB file.
>>> Entry.create_examples(name="no_bibliography") [Entry('no_bibliography')]
- property df
Return the data of this entry as a data frame.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.df t E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ...
The units and descriptions of the axes in the data frame can be recovered:
>>> entry.package.get_resource('echemdb').schema.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- field_unit(field_name)
Return the unit of the
field_name
of theechemdb
resource.EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.field_unit('E') 'V'
- classmethod from_csv(csvname, metadata=None, fields=None)
Returns an entry constructed from a CSV with a single header line.
EXAMPLES:
Units describing the fields can be provided:
>>> import os >>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}] >>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', fields=fields) >>> entry Entry('from_csv') >>> entry.package {'resources': [{'name': ...
Metadata can be appended:
>>> import os >>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_csv(csvname='examples/from_csv/from_csv.csv', metadata=metadata, fields=fields) >>> entry.user 'Max Doe'
- classmethod from_df(df, metadata=None, fields=None, outdir=None, *, basename)
Returns an entry constructed from a pandas dataframe.
EXAMPLES:
>>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> entry = Entry.from_df(df=df, basename='test_df') >>> entry Entry('test_df')
Metadata and field descriptions can be added:
>>> import os >>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields) >>> entry.user 'Max Doe'
Save the entry:
>>> entry.save(outdir='./test/generated/from_df')
TESTS
Verify that all fields are properly created even when they are not specified as fields:
>>> import os >>> fields = [{'name':'x', 'unit': 'm'}, {'name':'P', 'unit': 'um'}, {'name':'E', 'unit': 'V'}] >>> metadata = {'user':'Max Doe'} >>> entry = Entry.from_df(df=df, basename='test_df', metadata=metadata, fields=fields) >>> entry.package.get_resource('echemdb').schema.fields [{'name': 'x', 'type': 'integer', 'unit': 'm'}, {'name': 'y', 'type': 'integer'}]
- classmethod from_local(filename)
Return an entry from a :param filename:
EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.from_local('./examples/no_bibliography/no_bibliography.json') >>> entry Entry('no_bibliography')
- property identifier
Return a unique identifier for this entry, i.e., its basename.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.identifier 'alves_2011_electrochemistry_6010_f1a_solid'
- plot(x_label=None, y_label=None, name=None)
Return a 2D plot of this entry.
The default plot is constructed from the first two columns of the dataframne.
EXAMPLES:
>>> entry = Entry.create_examples()[0] >>> entry.plot() Figure(...)
The 2D plot can also be returned with custom axis units available in the resource:
>>> entry.plot(x_label='j', y_label='E') Figure(...)
- rename_fields(field_names, keep_original_name_as=None)
Returns a
Entry
with updated field names and dataframe column names. Provide a dict, where the key is the previous field name and the value the new name, such as{'t':'t_rel', 'E':'E_we'}
. The original field names can be kept in a new key.EXAMPLES:
>>> from unitpackage.entry import Entry >>> entry = Entry.create_examples()[0] >>> renamed_entry = entry.rename_fields({'t': 't_rel'}, keep_original_name_as='originalName') >>> renamed_entry.df t_rel E j 0 0.000000 -0.103158 -0.998277 1 0.020000 -0.102158 -0.981762 ... >>> renamed_entry.package.get_resource('echemdb').schema.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
TESTS:
Provide alternatives for non-existing fields:
>>> renamed_entry = entry.rename_fields({'t': 't_rel', 'x':'y'}, keep_original_name_as='originalName') >>> renamed_entry.package.get_resource('echemdb').schema.fields [{'name': 't_rel', 'type': 'number', 'unit': 's', 'originalName': 't'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
- rescale(units)
Returns a rescaled
Entry
with axes in the specifiedunits
. Provide a dict, where the key is the axis name and the value the new unit, such as {‘j’: ‘uA / cm2’, ‘t’: ‘h’}.EXAMPLES:
The units without any rescaling:
>>> entry = Entry.create_examples()[0] >>> entry.package.get_resource('echemdb').schema.fields [{'name': 't', 'type': 'number', 'unit': 's'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'A / m2'}]
A rescaled entry using different units:
>>> rescaled_entry = entry.rescale({'j':'uA / cm2', 't':'h'}) >>> rescaled_entry.package.get_resource('echemdb').schema.fields [{'name': 't', 'type': 'number', 'unit': 'h'}, {'name': 'E', 'type': 'number', 'unit': 'V', 'reference': 'RHE'}, {'name': 'j', 'type': 'number', 'unit': 'uA / cm2'}]
The values in the data frame are scaled to match the new units:
>>> rescaled_entry.df t E j 0 0.000000 -0.103158 -99.827664 1 0.000006 -0.102158 -98.176205 ...
- save(*, outdir, basename=None)
Create a unitpackage, i.e., a CSV file and a JSON file, in the directory
outdir
.EXAMPLES:
The output files are named
identifier.csv
andidentifier.json
using the identifier of the original resource:>>> import os >>> entry = Entry.create_examples()[0] >>> entry.save(outdir='./test/generated') >>> basename = entry.identifier >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
When a
basename
is set, the files are namedbasename.csv
andbasename.json
. Note that for a valid frictionless package this base name MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters’:>>> import os >>> entry = Entry.create_examples()[0] >>> basename = 'save_basename' >>> entry.save(basename=basename, outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
TESTS:
Save entry with metadata containing datetime format, which is not natively supported by JSON.
>>> import os >>> from datetime import datetime >>> import pandas as pd >>> from unitpackage.entry import Entry >>> df = pd.DataFrame({'x':[1,2,3], 'y':[2,3,4]}) >>> basename = 'save_datetime' >>> entry = Entry.from_df(df=df, basename=basename, metadata={'current time':datetime.now()}) >>> entry.save(outdir='./test/generated') >>> os.path.exists(f'test/generated/{basename}.json') and os.path.exists(f'test/generated/{basename}.csv') True
- property bibliography
Return a pybtex database of all bibtex bibliography files.
EXAMPLES:
>>> collection = Collection.create_example() >>> collection.bibliography BibliographyData( entries=OrderedCaseInsensitiveDict([ ('alves_2011_electrochemistry_6010', Entry('article', ... ('engstfeld_2018_polycrystalline_17743', Entry('article', ...
A collection with entries without bibliography.
>>> collection = Collection.create_example()["no_bibliography"] >>> collection.bibliography ''
- classmethod create_example()
Return a sample collection for use in automated tests.
EXAMPLES:
>>> Collection.create_example() [Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]
- filter(predicate)
Return the subset of the collection that satisfies predicate.
EXAMPLES:
>>> collection = Collection.create_example() >>> collection.filter(lambda entry: entry.source.url == 'https://doi.org/10.1039/C0CP01001D') [Entry('alves_2011_electrochemistry_6010_f1a_solid')]
The filter predicate can use properties that are not present on all entries in the collection. If a property is missing the element is removed from the collection:
>>> collection.filter(lambda entry: entry.non.existing.property) []
- classmethod from_local(datadir)
Create a collection from local datapackages.
EXAMPLES:
>>> from unitpackage.collection import Collection >>> collection = Collection.from_local('./examples') >>> collection [Entry('alves_2011_electrochemistry_6010_f1a_solid'), Entry('engstfeld_2018_polycrystalline_17743_f4b_1'), Entry('no_bibliography')]
- classmethod from_remote(url=None, data=None, outdir=None)
Create a collection from a url containing a zip.
When no url is provided a collection is created from the data packages published on echemdb.
EXAMPLES:
>>> from unitpackage.collection import Collection >>> collection = Collection.from_remote()
The folder containing the data in the zip can be specified with the :param data:. An output directory for the extracted data can be specified with the :param outdir:.
- save_entries(outdir=None)
Save the entries of this collection as datapackages (CSV and JSON) to the output directory :param outdir:.
EXAMPLES:
>>> db = Collection.create_example() >>> db.save_entries(outdir='test/generated/saved_collection') >>> import glob >>> glob.glob('test/generated/saved_collection/**.json') ['test...