unitpackage.local

Utilities to work with local frictionless Data Packages such as collecting Data Packages and creating unitpackages.

unitpackage.local.collect_datapackages(data)

Return a list of data packages defined in the directory data and its subdirectories.

EXAMPLES:

>>> packages = collect_datapackages("./examples/local")
>>> packages[0]
{'resources': [{'name':
...
unitpackage.local.collect_resources(datapackages)

Return a list of resources from a list of Data Packages.

EXAMPLES:

>>> packages = collect_datapackages("./examples/local")
>>> resources = collect_resources(packages)
>>> [resource.name for resource in resources]
['alves_2011_electrochemistry_6010_f1a_solid',
'engstfeld_2018_polycrystalline_17743_f4b_1',
'no_bibliography']
unitpackage.local.create_df_resource_from_csv(csvname, encoding=None, header_lines=None, column_header_lines=None, decimal=None, delimiters=None)

Create a pandas dataframe resource from a CSV file.

EXAMPLES:

>>> from unitpackage.local import create_df_resource_from_csv
>>> filename = 'examples/from_csv/from_csv_multiple_headers.csv'
>>> resource = create_df_resource_from_csv(csvname='examples/from_csv/from_csv_multiple_headers.csv', column_header_lines=2)
>>> resource
{'name': 'memory',
'type': 'table',
'data': [],
'format': 'pandas',
'mediatype': 'application/pandas',
'schema': {'fields': [{'name': 'E / V', 'type': 'integer'},
                      {'name': 'j / A / cm2', 'type': 'integer'}]}}
unitpackage.local.create_df_resource_from_df(df)

Return a pandas dataframe resource for a pandas DataFrame.

EXAMPLES:

>>> data = {'x': [1, 2, 3], 'y': [4, 5, 6]}
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> from unitpackage.local import create_df_resource_from_df
>>> resource = create_df_resource_from_df(df)
>>> resource
{'name': 'memory',
'type': 'table',
'data': [],
'format': 'pandas', ...

>>> resource.data
   x  y
0  1  4
1  2  5
2  3  6

>>> resource.format
'pandas'
unitpackage.local.create_df_resource_from_tabular_resource(resource)

Return a pandas dataframe resource for a frictionless Tabular Resource.

EXAMPLES:

>>> from frictionless import Package
>>> from unitpackage.local import create_df_resource_from_tabular_resource
>>> tabular_resource = Package("./examples/local/no_bibliography/no_bibliography.json").resources[0]
>>> df_resource = create_df_resource_from_tabular_resource(tabular_resource)
>>> df_resource
{'name': 'memory',
...
'format': 'pandas',
...

>>> df_resource.data
              t         E         j
...

TESTS:

>>> data = {'x': [1, 2, 3], 'y': [4, 5, 6]}
>>> df = pd.DataFrame(data)
>>> from unitpackage.entry import Entry
>>> entry = Entry.from_df(df, basename='test_parent_directory')
>>> entry.save(outdir=".")
>>> entry_ = Entry.from_local('test_parent_directory.json')
>>> entry_.df
       x  y
    0  1  4
    1  2  5
    2  3  6
unitpackage.local.create_tabular_resource_from_csv(csvname, encoding=None, header_lines=None, column_header_lines=None, decimal=None, delimiters=None)

Return a resource built from a provided CSV.

EXAMPLES:

For standard CSV files (single header line and subsequent lines with data, using . as decimal separator.) a tabular data resource is created:

>>> filename = './examples/from_csv/from_csv.csv'
>>> resource = create_tabular_resource_from_csv(filename)
>>> resource
{'name': 'from_csv',
'type': 'table',
'path': 'from_csv.csv',
'scheme': 'file',
'format': 'csv',
'mediatype': 'text/csv', ...

For CSV files with a more complex structure (header, multiple column header lines, or other separators) a pandas dataframe resource is created instead:

>>> filename = 'examples/from_csv/from_csv_multiple_headers.csv'
>>> resource = create_tabular_resource_from_csv(csvname=filename, column_header_lines=2)
>>> resource
{'name': 'memory',
'type': 'table',
'data': [],
'format': 'pandas',
'mediatype': 'application/pandas',
'schema': {'fields': [{'name': 'E / V', 'type': 'integer'},
                      {'name': 'j / A / cm2', 'type': 'integer'}]}}
unitpackage.local.create_unitpackage(resource, metadata=None, fields=None)

Return a Data Package built from a :param metadata: dict and tabular data in :param resource: frictionless.Resource.

The :param fields: list must be structured such as [{‘name’:’E’, ‘unit’: ‘mV’}, {‘name’:’T’, ‘unit’: ‘K’}].

EXAMPLES:

>>> from unitpackage.local import create_tabular_resource_from_csv, create_unitpackage
>>> resource = create_tabular_resource_from_csv("./examples/from_csv/from_csv.csv")
>>> new_fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}]
>>> package = create_unitpackage(resource=resource, fields=new_fields)
>>> package
{'resources': [{'name':
...
unitpackage.local.update_fields(original_fields, new_fields)

Return a new list of fields where a list of fields has been updated based on a new list of fields.

The :param: original_fields: list and :param new_fields: list must must be structured such as [{‘name’:’E’, ‘unit’: ‘mV’}, {‘name’:’T’, ‘unit’: ‘K’}] and each entry must contain a key name corresponding to a field name in the original fields.

EXAMPLES:

>>> from unitpackage.local import update_fields, create_tabular_resource_from_csv
>>> schema = create_tabular_resource_from_csv("./examples/from_csv/from_csv.csv").schema
>>> original_fields = schema.to_dict()['fields']
>>> original_fields
[{'name': 'E', 'type': 'integer'},
{'name': 'I', 'type': 'integer'}]

>>> new_fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}, {'name':'x', 'unit': 'm'}]
>>> updated_fields = update_fields(original_fields, new_fields)
>>> updated_fields
[{'name': 'E', 'type': 'integer', 'unit': 'mV'},
{'name': 'I', 'type': 'integer', 'unit': 'A'}]

TESTS:

Invalid fields:

>>> fields = 'not a list'
>>> updated_fields = update_fields(original_fields, fields)
Traceback (most recent call last):
...
ValueError: 'fields' must be a list such as
[{'name': '<fieldname>', 'unit':'<field unit>'}]`,
e.g., `[{'name':'E', 'unit': 'mV}, {'name':'T', 'unit': 'K}]`

More fields than required:

>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}, {'name':'x', 'unit': 'm'}]
>>> updated_fields = update_fields(original_fields, fields)
>>> updated_fields
[{'name': 'E', 'type': 'integer', 'unit': 'mV'},
{'name': 'I', 'type': 'integer', 'unit': 'A'}]

Part of the fields specified:

>>> fields = [{'name':'E', 'unit': 'mV'}]
>>> updated_fields = update_fields(original_fields, fields)
>>> updated_fields
[{'name': 'E', 'type': 'integer', 'unit': 'mV'},
{'name': 'I', 'type': 'integer'}]
unitpackage.local.write_metadata(out, metadata)

Write metadata to the out stream in JSON format.