unitpackage.local
Utilities to work with local frictionless Data Packages such as collecting Data Packages and creating unitpackages.
- unitpackage.local.collect_datapackages(data)
Return a list of data packages defined in the directory data and its subdirectories.
EXAMPLES:
>>> packages = collect_datapackages("./examples/local") >>> packages[0] {'resources': [{'name': ...
- unitpackage.local.collect_resources(datapackages)
Return a list of resources from a list of Data Packages.
EXAMPLES:
>>> packages = collect_datapackages("./examples/local") >>> resources = collect_resources(packages) >>> [resource.name for resource in resources] ['alves_2011_electrochemistry_6010_f1a_solid', 'engstfeld_2018_polycrystalline_17743_f4b_1', 'no_bibliography']
- unitpackage.local.create_df_resource_from_csv(csvname, encoding=None, header_lines=None, column_header_lines=None, decimal=None, delimiters=None)
Create a pandas dataframe resource from a CSV file.
EXAMPLES:
>>> from unitpackage.local import create_df_resource_from_csv >>> filename = 'examples/from_csv/from_csv_multiple_headers.csv' >>> resource = create_df_resource_from_csv(csvname='examples/from_csv/from_csv_multiple_headers.csv', column_header_lines=2) >>> resource {'name': 'memory', 'type': 'table', 'data': [], 'format': 'pandas', 'mediatype': 'application/pandas', 'schema': {'fields': [{'name': 'E / V', 'type': 'integer'}, {'name': 'j / A / cm2', 'type': 'integer'}]}}
- unitpackage.local.create_df_resource_from_df(df)
Return a pandas dataframe resource for a pandas DataFrame.
EXAMPLES:
>>> data = {'x': [1, 2, 3], 'y': [4, 5, 6]} >>> import pandas as pd >>> df = pd.DataFrame(data) >>> from unitpackage.local import create_df_resource_from_df >>> resource = create_df_resource_from_df(df) >>> resource {'name': 'memory', 'type': 'table', 'data': [], 'format': 'pandas', ... >>> resource.data x y 0 1 4 1 2 5 2 3 6 >>> resource.format 'pandas'
- unitpackage.local.create_df_resource_from_tabular_resource(resource)
Return a pandas dataframe resource for a frictionless Tabular Resource.
EXAMPLES:
>>> from frictionless import Package >>> from unitpackage.local import create_df_resource_from_tabular_resource >>> tabular_resource = Package("./examples/local/no_bibliography/no_bibliography.json").resources[0] >>> df_resource = create_df_resource_from_tabular_resource(tabular_resource) >>> df_resource {'name': 'memory', ... 'format': 'pandas', ... >>> df_resource.data t E j ...
TESTS:
>>> data = {'x': [1, 2, 3], 'y': [4, 5, 6]} >>> df = pd.DataFrame(data) >>> from unitpackage.entry import Entry >>> entry = Entry.from_df(df, basename='test_parent_directory') >>> entry.save(outdir=".") >>> entry_ = Entry.from_local('test_parent_directory.json') >>> entry_.df x y 0 1 4 1 2 5 2 3 6
- unitpackage.local.create_tabular_resource_from_csv(csvname, encoding=None, header_lines=None, column_header_lines=None, decimal=None, delimiters=None)
Return a resource built from a provided CSV.
EXAMPLES:
For standard CSV files (single header line and subsequent lines with data, using . as decimal separator.) a tabular data resource is created:
>>> filename = './examples/from_csv/from_csv.csv' >>> resource = create_tabular_resource_from_csv(filename) >>> resource {'name': 'from_csv', 'type': 'table', 'path': 'from_csv.csv', 'scheme': 'file', 'format': 'csv', 'mediatype': 'text/csv', ...
For CSV files with a more complex structure (header, multiple column header lines, or other separators) a pandas dataframe resource is created instead:
>>> filename = 'examples/from_csv/from_csv_multiple_headers.csv' >>> resource = create_tabular_resource_from_csv(csvname=filename, column_header_lines=2) >>> resource {'name': 'memory', 'type': 'table', 'data': [], 'format': 'pandas', 'mediatype': 'application/pandas', 'schema': {'fields': [{'name': 'E / V', 'type': 'integer'}, {'name': 'j / A / cm2', 'type': 'integer'}]}}
- unitpackage.local.create_unitpackage(resource, metadata=None, fields=None)
Return a Data Package built from a :param metadata: dict and tabular data in :param resource: frictionless.Resource.
The :param fields: list must be structured such as [{‘name’:’E’, ‘unit’: ‘mV’}, {‘name’:’T’, ‘unit’: ‘K’}].
EXAMPLES:
>>> from unitpackage.local import create_tabular_resource_from_csv, create_unitpackage >>> resource = create_tabular_resource_from_csv("./examples/from_csv/from_csv.csv") >>> new_fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}] >>> package = create_unitpackage(resource=resource, fields=new_fields) >>> package {'resources': [{'name': ...
- unitpackage.local.update_fields(original_fields, new_fields)
Return a new list of fields where a list of fields has been updated based on a new list of fields.
The :param: original_fields: list and :param new_fields: list must must be structured such as [{‘name’:’E’, ‘unit’: ‘mV’}, {‘name’:’T’, ‘unit’: ‘K’}] and each entry must contain a key name corresponding to a field name in the original fields.
EXAMPLES:
>>> from unitpackage.local import update_fields, create_tabular_resource_from_csv >>> schema = create_tabular_resource_from_csv("./examples/from_csv/from_csv.csv").schema >>> original_fields = schema.to_dict()['fields'] >>> original_fields [{'name': 'E', 'type': 'integer'}, {'name': 'I', 'type': 'integer'}] >>> new_fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}, {'name':'x', 'unit': 'm'}] >>> updated_fields = update_fields(original_fields, new_fields) >>> updated_fields [{'name': 'E', 'type': 'integer', 'unit': 'mV'}, {'name': 'I', 'type': 'integer', 'unit': 'A'}]
TESTS:
Invalid fields:
>>> fields = 'not a list' >>> updated_fields = update_fields(original_fields, fields) Traceback (most recent call last): ... ValueError: 'fields' must be a list such as [{'name': '<fieldname>', 'unit':'<field unit>'}]`, e.g., `[{'name':'E', 'unit': 'mV}, {'name':'T', 'unit': 'K}]`
More fields than required:
>>> fields = [{'name':'E', 'unit': 'mV'}, {'name':'I', 'unit': 'A'}, {'name':'x', 'unit': 'm'}] >>> updated_fields = update_fields(original_fields, fields) >>> updated_fields [{'name': 'E', 'type': 'integer', 'unit': 'mV'}, {'name': 'I', 'type': 'integer', 'unit': 'A'}]
Part of the fields specified:
>>> fields = [{'name':'E', 'unit': 'mV'}] >>> updated_fields = update_fields(original_fields, fields) >>> updated_fields [{'name': 'E', 'type': 'integer', 'unit': 'mV'}, {'name': 'I', 'type': 'integer'}]
- unitpackage.local.write_metadata(out, metadata)
Write metadata to the out stream in JSON format.