Metadata
Metadata class for handling nested dictionary/YAML metadata structures.
- class mdstools.metadata.metadata.Metadata(metadata: dict)
Wrapper for nested dictionary metadata structures.
Provides methods to load from YAML and convert to flattened tabular format.
EXAMPLES:
>>> from mdstools.metadata.metadata import Metadata >>> data = {'name': 'test', 'value': 42} >>> metadata = Metadata(data) >>> isinstance(metadata.data, dict) True >>> metadata.data['name'] 'test'
- property data: dict
Get the underlying metadata dictionary.
EXAMPLES:
>>> from mdstools.metadata.metadata import Metadata >>> m = Metadata({'experiment': {'value': 42}}) >>> m.data {'experiment': {'value': 42}}
- flatten()
Convert to flattened tabular representation.
- Returns:
FlattenedMetadata instance
EXAMPLES:
Nested dictionaries: >>> from mdstools.metadata.metadata import Metadata >>> data = {'experiment': {'value': 42, 'units': 'mV'}} >>> metadata = Metadata(data) >>> flattened = metadata.flatten() >>> flattened.rows # doctest: +NORMALIZE_WHITESPACE [['1', 'experiment', '<nested>'], ['1.1', 'value', 42], ['1.2', 'units', 'mV']] Lists of dictionaries: >>> data = {'measurements': [{'A': 1, 'B': 2}, {'A': 3, 'B': 4}]} >>> metadata = Metadata(data) >>> flattened = metadata.flatten() >>> len(flattened.rows) 5 >>> flattened.rows[0] ['1', 'measurements', '<nested>']
- classmethod from_yaml(filepath: str)
Load metadata from a YAML file.
- Parameters:
filepath – Path to YAML file
- Returns:
Metadata instance
EXAMPLES:
Load from a YAML file:
>>> from mdstools.metadata.metadata import Metadata >>> metadata = Metadata.from_yaml('tests/simple_test.yaml') >>> isinstance(metadata, Metadata) True >>> 'curation' in metadata.data True >>> metadata.data['system']['type'] 'electrochemical'
- to_yaml(filepath: str)
Save metadata to a YAML file.
- Parameters:
filepath – Path to save YAML file
EXAMPLES:
Basic save: >>> from mdstools.metadata.metadata import Metadata >>> import os >>> data = {'name': 'test', 'value': 42} >>> metadata = Metadata(data) >>> metadata.to_yaml('tests/generated/docstrings/test_metadata.yaml') >>> os.path.exists('tests/generated/docstrings/test_metadata.yaml') True Test roundtrip (dict → YAML → dict): >>> data = {'experiment': {'value': 42, 'units': 'mV'}, 'author': 'test'} >>> metadata = Metadata(data) >>> metadata.to_yaml('tests/generated/docstrings/roundtrip.yaml') >>> loaded = Metadata.from_yaml('tests/generated/docstrings/roundtrip.yaml') >>> loaded.data == data True
FlattenedMetadata class for handling tabular representations of metadata.
- class mdstools.metadata.flattened_metadata.FlattenedMetadata(rows: List[List])
Wrapper for flattened tabular metadata structures.
Handles metadata in tabular format as list of [number, key, value] rows. Provides methods to load from/save to CSV and Excel formats.
EXAMPLES:
>>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> rows = [['1', 'name', 'test'], ['2', 'value', 42]] >>> flattened = FlattenedMetadata(rows) >>> len(flattened.rows) 2 >>> flattened.rows[0] ['1', 'name', 'test']
- classmethod from_csv(filepath: str, **_kwargs)
Load flattened metadata from a CSV file.
- Parameters:
filepath – Path to CSV file
_kwargs – Additional arguments (currently unused, for future compatibility)
- Returns:
FlattenedMetadata instance
EXAMPLES:
Basic CSV loading with type preservation: >>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> flattened = FlattenedMetadata.from_csv('tests/from_csv_example.csv') >>> len(flattened.rows) 5 >>> flattened.rows[0][1] # First row, key column 'name' >>> # Values are converted to appropriate types (int, float, string) >>> isinstance(flattened.rows[1][2], int) # value field is int 42 True Reconstruct nested structure: >>> metadata = flattened.unflatten() >>> metadata.data {'name': 'test', 'value': 42, 'details': {'author': 'John Doe', 'year': 2024}}
TESTS:
Test roundtrip conversion with strings containing commas: >>> from mdstools.metadata.metadata import Metadata >>> import os >>> # Create data with comma in string value >>> original_data = {'description': 'test, with comma', 'value': 42, 'title': 'A, B, C'} >>> metadata = Metadata(original_data) >>> flattened = metadata.flatten() >>> # Save to CSV >>> flattened.to_csv('tests/generated/docstrings/test_comma.csv') >>> # Load back from CSV >>> loaded = FlattenedMetadata.from_csv('tests/generated/docstrings/test_comma.csv') >>> # Verify commas in strings are preserved >>> loaded.unflatten().data == original_data True >>> loaded.rows[0][2] # First value should contain comma 'test, with comma'
- classmethod from_excel(filepath, **kwargs)
Load flattened metadata from an Excel file.
- Parameters:
filepath – Path to Excel file
kwargs – Additional arguments passed to pandas.read_excel
- Returns:
FlattenedMetadata instance
NOTE: The Excel file must include columns named Number, Key, and Value.
EXAMPLES:
Test roundtrip: flattened → Excel → flattened >>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> import os >>> original_rows = [['1', 'experiment', '<nested>'], ... ['1.i1.1', 'A', '<nested>'], ... ['1.i1.1.1', 'value', 1], ... ['1.i1.1.2', 'units', 'mV'], ... ['1.i1.2', 'B', 2], ... ['1.i2.1', 'A', 3], ... ['1.i2.2', 'B', 4]] >>> flattened = FlattenedMetadata(original_rows) >>> flattened.to_excel('tests/generated/docstrings/test_flattened.xlsx') >>> loaded = FlattenedMetadata.from_excel('tests/generated/docstrings/test_flattened.xlsx') >>> loaded.unflatten().data == flattened.unflatten().data True Multi-sheet roundtrip: save with separate_sheets, load automatically merges: >>> # Save as multi-sheet Excel >>> rows = [['1', 'experiment', '<nested>'], ['1.1', 'value', 1], ... ['2', 'source', '<nested>'], ['2.1', 'author', 'test']] >>> flattened_multi = FlattenedMetadata(rows) >>> flattened_multi.to_excel('tests/generated/docstrings/multi_roundtrip.xlsx', separate_sheets=True) >>> # Load back - automatically handles multiple sheets >>> loaded_multi = FlattenedMetadata.from_excel('tests/generated/docstrings/multi_roundtrip.xlsx') >>> len(loaded_multi.rows) == len(rows) True
- property rows: List[List]
Get the underlying flattened data rows.
EXAMPLES:
>>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> fm = FlattenedMetadata([['1', 'a', 1], ['2', 'b', 2]]) >>> fm.rows [['1', 'a', 1], ['2', 'b', 2]]
- to_csv(filepath: str, **kwargs)
Save flattened metadata to a CSV file.
- Parameters:
filepath – Path to save CSV file
kwargs – Additional arguments passed to pandas.DataFrame.to_csv
EXAMPLES:
>>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> import os >>> rows = [['1', 'experiment', '<nested>'], ['1.1', 'value', 1]] >>> flattened = FlattenedMetadata(rows) >>> flattened.to_csv('tests/generated/docstrings/test_flat.csv') >>> os.path.exists('tests/generated/docstrings/test_flat.csv') True
- to_excel(filepath, separate_sheets=False, **kwargs)
Save flattened metadata to an Excel file.
- Parameters:
filepath – Path to save Excel file
separate_sheets – If True, create separate sheets for each top-level key
kwargs – Additional arguments passed to pandas.DataFrame.to_excel
When separate_sheets=True, the Excel file will have one sheet per top-level key in the nested structure, making it easier to navigate large metadata files.
EXAMPLES:
Single sheet export: >>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> import os >>> rows = [['1', 'experiment', '<nested>'], ['1.1', 'value', 1]] >>> flattened = FlattenedMetadata(rows) >>> flattened.to_excel('tests/generated/docstrings/test_flat.xlsx') >>> os.path.exists('tests/generated/docstrings/test_flat.xlsx') True Multi-sheet export: >>> rows = [['1', 'experiment', '<nested>'], ['1.1', 'value', 1], ... ['2', 'source', '<nested>'], ['2.1', 'author', 'test']] >>> flattened = FlattenedMetadata(rows) >>> flattened.to_excel('tests/generated/docstrings/test_flat_multi.xlsx', separate_sheets=True) >>> os.path.exists('tests/generated/docstrings/test_flat_multi.xlsx') True
- to_latex(filepath=None, **kwargs) str
Convert to LaTeX table format.
- Parameters:
filepath – Optional path to save the LaTeX file
kwargs – Additional arguments passed to pandas.DataFrame.to_latex
- Returns:
LaTeX-formatted string
EXAMPLES:
Simple example: >>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> rows = [['1', 'name', 'test'], ['2', 'value', 42]] >>> flattened = FlattenedMetadata(rows) >>> latex = flattened.to_latex() >>> 'tabular' in latex and 'name' in latex True With nested structures and lists: >>> rows = [['1', 'name', 'test'], ['2', 'foo', '<nested>'], ... ['2.i1', '', 'a'], ['2.i2', '', 'b'], ['2.i3', '', 'c']] >>> flattened = FlattenedMetadata(rows) >>> latex = flattened.to_latex() >>> 'Number' in latex and 'Key' in latex and 'Value' in latex True Save to file: >>> import os >>> flattened.to_latex('tests/generated/docstrings/test_flat.tex') '...' >>> os.path.exists('tests/generated/docstrings/test_flat.tex') True
- to_markdown(filepath=None, **kwargs) str
Convert to Markdown table format.
- Parameters:
filepath – Optional path to save the Markdown file
kwargs – Additional arguments passed to pandas.DataFrame.to_markdown
- Returns:
Markdown-formatted string
EXAMPLES:
Simple example: >>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> rows = [['1', 'name', 'test'], ['2', 'value', 42]] >>> flattened = FlattenedMetadata(rows) >>> print(flattened.to_markdown()) # doctest: +NORMALIZE_WHITESPACE | Number | Key | Value | |---------:|:------|:--------| | 1 | name | test | | 2 | value | 42 | With nested structures and lists: >>> rows = [['1', 'name', 'test'], ['2', 'foo', '<nested>'], ... ['2.i1', '', 'a'], ['2.i2', '', 'b'], ['2.i3', '', 'c']] >>> flattened = FlattenedMetadata(rows) >>> markdown = flattened.to_markdown() >>> 'Number' in markdown and 'Key' in markdown and 'Value' in markdown True >>> '<nested>' in markdown True Save to file: >>> import os >>> flattened.to_markdown('tests/generated/docstrings/test_flat.md') '...' >>> os.path.exists('tests/generated/docstrings/test_flat.md') True
- to_pandas() DataFrame
Convert to pandas DataFrame.
- Returns:
DataFrame with columns [‘Number’, ‘Key’, ‘Value’]
EXAMPLES:
>>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> rows = [['1', 'name', 'test'], ['2', 'value', 42]] >>> flattened = FlattenedMetadata(rows) >>> df = flattened.to_pandas() >>> df.columns.tolist() ['Number', 'Key', 'Value'] >>> df['Key'].tolist() ['name', 'value']
- unflatten(schema_path: str | None = None)
Convert back to nested metadata structure.
- Parameters:
schema_path – Optional JSON schema file path for validation
- Returns:
Metadata instance
EXAMPLES:
>>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> rows = [['1', 'experiment', '<nested>'], ... ['1.1', 'value', 1], ... ['1.2', 'units', 'mV']] >>> flattened = FlattenedMetadata(rows) >>> metadata = flattened.unflatten() >>> metadata.data {'experiment': {'value': 1, 'units': 'mV'}}
EnrichedFlattenedMetadata class for handling schema-enriched tabular metadata.
- class mdstools.metadata.enriched_metadata.EnrichedFlattenedMetadata(rows: List[List], schema_dir: str)
Schema-enriched wrapper for flattened tabular metadata structures.
Extends FlattenedMetadata by adding Example and Description columns from JSON Schema files. This provides documentation and reference values alongside the actual metadata.
EXAMPLES:
Load from a dictionary and enrich with schema information:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> import os >>> # Start with nested metadata >>> data = {'curation': {'process': [{'role': 'curator', 'name': 'John Doe'}]}} >>> # Create enriched metadata (flattens and adds schema info) >>> enriched = EnrichedFlattenedMetadata.from_dict(data, schema_dir='schemas') >>> # Base rows have 3 columns: [Number, Key, Value] >>> enriched.base_rows[0] # Top level ['1', 'curation', '<nested>'] >>> enriched.base_rows[2] # Leaf value ['1.1.i1.1', 'role', 'curator'] >>> # Enriched rows have 5 columns: [Number, Key, Value, Example, Description] >>> enriched.rows [['1', 'curation', '<nested>', '', ''], ['1.1', 'process', '<nested>', '', 'List of people involved in creating, recording, or curating this data.'], ['1.1.i1.1', 'role', 'curator', 'experimentalist', 'Role of a person in the data curation process.'], ['1.1.i1.2', 'name', 'John Doe', '', 'Full name of the person.']] >>> enriched.rows[2][3] # Example for 'role' field 'experimentalist' >>> 'person' in enriched.rows[2][4].lower() # Description contains 'person' True
- property base_rows: List[List]
Get the base 3-column rows without enrichment.
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> enriched.base_rows [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']]
- classmethod from_csv(filepath, schema_dir: str, **kwargs)
Load flattened metadata from a CSV file and enrich with schema information.
- Parameters:
filepath – Path to CSV file
schema_dir – Path to directory containing JSON Schema files
kwargs – Additional arguments passed to FlattenedMetadata.from_csv
- Returns:
EnrichedFlattenedMetadata instance
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> import os >>> # Create a test CSV with system metadata >>> import csv >>> os.makedirs('tests/generated/docstrings', exist_ok=True) >>> with open('tests/generated/docstrings/system_example.csv', 'w', newline='') as f: ... writer = csv.writer(f) ... _ = writer.writerow(['Number', 'Key', 'Value']) ... _ = writer.writerow(['1', 'system', '<nested>']) ... _ = writer.writerow(['1.1', 'type', 'electrochemical']) >>> enriched = EnrichedFlattenedMetadata.from_csv('tests/generated/docstrings/system_example.csv', ... schema_dir='schemas') >>> len(enriched.rows) 2 >>> len(enriched.rows[0]) # Has 5 columns 5
- classmethod from_dict(data: dict, schema_dir: str)
Create EnrichedFlattenedMetadata from a nested dictionary.
- Parameters:
data – Nested dictionary of metadata
schema_dir – Path to directory containing JSON Schema files
- Returns:
EnrichedFlattenedMetadata instance
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> data = {'system': {'type': 'electrochemical'}} >>> enriched = EnrichedFlattenedMetadata.from_dict(data, schema_dir='schemas') >>> enriched.base_rows[0] ['1', 'system', '<nested>'] >>> enriched.base_rows[1] ['1.1', 'type', 'electrochemical'] >>> len(enriched.rows[1]) # Enriched row has 5 columns 5
- classmethod from_excel(filepath, schema_dir: str, **kwargs)
Load flattened metadata from an Excel file and enrich with schema information.
- Parameters:
filepath – Path to Excel file
schema_dir – Path to directory containing JSON Schema files
kwargs – Additional arguments passed to FlattenedMetadata.from_excel
- Returns:
EnrichedFlattenedMetadata instance
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> from mdstools.metadata.flattened_metadata import FlattenedMetadata >>> import os >>> # Create test data >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> flattened = FlattenedMetadata(rows) >>> flattened.to_excel('tests/generated/docstrings/system_excel_example.xlsx') >>> # Load with enrichment >>> enriched = EnrichedFlattenedMetadata.from_excel('tests/generated/docstrings/system_excel_example.xlsx', ... schema_dir='schemas') >>> len(enriched.rows[0]) # Has 5 columns 5 Load multi-sheet enriched Excel file: >>> # Can load enriched files saved with separate_sheets=True >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical'], ... ['2', 'curation', '<nested>'], ['2.1', 'process', '<nested>']] >>> enriched_multi = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> enriched_multi.to_excel('tests/generated/docstrings/enriched_multi_ex.xlsx', separate_sheets=True) >>> loaded = EnrichedFlattenedMetadata.from_excel('tests/generated/docstrings/enriched_multi_ex.xlsx', ... schema_dir='schemas') >>> len(loaded.base_rows) 4
- property rows: List[List]
Get the enriched data rows with Example and Description columns.
Each row is
[number, key, value, example, description].EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> enriched.rows[0][:3] ['1', 'system', '<nested>'] >>> len(enriched.rows[0]) 5
- to_csv(filepath, **kwargs)
Save enriched metadata to a CSV file.
- Parameters:
filepath – Path to save CSV file
kwargs – Additional arguments passed to pandas.DataFrame.to_csv
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> import os >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> enriched.to_csv('tests/generated/docstrings/enriched_test.csv') >>> os.path.exists('tests/generated/docstrings/enriched_test.csv') True
- to_excel(filepath, separate_sheets=False, **kwargs)
Save enriched metadata to an Excel file.
- Parameters:
filepath – Path to save Excel file
separate_sheets – If True, create separate sheets for each top-level key
kwargs – Additional arguments passed to pandas.DataFrame.to_excel
When separate_sheets=True, the Excel file will have one sheet per top-level key in the nested structure, making it easier to navigate large metadata files.
EXAMPLES:
Single sheet export: >>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> import os >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> enriched.to_excel('tests/generated/docstrings/enriched_test.xlsx') >>> os.path.exists('tests/generated/docstrings/enriched_test.xlsx') True Multi-sheet export: >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical'], ... ['2', 'source', '<nested>'], ['2.1', 'author', 'test']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> enriched.to_excel('tests/generated/docstrings/enriched_multi.xlsx', separate_sheets=True) >>> os.path.exists('tests/generated/docstrings/enriched_multi.xlsx') True
- to_latex(filepath=None, **kwargs) str
Convert to LaTeX table format with enrichment columns.
- Parameters:
filepath – Optional path to save the LaTeX file
kwargs – Additional arguments passed to pandas.DataFrame.to_latex
- Returns:
LaTeX-formatted string
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> latex = enriched.to_latex() >>> 'tabular' in latex and 'Example' in latex and 'Description' in latex True Save to file: >>> import os >>> enriched.to_latex('tests/generated/docstrings/enriched_test.tex') '...' >>> os.path.exists('tests/generated/docstrings/enriched_test.tex') True
- to_markdown(filepath=None, **kwargs) str
Convert to Markdown table format with enrichment columns.
- Parameters:
filepath – Optional path to save the Markdown file
kwargs – Additional arguments passed to pandas.DataFrame.to_markdown
- Returns:
Markdown-formatted string
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> markdown = enriched.to_markdown() >>> 'Example' in markdown and 'Description' in markdown True Save to file: >>> import os >>> enriched.to_markdown('tests/generated/docstrings/enriched_test.md') '...' >>> os.path.exists('tests/generated/docstrings/enriched_test.md') True
- to_pandas() DataFrame
Convert to pandas DataFrame with enrichment columns.
- Returns:
DataFrame with columns [‘Number’, ‘Key’, ‘Value’, ‘Example’, ‘Description’]
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> rows = [['1', 'system', '<nested>'], ['1.1', 'type', 'electrochemical']] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> df = enriched.to_pandas() >>> df.columns.tolist() ['Number', 'Key', 'Value', 'Example', 'Description'] >>> '<nested>' in df['Value'].tolist() True
- unflatten(schema_path: str | None = None)
Convert back to nested metadata structure (ignores enrichment columns).
- Parameters:
schema_path – Optional JSON schema file path for validation
- Returns:
Metadata instance
EXAMPLES:
>>> from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata >>> rows = [['1', 'experiment', '<nested>'], ... ['1.1', 'value', 1]] >>> enriched = EnrichedFlattenedMetadata(rows, schema_dir='schemas') >>> metadata = enriched.unflatten() >>> metadata.data {'experiment': {'value': 1}}