Schema Enricher
Schema enricher for adding descriptions and examples from JSON Schema to flattened data.
- class mdstools.schema.enricher.SchemaEnricher(schema_dir: str)
Enriches flattened YAML data with descriptions and examples from JSON Schema files.
This class loads JSON Schema files and uses them to add metadata (descriptions and examples) to flattened data structures. It automatically prefers resolved schemas where all $ref references have been inlined.
EXAMPLES:
Basic usage with schemas:
>>> import os >>> os.makedirs('tests/generated', exist_ok=True) >>> from mdstools.schema.enricher import SchemaEnricher >>> enricher = SchemaEnricher('schemas') >>> 'curation' in enricher.schema_cache True
Enrich a single field path:
>>> desc, example = enricher.enrich_row('curation.process.role', 'curator') >>> 'person' in desc.lower() True >>> example in ['experimentalist', 'curator', 'reviewer', 'supervisor'] True
Enrich with nested objects:
>>> desc, example = enricher.enrich_row('system.type', 'electrochemical') >>> 'system' in desc.lower() True >>> example == 'electrochemical' True
- enrich_flattened_data(flattened_rows: list) list
Enrich flattened YAML rows with description and example columns.
Takes a list of
[number, key, value]rows and returns a list of[number, key, value, example, description]rows by looking up each field in the loaded JSON schemas.- Parameters:
flattened_rows – List of [level, key, value] rows
- Returns:
List of [level, key, value, example, description] rows
EXAMPLES:
Enriching curation metadata:
>>> from mdstools.schema.enricher import SchemaEnricher >>> enricher = SchemaEnricher('schemas') >>> rows = [ ... ['1', 'curation', '<nested>'], ... ['1.1', 'process', '<nested>'], ... ['1.1.i1.1', 'role', 'curator'], ... ['1.1.i1.2', 'name', 'Jane Doe'], ... ] >>> enriched = enricher.enrich_flattened_data(rows)
Each enriched row has 5 elements: [number, key, value, example, description]:
>>> len(enriched[0]) 5
Leaf fields get descriptions and examples from the schema:
>>> enriched[2] # 'role' field ['1.1.i1.1', 'role', 'curator', 'experimentalist', 'Role of a person in the data curation process.']
Non-leaf
<nested>rows may get descriptions too:>>> enriched[1][4] # 'process' description 'List of people involved in creating, recording, or curating this data.'
Fields without schema information get empty strings:
>>> rows_unknown = [['1', 'unknown_field', 'value']] >>> enriched_unknown = enricher.enrich_flattened_data(rows_unknown) >>> enriched_unknown[0][3:5] ['', '']
- enrich_row(key_path: str, current_value: Any) Tuple[str | None, str | None]
Get description and example for a specific field path.
Splits the dot-separated path, locates the matching top-level schema and walks down the definition tree to find the leaf metadata.
- Parameters:
key_path – Dot-separated path of keys (e.g., “curation.process.role”)
current_value – The current value in the YAML
- Returns:
Tuple of (description, example) or (None, None)
EXAMPLES:
>>> from mdstools.schema.enricher import SchemaEnricher >>> enricher = SchemaEnricher('schemas') Look up a deeply nested field:: >>> desc, example = enricher.enrich_row('curation.process.role', 'curator') >>> 'person' in desc.lower() True >>> example in ['experimentalist', 'curator', 'reviewer', 'supervisor'] True Top-level key without remaining path returns (None, None):: >>> enricher.enrich_row('curation', '<nested>') (None, None) Unknown field returns (None, None):: >>> enricher.enrich_row('nonexistent.field', 'value') (None, None) Empty or missing path:: >>> enricher.enrich_row('', 'value') (None, None)