Validator
Schema validation utilities for checking metadata against JSON Schema files or Pydantic models generated from LinkML.
Three validation approaches are available:
Remote validation via
validate()– fetches the JSON Schema from the metadata-schema repository on GitHub and validates a dict or file against it. Convenience wrappers such asvalidate_svgdigitizer()are provided for each schema.Local JSON Schema validation via
validate_metadata()– uses jsonschema library against a local JSON Schema file.Pydantic validation via
validate_with_pydantic()– uses auto-generated Pydantic models from LinkML. Provides richer error messages and type coercion.
- mdstools.schema.validator.validate(data: Any, schema: str = 'echemdb_package', version: str = None) None
Validate metadata against a schema from the metadata-schema repository.
Fetches the JSON schema from
https://raw.githubusercontent.com/echemdb/metadata-schema/<version>/schemas/and validates data against it.- Parameters:
data – Metadata dict, or path to a YAML/JSON file.
schema – Schema name — one of
'autotag','minimum_echemdb','source_data','svgdigitizer','echemdb_package','svgdigitizer_package'.version – Git tag or branch name. Defaults to the installed package version (i.e. the matching release tag). Pass
'main'to validate against the latest development schemas.
- Raises:
FileNotFoundError – If data is a path that does not exist.
ValueError – If validation fails or schema is unknown.
EXAMPLES:
Validate a local YAML file against the remote autotag schema:
>>> from mdstools.schema.validator import validate >>> validate('examples/file_schemas/autotag.yaml', schema='autotag')
Validate a dict:
>>> import yaml >>> with open('examples/file_schemas/minimum_echemdb.yaml') as f: ... data = yaml.safe_load(f) >>> validate(data, schema='minimum_echemdb')
Validate against a specific version (git tag):
>>> validate('examples/file_schemas/autotag.yaml', schema='autotag', version='0.5.1')
Invalid data raises
ValueError:>>> try: ... validate({'curation': 'not a dict'}, schema='autotag') ... except ValueError as e: ... 'validation failed' in str(e).lower() True
- mdstools.schema.validator.validate_autotag(data: Any, version: str = None) None
Validate metadata against the
autotagschema.See
validate()for parameter details.EXAMPLES:
>>> from mdstools.schema.validator import validate_autotag >>> validate_autotag('examples/file_schemas/autotag.yaml') Local validation (always in sync with the working tree):: >>> import yaml >>> from mdstools.schema.validator import validate_metadata >>> validate_metadata( ... yaml.safe_load(open('examples/file_schemas/autotag.yaml')), ... 'schemas/autotag.json')
- mdstools.schema.validator.validate_echemdb_package(data: Any, version: str = None) None
Validate metadata against the
echemdb_packageschema.See
validate()for parameter details.EXAMPLES:
>>> from mdstools.schema.validator import validate_echemdb_package >>> validate_echemdb_package('examples/file_schemas/echemdb_package.json') Local validation (always in sync with the working tree):: >>> import json, contextlib, io >>> from pathlib import Path >>> from mdstools.schema.validate_examples import validate_data, build_package_registry >>> schema = json.load(open('schemas/echemdb_package.json')) >>> data = json.load(open('examples/file_schemas/echemdb_package.json')) >>> with contextlib.redirect_stdout(io.StringIO()): ... registry = build_package_registry(Path('schemas')) >>> errors = validate_data(data, schema, registry=registry) >>> len(errors) 0
- mdstools.schema.validator.validate_metadata(data: Any, schema_path: str) None
Validate metadata against a JSON Schema.
Loads the schema at schema_path, resolves
$refreferences relative to the file, and validates data against it. Raises on the first batch of errors (up to 10 are reported).- Parameters:
data – Metadata object to validate
schema_path – Path to JSON Schema file
- Raises:
FileNotFoundError – If the schema file does not exist
ValueError – If validation fails
EXAMPLES:
Validating correct metadata passes silently:
>>> from mdstools.schema.validator import validate_metadata >>> from mdstools.metadata.metadata import Metadata >>> data = Metadata.from_yaml('examples/file_schemas/autotag.yaml').data >>> validate_metadata(data, 'schemas/autotag.json')
Validation errors raise
ValueErrorwith details:>>> invalid_data = {'curation': 'not a dict'} >>> try: ... validate_metadata(invalid_data, 'schemas/autotag.json') ... except ValueError as e: ... 'validation failed' in str(e).lower() True
Missing schema file raises
FileNotFoundError:>>> try: ... validate_metadata({}, 'nonexistent_schema.json') ... except FileNotFoundError: ... print('Schema file not found') Schema file not found
- mdstools.schema.validator.validate_minimum_echemdb(data: Any, version: str = None) None
Validate metadata against the
minimum_echemdbschema.See
validate()for parameter details.EXAMPLES:
>>> from mdstools.schema.validator import validate_minimum_echemdb >>> validate_minimum_echemdb('examples/file_schemas/minimum_echemdb.yaml') Local validation (always in sync with the working tree):: >>> import yaml >>> from mdstools.schema.validator import validate_metadata >>> validate_metadata( ... yaml.safe_load(open('examples/file_schemas/minimum_echemdb.yaml')), ... 'schemas/minimum_echemdb.json')
- mdstools.schema.validator.validate_source_data(data: Any, version: str = None) None
Validate metadata against the
source_dataschema.See
validate()for parameter details.EXAMPLES:
>>> from mdstools.schema.validator import validate_source_data >>> validate_source_data('examples/file_schemas/source_data.yaml', version='main') Local validation (always in sync with the working tree):: >>> import yaml >>> from mdstools.schema.validator import validate_metadata >>> validate_metadata( ... yaml.safe_load(open('examples/file_schemas/source_data.yaml')), ... 'schemas/source_data.json')
- mdstools.schema.validator.validate_svgdigitizer(data: Any, version: str = None) None
Validate metadata against the
svgdigitizerschema.See
validate()for parameter details.EXAMPLES:
>>> from mdstools.schema.validator import validate_svgdigitizer >>> validate_svgdigitizer('examples/file_schemas/svgdigitizer.yaml') Local validation (always in sync with the working tree):: >>> import yaml >>> from mdstools.schema.validator import validate_metadata >>> validate_metadata( ... yaml.safe_load(open('examples/file_schemas/svgdigitizer.yaml')), ... 'schemas/svgdigitizer.json')
- mdstools.schema.validator.validate_svgdigitizer_package(data: Any, version: str = None) None
Validate metadata against the
svgdigitizer_packageschema.See
validate()for parameter details.EXAMPLES:
>>> from mdstools.schema.validator import validate_svgdigitizer_package >>> validate_svgdigitizer_package('examples/file_schemas/svgdigitizer_package.json') Local validation (always in sync with the working tree):: >>> import json, contextlib, io >>> from pathlib import Path >>> from mdstools.schema.validate_examples import validate_data, build_package_registry >>> schema = json.load(open('schemas/svgdigitizer_package.json')) >>> data = json.load(open('examples/file_schemas/svgdigitizer_package.json')) >>> with contextlib.redirect_stdout(io.StringIO()): ... registry = build_package_registry(Path('schemas')) >>> errors = validate_data(data, schema, registry=registry) >>> len(errors) 0
- mdstools.schema.validator.validate_with_pydantic(data: Any, schema_name: str) Any
Validate metadata using auto-generated Pydantic models from LinkML.
Returns the validated Pydantic model instance on success. Raises
ValueErroron validation failure with detailed error messages.- Parameters:
data – Metadata dict to validate
schema_name – Schema name (e.g. ‘minimum_echemdb’, ‘autotag’)
- Returns:
Validated Pydantic model instance
- Raises:
ValueError – If validation fails or schema_name is unknown
EXAMPLES:
Validating correct metadata returns a Pydantic model:
>>> import yaml >>> from mdstools.schema.validator import validate_with_pydantic >>> with open('examples/file_schemas/minimum_echemdb.yaml') as f: ... data = yaml.safe_load(f) >>> model = validate_with_pydantic(data, 'minimum_echemdb') >>> model.source.citationKey 'engstfeld_2018_polycrystalline_17743'
Validation errors raise
ValueError:>>> try: ... validate_with_pydantic({'curation': 'not a dict'}, 'minimum_echemdb') ... except ValueError as e: ... 'validation failed' in str(e).lower() True
Unknown schema name raises
ValueError:>>> try: ... validate_with_pydantic({}, 'nonexistent') ... except ValueError as e: ... print('Unknown schema') Unknown schema