Validator

Schema validation utilities for checking metadata against JSON Schema files or Pydantic models generated from LinkML.

Three validation approaches are available:

  1. Remote validation via validate() – fetches the JSON Schema from the metadata-schema repository on GitHub and validates a dict or file against it. Convenience wrappers such as validate_svgdigitizer() are provided for each schema.

  2. Local JSON Schema validation via validate_metadata() – uses jsonschema library against a local JSON Schema file.

  3. Pydantic validation via validate_with_pydantic() – uses auto-generated Pydantic models from LinkML. Provides richer error messages and type coercion.

mdstools.schema.validator.validate(data: Any, schema: str = 'echemdb_package', version: str = None) None

Validate metadata against a schema from the metadata-schema repository.

Fetches the JSON schema from https://raw.githubusercontent.com/echemdb/metadata-schema/<version>/schemas/ and validates data against it.

Parameters:
  • data – Metadata dict, or path to a YAML/JSON file.

  • schema – Schema name — one of 'autotag', 'minimum_echemdb', 'source_data', 'svgdigitizer', 'echemdb_package', 'svgdigitizer_package'.

  • version – Git tag or branch name. Defaults to the installed package version (i.e. the matching release tag). Pass 'main' to validate against the latest development schemas.

Raises:
  • FileNotFoundError – If data is a path that does not exist.

  • ValueError – If validation fails or schema is unknown.

EXAMPLES:

Validate a local YAML file against the remote autotag schema:

>>> from mdstools.schema.validator import validate
>>> validate('examples/file_schemas/autotag.yaml', schema='autotag')

Validate a dict:

>>> import yaml
>>> with open('examples/file_schemas/minimum_echemdb.yaml') as f:
...     data = yaml.safe_load(f)
>>> validate(data, schema='minimum_echemdb')

Validate against a specific version (git tag):

>>> validate('examples/file_schemas/autotag.yaml', schema='autotag', version='0.5.1')

Invalid data raises ValueError:

>>> try:
...     validate({'curation': 'not a dict'}, schema='autotag')
... except ValueError as e:
...     'validation failed' in str(e).lower()
True
mdstools.schema.validator.validate_autotag(data: Any, version: str = None) None

Validate metadata against the autotag schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_autotag
>>> validate_autotag('examples/file_schemas/autotag.yaml')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/autotag.yaml')),
    ...     'schemas/autotag.json')
mdstools.schema.validator.validate_echemdb_package(data: Any, version: str = None) None

Validate metadata against the echemdb_package schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_echemdb_package
>>> validate_echemdb_package('examples/file_schemas/echemdb_package.json')

Local validation (always in sync with the working tree)::

    >>> import json, contextlib, io
    >>> from pathlib import Path
    >>> from mdstools.schema.validate_examples import validate_data, build_package_registry
    >>> schema = json.load(open('schemas/echemdb_package.json'))
    >>> data = json.load(open('examples/file_schemas/echemdb_package.json'))
    >>> with contextlib.redirect_stdout(io.StringIO()):
    ...     registry = build_package_registry(Path('schemas'))
    >>> errors = validate_data(data, schema, registry=registry)
    >>> len(errors)
    0
mdstools.schema.validator.validate_metadata(data: Any, schema_path: str) None

Validate metadata against a JSON Schema.

Loads the schema at schema_path, resolves $ref references relative to the file, and validates data against it. Raises on the first batch of errors (up to 10 are reported).

Parameters:
  • data – Metadata object to validate

  • schema_path – Path to JSON Schema file

Raises:
  • FileNotFoundError – If the schema file does not exist

  • ValueError – If validation fails

EXAMPLES:

Validating correct metadata passes silently:

>>> from mdstools.schema.validator import validate_metadata
>>> from mdstools.metadata.metadata import Metadata
>>> data = Metadata.from_yaml('examples/file_schemas/autotag.yaml').data
>>> validate_metadata(data, 'schemas/autotag.json')

Validation errors raise ValueError with details:

>>> invalid_data = {'curation': 'not a dict'}
>>> try:
...     validate_metadata(invalid_data, 'schemas/autotag.json')
... except ValueError as e:
...     'validation failed' in str(e).lower()
True

Missing schema file raises FileNotFoundError:

>>> try:
...     validate_metadata({}, 'nonexistent_schema.json')
... except FileNotFoundError:
...     print('Schema file not found')
Schema file not found
mdstools.schema.validator.validate_minimum_echemdb(data: Any, version: str = None) None

Validate metadata against the minimum_echemdb schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_minimum_echemdb
>>> validate_minimum_echemdb('examples/file_schemas/minimum_echemdb.yaml')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/minimum_echemdb.yaml')),
    ...     'schemas/minimum_echemdb.json')
mdstools.schema.validator.validate_source_data(data: Any, version: str = None) None

Validate metadata against the source_data schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_source_data
>>> validate_source_data('examples/file_schemas/source_data.yaml', version='main')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/source_data.yaml')),
    ...     'schemas/source_data.json')
mdstools.schema.validator.validate_svgdigitizer(data: Any, version: str = None) None

Validate metadata against the svgdigitizer schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_svgdigitizer
>>> validate_svgdigitizer('examples/file_schemas/svgdigitizer.yaml')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/svgdigitizer.yaml')),
    ...     'schemas/svgdigitizer.json')
mdstools.schema.validator.validate_svgdigitizer_package(data: Any, version: str = None) None

Validate metadata against the svgdigitizer_package schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_svgdigitizer_package
>>> validate_svgdigitizer_package('examples/file_schemas/svgdigitizer_package.json')

Local validation (always in sync with the working tree)::

    >>> import json, contextlib, io
    >>> from pathlib import Path
    >>> from mdstools.schema.validate_examples import validate_data, build_package_registry
    >>> schema = json.load(open('schemas/svgdigitizer_package.json'))
    >>> data = json.load(open('examples/file_schemas/svgdigitizer_package.json'))
    >>> with contextlib.redirect_stdout(io.StringIO()):
    ...     registry = build_package_registry(Path('schemas'))
    >>> errors = validate_data(data, schema, registry=registry)
    >>> len(errors)
    0
mdstools.schema.validator.validate_with_pydantic(data: Any, schema_name: str) Any

Validate metadata using auto-generated Pydantic models from LinkML.

Returns the validated Pydantic model instance on success. Raises ValueError on validation failure with detailed error messages.

Parameters:
  • data – Metadata dict to validate

  • schema_name – Schema name (e.g. ‘minimum_echemdb’, ‘autotag’)

Returns:

Validated Pydantic model instance

Raises:

ValueError – If validation fails or schema_name is unknown

EXAMPLES:

Validating correct metadata returns a Pydantic model:

>>> import yaml
>>> from mdstools.schema.validator import validate_with_pydantic
>>> with open('examples/file_schemas/minimum_echemdb.yaml') as f:
...     data = yaml.safe_load(f)
>>> model = validate_with_pydantic(data, 'minimum_echemdb')
>>> model.source.citationKey
'engstfeld_2018_polycrystalline_17743'

Validation errors raise ValueError:

>>> try:
...     validate_with_pydantic({'curation': 'not a dict'}, 'minimum_echemdb')
... except ValueError as e:
...     'validation failed' in str(e).lower()
True

Unknown schema name raises ValueError:

>>> try:
...     validate_with_pydantic({}, 'nonexistent')
... except ValueError as e:
...     print('Unknown schema')
Unknown schema