Validator

Schema validation utilities for checking metadata against JSON Schema files or Pydantic models generated from LinkML.

Three validation approaches are available:

  1. Remote validation via validate() – fetches the JSON Schema from the metadata-schema repository on GitHub and validates a dict or file against it. Convenience wrappers such as validate_svgdigitizer() are provided for each schema.

  2. Local JSON Schema validation via validate_metadata() – uses jsonschema library against a local JSON Schema file.

  3. Pydantic validation via validate_with_pydantic() – uses auto-generated Pydantic models from LinkML. Provides richer error messages and type coercion.

mdstools.schema.validator.load_yaml_metadata(stream: Any) Any

Load YAML metadata with dates and timestamps as plain strings.

A drop-in replacement for yaml.safe_load for metadata documents: unquoted YAML dates must reach the validator as strings rather than datetime.date objects, which fail validation of string-typed schema fields.

EXAMPLES:

>>> from mdstools.schema.validator import load_yaml_metadata
>>> load_yaml_metadata("date: 2021-07-09")
{'date': '2021-07-09'}

>>> import yaml
>>> yaml.safe_load("date: 2021-07-09")
{'date': datetime.date(2021, 7, 9)}
mdstools.schema.validator.validate(data: Any, schema: str = 'echemdb_package', version: str = None) None

Validate metadata against a schema from the metadata-schema repository.

Fetches the JSON schema from https://raw.githubusercontent.com/echemdb/metadata-schema/<version>/schemas/ and validates data against it.

Parameters:
  • data – Metadata dict, or path to a YAML/JSON file.

  • schema – Schema name — one of 'autotag', 'minimum_echemdb', 'source_data', 'svgdigitizer', 'echemdb_package', 'svgdigitizer_package'.

  • version – Git tag or branch name. Defaults to the installed package version (i.e. the matching release tag). Pass 'main' to validate against the latest development schemas.

Raises:
  • FileNotFoundError – If data is a path that does not exist.

  • ValueError – If validation fails or schema is unknown.

EXAMPLES:

Validate a local YAML file against the remote autotag schema:

>>> from mdstools.schema.validator import validate
>>> validate('examples/file_schemas/autotag.yaml', schema='autotag')

Validate a dict:

>>> import yaml
>>> with open('examples/file_schemas/minimum_echemdb.yaml') as f:
...     data = yaml.safe_load(f)
>>> validate(data, schema='minimum_echemdb')

Validate against a specific version (git tag). The current example uses fields added after 0.5.1 (e.g. operationParameters), so it does not validate against that older schema:

>>> try:
...     validate('examples/file_schemas/autotag.yaml', schema='autotag', version='0.5.1')
... except ValueError as e:
...     'validation failed' in str(e).lower()
True

Invalid data raises ValueError:

>>> try:
...     validate({'curation': 'not a dict'}, schema='autotag')
... except ValueError as e:
...     'validation failed' in str(e).lower()
True
mdstools.schema.validator.validate_autotag(data: Any, version: str = None) None

Validate metadata against the autotag schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_autotag
>>> validate_autotag('examples/file_schemas/autotag.yaml')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/autotag.yaml')),
    ...     'schemas/autotag.json')
mdstools.schema.validator.validate_echemdb_package(data: Any, version: str = None) None

Validate metadata against the echemdb_package schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_echemdb_package
>>> validate_echemdb_package('examples/file_schemas/echemdb_package.json')

Local validation (always in sync with the working tree)::

    >>> import json, contextlib, io
    >>> from pathlib import Path
    >>> from mdstools.schema.validate_examples import validate_data, build_package_registry
    >>> schema = json.load(open('schemas/echemdb_package.json'))
    >>> data = json.load(open('examples/file_schemas/echemdb_package.json'))
    >>> with contextlib.redirect_stdout(io.StringIO()):
    ...     registry = build_package_registry(Path('schemas'))
    >>> errors = validate_data(data, schema, registry=registry)
    >>> len(errors)
    0
mdstools.schema.validator.validate_instrument_references(data: Any) list

Check that every control.instrument resolves to an instrument name.

Within each experimental block, every operationParameters control block references an instrument by name; that name must match the name of an entry in the same block’s instrumentation list. JSON Schema cannot express this cross-reference, so it is checked here.

Parameters:

data – Metadata dict (object, file-schema, or package document).

Returns:

List of human-readable error messages (empty if all resolve).

EXAMPLES:

Resolving references pass silently:

>>> from mdstools.schema.validator import validate_instrument_references
>>> data = {"experimental": {
...     "instrumentation": [{"name": "Rotator1"}],
...     "operationParameters": {"massTransport": {"rotation": {
...         "rate": {"value": 1600, "unit": "1 / min"},
...         "control": {"instrument": "Rotator1"}}}}}}
>>> validate_instrument_references(data)
[]

A dangling reference is reported:

>>> bad = {"experimental": {
...     "instrumentation": [{"name": "Rotator1"}],
...     "operationParameters": {"temperature": {
...         "value": 298, "unit": "K",
...         "control": {"instrument": "Thermostat9"}}}}}
>>> validate_instrument_references(bad)
["control.instrument 'Thermostat9' does not match any instrumentation name (available: ['Rotator1'])"]
mdstools.schema.validator.validate_metadata(data: Any, schema_path: str) None

Validate metadata against a JSON Schema.

Loads the schema at schema_path, resolves $ref references relative to the file, and validates data against it. Raises on the first batch of errors (up to 10 are reported).

Parameters:
  • data – Metadata object to validate

  • schema_path – Path to JSON Schema file

Raises:
  • FileNotFoundError – If the schema file does not exist

  • ValueError – If validation fails

EXAMPLES:

Validating correct metadata passes silently:

>>> from mdstools.schema.validator import validate_metadata
>>> from mdstools.metadata.metadata import Metadata
>>> data = Metadata.from_yaml('examples/file_schemas/autotag.yaml').data
>>> validate_metadata(data, 'schemas/autotag.json')

Validation errors raise ValueError with details:

>>> invalid_data = {'curation': 'not a dict'}
>>> try:
...     validate_metadata(invalid_data, 'schemas/autotag.json')
... except ValueError as e:
...     'validation failed' in str(e).lower()
True

Missing schema file raises FileNotFoundError:

>>> try:
...     validate_metadata({}, 'nonexistent_schema.json')
... except FileNotFoundError:
...     print('Schema file not found')
Schema file not found
mdstools.schema.validator.validate_minimum_echemdb(data: Any, version: str = None) None

Validate metadata against the minimum_echemdb schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_minimum_echemdb
>>> validate_minimum_echemdb('examples/file_schemas/minimum_echemdb.yaml')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/minimum_echemdb.yaml')),
    ...     'schemas/minimum_echemdb.json')
mdstools.schema.validator.validate_source_data(data: Any, version: str = None) None

Validate metadata against the source_data schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_source_data
>>> validate_source_data('examples/file_schemas/source_data.yaml', version='main')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/source_data.yaml')),
    ...     'schemas/source_data.json')
mdstools.schema.validator.validate_svgdigitizer(data: Any, version: str = None) None

Validate metadata against the svgdigitizer schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_svgdigitizer
>>> validate_svgdigitizer('examples/file_schemas/svgdigitizer.yaml')

Local validation (always in sync with the working tree)::

    >>> import yaml
    >>> from mdstools.schema.validator import validate_metadata
    >>> validate_metadata(
    ...     yaml.safe_load(open('examples/file_schemas/svgdigitizer.yaml')),
    ...     'schemas/svgdigitizer.json')
mdstools.schema.validator.validate_svgdigitizer_package(data: Any, version: str = None) None

Validate metadata against the svgdigitizer_package schema.

See validate() for parameter details.

EXAMPLES:

>>> from mdstools.schema.validator import validate_svgdigitizer_package
>>> validate_svgdigitizer_package('examples/file_schemas/svgdigitizer_package.json')

Local validation (always in sync with the working tree)::

    >>> import json, contextlib, io
    >>> from pathlib import Path
    >>> from mdstools.schema.validate_examples import validate_data, build_package_registry
    >>> schema = json.load(open('schemas/svgdigitizer_package.json'))
    >>> data = json.load(open('examples/file_schemas/svgdigitizer_package.json'))
    >>> with contextlib.redirect_stdout(io.StringIO()):
    ...     registry = build_package_registry(Path('schemas'))
    >>> errors = validate_data(data, schema, registry=registry)
    >>> len(errors)
    0
mdstools.schema.validator.validate_with_pydantic(data: Any, schema_name: str) Any

Validate metadata using auto-generated Pydantic models from LinkML.

Returns the validated Pydantic model instance on success. Raises ValueError on validation failure with detailed error messages.

Parameters:
  • data – Metadata dict to validate

  • schema_name – Schema name (e.g. ‘minimum_echemdb’, ‘autotag’)

Returns:

Validated Pydantic model instance

Raises:

ValueError – If validation fails or schema_name is unknown

EXAMPLES:

Validating correct metadata returns a Pydantic model:

>>> import yaml
>>> from mdstools.schema.validator import validate_with_pydantic
>>> with open('examples/file_schemas/minimum_echemdb.yaml') as f:
...     data = yaml.safe_load(f)
>>> model = validate_with_pydantic(data, 'minimum_echemdb')
>>> model.source.citationKey
'engstfeld_2018_polycrystalline_17743'

Validation errors raise ValueError:

>>> try:
...     validate_with_pydantic({'curation': 'not a dict'}, 'minimum_echemdb')
... except ValueError as e:
...     'validation failed' in str(e).lower()
True

Unknown schema name raises ValueError:

>>> try:
...     validate_with_pydantic({}, 'nonexistent')
... except ValueError as e:
...     print('Unknown schema')
Unknown schema