Converters

Functions for flattening nested YAML/dict structures into tabular format.

mdstools.converters.flatten.flatten(d, prefix='', parent_key=None)

Flatten a nested YAML structure into a tabular format with numbered rows.

Parameters:

d – The data structure to flatten (dict or list)
prefix – The current numbering prefix (default: “”)
parent_key – The parent key name (default: None)

Returns:

A list of rows, each containing [number, key, value]

EXAMPLES:

Simple key-value pairs:

>>> data = {
...     'experiment': 'abc'}
>>> flatten(data)
[['1', 'experiment', 'abc']]

>>> data = {
...     'experiment': 'abc',
...     'details': 'foo'}
>>> flatten(data)
[['1', 'experiment', 'abc'],
 ['2', 'details', 'foo']]

Nested dictionaries:

>>> data = {
...     'experiment': {'value': 42, 'units': 'mV'}}
>>> flatten(data)
[['1', 'experiment', '<nested>'],
 ['1.1', 'value', 42],
 ['1.2', 'units', 'mV']]

Lists of dictionaries:

>>> data = {
...     'experiment': [{'A': 1, 'B': 2}, {'A': 3, 'B': 4}]}
>>> flatten(data)
[['1', 'experiment', '<nested>'],
 ['1.i1.1', 'A', 1],
 ['1.i1.2', 'B', 2],
 ['1.i2.1', 'A', 3],
 ['1.i2.2', 'B', 4]]

Primitive lists:

>>> data = {
...     'measurements': ['A', 'B', 'C']}
>>> flatten(data)
[['1', 'measurements', '<nested>'],
 ['1.i1', '', 'A'],
 ['1.i2', '', 'B'],
 ['1.i3', '', 'C']]

Mixed nested structures:

>>> data = {
...     'experiment': [{'A': {'value': 1, 'units': 'mV'}, 'B': 2}, {'A': 3, 'B': 4}]}
>>> flatten(data)
[['1', 'experiment', '<nested>'],
 ['1.i1.1', 'A', '<nested>'],
 ['1.i1.1.1', 'value', 1],
 ['1.i1.1.2', 'units', 'mV'],
 ['1.i1.2', 'B', 2],
 ['1.i2.1', 'A', 3],
 ['1.i2.2', 'B', 4]]

mdstools.converters.flatten.is_primitive_list(lst)

Check if a list contains only primitive values (no dicts or lists).

Returns True if every element is a simple scalar (str, int, float, bool, None), and False if any element is a dict or list.

EXAMPLES:

Primitive lists:

>>> from mdstools.converters.flatten import is_primitive_list
>>> is_primitive_list(['a', 'b', 'c'])
True
>>> is_primitive_list([1, 2, 3])
True
>>> is_primitive_list([1, 'mixed', 3.14, True, None])
True

Non-primitive lists:

>>> is_primitive_list([{'key': 'value'}])
False
>>> is_primitive_list([[1, 2], [3, 4]])
False
>>> is_primitive_list([1, {'nested': True}])
False

Edge case - empty list:

>>> is_primitive_list([])
True

Functions for unflattening tabular data back into nested structures.

mdstools.converters.unflatten.unflatten(rows)

Reconstruct a nested dictionary from flattened rows.

Takes a list of [number, key, value] rows produced by flatten() and rebuilds the original nested dictionary. Header rows (['number', 'key', 'value']) are automatically detected and skipped.

Parameters:: rows – List of [number, key, value] rows
Returns:: Reconstructed nested dictionary

EXAMPLES:

Simple key-value pairs:

>>> from mdstools.converters.unflatten import unflatten
>>> rows = [['1', 'name', 'test'], ['2', 'value', 42]]
>>> unflatten(rows)
{'name': 'test', 'value': 42}

Nested dictionaries:

>>> rows = [['1', 'experiment', '<nested>'],
...         ['1.1', 'value', 42],
...         ['1.2', 'units', 'mV']]
>>> unflatten(rows)
{'experiment': {'value': 42, 'units': 'mV'}}

Lists of dictionaries (item-indexed with i<n>):

>>> rows = [['1', 'people', '<nested>'],
...         ['1.i1.1', 'name', 'Alice'],
...         ['1.i1.2', 'role', 'curator'],
...         ['1.i2.1', 'name', 'Bob'],
...         ['1.i2.2', 'role', 'reviewer']]
>>> unflatten(rows)
{'people': [{'name': 'Alice', 'role': 'curator'}, {'name': 'Bob', 'role': 'reviewer'}]}

Primitive lists:

>>> rows = [['1', 'tags', '<nested>'],
...         ['1.i1', '', 'alpha'],
...         ['1.i2', '', 'beta'],
...         ['1.i3', '', 'gamma']]
>>> unflatten(rows)
{'tags': ['alpha', 'beta', 'gamma']}

Mixed nested structures (dicts inside lists inside dicts):

>>> rows = [['1', 'experiment', '<nested>'],
...         ['1.i1.1', 'A', '<nested>'],
...         ['1.i1.1.1', 'value', 1],
...         ['1.i1.1.2', 'units', 'mV'],
...         ['1.i1.2', 'B', 2],
...         ['1.i2.1', 'A', 3],
...         ['1.i2.2', 'B', 4]]
>>> unflatten(rows)
{'experiment': [{'A': {'value': 1, 'units': 'mV'}, 'B': 2}, {'A': 3, 'B': 4}]}

Header rows are automatically skipped:

>>> rows = [['number', 'key', 'value'],
...         ['1', 'name', 'test']]
>>> unflatten(rows)
{'name': 'test'}

Empty input returns an empty dict:

>>> unflatten([])
{}

Roundtrip with flatten():

>>> from mdstools.converters.flatten import flatten
>>> original = {'curation': {'process': [{'role': 'curator', 'name': 'Jane'}]}}
>>> unflatten(flatten(original)) == original
True