Command Line Interface

The command line interface (CLI) allows creating SVG files from PDFs, which in turn allows digitizing the processed SVG files. Certain plot types have specific commands to recover different kinds of plots with different metadata. All commands and options are revealed with

Note

The preceding ! in the following examples is used to evaluate bash commands in jupyter notebooks. Remove the ! to evaluate the command in the shell.

!svgdigitizer
Usage: svgdigitizer [OPTIONS] COMMAND [ARGS]...

  The svgdigitizer suite.

Options:
  --help  Show this message and exit.

Commands:
  cv        Digitize a cylic voltammogram and create a frictionless...
  digitize  Digitize a 2D plot.
  figure    Digitize a figure with units on the axis and create a...
  paginate  Render PDF pages as individual SVG files with linked PNG images.
  plot      Display a plot of the data traced in an SVG.

Note

Example files for the use with the svgdigitizer can be found in the repository.

paginate

Create SVG and PNG files from a PDF with

!svgdigitizer paginate --help
Usage: svgdigitizer paginate [OPTIONS] PDF

  Render PDF pages as individual SVG files with linked PNG images.

  The SVG and PNG files are written to the PDF's directory.

Options:
  --onlypng           Only produce png files.
  --outdir DIRECTORY  Write output files to this directory.
  --help              Show this message and exit.

Example PDFs for testing purposes are available in the svgdigitizer repository.

Examples

!svgdigitizer paginate ./files/others/example_plot_paginate.pdf

Download the resulting SVG (example_plot_paginate_p0.svg).

digitize

Produces a CSV from the curve traced in the SVG.

!svgdigitizer digitize --help
Usage: svgdigitizer digitize [OPTIONS] SVG

  Digitize a 2D plot.

  Produces a CSV from the curve traced in the SVG.

Options:
  --sampling-interval FLOAT  Sampling interval on the x-axis with respect to
                             the x-axis values.
  --outdir DIRECTORY         Write output files to this directory.
  --skewed                   Detect non-orthogonal skewed axes going through
                             the markers instead of assuming that axes are
                             perfectly horizontal and vertical.
  --help                     Show this message and exit.

Examples

Consider the following skewed plot.

_images/example_plot_p0.png

An unskewed digitized CSV can be created with

!svgdigitizer digitize ./files/others/example_plot_p0_demo_digitize.svg --skewed

The CSV can, for example, be imported as a pandas dataframe to create a plot.

import pandas as pd

df = pd.read_csv('./files/others/example_plot_p0_demo_digitize.csv')
df.plot(x='U', y='v', ylabel='v')
<Axes: xlabel='U', ylabel='v'>
_images/d16593ad114f94053ed0ded813ba1e7aa8577f8a559b8e1753b068280aed5dcd.png

The resulting plot indicates that only the nodes of the spline were connected. To improve the tracing use the --sampling-interval option.

!svgdigitizer digitize ./files/others/example_plot_p0_demo_digitize.svg --skewed --sampling-interval 0.01

The result looks as follows

Hide code cell source
import pandas as pd

df = pd.read_csv('./files/others/example_plot_p0_demo_digitize.csv')
df.plot(x='U', y='v', ylabel='v')
<Axes: xlabel='U', ylabel='v'>
_images/b03085ec7af9eb1beb54dac49f35d134266eb3152ddaf1d21e15ec8092ac206d.png

Note

The use of svgdigitizer digitize is discouraged when your axis labels contain units, because the output CSV does not contain this information. Use svgdigitizer figure instead, which creates a frictionless datapackage (CSV + JSON).

plot

Display a plot of the data traced in an SVG

!svgdigitizer plot --help
Usage: svgdigitizer plot [OPTIONS] SVG

  Display a plot of the data traced in an SVG.

Options:
  --sampling-interval FLOAT  Sampling interval on the x-axis with respect to
                             the x-axis values.
  --skewed                   Detect non-orthogonal skewed axes going through
                             the markers instead of assuming that axes are
                             perfectly horizontal and vertical.
  --help                     Show this message and exit.

Note

The plot will only be displayed, when your shell is configure accordingly.

Examples

The plot of an annotated example SVG (with skewed axis) with a specific sampling interval can be created with

!svgdigitizer plot ./files/others/example_plot_p0_demo_digitize.svg --skewed --sampling-interval 0.01

figure

The figure command produces a CSV and an JSON with additional metadata, which contains, for example, information on the axis units. In addition it will reconstruct a time axis, when the rate at which the data on the x-axis is given in a text label in the SVG such as scan rate: 30 m/s. Here the unit must be equivalent to that on the x-axis divided by a time.

!svgdigitizer figure --help
Usage: svgdigitizer figure [OPTIONS] SVG

  Digitize a figure with units on the axis and create a frictionless
  datapackage.

  The resulting CVS contains a time axis, when text label with a scan rate is
  given in the SVG whose units must be of type `x-axis unit / time unit`, such
  as `scan rate: 50 K / s`.

Options:
  --sampling-interval FLOAT  Sampling interval on the x-axis with respect to
                             the x-axis values.
  --outdir DIRECTORY         Write output files to this directory.
  --metadata FILENAME        yaml file with metadata
  --si-units                 Convert units of the plot and CSV to SI (only if
                             they are compatible with astropy units).
  --bibliography             Adds bibliography data from a bibfile as
                             descriptor to the datapackage.
  --skewed                   Detect non-orthogonal skewed axes going through
                             the markers instead of assuming that axes are
                             perfectly horizontal and vertical.
  --help                     Show this message and exit.

Note

Flags --bibliography, --metadata and si-units are covered in the advanced section section below.

Examples

Consider the following figure where the annotated SVG (looping_scan_rate.svg) contains a scan rate.

_images/looping_scan_rate_annotated.png

Digitize the figure with

!svgdigitizer figure ./files/others/looping_scan_rate.svg --sampling-interval 0.01
No text with `figure` containing a label such as `figure: 1a` found in the SVG.

Resulting in

Hide code cell source
import pandas as pd

df = pd.read_csv('./files/others/looping_scan_rate.csv')
df.plot(x='d', y='height', xlabel='distance [m]', ylabel='height [m]', legend=False)
<Axes: xlabel='distance [m]', ylabel='height [m]'>
_images/2b757ed95ea9c37e5c26a87228b3b4510be1f7bcf156ccbb5709e66b74fa3887.png

cv

The cv option is designed specifically to digitze cyclic voltammograms (CVs). Overall the command cv has the same functionality as the figure command. The differences are as follows.

  • Certain keys in the output metadata are directly related to cyclic voltammetry measurements.

  • The units on the x-axis must be equivalent to volt U given in units of V and those on the y-axis equivalent to current I in units of A or current density j in units of A / m2.

  • The voltage unit can be given vs. a reference, such as V vs. RHE. In that case, the dimension should be E instead of U.

  • The --sampling-interval should be provided in units of mV.

These standardized CV data are, for example, used in the echemdb database.

!svgdigitizer cv --help
Usage: svgdigitizer cv [OPTIONS] SVG

  Digitize a cylic voltammogram and create a frictionless datapackage.

  The sampling interval should be provided in mV.

  For inclusion in www.echemdb.org.

Options:
  --sampling-interval FLOAT  Sampling interval on the x-axis with respect to
                             the x-axis values.
  --outdir DIRECTORY         Write output files to this directory.
  --metadata FILENAME        yaml file with metadata
  --bibliography             Adds bibliography data from a bibfile as
                             descriptor to the datapackage.
  --si-units                 Convert units of the plot and CSV to SI (only if
                             they are compatible with astropy units).
  --skewed                   Detect non-orthogonal skewed axes going through
                             the markers instead of assuming that axes are
                             perfectly horizontal and vertical.
  --help                     Show this message and exit.

Note

Flags --bibliography, --metadata and si-units are covered in the advanced section section below.

Examples

An annotated example SVG is shown in the following figure.

_images/mustermann_2021_svgdigitizer_1_f2a_blue.png

which can be digitzed via

!svgdigitizer cv ./files/mustermann_2021_svgdigitizer_1/mustermann_2021_svgdigitizer_1_f2a_blue.svg --sampling-interval 0.01

Advanced flags

--si-units

The flag --si-unit is used by the figure command and commands that inherit from figure, such as the cv command. The units are converted to SI units, if they are compatible with the astropy unit package. The values in the CSV are scaled respectively and the new units are provided in the output JSON files.

Warning

In some cases conversion to SI units might not result in the desired output. For example, even though V is considered as an SI unit, astropy might convert the unit to W / A or A Ohm.

--metadata

The flag --metadata allows adding metadata to the resource of the datapackage from a yaml file. It is used by the figure command and commands that inherit from figure, such as the cv command.

Consider the following figure where the annotated SVG (looping_scan_rate.svg).

_images/looping_scan_rate_annotated.png

We collect additional metadata in a YAML file (looping_scan_rate_yaml.yaml) describing the underlying “experiment”.

!svgdigitizer figure ./files/others/looping_scan_rate_yaml.svg --metadata ./files/others/looping_scan_rate_yaml.yaml --sampling-interval 0.01
No text with `figure` containing a label such as `figure: 1a` found in the SVG.

The metadata from the YAML is included in the JSON of the resulting datapackage and is accessible with a JSON loader

import json
with open('./files/others/looping_scan_rate_yaml.json', 'r') as f:
    metadata = json.load(f)
metadata
{'resources': [{'name': 'looping_scan_rate_yaml',
   'type': 'table',
   'path': 'looping_scan_rate_yaml.csv',
   'scheme': 'file',
   'format': 'csv',
   'mediatype': 'text/csv',
   'encoding': 'utf-8',
   'schema': {'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
     {'name': 'd', 'type': 'number', 'unit': 'm'},
     {'name': 'height', 'type': 'number', 'unit': 'm'}]},
   'metadata': {'echemdb': {'cyclist': 'John Doe',
     'title': 'Cyclist driving through a looping.',
     'description': 'The cyclist rides at a constant speed of 5 m/s along a track including a looping.',
     'experimental': {'tags': []},
     'source': {'figure': '', 'curve': 'blue'},
     'figure description': {'version': 1,
      'type': 'digitized',
      'simultaneous measurements': [],
      'measurement type': 'custom',
      'fields': [{'name': 'd',
        'type': 'number',
        'unit': 'm',
        'orientation': 'x'},
       {'name': 'height', 'type': 'number', 'unit': 'm', 'orientation': 'y'}],
      'comment': '',
      'scan rate': {'value': 30.0, 'unit': 'm / s'}},
     'data description': {'version': 1,
      'type': 'digitized',
      'measurement type': 'custom'}}}}]}

or directly with the frictionless interface

from frictionless import Package
package = Package('./files/others/looping_scan_rate_yaml.json')
package
{'resources': [{'name': 'looping_scan_rate_yaml',
                'type': 'table',
                'path': 'looping_scan_rate_yaml.csv',
                'scheme': 'file',
                'format': 'csv',
                'mediatype': 'text/csv',
                'encoding': 'utf-8',
                'schema': {'fields': [{'name': 't',
                                       'type': 'number',
                                       'unit': 's'},
                                      {'name': 'd',
                                       'type': 'number',
                                       'unit': 'm'},
                                      {'name': 'height',
                                       'type': 'number',
                                       'unit': 'm'}]},
                'metadata': {'echemdb': {'cyclist': 'John Doe',
                                         'title': 'Cyclist driving through a '
                                                  'looping.',
                                         'description': 'The cyclist rides at '
                                                        'a constant speed of 5 '
                                                        'm/s along a track '
                                                        'including a looping.',
                                         'experimental': {'tags': []},
                                         'source': {'figure': '',
                                                    'curve': 'blue'},
                                         'figure description': {'version': 1,
                                                                'type': 'digitized',
                                                                'simultaneous measurements': [],
                                                                'measurement type': 'custom',
                                                                'fields': [{'name': 'd',
                                                                            'type': 'number',
                                                                            'unit': 'm',
                                                                            'orientation': 'x'},
                                                                           {'name': 'height',
                                                                            'type': 'number',
                                                                            'unit': 'm',
                                                                            'orientation': 'y'}],
                                                                'comment': '',
                                                                'scan rate': {'value': 30.0,
                                                                              'unit': 'm '
                                                                                      '/ '
                                                                                      's'}},
                                         'data description': {'version': 1,
                                                              'type': 'digitized',
                                                              'measurement type': 'custom'}}}}]}

For electrochemical data an example YAML can be found here.

--bibliography

The flag --bibliography adds a bibtex bibliography entry to the JSON of the produced datapackage. It is used by the figure command and commands that inherit from figure such as the cv command.

Requirements:

  • a file in the BibTex format should exist in the same folder than the SVG (otherwise an empty string is returned)

  • a YAML file must exist which is invoked with the --metadata option.

  • the YAML file file must contain a reference to the bib file such as

source:
  citation key: BIB_FILENAME  # without file extension
!svgdigitizer figure ./files/others/looping_scan_rate_bib.svg --bibliography --metadata ./files/others/looping_scan_rate_bib.yaml --sampling-interval 0.01
No text with `figure` containing a label such as `figure: 1a` found in the SVG.

The bib file content is included in the resulting JSON of the datapackge

from frictionless import Package
package = Package('./files/others/looping_scan_rate_bib.json')
package
{'resources': [{'name': 'looping_scan_rate_bib',
                'type': 'table',
                'path': 'looping_scan_rate_bib.csv',
                'scheme': 'file',
                'format': 'csv',
                'mediatype': 'text/csv',
                'encoding': 'utf-8',
                'schema': {'fields': [{'name': 't',
                                       'type': 'number',
                                       'unit': 's'},
                                      {'name': 'd',
                                       'type': 'number',
                                       'unit': 'm'},
                                      {'name': 'height',
                                       'type': 'number',
                                       'unit': 'm'}]},
                'metadata': {'echemdb': {'cyclist': 'John Doe',
                                         'title': 'Cyclist driving through a '
                                                  'looping.',
                                         'description': 'The cyclist rides at '
                                                        'a constant speed of 5 '
                                                        'm/s along a track '
                                                        'including a looping.',
                                         'source': {'citation key': 'cyclist2023',
                                                    'figure': '',
                                                    'curve': 'blue',
                                                    'bibdata': '@article{cyclist2023,\n'
                                                               '    author = '
                                                               '"Doe, John",\n'
                                                               '    title = '
                                                               '"Cycling a '
                                                               'Looping",\n'
                                                               '    journal = '
                                                               '"New Open '
                                                               'Access '
                                                               'Journal",\n'
                                                               '    volume = '
                                                               '"1",\n'
                                                               '    number = '
                                                               '"1",\n'
                                                               '    pages = '
                                                               '"1--4",\n'
                                                               '    year = '
                                                               '"2023",\n'
                                                               '    publisher '
                                                               '= "Some '
                                                               'publisher"\n'
                                                               '}\n'},
                                         'experimental': {'tags': []},
                                         'figure description': {'version': 1,
                                                                'type': 'digitized',
                                                                'simultaneous measurements': [],
                                                                'measurement type': 'custom',
                                                                'fields': [{'name': 'd',
                                                                            'type': 'number',
                                                                            'unit': 'm',
                                                                            'orientation': 'x'},
                                                                           {'name': 'height',
                                                                            'type': 'number',
                                                                            'unit': 'm',
                                                                            'orientation': 'y'}],
                                                                'comment': '',
                                                                'scan rate': {'value': 30.0,
                                                                              'unit': 'm '
                                                                                      '/ '
                                                                                      's'}},
                                         'data description': {'version': 1,
                                                              'type': 'digitized',
                                                              'measurement type': 'custom'}}}}]}