Command Line Interface
The command line interface (CLI) allows creating SVG files from PDFs, which in turn allows digitizing the processed SVG files. Certain plot types have specific commands to recover different kinds of plots with different metadata. All commands and options are revealed with
Note
The preceding !
in the following examples is used to evaluate bash commands in jupyter notebooks. Remove the !
to evaluate the command in the shell.
!svgdigitizer
Usage: svgdigitizer [OPTIONS] COMMAND [ARGS]...
The svgdigitizer suite.
Options:
--help Show this message and exit.
Commands:
cv Digitize a cylic voltammogram and create a frictionless...
digitize Digitize a 2D plot.
figure Digitize a figure with units on the axis and create a...
paginate Render PDF pages as individual SVG files with linked PNG images.
plot Display a plot of the data traced in an SVG.
Note
Example files for the use with the svgdigitizer
can be found in the repository.
paginate
Create SVG and PNG files from a PDF with
!svgdigitizer paginate --help
Usage: svgdigitizer paginate [OPTIONS] PDF
Render PDF pages as individual SVG files with linked PNG images.
The SVG and PNG files are written to the PDF's directory.
Options:
--onlypng Only produce png files.
--outdir DIRECTORY Write output files to this directory.
--help Show this message and exit.
Example PDFs
for testing purposes are available in the svgdigitizer
repository.
Examples
!svgdigitizer paginate ./files/others/example_plot_paginate.pdf
Download the resulting SVG (example_plot_paginate_p0.svg)
.
digitize
Produces a CSV from the curve traced in the SVG.
!svgdigitizer digitize --help
Usage: svgdigitizer digitize [OPTIONS] SVG
Digitize a 2D plot.
Produces a CSV from the curve traced in the SVG.
Options:
--sampling-interval FLOAT Sampling interval on the x-axis with respect to
the x-axis values.
--outdir DIRECTORY Write output files to this directory.
--skewed Detect non-orthogonal skewed axes going through
the markers instead of assuming that axes are
perfectly horizontal and vertical.
--help Show this message and exit.
Examples
Consider the following skewed plot.
An unskewed digitized CSV can be created with
!svgdigitizer digitize ./files/others/example_plot_p0_demo_digitize.svg --skewed
The CSV can, for example, be imported as a pandas dataframe to create a plot.
import pandas as pd
df = pd.read_csv('./files/others/example_plot_p0_demo_digitize.csv')
df.plot(x='U', y='v', ylabel='v')
<Axes: xlabel='U', ylabel='v'>
The resulting plot indicates that only the nodes of the spline were connected. To improve the tracing use the --sampling-interval
option.
!svgdigitizer digitize ./files/others/example_plot_p0_demo_digitize.svg --skewed --sampling-interval 0.01
The result looks as follows
Show code cell source
import pandas as pd
df = pd.read_csv('./files/others/example_plot_p0_demo_digitize.csv')
df.plot(x='U', y='v', ylabel='v')
<Axes: xlabel='U', ylabel='v'>
Note
The use of svgdigitizer digitize
is discouraged when your axis labels contain units, because the output CSV does not contain this information. Use svgdigitizer figure
instead, which creates a frictionless datapackage (CSV + JSON).
plot
Display a plot of the data traced in an SVG
!svgdigitizer plot --help
Usage: svgdigitizer plot [OPTIONS] SVG
Display a plot of the data traced in an SVG.
Options:
--sampling-interval FLOAT Sampling interval on the x-axis with respect to
the x-axis values.
--skewed Detect non-orthogonal skewed axes going through
the markers instead of assuming that axes are
perfectly horizontal and vertical.
--help Show this message and exit.
Note
The plot will only be displayed, when your shell is configure accordingly.
Examples
The plot of an annotated example SVG (with skewed axis) with a specific sampling interval can be created with
!svgdigitizer plot ./files/others/example_plot_p0_demo_digitize.svg --skewed --sampling-interval 0.01
figure
The figure command produces a CSV and an JSON with additional metadata, which contains, for example, information on the axis units. In addition it will reconstruct a time axis, when the rate at which the data on the x-axis is given in a text label in the SVG such as scan rate: 30 m/s
. Here the unit must be equivalent to that on the x-axis divided by a time.
!svgdigitizer figure --help
Usage: svgdigitizer figure [OPTIONS] SVG
Digitize a figure with units on the axis and create a frictionless
datapackage.
The resulting CVS contains a time axis, when text label with a scan rate is
given in the SVG whose units must be of type `x-axis unit / time unit`, such
as `scan rate: 50 K / s`.
Options:
--sampling-interval FLOAT Sampling interval on the x-axis with respect to
the x-axis values.
--outdir DIRECTORY Write output files to this directory.
--metadata FILENAME yaml file with metadata
--si-units Convert units of the plot and CSV to SI (only if
they are compatible with astropy units).
--bibliography Adds bibliography data from a bibfile as
descriptor to the datapackage.
--skewed Detect non-orthogonal skewed axes going through
the markers instead of assuming that axes are
perfectly horizontal and vertical.
--help Show this message and exit.
Note
Flags --bibliography
, --metadata
and si-units
are covered in the advanced section section below.
Examples
Consider the following figure where the annotated SVG (looping_scan_rate.svg)
contains a scan rate.
Digitize the figure with
!svgdigitizer figure ./files/others/looping_scan_rate.svg --sampling-interval 0.01
No text with `figure` containing a label such as `figure: 1a` found in the SVG.
Resulting in
Show code cell source
import pandas as pd
df = pd.read_csv('./files/others/looping_scan_rate.csv')
df.plot(x='d', y='height', xlabel='distance [m]', ylabel='height [m]', legend=False)
<Axes: xlabel='distance [m]', ylabel='height [m]'>
cv
The cv
option is designed specifically to digitze cyclic voltammograms (CVs). Overall the command cv
has the same functionality as the figure
command. The differences are as follows.
Certain keys in the output metadata are directly related to cyclic voltammetry measurements.
The units on the x-axis must be equivalent to volt
U
given in units ofV
and those on the y-axis equivalent to currentI
in units ofA
or current densityj
in units ofA / m2
.The voltage unit can be given vs. a reference, such as
V vs. RHE
. In that case, the dimension should beE
instead ofU
.The
--sampling-interval
should be provided in units ofmV
.
These standardized CV data are, for example, used in the echemdb database.
!svgdigitizer cv --help
Usage: svgdigitizer cv [OPTIONS] SVG
Digitize a cylic voltammogram and create a frictionless datapackage.
The sampling interval should be provided in mV.
For inclusion in www.echemdb.org.
Options:
--sampling-interval FLOAT Sampling interval on the x-axis with respect to
the x-axis values.
--outdir DIRECTORY Write output files to this directory.
--metadata FILENAME yaml file with metadata
--bibliography Adds bibliography data from a bibfile as
descriptor to the datapackage.
--si-units Convert units of the plot and CSV to SI (only if
they are compatible with astropy units).
--skewed Detect non-orthogonal skewed axes going through
the markers instead of assuming that axes are
perfectly horizontal and vertical.
--help Show this message and exit.
Note
Flags --bibliography
, --metadata
and si-units
are covered in the advanced section section below.
Examples
An annotated example SVG
is shown in the following figure.
which can be digitzed via
!svgdigitizer cv ./files/mustermann_2021_svgdigitizer_1/mustermann_2021_svgdigitizer_1_f2a_blue.svg --sampling-interval 0.01
Advanced flags
--si-units
The flag --si-unit
is used by the figure
command and commands that inherit from figure
, such as the cv
command. The units are converted to SI units, if they are compatible with the astropy unit package. The values in the CSV are scaled respectively and the new units are provided in the output JSON files.
Warning
In some cases conversion to SI units might not result in the desired output. For example, even though V
is considered as an SI unit, astropy might convert the unit to W / A
or A Ohm
.
--metadata
The flag --metadata
allows adding metadata to the resource of the datapackage from a yaml file. It is used by the figure
command and commands that inherit from figure
, such as the cv
command.
Consider the following figure where the annotated SVG (looping_scan_rate.svg)
.
We collect additional metadata in a YAML file (looping_scan_rate_yaml.yaml)
describing the underlying “experiment”.
!svgdigitizer figure ./files/others/looping_scan_rate_yaml.svg --metadata ./files/others/looping_scan_rate_yaml.yaml --sampling-interval 0.01
No text with `figure` containing a label such as `figure: 1a` found in the SVG.
The metadata from the YAML is included in the JSON of the resulting datapackage and is accessible with a JSON loader
import json
with open('./files/others/looping_scan_rate_yaml.json', 'r') as f:
metadata = json.load(f)
metadata
{'resources': [{'name': 'looping_scan_rate_yaml',
'type': 'table',
'path': 'looping_scan_rate_yaml.csv',
'scheme': 'file',
'format': 'csv',
'mediatype': 'text/csv',
'encoding': 'utf-8',
'schema': {'fields': [{'name': 't', 'type': 'number', 'unit': 's'},
{'name': 'd', 'type': 'number', 'unit': 'm'},
{'name': 'height', 'type': 'number', 'unit': 'm'}]},
'metadata': {'echemdb': {'cyclist': 'John Doe',
'title': 'Cyclist driving through a looping.',
'description': 'The cyclist rides at a constant speed of 5 m/s along a track including a looping.',
'experimental': {'tags': []},
'source': {'figure': '', 'curve': 'blue'},
'figure description': {'version': 1,
'type': 'digitized',
'simultaneous measurements': [],
'measurement type': 'custom',
'fields': [{'name': 'd',
'type': 'number',
'unit': 'm',
'orientation': 'x'},
{'name': 'height', 'type': 'number', 'unit': 'm', 'orientation': 'y'}],
'comment': '',
'scan rate': {'value': 30.0, 'unit': 'm / s'}},
'data description': {'version': 1,
'type': 'digitized',
'measurement type': 'custom'}}}}]}
or directly with the frictionless interface
from frictionless import Package
package = Package('./files/others/looping_scan_rate_yaml.json')
package
{'resources': [{'name': 'looping_scan_rate_yaml',
'type': 'table',
'path': 'looping_scan_rate_yaml.csv',
'scheme': 'file',
'format': 'csv',
'mediatype': 'text/csv',
'encoding': 'utf-8',
'schema': {'fields': [{'name': 't',
'type': 'number',
'unit': 's'},
{'name': 'd',
'type': 'number',
'unit': 'm'},
{'name': 'height',
'type': 'number',
'unit': 'm'}]},
'metadata': {'echemdb': {'cyclist': 'John Doe',
'title': 'Cyclist driving through a '
'looping.',
'description': 'The cyclist rides at '
'a constant speed of 5 '
'm/s along a track '
'including a looping.',
'experimental': {'tags': []},
'source': {'figure': '',
'curve': 'blue'},
'figure description': {'version': 1,
'type': 'digitized',
'simultaneous measurements': [],
'measurement type': 'custom',
'fields': [{'name': 'd',
'type': 'number',
'unit': 'm',
'orientation': 'x'},
{'name': 'height',
'type': 'number',
'unit': 'm',
'orientation': 'y'}],
'comment': '',
'scan rate': {'value': 30.0,
'unit': 'm '
'/ '
's'}},
'data description': {'version': 1,
'type': 'digitized',
'measurement type': 'custom'}}}}]}
For electrochemical data an example YAML can be found here.
--bibliography
The flag --bibliography
adds a bibtex bibliography entry to the JSON of the produced datapackage. It is used by the figure
command and commands that inherit from figure
such as the cv
command.
Requirements:
a file in the
BibTex
format should exist in the same folder than the SVG (otherwise an empty string is returned)a YAML file must exist which is invoked with the
--metadata
option.the
YAML file
file must contain a reference to the bib file such as
source:
citation key: BIB_FILENAME # without file extension
!svgdigitizer figure ./files/others/looping_scan_rate_bib.svg --bibliography --metadata ./files/others/looping_scan_rate_bib.yaml --sampling-interval 0.01
No text with `figure` containing a label such as `figure: 1a` found in the SVG.
The bib file content is included in the resulting JSON of the datapackge
from frictionless import Package
package = Package('./files/others/looping_scan_rate_bib.json')
package
{'resources': [{'name': 'looping_scan_rate_bib',
'type': 'table',
'path': 'looping_scan_rate_bib.csv',
'scheme': 'file',
'format': 'csv',
'mediatype': 'text/csv',
'encoding': 'utf-8',
'schema': {'fields': [{'name': 't',
'type': 'number',
'unit': 's'},
{'name': 'd',
'type': 'number',
'unit': 'm'},
{'name': 'height',
'type': 'number',
'unit': 'm'}]},
'metadata': {'echemdb': {'cyclist': 'John Doe',
'title': 'Cyclist driving through a '
'looping.',
'description': 'The cyclist rides at '
'a constant speed of 5 '
'm/s along a track '
'including a looping.',
'source': {'citation key': 'cyclist2023',
'figure': '',
'curve': 'blue',
'bibdata': '@article{cyclist2023,\n'
' author = '
'"Doe, John",\n'
' title = '
'"Cycling a '
'Looping",\n'
' journal = '
'"New Open '
'Access '
'Journal",\n'
' volume = '
'"1",\n'
' number = '
'"1",\n'
' pages = '
'"1--4",\n'
' year = '
'"2023",\n'
' publisher '
'= "Some '
'publisher"\n'
'}\n'},
'experimental': {'tags': []},
'figure description': {'version': 1,
'type': 'digitized',
'simultaneous measurements': [],
'measurement type': 'custom',
'fields': [{'name': 'd',
'type': 'number',
'unit': 'm',
'orientation': 'x'},
{'name': 'height',
'type': 'number',
'unit': 'm',
'orientation': 'y'}],
'comment': '',
'scan rate': {'value': 30.0,
'unit': 'm '
'/ '
's'}},
'data description': {'version': 1,
'type': 'digitized',
'measurement type': 'custom'}}}}]}