Welcome to svgdigitizer’s documentation!

The svgdigitizer allows recovering data from a curve in a figure, plotted in a 2D coordinate system. The data can be recovered from an SVG with a Python API or a command line interface to create figures or frictionless datapackages (CSV and JSON). The svgdigitizer supports units, scalebars, scaling factors, and more.

files/logo/logo.png

Example Plot

Such plots are often found in scientific publications, where in many cases, especially for old publications, the source data is not accessible anymore. In some cases, the axes of the plot can be skewed, e.g., in scanned documents. An extreme case for such a plot is depicted in the following figure.

files/images/example_plot_p0.png

In order to recover the source data, first the plot is imported in a vector graphics program, such as Inkscape to create an SVG file. The curve is traced with a regular bezier path and is grouped with a text label containing a unique label. The coordinate system is defined by groups of two points and text labels for each axis. A scan rate can be given as text label, where the units of the rate must be equivalent to the unit on the x-axis divided by time, such as 50 mV / s. This allows reconstructing a time axis for inclusion in the CSV file. Additional labels describing the data can be provided anywhere in the SVG file. For the above figure the SVG looks as follows.

files/images/example_plot_p0_demo.png

Command line interface

This SVG can be digitized from the command line interface, which creates a frictionless datapacke including a CSV of the x and y data (here U and v) and a JSON file with metadata. The sampling of the bezier paths can be set by --sampling-interval which specifies the sampling interval in x units. In this specific case also indicate that the axes are --skewed.

svgdigitizer figure example_plot_p0_demo.svg --sampling-interval 0.01 --skewed

API

With the Python API, the SVG can also be used to create an SVGPlot instance to extract basic information.

from svgdigitizer.svg import SVG
from svgdigitizer.svgplot import SVGPlot

plot = SVGPlot(SVG(open('./files/others/example_plot_p0_demo.svg', 'rb')), sampling_interval=0.01, algorithm='mark-aligned')

Now axis labels or any other text label in the SVG can be queried.

plot.axis_labels
{'U': 'V', 'v': 'm / s'}
plot.svg.get_texts()
[<text>curve: blue</text>,
 <text>U2: 80 V</text>,
 <text>U1: 10 V</text>,
 <text>v1: 10 m / s</text>,
 <text>v2: 60 m / s</text>,
 <text>comment: random data</text>,
 <text>operator: Mr. X</text>,
 <text>scan rate: 50 mV / s</text>]

The sampled data can be extracted as a pandas dataframe, where the values are given the original plots units:

plot.df.head()
U v
0 9.881301 12.035193
1 9.891301 12.033743
2 9.901301 12.032435
3 9.911301 12.031220
4 9.921301 12.030082

The SVGPlot instance can be used be used to create an SVGFigure instance, which, for example, reconstructs the time axis based on the scan rate.

from svgdigitizer.svgfigure import SVGFigure

figure = SVGFigure(plot)
figure.scan_rate # Returns an astropy quantity
\[50 \; \mathrm{\frac{mV}{s}}\]

And the corresponding data:

figure.df.head()
t U v
0 0.0 9.881301 12.035193
1 0.2 9.891301 12.033743
2 0.4 9.901301 12.032435
3 0.6 9.911301 12.031220
4 0.8 9.921301 12.030082

A plot can be created via

figure.plot() # Or plot.plot() for an svgplot instance.
_images/b76c30f3e64eb929f2f9b5eb13b3aa3352b98d00428092d2a36261e03ca3615d.png

Advantages

The `svgdigitizer has some advantages compared to other plot digitizers:

  • usage of splines allows for very precise retracing distinct features

  • supports multiple y (x) values per x (y) value

  • supports scale bars

  • supports scaling factors

  • supports plots with skewed axis

  • splines can be digitized with specific sampling intervals

  • extracts units from axis labels

  • extracts metadata associated with the plot in the SVG

  • reconstruct time series with a given scan rate

  • saves data as frictionless datapackage (CSV + JSON) allowing for FAIR data usage

  • inclusion of metadata in the datapackage

  • Python API to interact with the retraced data

Installation

This package is available on PiPY and can be installed with pip:

pip install svgdigitizer

The package is also available on conda-forge an can be installed with conda

conda install -c conda-forge svgdigitizer

or mamba

mamba install -c conda-forge svgdigitizer

See the installation instructions for further details.

Further information

The svgdigitizer can be enhanced with other modules for specific datasets.

Currently the following datasets are supported:

  • cyclic voltammograms (I vs. E — current vs. potential curves or j vs. E — current density vs. potential curves) commonly found in electrochemistry. For further details and requirements refer to the specific instructions of the cv module itself or the detailed description on how to digitize cyclic voltammograms for the echemdb.

If you have used this project in the preparation of a publication, please cite it as described on our zenodo page.

Datapackage interaction

Datapackges created with svgplot (or modules inheriting from svgplot such as cv) can be loaded with the ‘unitpackage’ module to create a database of the digitized data. In case your own data has the same datapackage structure, the digitized data can easily be compared with your own data.