skip to context

Springer’s text- and data-mining policy

Springer grants text- and data-mining rights to subscribed content, provided the purpose is non-commercial research.

Demand for TDM has been low up to now, but is steadily increasing . As the volume of scientific publications increases and TDM software tools improve, Springer acknowledges the need for a more formalized process to enable TDM, and strives to make this as simple as possible for researchers.

Springer grants text- and data-mining rights to subscribed content to researchers via their institutions, provided the purpose is non-commercial research. The selection and refinement of desired articles can be conducted by using existing search methods and tools, such as PubMed, Web of Science and Springer’s Metadata API, among others.

Individual researchers are encouraged to download subscription and open access content for TDM purposes directly from the SpringerLink platform. No registration or API key is required. Full-text content can be accessed easily and programmatically at friendly URLs based on the content’s Digital Object Identifier (DOI).

Implementation by academic and government institutions

For subscribers at academic and government institutions, these rights will be included in all new and renewed SpringerLink subscription agreements as an additional TDM clause. Existing subscribers may also add the TDM clause before their agreement is up for renewal.

Use of text and data mining results and research output

Publications or analyses resulting from TDM of subscribed content may include quotations from the original text of up to 200 characters, or 20 words, or 1 complete sentence. They should cite the original Springer content in the form of a DOI link. Permission to reproduce images may be granted on a case-by-case basis.

For Open Access (OA) publications from Springer, BioMed Central and SpringerOpen, TDM is usually allowed without restrictions since the majority of our OA content is licensed under CC-BY.

Technical guide to downloading content

For TDM researchers interested in cross-publisher automated downloading, the CrossRef TDM initiative may be useful. Springer is actively collaborating with CrossRef on this project and we expect Springer content to be fully supported soon.

How to perform Text and Data Mining of Springer content

The following guidelines illustrate how to download Springer or BMC content for TDM purposes.

  • We assume that the TDM researcher begins with a list of digital object identifiers (DOIs) for articles and chapters of interest, to be downloaded for TDM purposes. If you need to make such a list, a few useful search tools are listed in the appendix below.
  • Individual researchers are encouraged to download subscribed and open access content for TDM purposes directly from the SpringerLink platform. Full-text content can be accessed via friendly URLs, using a DOI with the following convention:

PDF: http://link.springer.com/[DOI].pdf

HTML (when available): http://link.springer.com/[DOI].html

  • Content can be downloaded using a web browser, or with an HTTP GET request using any convenient scripting tool such as curl, wget and Python’s urllib, among others. Note that the tool should be enabled to follow HTTP 301, 302 and 303 redirects. See example below.
  •  No API key or other authentication is required. TDM researchers are requested to be considerate and limit their downloading speed to a reasonable rate.

Example:

The TDM researcher wants to download the Open Access article “Properties of gold nanostructures sputtered on glass,” Nanoscale Research Letters, 2011, 6:96, DOI 10.1186/1556-276X-6-96.


The full-text article may thus be downloaded via the following URLs:
• PDF: http://link.springer.com/10.1186/1556-276X-6-96.pdf
• HTML: http://link.springer.com/10.1186/1556-276X-6-96.html


For example, the following curl command would download the PDF version into a file named “article.pdf”:

> curl -X GET -L http://link.springer.com/10.1186/1556-276X-6-96.pdf > article.pdf

Search tools for compiling a list of relevant DOIs

Many online tools support rich advanced or Boolean searches of academic literature, which can be used to generate lists of DOIs for text and data mining.

For example, two such tools are PubMed (free metadata search for biomedical literature) and the Web of Science (subscription only, covering multiple fields). PubMed includes APIs for searching articles and listing article DOIs, while Web of Science supports exporting machine-readable citation information that include DOIs.

For searching within Springer content, Springer provides the free Springer Metadata API. The API provides rich searching for the vast majority of Springer, BioMed Central and SpringerOpen documents, including all journal content, book chapters and protocols. The Springer Book Archives will soon be searchable through this API as well.

The Metadata API returns basic document metadata (in XML or JSON format), including content DOIs and direct URLs to the PDF and HTML full-text files on SpringerLink.
To use the Metadata API, simply register on Springer's API portal, sign the Terms of Service, and request a key for the Metadata API. The key will then be automatically generated. Some documentation for the API is available on the website to get you started.

For TDM researchers interested in cross-publisher automated downloading, the CrossRef TDM initiative may be useful. Springer is actively collaborating with CrossRef on this project and we expect Springer content to be fully supported soon.