Grobid Service Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 9

DownloadGrobid-service Grobid-service-manual
Open PDF In BrowserView PDF
GROBID manual
1. Overview.......................................................................................................................................... 1
2. Build and set up environment for local deployment........................................................................ 1
3. Build and set up environment for remote deployment..................................................................... 1
4. Use of grobid-service console.......................................................................................................... 2
5. grobid-service REST API................................................................................................................. 5
6. Examples with curl...........................................................................................................................9
Authors: Damien Ridereau, Patrice Lopez
Contact: patrice.lopez@inria.fr

1.

Overview

The project grobid-service is a RESTful service implementation for accessing the grobid system.
grobid-service is an open source project under the Apache License 2.0. It comes as a war file for
deploying on a web container e.g. tomcat. The project also contains the libraries of grobid-core,
doing the extraction work.

2.

Build and set up environment for local deployment

To build grobid for local deployment, you just have to go to the root of the project and run the
following command:
mvn clean install
Then deploy the generated war to the server. The artifact is in:
grobid-service/target/grobid-service-.war

3.

Build and set up environment for remote deployment

3.1.

Logs

Grobid uses Apache log4j as logging library. By default, the log are written in a file grobid.log in
the current directory where the application is launched. This is of course not adapted to a
deployment in production. In order to set the path and filename for logging, edit the file under
grobid/grobid-core/src/resources/log4j.xml and change the following line according to your
production logging policy:

you can indicate the wished log path, for instance for Tomcat:

Be sure that the Tomcat or JBoss has the write authorization in the indicated log path.

3.2.

Parameters set up

In grobid-service-.war, the file web.xml has 3 parameters to set before starting the server:
org.grobid.property: path to grobid.property
org.grobid.property.service: path to grobid_service.properties
org.grobid.home: path to grobid_home
These properties are filled by the following variables: _GROBID_PROPERTY,
_GROBID_SERVICE_PROPERTY, and _GROBID_HOME so that it is possible to fill these values
with a script given the environment. It is also possible to set manually these variables before
building the war artefact.

3.3.

Build

To build grobid for remote deployment, you have to go to the root of the project and run the
following command:
mvn clean install -PgenericBuild
It will generate 2 artifacts, 1 in grobid-home, 1 in grobid-service:
grobid-home/target/grobid-home-.zip
grobid-service/target/grobid-service-.war
Copy these 2 artifacts to your remote server.
grobid-home-.zip contains the needed native libraries, the models, lexicons and a config
directory that contains 2 properties files grobid.properties and grobid_service.properties.
You have to unzip grobid-home wherever you want on your server.
unzip grobid-home-.zip

4.

Use of grobid-service console

Welcome page is available at http:/// (i.e: for local
tomcat http://localhost:8080/). From there you can access to
about grobid (Fig 4.1), process some conversion from the interface "Test Rest Interface" (Fig 4.2)
and access the administration parameters contained in grobid.properties and
grobid_service.properties (Fig 4.3):

Fig 4.1: About

Fig 4.2: Test Rest Interface

Fig 4.3: Service administration

The web page "Test Rest Interface" (Fig. 4.2) allows you to test the different REST requests quickly
and easily. For technical look in the code, GrobidRestService class is the entry point for each rest
service of Grobid.

5.

grobid-service REST API

The table below shows the provided resources corresponding to the HTTP verbs, to use the grobidservice. All url described bellow are relative path, the root url is http:///.

Type of request

URL

Parameter
name

/admin

Requesting
type

POST

MIME Type
Request input type
application/x-www-formurlencoded

sha1
/admin?sha1=

GET

String

/sha1

POST

application/x-www-formurlencoded

sha1
/sha1?sha1=

/allProperties

GET

String

POST

application/x-www-formurlencoded

sha1

Administration

GET

String

/changePropertyValue

POST

application/x-www-formurlencoded

xml

text/html

text/html

Request to get an input string hashed
using sha1.

Request to get all properties
key/value/type as xml. Sent xml follow
the following schema:

String



key
value
type

...


Change the property value from the
property key passed in the xml input. Xml
input has to follow the following schema:
text/xml

GET

Description

Request to get parameters of
grobid.properties and
grobid_service.properties formatted in
html table.

text/xml

/allProperties?sha1=

/changePropertyValue?xml=

Response output
type


pwd

key
value
type



General

/grobid

/processHeaderDocument

N/A

input
consolidate

GET

POST, PUT

N/A

multipart/form-data

text/html

application/xml

Gives a very brief description about
grobid.
Extract the header of the input PDF
document, normalize it and convert it into
a TEI format.
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).

/processFulltextDocument

input
consolidate

POST, PUT

multipart/form-data

application/xml

Convert the complete input document into
tei.xml format (header, body and
bibliographical section).
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).
Convert the complete input document into
tei.xml format (header, body and
bibliographical section).

Pdf to tei.xml
conversion
/processFulltextAssetDocument

input
consolidate

POST, PUT

multipart/form-data

application/zip

Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).
The result is a ZIP archive containing the
TEI fulltext and the embedded images
(the document assets) converted in PNG.

/processReferences

input
consolidate

POST, PUT

multipart/form-data

application/xml

Extract and convert all the references
present in the input document into tei.xml
format
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).

/processDate

/processHeaderNames

Parse/normalize
data

/processCitationNames

/processAffiliations

/processCitations

POST, PUT

application/x-www-formurlencoded

application/xml

POST, PUT

application/x-www-formurlencoded

application/xml

names

POST, PUT

application/x-www-formurlencoded

application/xml

affiliations

POST, PUT

application/x-www-formurlencoded

application/xml

date

names

citations
consolidate

POST, PUT

application/x-www-formurlencoded

application/xml

Parse a raw date and return the
corresponding normalized date in ISO
8601 embedded in a TEI fragment.

Parse a raw sequence of names from a
header section and return the
corresponding normalized authors in TEI
format.

Parse a raw sequence of names from a
header section and return the
corresponding normalized authors in TEI
format..
Parse a raw sequence of affiliations and
return the corresponding normalized
affiliations with address in TEI format..
Parse a raw citation and return the
corresponding normalized citations in
TEI format.
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).

/processCitationPatentTEI

input
consolidate

POST, PUT

multipart/form-data

application/xml

Extract and parse the patent and non
patent citations in the description of a
patent encoded in TEI. Results are added
to the original document as TEI stand-off
annotations.
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).

/processCitationPatentST36

input
consolidate

POST, PUT

multipart/form-data

application/xml

Extract and parse the patent and non
patent citations in the description of a
patent encoded in ST.36. Results are
returned as a lits of TEI citations.
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).

Citation
extraction and
normalization
from patents

/processCitationPatentTXT

text
consolidate

POST, PUT

application/x-www-formurlencoded

application/xml

Extract and parse the patent and non
patent citations in the description of a
patent sent as UTF-8 text. Results are
returned as a lits of TEI citations.
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).

/processCitationPatentPDF

input
consolidate

POST, PUT

multipart/form-data

application/xml

Extract and parse the patent and non
patent citations in the description of a
patent sent as PDF. Results are returned
as a lits of TEI citations.
Consolidate is a string of value 0 (no
consolidation) or 1 (consolidate).

6.

Examples with curl

Here are examples of command lines calling the Grobid service using curl. The server instance
name here is localhost using the port 8080.
•

header extraction of a PDF file in the current directory:

> curl -v --form input=@./thefile.pdf localhost:8080/processHeaderDocument
•

fulltext extraction of a PDF file in the current directory with consolidation of the header:

> curl -v --form consolidate=1 --form input=@./thefile.pdf
localhost:8080/processFulltextDocument
•

parsing of a raw reference string in isolation without consolidation (default value):

> curl -X POST -d "citations=Graff, Expert. Opin. Ther. Targets (2002) 6(1):
103-113" localhost:8080/processCitation
•

extraction and parsing of all references in a PDF without consolidation (default value):

> curl -v --form --form input=@./thefile.pdf
localhost:8080/processReferences



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Page Count                      : 9
Language                        : en-US
Title                           : grobid-service
Creator                         : Writer
Producer                        : LibreOffice 4.2
Create Date                     : 2016:07:21 15:47:00+02:00
EXIF Metadata provided by EXIF.tools

Navigation menu