Grobid Service Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 9
Download | |
Open PDF In Browser | View PDF |
GROBID manual 1. Overview.......................................................................................................................................... 1 2. Build and set up environment for local deployment........................................................................ 1 3. Build and set up environment for remote deployment..................................................................... 1 4. Use of grobid-service console.......................................................................................................... 2 5. grobid-service REST API................................................................................................................. 5 6. Examples with curl...........................................................................................................................9 Authors: Damien Ridereau, Patrice Lopez Contact: patrice.lopez@inria.fr 1. Overview The project grobid-service is a RESTful service implementation for accessing the grobid system. grobid-service is an open source project under the Apache License 2.0. It comes as a war file for deploying on a web container e.g. tomcat. The project also contains the libraries of grobid-core, doing the extraction work. 2. Build and set up environment for local deployment To build grobid for local deployment, you just have to go to the root of the project and run the following command: mvn clean install Then deploy the generated war to the server. The artifact is in: grobid-service/target/grobid-service-.war 3. Build and set up environment for remote deployment 3.1. Logs Grobid uses Apache log4j as logging library. By default, the log are written in a file grobid.log in the current directory where the application is launched. This is of course not adapted to a deployment in production. In order to set the path and filename for logging, edit the file under grobid/grobid-core/src/resources/log4j.xml and change the following line according to your production logging policy: you can indicate the wished log path, for instance for Tomcat: Be sure that the Tomcat or JBoss has the write authorization in the indicated log path. 3.2. Parameters set up In grobid-service- .war, the file web.xml has 3 parameters to set before starting the server: org.grobid.property: path to grobid.property org.grobid.property.service: path to grobid_service.properties org.grobid.home: path to grobid_home These properties are filled by the following variables: _GROBID_PROPERTY, _GROBID_SERVICE_PROPERTY, and _GROBID_HOME so that it is possible to fill these values with a script given the environment. It is also possible to set manually these variables before building the war artefact. 3.3. Build To build grobid for remote deployment, you have to go to the root of the project and run the following command: mvn clean install -PgenericBuild It will generate 2 artifacts, 1 in grobid-home, 1 in grobid-service: grobid-home/target/grobid-home- .zip grobid-service/target/grobid-service- .war Copy these 2 artifacts to your remote server. grobid-home- .zip contains the needed native libraries, the models, lexicons and a config directory that contains 2 properties files grobid.properties and grobid_service.properties. You have to unzip grobid-home wherever you want on your server. unzip grobid-home- .zip 4. Use of grobid-service console Welcome page is available at http:// / (i.e: for local tomcat http://localhost:8080/ ). From there you can access to about grobid (Fig 4.1), process some conversion from the interface "Test Rest Interface" (Fig 4.2) and access the administration parameters contained in grobid.properties and grobid_service.properties (Fig 4.3): Fig 4.1: About Fig 4.2: Test Rest Interface Fig 4.3: Service administration The web page "Test Rest Interface" (Fig. 4.2) allows you to test the different REST requests quickly and easily. For technical look in the code, GrobidRestService class is the entry point for each rest service of Grobid. 5. grobid-service REST API The table below shows the provided resources corresponding to the HTTP verbs, to use the grobidservice. All url described bellow are relative path, the root url is http:// / . Type of request URL Parameter name /admin Requesting type POST MIME Type Request input type application/x-www-formurlencoded sha1 /admin?sha1= GET String /sha1 POST application/x-www-formurlencoded sha1 /sha1?sha1= /allProperties GET String POST application/x-www-formurlencoded sha1 Administration GET String /changePropertyValue POST application/x-www-formurlencoded xml text/html text/html Request to get an input string hashed using sha1. Request to get all properties key/value/type as xml. Sent xml follow the following schema: String Change the property value from the property key passed in the xml input. Xml input has to follow the following schema: text/xml GET Description Request to get parameters of grobid.properties and grobid_service.properties formatted in html table. text/xml /allProperties?sha1= key value type ... /changePropertyValue?xml= Response output type General /grobid /processHeaderDocument N/A input consolidate GET POST, PUT N/A multipart/form-data text/html application/xml Gives a very brief description about grobid. Extract the header of the input PDF document, normalize it and convert it into a TEI format. Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). /processFulltextDocument input consolidate POST, PUT multipart/form-data application/xml Convert the complete input document into tei.xml format (header, body and bibliographical section). Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). Convert the complete input document into tei.xml format (header, body and bibliographical section). Pdf to tei.xml conversion /processFulltextAssetDocument input consolidate POST, PUT multipart/form-data application/zip Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). The result is a ZIP archive containing the TEI fulltext and the embedded images (the document assets) converted in PNG. /processReferences input consolidate POST, PUT multipart/form-data application/xml Extract and convert all the references present in the input document into tei.xml format Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). /processDate /processHeaderNames Parse/normalize data /processCitationNames /processAffiliations /processCitations POST, PUT application/x-www-formurlencoded application/xml POST, PUT application/x-www-formurlencoded application/xml names POST, PUT application/x-www-formurlencoded application/xml affiliations POST, PUT application/x-www-formurlencoded application/xml date names citations consolidate POST, PUT application/x-www-formurlencoded application/xml Parse a raw date and return the corresponding normalized date in ISO 8601 embedded in a TEI fragment. Parse a raw sequence of names from a header section and return the corresponding normalized authors in TEI format. Parse a raw sequence of names from a header section and return the corresponding normalized authors in TEI format.. Parse a raw sequence of affiliations and return the corresponding normalized affiliations with address in TEI format.. Parse a raw citation and return the corresponding normalized citations in TEI format. Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). /processCitationPatentTEI input consolidate POST, PUT multipart/form-data application/xml Extract and parse the patent and non patent citations in the description of a patent encoded in TEI. Results are added to the original document as TEI stand-off annotations. Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). /processCitationPatentST36 input consolidate POST, PUT multipart/form-data application/xml Extract and parse the patent and non patent citations in the description of a patent encoded in ST.36. Results are returned as a lits of TEI citations. Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). Citation extraction and normalization from patents /processCitationPatentTXT text consolidate POST, PUT application/x-www-formurlencoded application/xml Extract and parse the patent and non patent citations in the description of a patent sent as UTF-8 text. Results are returned as a lits of TEI citations. Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). /processCitationPatentPDF input consolidate POST, PUT multipart/form-data application/xml Extract and parse the patent and non patent citations in the description of a patent sent as PDF. Results are returned as a lits of TEI citations. Consolidate is a string of value 0 (no consolidation) or 1 (consolidate). 6. Examples with curl Here are examples of command lines calling the Grobid service using curl. The server instance name here is localhost using the port 8080. • header extraction of a PDF file in the current directory: > curl -v --form input=@./thefile.pdf localhost:8080/processHeaderDocument • fulltext extraction of a PDF file in the current directory with consolidation of the header: > curl -v --form consolidate=1 --form input=@./thefile.pdf localhost:8080/processFulltextDocument • parsing of a raw reference string in isolation without consolidation (default value): > curl -X POST -d "citations=Graff, Expert. Opin. Ther. Targets (2002) 6(1): 103-113" localhost:8080/processCitation • extraction and parsing of all references in a PDF without consolidation (default value): > curl -v --form --form input=@./thefile.pdf localhost:8080/processReferences pwd key value type
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : No Page Count : 9 Language : en-US Title : grobid-service Creator : Writer Producer : LibreOffice 4.2 Create Date : 2016:07:21 15:47:00+02:00EXIF Metadata provided by EXIF.tools