Ocr4all Setup Guide
User Manual:
Open the PDF directly: View PDF
.
Page Count: 4
| Download | |
| Open PDF In Browser | View PDF |
Preparation • You have to prepare the following folder structure (or simply download it from https://github.com/OCR4all/getting_started): o • ocr4all data models (main folder) (folder for the documents you want to recognize) (folder for the OCR models ) This structure can provisionally be created/downloaded to anywhere in your System. However, depending on your System (Linux, Windows, macOS), it might be recommended to move it later, see below. Choosing the correct Docker Version • • • • • You will need the Community Edition of Docker for Installation. If you are fortunate enough to be able to choose between several operating systems or willing to set up one (just) for OCR4all, we firmly recommend to use Linux over Windows. Linux: https://docs.docker.com/install/ (choose your distribution in the left menu and follow the installation instructions) Windows: o There are two ways of using Docker on Windows: Docker for Windows (recommended) and the Docker Toolbox. o Docker for Windows: Requires Windows 10, 64 bit: Pro, Enterprise or Education (can be looked up under System Information) https://docs.docker.com/docker-for-windows/release-notes/ (do not choose “Download Docker for Windows” right away, but instead use “Download” under the “Stable Releases” section below to skip registration) o Docker Toolbox for other (older) Versions of Windows: https://docs.docker.com/toolbox/toolbox_install_windows/ macOS: o Similar to Windows, there is Docker for Mac und the Docker Toolbox. o https://docs.docker.com/docker-for-mac/ (do not choose “Download Docker for Mac” right away, but instead use “Download” under the “Stable Releases” section below to skip registration) o https://docs.docker.com/docker-for-mac/docker-toolbox/ For models older than 2010 (< macOS Sierra 10.12). Will not be covered in this guide. Following up, you will find three separate guides, each for Linux and macOS, Docker for Windows, and for the Docker Toolbox (using Windows). You can copy the different terminal commands without line breaks from the accompanying file calls.txt. If you have any questions/remarks, or run into any problems, please do not hesitate to contact us via mail (christian.reul@uni-wuerzburg.de, maximilan.wehner@uniwuerzburg.de) or to open an issue on GitHub! Linux and macOS Docker Setup • Follow the instructions under https://docs.docker.com/install/ … • … and appreciate that everything works without further adjustments! OCR4all Setup • The OCR4all folder structure detailed above can be located anywhere you want. • Open a terminal inside the OCR4all folder and pull the OCR4all image using the following command (this will take up a few minutes and requires a stable connection to the internet): sudo docker pull ls6uniwue/ocr4all • Create the OCR4all container using the following command: sudo docker run -p 1476:8080 -p 5000:5000 \ -u `id -u root`:`id -g $USER` --name ocr4all \ -v $PWD/data:/var/ocr4all/data \ -v $PWD/models:/var/ocr4all/models/custom \ -it ls6uniwue/ocr4all (Once again, this may take a while) Browser Access and further use • OCR4all is optimized for Chrome/Chromium. • Browser Access: http://localhost:1476/OCR4all_Web/ • In the Browser Tool, check Project Overview → Project selection: If you can find the two demo books called “Cirurgia“ and “GNM“ the mapping (-v $PWD/data:/…) is working properly. Otherwise, it´s likely that there was a typo in the “docker run” command and you have to create the container again. First, delete the container you just created: Stop the process in the terminal using CTRL+C, then type: sudo docker rm ocr4all Check and correct your command, especially the “-v $PWD/data:/…”-parts, then run it again. • If everything is set up properly, you can (and should!) start OCR4all in the future by using: sudo docker start –ia ocr4all Docker for Windows Docker Setup • Follow the installation guide under https://docs.docker.com/docker-for-windows/releasenotes/. Make sure to give all needed permissions, install all additional drivers etc. • Start Docker. • Adjust the Docker settings (Right-click on the Docker symbol in the hidden bottom-right toolbar, then chose settings): o Shared Drives: chosen drive (or partition). You will need at least one. Our recommendation is to simply use C. Click Apply. (Attention: This requires a valid, non-empty Windows password. Changing or removing the password later results in a silent removal of your Docker privileges!). o Advanced: adjust CPUs (max) and Memory (2GB+) if you want to. OCR4all Setup • Move the OCR4all folder structure detailed above to a shared drive (or partition). In the following example, we use “C:\Users\Public\ocr4all\...”. We firmly recommend using the same for the first setup. • Inside the OCR4all folder, open PowerShell (Shift+Right click inside OCR4all folder → Open PowerShell window here) and load an OCR4all image using the following command (this will take up a few minutes and requires a stable connection to the internet): docker pull ls6uniwue/ocr4all • Create the OCR4all container using the following command (Note: this works only for the recommended setup, i.e. when the ocr4all folder is located in “C:\Users\Public\...”) docker run -p 1476:8080 -p 5000:5000 --name ocr4all -v C:\Users\Public\ocr4all\data:/var/ocr4all/data -v C:\Users\Public\ocr4all\models:/var/ocr4all/models/custom -it ls6uniwue/ocr4all Alternatively, you will have to adjust the paths marked in bold print. o Use absolute paths and autocompletion! o It is recommended to not use print working directory (PWD) in this case. Browser Access and further use • OCR4all is optimized for Chrome/Chromium. • Browser Access: http://localhost:1476/OCR4all_Web/ • In the browser, check Project Overview → Project selection: If you can find the two preloaded books called „Cirurgia“ und „GNM“, the mapping (-v C:\Users\...) is working properly. Otherwise, there might be a typo in the „docker run“ command and you have to create the container again. First, delete the container you just created: Stop the process in Powershell using CTRL+C, then type: docker rm ocr4all Check and correct your command (as with most terminals, you can sift through your previous commands using the arrow keys), especially the two “-v C:\Users\...”-lines, then run it again. • If everything is set up properly, you can (and should!) start OCR4all in the future by using docker start –ia ocr4all Docker Toolbox Docker Setup • Follow the installation guide under https://docs.docker.com/toolbox/toolbox_install_windows/. Make sure to give all needed permissions, install all additional drivers etc. • Start the Docker quickstart terminal and wait for all processes to finish (Give the needed permissions; This requires a stable internet connection). • Close Docker quickstart terminal. • Open Oracle VM Virtual Box. o Right click on „default“ → Close → Turn Off. o Click on „default“ → Change → System → Adjust CPUs (almost max) and memory (2GB+) if you want to → OK. o It is possible to share additional drives (partitions), however, this is quite complicated and not recommended or explained further at this point. • Restart Docker quickstart terminal. OCR4all Setup • Move the OCR4all folder structure detailed above into a folder under C:\Users. In the following example, we use “C:\Users\Public\ocr4all\...”. We firmly recommend to use the same. • Inside the OCR4all folder, open PowerShell (Shift+right click inside OCR4all folder → Open PowerShell window here) and load an OCR4all image using the following command (this will take up a few minutes and requires a stable connection to the internet): docker pull ls6uniwue/ocr4all • Open the Docker quickstart terminal again and create the OCR4all container using the following command (Note: this only works for the recommended setup, i.e. if the ocr4all folder is located in C:\Users\Public\...) docker run -p 1476:8080 -p 5000:5000 --name ocr4all -v /c/Users/Public/ocr4all/data:/var/ocr4all/data -v /c/Users/Public/ocr4all/models:/var/ocr4all/models/custom -it ls6uniwue/ocr4all Alternatively, you have to adjust the paths marked in bold print. o Use absolute paths and autocompletion! o It is recommended to not use print working directory (PWD) in this case. Browser Access and further use • OCR4all is optimized for Chrome/Chromium. • Browser Access: http://192.168.99.100:1476/OCR4all_Web/ • In the browser, check Project Overview → Project selection: If you can find the two preloaded books called „Cirurgia“ und „GNM“, the mapping (-v …) is working properly. Otherwise, it´s likely that there was a typo in the „docker run“ command, so you will have to create the container again. First, delete the container you just created: Stop the process in the docker quickstart terminal using CTRL+C, then type: docker rm ocr4all Check and correct your command, especially the two “-v…”-lines, then run it again. • If everything is set up properly, you can (and should!) restart OCR4all in the future by using docker start –ia ocr4all
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : Yes App Version : 16.0000 Author : Windows User Comments : Company : Create Date : 2019:04:24 10:10:20+02:00 Doc Security : 0 Hyperlinks Changed : 0 Links Up To Date : 0 Modify Date : 2019:04:24 10:10:24+02:00 Scale Crop : 0 Share Doc : 0 Source Modified : D:20190424081010 Subject : Tagged PDF : Yes XMP Toolkit : Adobe XMP Core 5.4-c006 80.159825, 2016/09/16-03:31:08 Metadata Date : 2019:04:24 10:10:24+02:00 Creator Tool : Acrobat PDFMaker 11 for Word Document ID : uuid:e5aedb38-179f-41af-aa89-49b55a06527f Instance ID : uuid:1ff75ea9-bcbd-4226-b51c-f4e85df6462c Format : application/pdf Title : Description : Creator : Windows User Producer : Adobe PDF Library 11.0 Page Layout : OneColumn Page Count : 4EXIF Metadata provided by EXIF.tools