docker and ocr updates
This commit is contained in:
@@ -2,6 +2,9 @@
|
||||
|
||||
This document provides instructions on how to add additional language packs for the OCR tab in Stirling-PDF, both inside and outside of Docker.
|
||||
|
||||
## My OCR used to work and now doesnt!
|
||||
Please update your tesseract docker volume path version from 4.00 to 5
|
||||
|
||||
## How does the OCR Work
|
||||
Stirling-PDF uses [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) which in turn uses tesseract for its text recognition.
|
||||
All credit goes to them for this awesome work!
|
||||
@@ -18,7 +21,7 @@ Depending on your requirements, you can choose the appropriate language pack for
|
||||
### Installing Language Packs
|
||||
|
||||
1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need.
|
||||
2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tesseract-ocr/4.00/tessdata` (Debian) or `/usr/share/tesseract/tessdata` (Fedora)
|
||||
2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tesseract-ocr/5/tessdata` (Debian) or `/usr/share/tesseract/tessdata` (Fedora)
|
||||
|
||||
# DO NOT REMOVE EXISTING ENG.TRAINEDDATA, IT'S REQUIRED.
|
||||
|
||||
@@ -34,14 +37,14 @@ services:
|
||||
your_service_name:
|
||||
image: your_docker_image_name
|
||||
volumes:
|
||||
- /location/of/trainingData:/usr/share/tesseract-ocr/4.00/tessdata
|
||||
- /location/of/trainingData:/usr/share/tesseract-ocr/5/tessdata
|
||||
```
|
||||
|
||||
|
||||
#### Docker run
|
||||
Add the following to your existing docker run command
|
||||
```bash
|
||||
-v /location/of/trainingData:/usr/share/tesseract-ocr/4.00/tessdata
|
||||
-v /location/of/trainingData:/usr/share/tesseract-ocr/5/tessdata
|
||||
```
|
||||
|
||||
#### Non-Docker
|
||||
|
||||
Reference in New Issue
Block a user