docker and ocr updates

This commit is contained in:
Anthony Stirling
2023-12-10 22:02:30 +00:00
parent 8b55ffff96
commit 59c7978330
28 changed files with 100 additions and 110 deletions

View File

@@ -2,6 +2,9 @@
This document provides instructions on how to add additional language packs for the OCR tab in Stirling-PDF, both inside and outside of Docker.
## My OCR used to work and now doesnt!
Please update your tesseract docker volume path version from 4.00 to 5
## How does the OCR Work
Stirling-PDF uses [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) which in turn uses tesseract for its text recognition.
All credit goes to them for this awesome work!
@@ -18,7 +21,7 @@ Depending on your requirements, you can choose the appropriate language pack for
### Installing Language Packs
1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need.
2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tesseract-ocr/4.00/tessdata` (Debian) or `/usr/share/tesseract/tessdata` (Fedora)
2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tesseract-ocr/5/tessdata` (Debian) or `/usr/share/tesseract/tessdata` (Fedora)
# DO NOT REMOVE EXISTING ENG.TRAINEDDATA, IT'S REQUIRED.
@@ -34,14 +37,14 @@ services:
your_service_name:
image: your_docker_image_name
volumes:
- /location/of/trainingData:/usr/share/tesseract-ocr/4.00/tessdata
- /location/of/trainingData:/usr/share/tesseract-ocr/5/tessdata
```
#### Docker run
Add the following to your existing docker run command
```bash
-v /location/of/trainingData:/usr/share/tesseract-ocr/4.00/tessdata
-v /location/of/trainingData:/usr/share/tesseract-ocr/5/tessdata
```
#### Non-Docker