Removal of Ghostscript to use qpdf and tesseract directly (#2338)

* navbar fix multi tool and compress location

* release notes and ghostscript removal

* cleanups

* formatting

* update docs

* more

* more

* docs

* release bump

* Hardening suggestions for Stirling-PDF / ghostscript (#2339)

* Protect `readLine()` against DoS

* Sanitized user-provided file names in HTTP multipart uploads

---------

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>

---------

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>
This commit is contained in:
Anthony Stirling
2024-11-26 20:50:35 +00:00
committed by GitHub
parent 654bc94d44
commit 833b3c45c6
69 changed files with 1106 additions and 665 deletions

View File

@@ -68,7 +68,7 @@ nix-env -iA nixpkgs.jbig2enc
### Step 3: Install Additional Software
Next we need to install LibreOffice for conversions, ocrmypdf for OCR, and OpenCV for pattern recognition functionality.
Next we need to install LibreOffice for conversions, qpdf for OCR, and OpenCV for pattern recognition functionality.
Install the following software:
@@ -81,27 +81,27 @@ Install the following software:
- unoconv
- pngquant
- unpaper
- ocrmypdf
- qpdf
- opencv-python-headless
For Debian-based systems, you can use the following command:
```bash
sudo apt-get install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf
sudo apt-get install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper qpdf
pip3 install uno opencv-python-headless unoconv pngquant WeasyPrint --break-system-packages
```
For Fedora:
```bash
sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf
sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper qpdf
pip3 install uno opencv-python-headless unoconv pngquant WeasyPrint
```
For Nix:
```bash
nix-env -iA nixpkgs.unpaper nixpkgs.libreoffice nixpkgs.ocrmypdf nixpkgs.poppler_utils
nix-env -iA nixpkgs.unpaper nixpkgs.libreoffice nixpkgs.qpdf nixpkgs.poppler_utils
pip3 install uno opencv-python-headless unoconv pngquant WeasyPrint
```
@@ -146,7 +146,6 @@ The easiest method is to use the language packs provided by your repositories. S
1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need.
2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tessdata`
3. Please view [OCRmyPDF install guide](https://ocrmypdf.readthedocs.io/en/latest/installation.html) for more info.
**IMPORTANT:** DO NOT REMOVE EXISTING `eng.traineddata`, IT'S REQUIRED.