Extract tables from PDF to CSV using Tabula (#2312)

* Add Tabula dependency and exclude slf4j-simple

- Add tabula-java dependency to extract tables into CSV.
- Exclude slf4j-simple due to Logback

* Add a flexible CSVWriter

- Add FlexibleCSVWriter which extends CSVWriter to pass a custom CSVFormat, as CSVWriter's parameterized constructor (that allows changing CSVFormat) is protected.

* Use Tabula in extracting tables from PDF

- Use Tabula in extracting tables from PDF instead of the existing implementation

* Delete PDFTableStripper as It is unneeded

- Delete PDFTableStripper as It is unneeded as Tabula-Java is used instead.

* Use correct class in ExtractCSVController logger

* Exclude gson and bcprov-jdk15on dependencies from tabula

- Exclude gson and bcprov-jdk15on from tabula-java due to detected security vulnerabilities.
This commit is contained in:
Omar Ahmed Hassan
2024-11-24 01:28:44 +02:00
committed by GitHub
parent faa8a9752c
commit afad06bed4
4 changed files with 43 additions and 419 deletions

View File

@@ -0,0 +1,16 @@
package stirling.software.SPDF.pdf;
import org.apache.commons.csv.CSVFormat;
import technology.tabula.writers.CSVWriter;
public class FlexibleCSVWriter extends CSVWriter {
public FlexibleCSVWriter() {
super();
}
public FlexibleCSVWriter(CSVFormat csvFormat) {
super(csvFormat);
}
}