Extract tables from PDF to CSV using Tabula (#2312)
* Add Tabula dependency and exclude slf4j-simple - Add tabula-java dependency to extract tables into CSV. - Exclude slf4j-simple due to Logback * Add a flexible CSVWriter - Add FlexibleCSVWriter which extends CSVWriter to pass a custom CSVFormat, as CSVWriter's parameterized constructor (that allows changing CSVFormat) is protected. * Use Tabula in extracting tables from PDF - Use Tabula in extracting tables from PDF instead of the existing implementation * Delete PDFTableStripper as It is unneeded - Delete PDFTableStripper as It is unneeded as Tabula-Java is used instead. * Use correct class in ExtractCSVController logger * Exclude gson and bcprov-jdk15on dependencies from tabula - Exclude gson and bcprov-jdk15on from tabula-java due to detected security vulnerabilities.
This commit is contained in:
committed by
GitHub
parent
faa8a9752c
commit
afad06bed4
@@ -0,0 +1,16 @@
|
||||
package stirling.software.SPDF.pdf;
|
||||
|
||||
import org.apache.commons.csv.CSVFormat;
|
||||
|
||||
import technology.tabula.writers.CSVWriter;
|
||||
|
||||
public class FlexibleCSVWriter extends CSVWriter {
|
||||
|
||||
public FlexibleCSVWriter() {
|
||||
super();
|
||||
}
|
||||
|
||||
public FlexibleCSVWriter(CSVFormat csvFormat) {
|
||||
super(csvFormat);
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user