Last updated:
0 purchases
leaffocus 0.6.2
leaf-focus
Extract structured text from pdf files.
Install
Install from PyPI using pip:
pip install leaf-focus
Download the Xpdf command line tools and extract the executable files.
Provide the directory containing the executable files as --exe-dir.
Usage
usage: leaf-focus [-h] [--version] --exe-dir EXE_DIR [--page-images] [--ocr]
[--first FIRST] [--last LAST]
[--log-level {debug,info,warning,error,critical}]
input_pdf output_dir
Extract structured text from a pdf file.
positional arguments:
input_pdf path to the pdf file to read
output_dir path to the directory to save the extracted text files
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--exe-dir EXE_DIR path to the directory containing xpdf executable files
--page-images save each page of the pdf as a separate image
--ocr run optical character recognition on each page of the
pdf
--first FIRST the first pdf page to process
--last LAST the last pdf page to process
--log-level {debug,info,warning,error,critical}
the log level: debug, info, warning, error, critical
Examples
# Extract the pdf information and embedded text.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages
# Extract the pdf information, embedded text, an image of each page, and Optical Character Recognition results of each page.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages --ocr
Dependencies
xpdf
keras-ocr
Tensorflow (can optionally be run more efficiently using one or more GPUs)
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.