leaf-focus 0.6.2

Creator: rpa-with-ash

Last updated:

Add to Cart

Description:

leaffocus 0.6.2

leaf-focus
Extract structured text from pdf files.
Install
Install from PyPI using pip:
pip install leaf-focus




Download the Xpdf command line tools and extract the executable files.
Provide the directory containing the executable files as --exe-dir.
Usage
usage: leaf-focus [-h] [--version] --exe-dir EXE_DIR [--page-images] [--ocr]
[--first FIRST] [--last LAST]
[--log-level {debug,info,warning,error,critical}]
input_pdf output_dir

Extract structured text from a pdf file.

positional arguments:
input_pdf path to the pdf file to read
output_dir path to the directory to save the extracted text files

optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--exe-dir EXE_DIR path to the directory containing xpdf executable files
--page-images save each page of the pdf as a separate image
--ocr run optical character recognition on each page of the
pdf
--first FIRST the first pdf page to process
--last LAST the last pdf page to process
--log-level {debug,info,warning,error,critical}
the log level: debug, info, warning, error, critical

Examples
# Extract the pdf information and embedded text.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages

# Extract the pdf information, embedded text, an image of each page, and Optical Character Recognition results of each page.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages --ocr

Dependencies

xpdf
keras-ocr
Tensorflow (can optionally be run more efficiently using one or more GPUs)

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.