Last updated:
0 purchases
pdfdata 0.1.3.2
{pdfdata}
Python package for extracting text and data from PDFs.
Installation
pip install pdfdata
Usage
from pdfdata import *
from pprint import pprint
# parse pdf as dictionary
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res = pdf_doc_extract_span_list(pdf_parsed)
pprint(res, depth=3)
# parse pdf as list of spans
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res = pdf_doc_extract_span_df(pdf_parsed)
pprint(res[0])
# transform pdf text to jsonnl
pdf_text_to_jsonnl('pdfs/0641-20.pdf', '0641-20.jsonnl')
DevNotes
build
python -m build
pypi test upload
python -m twine upload --repository testpypi dist/* --skip-existing
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.