Last updated:
0 purchases
pandadoc 0.1.0
pandadoc: lightweight pandoc wrapper
An extremely lightweight pandoc wrapper for Python 3.8+.
Its features:
Supports conversion between all formats that pandoc supports -
markdown, HTML, LaTeX, Word, epub, pdf (output),
and more.
Output to raw bytes (binary formats - e.g. PDF), to str objects
(text formats - e.g. markdown), or to file (any format).
pandoc errors are raised as (informative) exceptions.
Full flexibility of the pandoc command-line tool, and the same syntax. (See the
pandoc manual for more information.)
Getting Started Guide
Installation
First, ensure pandoc is on your PATH.
(In other words, install pandoc and add it to
your PATH.)
Then install pandadoc from PyPI:
$ python -m pip install pandadoc
That’s it.
Usage
Convert a webpage to markdown, and store it as a python str:
>>> import pandadoc
>>> input_url = "https://example.com/"
>>> example_md = pandadoc.call_pandoc(
... options=["-t", "markdown"], files=[input_url]
... )
>>> print(example_md)
<div>
# Example Domain
This domain is for use in illustrative examples in documents.
...
Now convert the markdown to RTF, and write it to a file:
>>> rtf_output_file = "example.rtf"
>>> pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "rtf", "-o", rtf_output_file],
... input_text=example_md,
... )
''
Notice that call_pandoc returns an empty string '' when a file output is used.
Looking at the output file:
{\pard \ql \f0 \sa180 \li0 \fi0 \outlinelevel0 \b \fs36 Example Domain\par}
{\pard \ql \f0 \sa180 \li0 \fi0 This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\par}
{\pard \ql \f0 \sa180 \li0 \fi0 {\field{\*\fldinst{HYPERLINK "https://www.iana.org/domains/example"}}{\fldrslt{\ul
More information...
}}}
\par}
Convert this RTF document to PDF, using xelatex with a custom character set,
and store the result as raw bytes:
>>> raw_pdf = pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "pdf", "--pdf-engine", "xelatex", "--variable-mainfont", "Palatino"],
... files=[rtf_output_file],
... decode=False,
... )
Note that PDF conversion requires a
“PDF engine”
(e.g. pdflatex, latexmk etc.) to be installed.
Now you can send those raw bytes over a network, or write them to a file:
>>> with open("example.pdf", "wb") as f:
... f.write(raw_pdf)
...
>>> # Finished
You can find more pandoc examples here.
Exceptions
If pandoc exits with an error, an appropriate exception is raised (based on the
exit code):
>>> pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "zzz"], # non-existent format
... input_text=example_md,
... )
Traceback (most recent call last):
...
pandadoc.exceptions.PandocUnknownWriterError: Unknown output format zzz
>>> isinstance(pandadoc.exceptions.PandocUnknownWriterError(), pandadoc.PandocError)
True
You can find a full list of exceptions in the pandadoc.exceptions module.
Explanation
The pandoc command-line tool works like this:
pandoc [OPTIONS] [FILES]
In addition to the OPTIONS
(documented here),
you can provide either some FILES, or some input text (via stdin).
The call_pandoc function of pandadoc works in a similar way:
The options argument contains a list of pandoc options.
E.g. ["-f", "markdown", "-t", "html"].
The files argument is a list of file paths (or absolute URIs).
E.g. ["path/to/file.md", "https://www.fsf.org"]
The input_text argument is used as text input to pandoc.
E.g. # Simple Doc\n\nA simple markdown document\n.
The timeout and decode arguments are used to control whether the pandoc
process times out, and whether the result should be decoded to a str
(True by default).
Bugs/Requests
Please use the GitHub issue tracker
to submit bugs or request features.
Feedback is always appreciated.
License
Distributed under the
MIT license.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.