pdfsim 0.3

Creator: railscoder56

Last updated:

Add to Cart


pdfsim 0.3

PDF Similarity Matcher
The PDF Similarity Matcher is a command-line tool for finding and displaying PDF documents similar to a given input PDF based on extracted text features. It leverages text extraction and similarity comparison to help you identify relevant matches from a directory of PDFs.

Extracts text from PDF files.
Processes and compares features from multiple PDFs.
Calculates similarity scores between an input PDF and PDFs in the directory.
Optionally displays detailed key-value feature information for similar PDFs.

Follow these steps to install and set up the PDF Similarity Matcher:

Clone the repository:
git clone https://github.com/yourusername/pdfsim.git
cd pdfsim

Create a virtual environment:
python3 -m venv venv

Activate the virtual environment:

On Windows:

On macOS/Linux:
source venv/bin/activate

Install the required packages:
pip install -r requirements.txt

Ensure requirements.txt includes the necessary libraries:

To find similar PDFs, use the following command:
python3 main.py -d <directory_containing_pdf> -i <input_pdf> -t <top_n> [-kv]


-d, --database (required): Path to the directory containing PDF files to compare against.
-i, --input (required): Path to the input PDF file you want to compare.
-t, --top (optional, default: 1): Number of top similar PDFs to display.
-kv (optional): Enable detailed key-value feature output for similar PDFs.


For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.