pdfsim 0.3

Creator: railscoder56

Last updated:

Add to Cart

Description:

pdfsim 0.3

PDF Similarity Matcher
The PDF Similarity Matcher is a command-line tool for finding and displaying PDF documents similar to a given input PDF based on extracted text features. It leverages text extraction and similarity comparison to help you identify relevant matches from a directory of PDFs.
Features

Extracts text from PDF files.
Processes and compares features from multiple PDFs.
Calculates similarity scores between an input PDF and PDFs in the directory.
Optionally displays detailed key-value feature information for similar PDFs.

Installation
Follow these steps to install and set up the PDF Similarity Matcher:


Clone the repository:
git clone https://github.com/yourusername/pdfsim.git
cd pdfsim



Create a virtual environment:
python3 -m venv venv



Activate the virtual environment:


On Windows:
venv\Scripts\activate



On macOS/Linux:
source venv/bin/activate





Install the required packages:
pip install -r requirements.txt

Ensure requirements.txt includes the necessary libraries:
PyPDF2
scikit-learn
nltk



Usage
To find similar PDFs, use the following command:
python3 main.py -d <directory_containing_pdf> -i <input_pdf> -t <top_n> [-kv]

Arguments

-d, --database (required): Path to the directory containing PDF files to compare against.
-i, --input (required): Path to the input PDF file you want to compare.
-t, --top (optional, default: 1): Number of top similar PDFs to display.
-kv (optional): Enable detailed key-value feature output for similar PDFs.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.