quackling 0.1.1

Last updated:

0 purchases

quackling 0.1.1 Image
quackling 0.1.1 Images
Add to Cart

Description:

quackling 0.1.1

Quackling








Quackling enables document-native generative AI applications, such as RAG, based on Docling.
Features

🧠 Enables rich gen AI applications by providing capabilities on native document level — not just plain text / Markdown!
⚡️ Leverages Docling's conversion quality and speed.
⚙️ Integrates with standard LLM application frameworks, such as LlamaIndex, for building powerful applications like RAG.






Installation
To use Quackling, simply install quackling from your package manager, e.g. pip:
pip install quackling

Usage
Quackling offers core capabilities (quackling.core), as well as framework integration components
e.g. for LlamaIndex (quackling.llama_index). Below you find examples of both.
Basic RAG
Below you find a basic RAG pipeline using LlamaIndex.

[!NOTE]
To use as is, first pip install llama-index-embeddings-huggingface llama-index-llms-huggingface-api
additionally to quackling to install the models.
Otherwise, you can set EMBED_MODEL & LLM as desired, e.g. using
local models.

import os

from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from quackling.llama_index.node_parsers.hier_node_parser import HierarchicalNodeParser
from quackling.llama_index.readers.docling_reader import DoclingReader

DOCS = ["https://arxiv.org/pdf/2311.18481"]
QUERY = "What is DocQA?"
EMBED_MODEL = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
LLM = HuggingFaceInferenceAPI(
token=os.getenv("HF_TOKEN"),
model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",
)

index = VectorStoreIndex.from_documents(
documents=DoclingReader(parse_type=DoclingReader.ParseType.JSON).load_data(DOCS),
embed_model=EMBED_MODEL,
transformations=[HierarchicalNodeParser()],
)
query_engine = index.as_query_engine(llm=LLM)
response = query_engine.query(QUERY)
# > DocQA is a question-answering conversational assistant [...]

Chunking
You can also use Quackling as a standalone with any pipeline.
For instance, to split the document to chunks based on document structure and returning pointers
to Docling document's nodes:
from docling.document_converter import DocumentConverter
from quackling.core.chunkers.hierarchical_chunker import HierarchicalChunker

doc = DocumentConverter().convert_single("https://arxiv.org/pdf/2408.09869").output
chunks = list(HierarchicalChunker().chunk(doc))
# > [
# > ChunkWithMetadata(
# > path='$.main-text[4]',
# > text='Docling Technical Report\n[...]',
# > page=1,
# > bbox=[117.56, 439.85, 494.07, 482.42]
# > ),
# > [...]
# > ]

More examples
Check out the examples — showcasing different variants of RAG incl. vector ingestion & retrieval:

[LlamaIndex] Milvus dense-embedding RAG
[LlamaIndex] Milvus hybrid RAG, combining dense & sparse embeddings
[LlamaIndex] Milvus RAG, also fetching native document metadata for search results
[LlamaIndex] Local node transformations (e.g. embeddings)
...

Contributing
Please read Contributing to Quackling for details.
References
If you use Quackling in your projects, please consider citing the following:
@software{Docling,
author = {Deep Search Team},
month = {7},
title = {{Docling}},
url = {https://github.com/DS4SD/docling},
version = {main},
year = {2024}
}

License
The Quackling codebase is under MIT license.
For individual component usage, please refer to the component licenses found in the original packages.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.