assemblyai-haystack 0.1.1

Last updated: September 3, 2024

0 purchases

Free

Donate

Creator: coderz1093

Languages

Python

Description:

assemblyaihaystack 0.1.1

AssemblyAI Audio Transcript Loader
The AssemblyAI Audio Transcript Loader allows you to transcribe audio files with the AssemblyAI API and load the transcribed text into Haystack documents.
To use this package, you should have the environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument while adding a component (see usage code example below).
More info about AssemblyAI:

Website
Get a Free API key
AssemblyAI API Docs

Installation
First, install the assemblyai-haystack python package.
pip install assemblyai-haystack

This package installs and uses the AssemblyAI Python SDK. You can find more info about the SDK at the assemblyai-python-sdk GitHub repo.
Usage
The AssemblyAITranscriber needs to be initialized with the AssemblyAI API key.
The run function needs at least the file_path argument. Audio files can be specified as an URL or a local file path.
You can also specify whether you want summarization and speaker diarization results in the run function.
import os

from assemblyai_haystack.transcriber import AssemblyAITranscriber
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Pipeline
from haystack.components.writers import DocumentWriter

ASSEMBLYAI_API_KEY = os.environ.get("ASSEMBLYAI_API_KEY")

## Use AssemblyAITranscriber in a pipeline
document_store = InMemoryDocumentStore()
file_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

indexing = Pipeline()
indexing.add_component("transcriber", AssemblyAITranscriber(api_key=ASSEMBLYAI_API_KEY))
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("transcriber.transcription", "writer.documents")
indexing.run(
{
"transcriber": {
"file_path": file_url,
"summarization": None,
"speaker_labels": None,
}
}
)

print("Indexed Document Count:", document_store.count_documents())

Note: Calling indexing.run() blocks until the transcription is finished.
The results of the transcription, summarization and speaker diarization are returned in separate document lists:

transcription
summarization
speaker_labels

The metadata of the transcription document contains the transcription ID and url of the uploaded audio file.
{
"transcript_id":"73089e32-...-4ae9-97a4-eca7fe20a8b1",
"audio_url":"https://storage.googleapis.com/aai-docs-samples/nbc.mp3"
}

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

There are no reviews.

zed

assemblyai-haystack 0.1.1

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

flutter_exts

desktop_info

structured_data

simplest

airex_flutter_plugin

assemblyai-haystack 0.1.1

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

flutter_exts

desktop_info

structured_data

simplest

airex_flutter_plugin