Last updated:
0 purchases
audioanalyser 0.0.6
Audio Analyser: Speech-to-Text, Analysis, Recommendations & Translations
• Website
• Report Bug
• Request Feature
• Contributing Guidelines
Overview
Audio Analyser leverages the power of Microsoft Azure's advanced AI services to transform your audio data into valuable insight reports in no time through automatic speech-to-text, text analysis, and recommendations.
Solve the pain of manual audio analysis: Manually analyzing audio is time consuming and limited. Audio Analyser automates the process, quickly surfacing key insights through AI-powered speech and language processing.
Discover Hidden Insights in Minutes: AI-Powered Audio Analysis for Your Call Recordings and Audio Files.
Streamline call recording and audio file transcription, uncover actionable insights in seconds with advanced text analysis, powered by Microsoft Azure AI services
Go beyond simple transcription: Discover sentiment, key information, and gain a multi-faceted understanding of your conversations through in-depth analysis and comprehensive reports.
Audio Analyser leverages the power of Azure's advanced AI services to transform your audio data into valuable insight reports in no time.
Table of Contents
Audio Analyser: Speech-to-Text, Analysis, Recommendations & Translations
Overview
Table of Contents
Key Features
Built on a Robust Foundation
Dependencies
Installation
Create a Virtual Environment
Installation and Setup
Getting Started
Usage Instructions
To run the Audio Analyser CLI
To run the Audio Analyser server
Usage
Requirements
Configuration
Modules
Audio Recorder Module
Key Features
How It Works
Usage
Customization and Flexibility
Scalability and Reliability
Analyze Text Files Module
Key Features
How It Works
Usage
Customization
Scalability and Performance
Azure Recommendation Module
Key Features
How It Works
Usage
Customization and Flexibility
Scalability and Innovation
Speech Text Server Module
Key Features
How It Works
Usage
Customization and Scalability
Advanced Technology Integration
Text-to-Speech Synthesis Module
Key Features
How It Works
Usage
Customization and Versatility
Scalability and Integration
Transcribe Audio Files Module
Key Features
How It Works
Usage
Customization and Versatility
Scalability and Integration
Translations Module
Key Features
How It Works
Usage
Supported Languages
Error Handling and Logging
Extensibility
License
Contribution
Acknowledgements
Key Features
Audio Recording: Record audio files and conversations.
Speech to Text: Convert spoken language into text using Azure's speech-to-text service.
Text to Speech: Convert text into spoken language using Azure's text-to-speech service.
Instant Transcription: Instantly transcribe audio files and recordings into text.
Text Analysis: Analyze text for various features using Azure's text analytics service.
Recommendations: Get actionable recommendations based on the results of the analysis.
Support for outputting results in different formats, including JSON, TXT and SQLite.
Actionable Insights:
Analyze text for various features, including Overall Sentiment, Positive/Negative Sentiment Analysis, Identify Key Topics and Entities, Language, Personally Identifiable Information (PII).
Uncover sentiment and key information within conversations.
Data-Driven Reports:
Generate detailed reports for easy sharing and analysis.
Translations: Translate text to and from a variety of languages using Azure's Translator API.
Support for Multiple Languages: Supports a wide range of languages, including English, French, German, Spanish, and more.
Batch Translation: Translate multiple text files simultaneously, saving time and effort.
Flexible Output Options: Output translation results in various formats, including plain text files, JSON, and SQLite databases.
Web Server: A CherryPy-based web server to handle incoming requests and process them.
Built on a Robust Foundation
Azure-powered technology and a secure CherryPy web server ensure accurate analysis and reliable data management.
Scalable architecture: Adapt seamlessly to your needs, handling large datasets with ease.
Experience the power of Audio Analyser today!
Dependencies
CherryPy
Azure Cognitive Services Speech SDK
Azure AI Text Analytics
Azure Open AI Services
Python standard libraries: asyncio, threading, logging, sqlite3, json
Dotenv for environment variable management
Installation
Audio Analyser is built on Azure Cognitive Services for speech and language processing, with a CherryPy web server frontend. Key components include:
Audio Recorder - record audio clips
Speech-to-Text - transcribe audio
Text-to-Speech - convert text to speech
Text Analytics - analyze transcripts
Recommendation Generator - suggest actions
Web Server - handle API requests
Create a Virtual Environment
We recommend creating a virtual environment to install the Audio Analyser. This will ensure that the package is installed in an isolated environment and will not affect other projects.
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Installation and Setup
Install required Python packages:
pip install cherrypy azure-ai-textanalytics azure-cognitiveservices-speech
Set up Azure services and obtain necessary API keys.
Configure environment variables for Azure services in a .env file.
Getting Started
Install audioanalyser with just one command:
pip install audioanalyser
Usage Instructions
To run the Audio Analyser CLI
Start the CLI using audioanalyser:
python -m audioanalyser
Follow the instructions to utilize speech-to-text and text analysis features.
Access the generated transcript and report files in the resources directory in the root folder.
To run the Audio Analyser server
Start the server using audioanalyser:
python -m audioanalyser -s
Access the server at the specified host and port to utilize speech-to-text and text analysis features.
Usage
To run the application, use the following command:
python server.py
This will start the CherryPy web server, and you can interact with the application through the defined endpoints.
Requirements
The minimum supported Python version is 3.6.
Azure Cognitive Services for speech and text processing.
CherryPy for the web server.
Open AI Services for summarization.
Python's standard libraries including asyncio, sqlite3, and threading.
Configuration
Ensure that your Azure credentials and other configurations are correctly set in a .env file in the root directory.
Please refer to the env.example file for the required environment variables.
Modules
Audio Recorder Module
The Audio Recorder Module in Audio Analyser is a robust tool designed for high-quality audio recording. It integrates seamlessly with the rest of the application, providing a user-friendly interface for capturing audio data, which is essential for the subsequent speech-to-text and analysis processes.
Key Features
High-Quality Recording: Capture clear and crisp audio, which is vital for accurate speech-to-text conversion.
Flexible Configuration: Utilizes a Config class to load settings from a .env file, allowing for easy customization of recording parameters such as duration, format, and quality.
Directory Management: Automatically validates and manages input and output directories, ensuring a smooth and error-free recording experience.
Advanced Audio Settings Validation: Checks and confirms audio settings before recording begins, thereby minimizing potential issues during the recording process.
Automated File Path Generation: Dynamically generates file paths for the recorded audio, streamlining the file management process.
How It Works
Setup and Configuration: The module reads configurations from the .env file, setting up necessary parameters for recording.
Directory Validation: It checks the specified input and output directories to ensure they exist and are accessible.
Recording Execution: On initiating the recording process, the module captures audio based on predefined settings. This can be triggered manually or automatically as part of a larger workflow.
File Management: After recording, the audio file is saved to the designated output directory, with a file name generated based on customizable rules.
Usage
To start recording, ensure that the environment variables are set up in the .env file.
Run the Audio Recorder Module through the Audio Analyser interface or as a standalone process.
The module will handle the rest, from validating settings to saving the recorded audio file.
Customization and Flexibility
The module can be customized to record audio for variable durations and in different formats, as required by the user.
It's designed to be flexible enough to integrate with different audio sources and output requirements.
Scalability and Reliability
Designed to handle both small-scale and large-scale audio recording tasks.
Implements robust error handling to deal with potential recording issues, ensuring reliability in diverse environments.
Analyze Text Files Module
The Analyze Text Files Module in Audio Analyser is a sophisticated tool designed for in-depth analysis of text data, utilizing Azure Text Analytics. It’s capable of extracting meaningful insights from text files, such as sentiment, key entities, and more, making it an essential component for understanding and interpreting textual data.
Key Features
Advanced Text Analytics: Leverages Azure's AI capabilities for comprehensive analysis including sentiment analysis, entity recognition, and key phrase extraction.
Configurable Environment: Uses the Config class to seamlessly integrate with Azure Language services, ensuring a flexible and customizable setup.
Diverse Output Formats: Capable of saving analysis results in multiple formats, accommodating various data presentation and storage needs.
Efficient File Processing: Processes text files for analysis efficiently, handling both single files and batches, suitable for different scales of data.
How It Works
Environment Setup: The module begins by setting up necessary configurations using environment variables. This includes connecting to Azure Language services.
File Processing: It reads text files from a specified directory, preparing them for analysis.
Executing Text Analysis: The TextAnalysis class performs various analytics tasks on the text data, extracting insights like overall sentiment, key entities, and phrases.
Storing Results: Analysis results are then stored in the preferred format, be it plain text, JSON, or another format, in the designated output directory.
Usage
Ensure that the Azure service credentials and other settings are correctly configured in the .env file.
Place the text files to be analyzed in the specified input directory.
Execute the Analyze Text Files Module, which will automatically process the files and save the analysis results.
Customization
The module allows for customization of analysis parameters and output formats, catering to specific needs of the analysis task.
Users can specify particular aspects of text analysis to focus on, such as sentiment analysis or entity extraction, based on their requirements.
Scalability and Performance
Optimized for performance, the module can handle large volumes of text data without compromising on speed or accuracy.
Scalable architecture ensures that the module can adapt to increasing amounts of data as the application grows.
This module represents a vital part of the Audio Analyser’s capability to turn textual data into actionable insights, enhancing the overall value of the analysis process.
Azure Recommendation Module
The Azure Recommendation Module in Audio Analyser is an advanced tool that leverages the power of OpenAI's GPT-3 to generate insightful and relevant recommendations from customer transcripts. This module transforms raw text data into actionable advice, enhancing decision-making processes.
Key Features
Intelligent Recommendations: Utilizes OpenAI's GPT-3 for generating smart and contextually relevant recommendations based on the content of customer transcripts.
Automated Transcript Processing: Automatically reads and processes transcripts from a designated directory, streamlining the workflow.
Customizable Output: Offers flexibility in saving recommendations to a preferred format and location, tailored to user requirements.
Configurable Settings: Allows users to configure various parameters like API keys, folder paths, and output preferences through environment variables.
How It Works
Reading Transcripts: The module scans a specified directory to load customer transcripts, ensuring that all relevant data is considered for analysis.
Generating Recommendations: Leverages GPT-3's advanced natural language understanding capabilities to analyze the transcripts and generate recommendations.
Saving Outputs: The insightful recommendations are then saved in a designated folder, in a format that facilitates easy review and implementation.
Usage
Set up the necessary environment variables, including API keys and directory paths, in the .env file.
Place the transcripts in the specified input directory.
Run the Azure Recommendation Module to automatically process the transcripts and generate recommendations.
Access the generated recommendations in the specified output directory.
Customization and Flexibility
Users can customize the type of recommendations generated by tweaking the prompt strategy sent to GPT-3, enabling tailored advice for different scenarios.
The module supports various output preferences, allowing users to choose how and where the recommendations are stored.
Scalability and Innovation
Designed to handle a wide range of transcript volumes, from individual files to large batches, ensuring scalability.
Represents a cutting-edge application of AI in text analysis, setting a new standard for automated recommendation systems.
This module is a testament to the Audio Analyser's commitment to harnessing the latest in AI technology to provide valuable, data-driven insights and recommendations.
Speech Text Server Module
The Speech Text Server Module in Audio Analyser is a robust server-side component designed to handle speech-to-text processing efficiently. This module serves as the backbone of the application, managing the conversion of audio data into text and further analyzing this textual data for insights.
Key Features
Comprehensive Speech-to-Text Operations: Employs advanced algorithms to accurately transcribe spoken words into written text, forming the basis for further analysis.
Integrated Audio Recording and Analysis: Seamlessly records audio, transcribes it, and then analyzes the text to extract meaningful insights.
Recommendation Generation: Utilizes transcribed text to generate actionable recommendations, adding significant value to the analysis process.
Efficient Request Handling: Capable of managing various server operations and handling multiple client requests simultaneously, ensuring a smooth user experience.
How It Works
Audio Processing: Initially, the module captures and processes audio recordings, preparing them for transcription.
Speech-to-Text Conversion: Utilizes advanced speech recognition technology to transcribe audio data into text with high accuracy.
Text Analysis and Recommendations: Once the audio is transcribed, the module analyzes the text data, extracting key insights and generating recommendations based on the content.
Server Operations: Manages all server-side functionalities, ensuring efficient processing and response to client requests.
Usage
The module is typically used as a part of the Audio Analyser's server-side operations.
It can handle requests for audio processing, transcription, text analysis, and recommendation generation.
Ideal for applications requiring real-time speech-to-text conversion and subsequent analysis.
Customization and Scalability
Customizable to suit various speech-to-text scenarios and can be configured to handle specific analysis requirements.
Scalable to accommodate a growing number of requests and larger data sets, making it suitable for both small-scale and large-scale applications.
Advanced Technology Integration
Integrates state-of-the-art speech recognition and natural language processing technologies to provide fast and accurate transcriptions.
The module's architecture allows for easy integration with additional AI services and tools for enhanced functionality.
The Speech Text Server Module is crucial for transforming raw audio data into actionable textual information, thereby playing a vital role in the Audio Analyser's capability to deliver comprehensive audio analysis solutions.
Text-to-Speech Synthesis Module
The Text-to-Speech Synthesis Module in the application is a highly efficient component crafted to transform text into spoken audio using Azure's cutting-edge Text-to-Speech API. This module stands out as a crucial instrument for generating audible content from textual data, facilitating diverse applications such as audiobook production, voice notifications, or enhancing accessibility features.
Key Features
Superior Voice Quality: Employs Azure's Text-to-Speech API to produce clear and natural-sounding voice outputs from text.
Customizable Voice Attributes: Offers flexibility in choosing voice tones, accents, and languages to suit varied requirements.
Efficient Error Management: Features advanced error detection and handling to ensure high reliability across different operational scenarios.
Diverse Output Formats: Supports saving synthesized speech in various audio file formats, accommodating different usage contexts.
How It Works
Text Input Processing: Accepts textual data as input, which can range from simple sentences to comprehensive paragraphs.
Speech Synthesis: Leverages Azure's API to convert text into digital speech with options for customizing voice properties.
Error Handling: Implements robust mechanisms to manage errors, ensuring smooth and consistent audio output generation.
Audio File Saving: Outputs the synthesized speech into designated audio formats, ready for playback or integration into other systems.
Usage
Input the desired text into the module via its programming interface.
Configure the module settings, including voice type and output format preferences.
Trigger the text-to-speech synthesis process through the module's execution command.
Retrieve the generated audio file from the specified output location.
Customization and Versatility
Enables extensive customization of voice characteristics and speech parameters, enhancing the module's adaptability to different text types and use cases.
Designed to process a wide range of textual inputs, making it versatile for various applications and user needs.
Scalability and Integration
Scalable architecture allows for handling growing amounts of text inputs efficiently, suitable for both small and extensive text-to-speech conversion tasks.
Easily integrates with Azure services and other components within the application ecosystem, contributing to a seamless operational flow.
Transcribe Audio Files Module
The Transcribe Audio Files Module in Audio Analyser is a specialized component designed to convert spoken language in audio files into accurate text. Utilizing Azure's state-of-the-art Speech-to-Text API, this module is an essential tool for transforming audio data into a format that can be easily analyzed and processed.
Key Features
High-Efficiency Transcription: Leverages Azure's powerful Speech-to-Text API to provide fast and accurate transcription of audio files.
Batch Processing Capability: Capable of processing both individual audio files and large batches, making it versatile for various project sizes.
Robust Error Handling: Incorporates sophisticated error handling mechanisms to ensure reliability even in cases of challenging audio quality or API issues.
Flexible Output Options: Transcriptions can be saved in multiple formats, including plain text files, JSON, and SQLite databases, catering to diverse data management needs.
How It Works
Audio File Processing: The module accepts audio files as input, processing them individually or in batches based on user requirements.
Speech-to-Text Conversion: Utilizes Azure's Speech-to-Text API to accurately transcribe the spoken words in the audio files into written text.
Error Management: During transcription, the module efficiently handles any errors or exceptions, ensuring consistent output quality.
Saving Transcripts: The transcribed text is then saved in the specified format, allowing for easy integration with other modules or systems.
Usage
Place the audio files in the designated input directory.
Execute the Transcribe Audio Files Module through the Audio Analyser interface.
The module will automatically process the audio files and save the transcriptions in the chosen format.
Customization and Versatility
Users can customize various aspects of the transcription process, including the choice of output format and error handling strategies.
The module's design allows it to handle different audio formats and qualities, making it adaptable to a wide range of audio data sources.
Scalability and Integration
Scalable to handle increasing volumes of audio data, suitable for both small-scale and large-scale transcription tasks.
Seamlessly integrates with other Azure services and modules within the Audio Analyser application, enhancing the overall functionality of the system.
This module plays a pivotal role in the Audio Analyser's ability to extract textual data from audio, laying the foundation for in-depth analysis and insight generation.
Translations Module
The Translations Module in Audio Analyser is specifically designed to handle multilingual text translation tasks, leveraging Azure AI Translator API. This powerful service offers cloud-based neural machine translation, compatible across different operating systems, to provide seamless translation experiences.
Key Features
Batch Translation: Process multiple text files simultaneously, offering efficiency and time-saving for large-scale translation tasks.
Support for Multiple Languages: Capable of translating text to and from a variety of languages, as listed in the Languages Supported section.
Format Versatility: Output translation results in diverse formats, including plain text files, JSON, and SQLite databases, catering to different use case requirements.
Seamless Integration with Azure Translator API: Utilizes Azure's robust machine translation capabilities for accurate and context-aware translations.
Error Handling: Incorporates comprehensive error handling mechanisms to ensure reliable translation processes even in case of unexpected API behavior.
How It Works
File Processing: The module takes text files as input. It can process individual files or batches of files, making it adaptable to both small and large-scale translation tasks.
Translation Execution: Utilizes Azure's Translator API to translate the content of the text files. It supports a wide range of languages, providing versatility for global use cases.
Output Generation: After translation, the results are outputted in the user-preferred format. The module supports various output formats like JSON, TXT, and SQLite, providing flexibility in how the results are utilized.
Usage
To translate a text file, place it in the specified input directory.
Run the translation module through the Audio Analyser interface.
Choose your target language and output format.
The translated text will be saved in the designated output directory in the chosen format.
Supported Languages
Below is a list of languages supported by the Translations Module, along with their respective language codes:
Language
Language code
Afrikaans
af
Albanian
sq
Amharic
am
Arabic
ar
Armenian
hy
Assamese
as
Azerbaijani (Latin)
az
Bangla
bn
Bashkir
ba
Basque
eu
Bhojpuri
bho
Bodo
brx
Bosnian (Latin)
bs
Bulgarian
bg
Cantonese (Traditional)
yue
Catalan
ca
Chinese (Literary)
lzh
Chinese Simplified
zh
Chinese Traditional
zh
chiShona
sn
Croatian
hr
Czech
cs
Danish
da
Dari
prs
Divehi
dv
Dogri
doi
Dutch
nl
English
en
Estonian
et
Faroese
fo
Fijian
fj
Filipino
fil
Finnish
fi
French
fr
French (Canada)
fr
Galician
gl
Georgian
ka
German
de
Greek
el
Gujarati
gu
Haitian Creole
ht
Hausa
ha
Hebrew
he
Hindi
hi
Hmong Daw (Latin)
mww
Hungarian
hu
Icelandic
is
Igbo
ig
Indonesian
id
Inuinnaqtun
ikt
Inuktitut
iu
Inuktitut (Latin)
iu
Irish
ga
Italian
it
Japanese
ja
Kannada
kn
Kashmiri
ks
Kazakh
kk
Khmer
km
Kinyarwanda
rw
Klingon
tlh
Klingon (plqaD)
tlh
Konkani
gom
Korean
ko
Kurdish (Central)
ku
Kurdish (Northern)
kmr
Kyrgyz (Cyrillic)
ky
Lao
lo
Latvian
lv
Lithuanian
lt
Lingala
ln
Lower Sorbian
dsb
Luganda
lug
Macedonian
mk
Maithili
mai
Malagasy
mg
Malay (Latin)
ms
Malayalam
ml
Maltese
mt
Maori
mi
Marathi
mr
Mongolian (Cyrillic)
mn
Mongolian (Traditional)
mn
Myanmar
my
Nepali
ne
Norwegian
nb
Nyanja
nya
Odia
or
Pashto
ps
Persian
fa
Polish
pl
Portuguese (Brazil)
pt
Portuguese (Portugal)
pt
Punjabi
pa
Queretaro Otomi
otq
Romanian
ro
Rundi
run
Russian
ru
Samoan (Latin)
sm
Serbian (Cyrillic)
sr
Serbian (Latin)
sr
Sesotho
st
Sesotho sa Leboa
nso
Setswana
tn
Sindhi
sd
Sinhala
si
Slovak
sk
Slovenian
sl
Somali (Arabic)
so
Spanish
es
Swahili (Latin)
sw
Swedish
sv
Tahitian
ty
Tamil
ta
Tatar (Latin)
tt
Telugu
te
Thai
th
Tibetan
bo
Tigrinya
ti
Tongan
to
Turkish
tr
Turkmen (Latin)
tk
Ukrainian
uk
Upper Sorbian
hsb
Urdu
ur
Uyghur (Arabic)
ug
Uzbek (Latin)
uz
Vietnamese
vi
Welsh
cy
Xhosa
xh
Yoruba
yo
Yucatec Maya
yua
Zulu
zu
Error Handling and Logging
The module is designed to robustly handle various errors, including API connection issues, file reading/writing errors, and unsupported language codes. Detailed logs are generated for troubleshooting and audit purposes.
Extensibility
This module is built with extensibility in mind, allowing for future enhancements such as additional language support, improved translation accuracy, and integration with other translation services or custom models.
License
The project is licensed under the terms of both the MIT license and the
Apache License (Version 2.0).
Apache License, Version 2.0
MIT license
Contribution
We welcome contributions to audioanalyser. Please see the
contributing instructions for more information.
Unless you explicitly state otherwise, any contribution intentionally
submitted for inclusion in the work by you, as defined in the
Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
Acknowledgements
We would like to extend a big thank you to all the awesome contributors
of audioanalyser for their help and support.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.