Questions tagged [audio-recognition]
The audio-recognition tag has no summary.
115 questions
0 votes
0 answers
5 views
What has Meta's "Ominlingual ASR" really learned?
Meta has recently published its new transcription model Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages. However, I am somewhat sceptical about it, particularly given ...
2 votes
0 answers
51 views
Do you need paired data to train multimodal?
I have video, audio, and text data. The intent is to use the multimodal for binary classification. However, the data is not paired (i.e The audio and text are not from the same video recording). I've ...
5 votes
2 answers
437 views
How can you efficiently cluster speech segments by speaker?
We have ~30 audio snippets, of which around 50% are from the same speaker, who is our target speaker, and the rest are from various different speakers. We want to extract all audio snippets from our ...
0 votes
1 answer
47 views
CNN model is not learning enough. Accuracy remains the same throughout
I am working on a Deep Learning model which will help me predict deep fake voices. For the data preprocessing, I have done everything to the T, following papers which have already been published. But ...
0 votes
1 answer
61 views
How to Classify Blueberries as "Crunchy", "Juicy" or "Soft" using Acoustic Signal Processing and Machine Learning?
I'm working on a project to classify blueberries based on their texture—specifically, whether they are soft, juicy, or crunchy—using the sounds they produce when crushed. I have about 1100 audio ...
1 vote
1 answer
295 views
Detection of musical instruments using Yamnet
My goal is to detect musical instruments with AI (machine learning). I'm currently using the Yamnet model to make inferences, but it has a very wide range of categories, for example, "Growling&...
0 votes
0 answers
35 views
Drum sound classification using RNN issues - help needed
I am new to the field of machine learning, even tho I have solid background in semi-related fields (am control system engineer by trade) and as a hobby project I wanted to work a bit with sound ...
0 votes
1 answer
174 views
Generate VTT file from speech to text
I was able to generate text from an audio file using huggingface, using this code ...
1 vote
1 answer
120 views
Using a SVM to classify audio data
If I had 1000 audio files where three people are independently saying an animal at the same time, there can be 9 independent labels of animals. What features should I select from the audio file, and ...
1 vote
0 answers
48 views
How to make a dataset for Audio Speech Recognition of Arabic Language
I am trying to map Quranic Verses for the identification of errors. Different audios for the same verse have different durations as it depends on the reciter. I used Audacity to normalize but the ...
1 vote
2 answers
111 views
Classification on sound data
My goal is to detect a problem in a windturbine. I have a dataset of 2h (1 hour for each class). To keep in mind, it will be embedded on an MCU target, so the neural network have to be less than 10M ...
1 vote
0 answers
85 views
Labelling spectrograms
Currently I'm working on a ML project, just need an information, is there any tool that is present that can load audios file and generates spectrograms as well as an option to annotating/ label the ...
1 vote
1 answer
64 views
What process actually takes place during audio feedback suppression machine learning
After training a neural network (NN) to tell the difference between a clean audio signal and a signal with a specific "noise", what is the mechanics that actually takes place where an unseen ...
0 votes
0 answers
132 views
Best Feature Extraction Practise for Long Audio Data
I have a video dataset and my aim is classifying predefined scenes in these videos at 1fps (that means I perform classification at each second). Therefore, I plan to fuse audio and visual features for ...
0 votes
1 answer
1k views
Dimensions of mel spectrogram
Can someone explain me dimensions in ASR? For example, if I have an audio, convert it to mel spectrogram and now I have a tensor of dimension [1, 128, 850]. Am I understand right that 128 - number of ...