Questions tagged [audio-recognition]

Question 1

Meta has recently published its new transcription model Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages. However, I am somewhat sceptical about it, particularly given ...

Question 2

I have video, audio, and text data. The intent is to use the multimodal for binary classification. However, the data is not paired (i.e The audio and text are not from the same video recording). I've ...

Question 3

We have ~30 audio snippets, of which around 50% are from the same speaker, who is our target speaker, and the rest are from various different speakers. We want to extract all audio snippets from our ...

Question 4

I am working on a Deep Learning model which will help me predict deep fake voices. For the data preprocessing, I have done everything to the T, following papers which have already been published. But ...

Question 5

I'm working on a project to classify blueberries based on their texture—specifically, whether they are soft, juicy, or crunchy—using the sounds they produce when crushed. I have about 1100 audio ...

Question 6

My goal is to detect musical instruments with AI (machine learning). I'm currently using the Yamnet model to make inferences, but it has a very wide range of categories, for example, "Growling&...

Question 7

I am new to the field of machine learning, even tho I have solid background in semi-related fields (am control system engineer by trade) and as a hobby project I wanted to work a bit with sound ...

Question 8

I was able to generate text from an audio file using huggingface, using this code ...

Question 9

If I had 1000 audio files where three people are independently saying an animal at the same time, there can be 9 independent labels of animals. What features should I select from the audio file, and ...

Question 10

I am trying to map Quranic Verses for the identification of errors. Different audios for the same verse have different durations as it depends on the reciter. I used Audacity to normalize but the ...

Question 11

My goal is to detect a problem in a windturbine. I have a dataset of 2h (1 hour for each class). To keep in mind, it will be embedded on an MCU target, so the neural network have to be less than 10M ...

Question 12

Currently I'm working on a ML project, just need an information, is there any tool that is present that can load audios file and generates spectrograms as well as an option to annotating/ label the ...

Question 13

After training a neural network (NN) to tell the difference between a clean audio signal and a signal with a specific "noise", what is the mechanics that actually takes place where an unseen ...

Question 14

I have a video dataset and my aim is classifying predefined scenes in these videos at 1fps (that means I perform classification at each second). Therefore, I plan to fuse audio and visual features for ...

Question 15

Can someone explain me dimensions in ASR? For example, if I have an audio, convert it to mel spectrogram and now I have a tensor of dimension [1, 128, 850]. Am I understand right that 128 - number of ...

Stack Exchange Network

Questions tagged [audio-recognition]

What has Meta's "Ominlingual ASR" really learned?

Do you need paired data to train multimodal?

How can you efficiently cluster speech segments by speaker?

CNN model is not learning enough. Accuracy remains the same throughout

How to Classify Blueberries as "Crunchy", "Juicy" or "Soft" using Acoustic Signal Processing and Machine Learning?

Detection of musical instruments using Yamnet

Drum sound classification using RNN issues - help needed

Generate VTT file from speech to text

Using a SVM to classify audio data

How to make a dataset for Audio Speech Recognition of Arabic Language

Classification on sound data

Labelling spectrograms

What process actually takes place during audio feedback suppression machine learning

Best Feature Extraction Practise for Long Audio Data

Dimensions of mel spectrogram

Hot Network Questions