Skip to main content

Questions tagged [audio-recognition]

0 votes
0 answers
5 views

Meta has recently published its new transcription model Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages. However, I am somewhat sceptical about it, particularly given ...
bilalj's user avatar
  • 1
2 votes
0 answers
51 views

I have video, audio, and text data. The intent is to use the multimodal for binary classification. However, the data is not paired (i.e The audio and text are not from the same video recording). I've ...
myts999's user avatar
  • 21
5 votes
2 answers
437 views

We have ~30 audio snippets, of which around 50% are from the same speaker, who is our target speaker, and the rest are from various different speakers. We want to extract all audio snippets from our ...
Yes's user avatar
  • 181
0 votes
1 answer
47 views

I am working on a Deep Learning model which will help me predict deep fake voices. For the data preprocessing, I have done everything to the T, following papers which have already been published. But ...
HaughtyNavigator's user avatar
0 votes
1 answer
61 views

I'm working on a project to classify blueberries based on their texture—specifically, whether they are soft, juicy, or crunchy—using the sounds they produce when crushed. I have about 1100 audio ...
Raghav Rathi's user avatar
1 vote
1 answer
295 views

My goal is to detect musical instruments with AI (machine learning). I'm currently using the Yamnet model to make inferences, but it has a very wide range of categories, for example, "Growling&...
Maxime Dupré's user avatar
0 votes
0 answers
35 views

I am new to the field of machine learning, even tho I have solid background in semi-related fields (am control system engineer by trade) and as a hobby project I wanted to work a bit with sound ...
APasagic's user avatar
0 votes
1 answer
174 views

I was able to generate text from an audio file using huggingface, using this code ...
Kelly Goedert's user avatar
1 vote
1 answer
120 views

If I had 1000 audio files where three people are independently saying an animal at the same time, there can be 9 independent labels of animals. What features should I select from the audio file, and ...
Joe's user avatar
  • 11
1 vote
0 answers
48 views

I am trying to map Quranic Verses for the identification of errors. Different audios for the same verse have different durations as it depends on the reciter. I used Audacity to normalize but the ...
Sijran's user avatar
  • 11
1 vote
2 answers
111 views

My goal is to detect a problem in a windturbine. I have a dataset of 2h (1 hour for each class). To keep in mind, it will be embedded on an MCU target, so the neural network have to be less than 10M ...
Nept0's user avatar
  • 11
1 vote
0 answers
85 views

Currently I'm working on a ML project, just need an information, is there any tool that is present that can load audios file and generates spectrograms as well as an option to annotating/ label the ...
Karthik Sudapelli's user avatar
1 vote
1 answer
64 views

After training a neural network (NN) to tell the difference between a clean audio signal and a signal with a specific "noise", what is the mechanics that actually takes place where an unseen ...
Joe's user avatar
  • 147
0 votes
0 answers
132 views

I have a video dataset and my aim is classifying predefined scenes in these videos at 1fps (that means I perform classification at each second). Therefore, I plan to fuse audio and visual features for ...
kubicwerke's user avatar
0 votes
1 answer
1k views

Can someone explain me dimensions in ASR? For example, if I have an audio, convert it to mel spectrogram and now I have a tensor of dimension [1, 128, 850]. Am I understand right that 128 - number of ...
randomuser228's user avatar

15 30 50 per page
1
2 3 4 5
8