How can I group transcribed phrases into meaningful chunks without using complex models?

Asked 1 month ago

Viewed 12 times

I have a large set of phrases obtained via Azure Fast Transcription, and I need to group them into coherent semantic chunks (to use later in a RAG pipeline).

Initially, I tried grouping phrases based on speaker pauses (e.g., merging phrases when pauses are below a certain threshold), but this approach isn’t generic enough — different speakers have very different pause patterns (some pause for 0.5s, others for 2s, even within the same recording).

Due to project constraints, I can’t use complex NLP models or embeddings, so I’m looking for a lightweight or heuristic-based approach to merge consecutive phrases into semantically meaningful chunks.

asked Nov 7 at 8:53

Daniel

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

How can I group transcribed phrases into meaningful chunks without using complex models?

0

Hot Network Questions

How can I group transcribed phrases into meaningful chunks without using complex models?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions