Questions tagged [embeddings]
The embeddings tag has no summary.
195 questions
0 votes
0 answers
9 views
Approach to creating vector similarities between primitive voice sounds (i.e. basic consonants and vowels)?
I am working on some natural language stuff for fun, basically a rhyming dictionary, trying to figure it out. Trying next to figure out how to properly/decently capture the basic consonants + vowels ...
0 votes
0 answers
9 views
What is the problem with symmetry in GloVE motivation?
I am currently studying GloVe paper about word embeddings. link In Section 3 The GloVe Model this model is derived from several desiderata, one of which confuses me. It is around Equation 3 which ...
0 votes
1 answer
18 views
Is there a fast method from sampling from document embeddings to *maximize* pairwise distances?
I have a large set of document embeddings, and I would like to sample a subset where the median or average pairwise distance is maximized. The idea here is to get a more balanced sample set where long ...
1 vote
0 answers
36 views
Combining Embeddings and Ontology (DAG) in Visualisation
How can I visualise a hierarchical ontology of items in embedding space, combining text embeddings with the graphical structure? (Something similar to the example below) I have a hierarchical ...
3 votes
1 answer
50 views
Single nn.Embedding instance vs mulitple nn.Embedding instances
I am trying to determine if using multiple instances of nn.Embedding() has any value over using a single instance in training a model. As an example, let's say I ...
6 votes
1 answer
147 views
How to implement this simple recommender system in Keras?
I've been working in data science for a long time, but very rarely have I been called upon to implement an ML algorithm; I've just ran other people's libraries. I'm trying to pick up the skill. I'm ...
1 vote
0 answers
29 views
Preference learning for collaborative scheduling
I am working on a project of integrating the preferences of the workers into a schedule, I mean we won’t only satisfy the systematic constraints but also users preferences as constraints, so we are ...
2 votes
1 answer
37 views
MTEB/MMTEB: dataset and metric to determine threshold for pair classification task
I'm trying to locally replicate the pair classification task of MMTEB/MTEB. However, I didn't find train/dev sets for all datasets in this task. Table 2 in the original MTEB paper (Mueninghoff et al, ...
5 votes
2 answers
436 views
How can you efficiently cluster speech segments by speaker?
We have ~30 audio snippets, of which around 50% are from the same speaker, who is our target speaker, and the rest are from various different speakers. We want to extract all audio snippets from our ...
1 vote
2 answers
132 views
Embeddings for multiple categorical features with different cardinality
If I have multiple categorical features, each which has its own unique cardinality, and I want to use an embedding layer to reduce the dimensions fed to an MLP. Should I have one big embedding matrix ...
0 votes
0 answers
23 views
What is the best approaching to creating an embedding layer of a combination of two categorical variables?
I have two integer encoded categorical variables, one indexed from 0 and another indexed from 1. What is the best way of embedding unique tuples (Category A, Category B), taking into account that ...
1 vote
1 answer
74 views
Combine multiple embeddings to create a user representation
I’m building a recommendation system where each user interacts with sessions (topics with a title and description). I want to represent each user using their last 5 session interactions by creating a ...
1 vote
0 answers
31 views
RAG System Design: Context-Aware Customer Support for Property Management with Mixed Property-Specific and Global Information
Background I manage a property portfolio on platforms like Airbnb, handling customer support through the entire guest journey (pre-booking to post-stay). I'm building a RAG system to help automate ...
0 votes
1 answer
85 views
Find the correlation between two lists of texts
Let's say that I have some lists of texts such as : ...
1 vote
0 answers
149 views
Understanding the embeddings model (dunzhang/stella_en_400M_v5) by Alibaba. The details about the retrieve task and the s2s task
The model I am talking about is hosted here: From the documentation: We simplify usage of prompts, providing two prompts for most general tasks, one is for s2p, another one is for s2s.Prompt of s2p ...