2
$\begingroup$

I have a dataset with following data format:

3 -> a -> b -> c -> d -> ikd

a -> c -> 3 -> dk -> 2 -> l2i

Each row represents a path from start to end. Let's take the first row as an example. The start point is 3 and the endpoint is ikd. I have millions of rows like that. And each row may have a different length. What I want to do is let users input a path with the above format and I predict a new path based on the existent dataset. The idea is to find the closest path for each user. Each node on the path may have a weight to indicate how important this node is. So the closest path should respect this weight.

Is there any ML algorithm for this kind of problem? I am new to ML so I need some guidance on solving this problem.


EDIT: There is no specific meaning for the type of each node, like number, letter, etc. All these nodes are of type string and the user's input is from an existent string list.

$\endgroup$
4
  • $\begingroup$ What is the significance of the different types of vertices; letters, numbers, ngrams. Is there a grammar that determines which transitions are possible? $\endgroup$ Commented Mar 13, 2018 at 0:12
  • $\begingroup$ there is no meaning of the vertices type. Everything is in string format. And users will input the node from a node category which means there should not be any non-existed value. $\endgroup$ Commented Mar 13, 2018 at 0:15
  • $\begingroup$ Read about sequence prediction. There are many tutorials; e.g., keras, tensorflow. $\endgroup$ Commented Mar 13, 2018 at 16:49
  • $\begingroup$ It seems that you are looking to find best match according to some sequence similarity metric, like Edit distance. From that Wikipedia page you can navigate to Levenshtein distance, graph versions an applications in biochemistry. For millions of inputs you probably also needed to use some solutions to make the search more efficient, information retrieval is a very broad topic. $\endgroup$ Commented Jan 26, 2024 at 8:47

1 Answer 1

1
$\begingroup$

This is called sequence prediction.

Common algorithms for sequence prediction are:

The choice of algorithm depends on the underlying structure of the data and how much data is available.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.