Questions tagged [text]

Question 1

Cross-posting what I wrote here, Chinese Character Frequency for all ~100,000 Chinese Unicode Characters?, where I explain in more detail how I have been unable to find a Chinese character frequency ...

Question 2

What known state of art techniques might ChatGPT-4o, Claude 3 or other similar systems be using to understand both text and image data? I noticed that ChatGPT-4o can recognize text in an image well. ...

Question 3

I'm looking to use ML to read in a blob of text, and extract a name from that text blob. (The blob is from an OCR result from an iPhone) The text blob varies in size, but the name is always present in ...

Question 4

In How can the accuracy of the dictionary-based approach be measured and improved?, one user says that: dictionary-based approach is a heuristic method Isn't that this approach is a type of rule-...

Question 5

Question: I'm working on a project where I need to cluster a dataset of articles based on various features, including text, numeric values, and categorical data. I've implemented a clustering approach ...

Question 6

I have a dataset that I have collected for specific topic. The dataset is in the following format: Raw text (similar to shakespeare dataset) where it has no label or input, just text Question and ...

Question 7

I have been facing difficulties while loading specific lines from a text file. The lines contain characters such as Ù¹Ø§Ù… Ø¨ÛŒÙ…Ø§Ø± ÛÛ’Û” Ù¹Ø§Ù… Ø¨ÛŒÙ…Ø§Ø± ÛÛ’. I have tried using different ...

Question 8

I am new to ML and trying to solve problem of text segmentation. I have a transcript of news show and I want to split this transcript into parts by topic. I tried to google and asked chatgpt and found ...

Question 9

I have a set of podcast episode transcriptions in Arabic. I wish to convert these to embedding vectors so I can run a similarity comparison of them. Here's the summary statistics on the episodes: ...

Question 10

An important limiting factor on the performance of large language models, is the amount of training text available. Of course, using e.g. the Gutenberg archive of public domain books is an obvious ...

Question 11

I want to 'lemmatize' phrases to dictionary entries. For instance, the following collocates can be standardized to the idiom in the aforementioned link ...

Question 12

I am developing a fine tune model to emulate a tech support chatbot based on my given information. I am struggling to create a large dataset (aiming for 1000 prompt/completion pairs), does anyone have ...

Question 13

I am working on an analysis using a dictionary-based text-as-data approach. I have a dataset of texts (n=1200), and I am applying a dictionary of 50 words (I tokenize the text with each word being one ...

Question 14

Can you suggest me some papers to read about deep learning models that find patterns/similarities between different texts? What I have is a set of reviews with the following categories for each review:...

Question 15

I am working on a classification model using one of the following three algorithms: RandomForestClassifier, a TensorFlow model and a LogisticRegression model. The data set I am working with has a ...

Stack Exchange Network

Questions tagged [text]

Possible ways to collect frequency data for all ~100,000 Chinese Unicode characters?

How does ChatGPT-4o work on text + image data?

Text Classification with unlimited labels, Text Extraction?

Why is dictionary-based approach a heuristic method?

Clustering Similar Articles Using Mixed Data: Seeking Advice and Validation

Best practice for fine tuning LLM

Trouble Loading Lines from Text File with Various Encodings

Text segmentation problem

How do people usually handle creating an embedding vector of longer texts (32000 characters?

Were any LLMs trained on Google books?

Can this task for phrases be called lemmatization?

Creating variations of prompts for ChatGPT

Dictionary-based text analysis- dealing with length

how to extract common aspects from text using deep learning?

Predictive value of short text fields

Hot Network Questions