Questions tagged [feature-engineering]
the process of using domain knowledge of the data to create features that improve machine learning algorithms
648 questions
0 votes
0 answers
9 views
Approach to creating vector similarities between primitive voice sounds (i.e. basic consonants and vowels)?
I am working on some natural language stuff for fun, basically a rhyming dictionary, trying to figure it out. Trying next to figure out how to properly/decently capture the basic consonants + vowels ...
5 votes
1 answer
46 views
How Do You Balance Feature Search Strategy and HP Optimization Cost?
What I’m trying to figure out I'm working on a machine learning project and would love to hear your thoughts on two things: A. How to prioritize feature exploration B. Whether to fix hyperparameters (...
7 votes
1 answer
125 views
LSTM feature scaling with windowing?
Beginner ML practitioner here. I'm trying to do some time series forecasting on a fairly high resolution dataset that stretches over a long period of time. The values vary pretty widely over time: to ...
4 votes
2 answers
462 views
NLP of noisy unpredictable text to extract dates--just regex?
Question: Are there better approaches than regex for extracting event dates (including relative) from noisy text? Are there NLP tools that can help disambiguate multiple date mentions in various ...
9 votes
2 answers
216 views
Is it best practice to remove outliers from transaction data used for training?
I am building a random forest regression model. The goal is to predict the maximum each customer will spend in a single transaction during the next 90 days. I have transaction data for 7m customers, ...
3 votes
1 answer
367 views
What does it mean when even a small set of samples don't give 0 loss?
I'm trying to do a regression problem where I find Molar compositions of some chemical species. I'm using this kind of netwrok: ...
3 votes
1 answer
105 views
Principal Data Analysis - how to determine the key features contribute to PC1 using scikit-learn python
I struggle to select the key features that contribute to PC1. I will use the public breast cancer dataset to illustrate the issue. Please feel free to point me to previous post if this question has ...
0 votes
1 answer
95 views
Lagged feature engineering - time series forecasting with blocked cross validation
I am working in a team developing a time series forecasting model using xgboost (or similar). We have a draft workflow for optimising model hyperparameters, incorporating an initial train-test split ...
1 vote
1 answer
67 views
time series prediction with large number of static features
Need to make time series prediction on a large data set. There are both static and dynamic features. Static features like (store location id 10k+) and dynamics features like daily sales and daily ...
2 votes
0 answers
49 views
Feature engineering using Knowledge Graph
I have built a knowledge graph with every customer having the same attributes, for example, gradeA, gradeB, gradeC. With this graph I want to attempt to find patterns between customers with shared ...
8 votes
4 answers
940 views
Rounding Float Values in ML Models
Let's assume I have a column with float values (e.g., 3.12334354454, 5.75434331354, and so on). If I round these values to two decimal places (e.g., 3.12, 5.75), I think the advantages and ...
1 vote
0 answers
39 views
SHAP vs. Manual Analysis: Why Opposite Correlations for a feature?
When plotting a SHAP beeswarm plot on my binary classification model (predicting subscription renewal probability), one of the columns indicate that high feature values correlate with low SHAP values ...
0 votes
0 answers
20 views
How to Represent Structured Inputs in a Neural Network for Multi-Entity Prediction?
I'm building a neural network model to predict which student in a class will achieve the highest score on an upcoming exam (this is not the actual task, I actually modified the task to maintain ...
2 votes
1 answer
71 views
I didn't scale all features I used for prediction, does it make sense?
In my regression-based machine learning project, I have features like coordinates (latitude and longitude) that I prefer not to scale or transform. The main reason is that reversing the transformation ...
0 votes
1 answer
32 views
Calculating risk or amount of slipperiness based on historical weather data
Given hourly updates of precipitation amount (for the preceding hour) and temperature, how would you calculate if it's slippery or not?