Questions tagged [outlier]
For questions regarding outliers or unusual points in the data.
227 questions
3 votes
0 answers
32 views
Why can zero-inflated generalized Poisson model not capture the overdispersion in the count data?
I have fitted a zero-inflated Poisson (ZIP) model to my count data with an excess of zeros (figure 1). Since it didn't capture the overdispersion in the data, and overdispersion is indeed ...
9 votes
2 answers
216 views
Is it best practice to remove outliers from transaction data used for training?
I am building a random forest regression model. The goal is to predict the maximum each customer will spend in a single transaction during the next 90 days. I have transaction data for 7m customers, ...
2 votes
0 answers
41 views
How to remove spurious data points recorded in a measurement? How to improve the result obtained using a Savitzky-Golay filter?
The following two figures show raw data and filtered data recorded in a measurement. I have used SciPy's Savizky-Golay filter with window_length = 6 and polyorder of 3 to obtain the second plot. One ...
3 votes
0 answers
62 views
Rolling z-score and normalizing
I am using a rolling window z-score method to flag if a record is an outlier. Is it necessary to first normalize the values of the desired feature before computing the rolling z-score?
2 votes
0 answers
36 views
Anomaly detection time in time-series for drops
I am looking into different statistical methods for determining a decrease in a numeric "count" feature across a time-series dataset. The dataset is relatively small (about 50 records), and ...
9 votes
3 answers
2k views
Regression model R2 drops when I remove outliers: is that even possible?
I'm analyzing how outliers in my dataset of size 8x8000 affect regression models. I have three scenarios: raw dataset (with outliers), Winsorized dataset (2% of the extreme outliers adjusted), and ...
0 votes
1 answer
132 views
Is normalization required before outlier detection?
When working with machine learning or data preprocessing, the order of operations is crucial for accurate results. One common question is: Should normalization or standardization be applied before ...
0 votes
0 answers
38 views
How to determine outliers based on a regression logarithmic-scaled?
I'm facing a problem were I'd like to detect outliers from a data collection. The goal is to be able to identify outliers from a variable Y based on its relation with the variable X. To do that, I did:...
3 votes
1 answer
55 views
huge outliers in small dataset
I have a small dataset that has 66 samples and 19 features. It is a numerical and tabular dataset. The goal is to predict a value according to these 19 features. The data is about a medical physics ...
0 votes
0 answers
123 views
Confused with Isolation Forest
Let say, I have the anomaly detection (unsupervised learning) dataset with 10 observations (two features). The datasets is like below: After executing the model, following are the results (anomalies ...
0 votes
1 answer
115 views
How to identify outliers on a box and whisker plot that seems to be compressed?
I have plotted box plots for the features of an ML problem, to identify outliers. I have scaled the data using a MinMaxScaler so that the scaled data is in the range [0,1]. For some columns, the two ...
0 votes
1 answer
69 views
can we use tanh activation function to detect outliers?
Can we use tanh activation function to detect outliers ? Does my image below true for dataset outliers (after training model with tanh activation function) ?
1 vote
0 answers
48 views
Outlier detection with elliptic envelope - unexpected error
I am trying to detect outliers with sklearn.covariance.EllipticEnvelope for a single variable, but it throws an unexpected error. Here is an example the reproduces ...
0 votes
1 answer
962 views
Min-Max Scaling more sensitive to outliers than 'Simple Feature Scaling'?
I am confused as to the pros and cons of two different approaches to normalization: Min-Max Scaling, and what the lecturer in the course I am taking refers to as 'Simple Feature Scaling'. The latter ...
1 vote
1 answer
1k views
Outlier Handing when most value is 0
Just a question, i know that when we plot against the distribution of numerical data, those who fall outside of the boxplot (diamond shape point) are considered outlier. However, i met a case where ...