Skip to main content

Questions tagged [outlier]

For questions regarding outliers or unusual points in the data.

3 votes
0 answers
32 views

I have fitted a zero-inflated Poisson (ZIP) model to my count data with an excess of zeros (figure 1). Since it didn't capture the overdispersion in the data, and overdispersion is indeed ...
Paw in Data's user avatar
9 votes
2 answers
216 views

I am building a random forest regression model. The goal is to predict the maximum each customer will spend in a single transaction during the next 90 days. I have transaction data for 7m customers, ...
SRJCoding's user avatar
  • 191
2 votes
0 answers
41 views

The following two figures show raw data and filtered data recorded in a measurement. I have used SciPy's Savizky-Golay filter with window_length = 6 and polyorder of 3 to obtain the second plot. One ...
Subhadeep Bej's user avatar
3 votes
0 answers
62 views

I am using a rolling window z-score method to flag if a record is an outlier. Is it necessary to first normalize the values of the desired feature before computing the rolling z-score?
Mar's user avatar
  • 165
2 votes
0 answers
36 views

I am looking into different statistical methods for determining a decrease in a numeric "count" feature across a time-series dataset. The dataset is relatively small (about 50 records), and ...
Mar's user avatar
  • 165
9 votes
3 answers
2k views

I'm analyzing how outliers in my dataset of size 8x8000 affect regression models. I have three scenarios: raw dataset (with outliers), Winsorized dataset (2% of the extreme outliers adjusted), and ...
ml.freak's user avatar
  • 113
0 votes
1 answer
132 views

When working with machine learning or data preprocessing, the order of operations is crucial for accurate results. One common question is: Should normalization or standardization be applied before ...
Yogananda's user avatar
0 votes
0 answers
38 views

I'm facing a problem were I'd like to detect outliers from a data collection. The goal is to be able to identify outliers from a variable Y based on its relation with the variable X. To do that, I did:...
Arya's user avatar
  • 1
3 votes
1 answer
55 views

I have a small dataset that has 66 samples and 19 features. It is a numerical and tabular dataset. The goal is to predict a value according to these 19 features. The data is about a medical physics ...
Erfan Mollai's user avatar
0 votes
0 answers
123 views

Let say, I have the anomaly detection (unsupervised learning) dataset with 10 observations (two features). The datasets is like below: After executing the model, following are the results (anomalies ...
Bits's user avatar
  • 131
0 votes
1 answer
115 views

I have plotted box plots for the features of an ML problem, to identify outliers. I have scaled the data using a MinMaxScaler so that the scaled data is in the range [0,1]. For some columns, the two ...
san's user avatar
  • 1
0 votes
1 answer
69 views

Can we use tanh activation function to detect outliers ? Does my image below true for dataset outliers (after training model with tanh activation function) ?
user3668129's user avatar
1 vote
0 answers
48 views

I am trying to detect outliers with sklearn.covariance.EllipticEnvelope for a single variable, but it throws an unexpected error. Here is an example the reproduces ...
Maya's user avatar
  • 11
0 votes
1 answer
962 views

I am confused as to the pros and cons of two different approaches to normalization: Min-Max Scaling, and what the lecturer in the course I am taking refers to as 'Simple Feature Scaling'. The latter ...
Chris Bedford's user avatar
1 vote
1 answer
1k views

Just a question, i know that when we plot against the distribution of numerical data, those who fall outside of the boxplot (diamond shape point) are considered outlier. However, i met a case where ...
Razark's user avatar
  • 11

15 30 50 per page
1
2 3 4 5
16