0
$\begingroup$

I am trying to do unsupervised anomaly detection on a dataset with a dozen of variables. None of them have descriptions, and the dataset doesn't have any labels or class variable.

I have tried using a Robust scaler to fit and transform the data, and then fitting an Isolation Forest and Local outlier factor on the dataset, but when I try to visualize the results by doing a PCA on the data that has been scaled, I cannot really see anomalies or clear outliers in the visualization, but I'm not sure why. I've looked into some tutorials, and they follow the same steps as I did.

How can evaluate my models and visualize the outliers ? By plotting the anomalies in red they don't really seem to be far away or differs from the normal data point on a PCA plot.

Am I missing something or forgetting an important step before fitting the algorithm ?

$\endgroup$
0

2 Answers 2

0
$\begingroup$

To evaluate your method in a structured fashion, you need to label some data. You cannot determine how well it identifies anomalies, if you have not defined what anomalies actually mean.

$\endgroup$
0
$\begingroup$

https://medium.com/simform-engineering/anomaly-detection-with-unsupervised-machine-learning-3bcf4c431aff

You can refer to this too Am I super late tho ....

$\endgroup$
1
  • $\begingroup$ While the linked article is very interesting, just posting a link as an answer is not desirable on stack exchange sites for many reasons such as the link becoming dead for some reason in the future. At the least, could you write a summary of it? Otherwise it would be better if it were a comment to the question, not an answer. $\endgroup$ Commented Oct 1 at 19:25

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.