thestraf/README

This git repository represents a comparison of two algorithms for anomaly detection on a (not completely) random dataset "thyroid" from here: http://odds.cs.stonybrook.edu/

Next to a classical autoencoder (ae.py) you find the algorithm in question (traf.py), which is explained further down.

As it is usually stated in anomaly detection, both algorithms are given a set of normal datapoints and try to understand some structure from them. Afterwards this is evaluated with some known anomalies (and some known normals). This is done by calculating a score known as AUC. Important here is only that a higher AUC is a better AUC (and maybe that an AUC~=0.5 represents randomly guessing if an event is normal or abnormal), but you can find more Informations here: https://en.wikipedia.org/wiki/Receiver_operating_characteristic

This score is printed after running each code. It is subject to statistics, but the best scores that I could achieve were about ~0.7 for the autoencoder, while my proposed algorithm reached always an AUC score>0.9. And even when it is probable that this improvement does not hold for every dataset, this algorithmus still has some usage, as it has much fewer parameters and hyperparameters while being much easier to interpret (every parameter is given in a 6x6 Matrix (since every datapoint is 6 dimensional), instead of a complicated neuronal network).

To explain the algorithms:

Autoencoders are a common enough algorithm for somebody having explained them better than I ever could. Here is a random article that I just googled: https://towardsdatascience.com/anomaly-detection-with-autoencoder-b4cdce4866a6

Assuming you understand them, my (still unnamed) algorithm could be seen as a simplification of them.
Instead of having multiple layers, it uses just one. This removes the encoder part of the Autoencoder and creates the thing that makes this algorithm most interesting to me. Finding a Matrix that maps x to x should be trivial (x=1*x) and a trivial Matrix would not at all be able to differentiate between normal and abnormal events. And this is basically what happens if you use tensorflow to optimize this matrix. And even though also here the AUC score is not random (0.5), it is still much lower and represents a much less decisive anomaly detection.
The interesting part comes when you use another optimizer, namely evolutionary optimisation. Ive been interested in evolutionary optimisation for a while, and when I noticed that this should be evolutionary optimisable fairly well, (since it is only one relatively small matrix) I used this opportunity to try out a new optimizer that I was thinking about (these are most of the other .py files in this git) and suddently this anomaly detection algorithm becomes good (good in the sense that it is able to differentiate well between normal and abnormal. It is much slower (I just stop traf.py at 10min of optimisation) and scales terrible to higher sizes (without thinking much, I would say with n**4)).

And I think that is super interesting: This algorithm should not work at all. But for some reason, it is actually quite good.

T task for a thesis would be to try to understand why this works and then maybe try to improve its speed, its scaling and or its quality.
If you have any questions, please feel free to write an email to Simon.Kluettermann@cs.tu-dortmund.de