diff --git a/config.tex b/config.tex index 42db43a..1951a7f 100644 --- a/config.tex +++ b/config.tex @@ -5,7 +5,7 @@ %%%%%%%%%%%%%%%%%% % Talk's title -\settalktitle{Anomaly Detection Seminar 2021/2022} +\settalktitle{Seminar Unsupervised Machine Learning - Anomaly Detection} % Author's name \settalkauthor{} diff --git a/slides.blg b/slides.blg index 5395a5e..bb5cbee 100644 --- a/slides.blg +++ b/slides.blg @@ -1,9 +1,9 @@ [0] Config.pm:312> INFO - This is Biber 2.15 [0] Config.pm:315> INFO - Logfile is 'slides.blg' -[53] biber:330> INFO - === Mo Okt 11, 2021, 17:26:11 -[63] Biber.pm:415> INFO - Reading 'slides.bcf' -[115] Biber.pm:952> INFO - Found 0 citekeys in bib section 0 -[119] Utils.pm:395> WARN - The file 'slides.bcf' does not contain any citations! -[125] bbl.pm:651> INFO - Writing 'slides.bbl' with encoding 'UTF-8' -[126] bbl.pm:754> INFO - Output to slides.bbl -[126] Biber.pm:128> INFO - WARNINGS: 1 +[51] biber:330> INFO - === Mo Okt 11, 2021, 17:38:14 +[61] Biber.pm:415> INFO - Reading 'slides.bcf' +[111] Biber.pm:952> INFO - Found 0 citekeys in bib section 0 +[116] Utils.pm:395> WARN - The file 'slides.bcf' does not contain any citations! +[122] bbl.pm:651> INFO - Writing 'slides.bbl' with encoding 'UTF-8' +[122] bbl.pm:754> INFO - Output to slides.bbl +[122] Biber.pm:128> INFO - WARNINGS: 1 diff --git a/slides.log b/slides.log index c18a4e2..15ef296 100644 --- a/slides.log +++ b/slides.log @@ -1,4 +1,4 @@ -This is LuaHBTeX, Version 1.13.0 (TeX Live 2021/Arch Linux) (format=lualatex 2021.6.8) 11 OCT 2021 17:26 +This is LuaHBTeX, Version 1.13.0 (TeX Live 2021/Arch Linux) (format=lualatex 2021.6.8) 11 OCT 2021 17:38 system commands enabled. **slides (./slides.tex @@ -2068,12 +2068,12 @@ Package polyglossia Info: Option: English, variant=american. Package polyglossia Info: Option: english variant=american (with additional pat terns). Module polyglossia Info: Language data for usenglishmax -(polyglossia) patterns hyph-en-us.pat.txt -(polyglossia) hyphenation hyph-en-us.hyp.txt -(polyglossia) righthyphenmin 3 -(polyglossia) lefthyphenmin 2 (polyglossia) loader loadhyph-en-us.tex -(polyglossia) synonyms on input line 35 +(polyglossia) hyphenation hyph-en-us.hyp.txt +(polyglossia) synonyms +(polyglossia) lefthyphenmin 2 +(polyglossia) patterns hyph-en-us.pat.txt +(polyglossia) righthyphenmin 3 on input line 35 Module polyglossia Info: Language usenglishmax was not yet loaded; created with id 2 on input line 35 Package polyglossia Info: Option: english variant=american (with additional pat @@ -2135,12 +2135,12 @@ braces): > {german/localnumeral} => {polyglossia@C@localnumeral} > {german/Localnumeral} => {polyglossia@C@localnumeral}. Module polyglossia Info: Language data for german -(polyglossia) patterns hyph-de-1901.pat.txt -(polyglossia) hyphenation -(polyglossia) righthyphenmin 2 -(polyglossia) lefthyphenmin 2 (polyglossia) loader loadhyph-de-1901.tex -(polyglossia) synonyms on input line 10 +(polyglossia) hyphenation +(polyglossia) synonyms +(polyglossia) lefthyphenmin 2 +(polyglossia) patterns hyph-de-1901.pat.txt +(polyglossia) righthyphenmin 2 on input line 10 Module polyglossia Info: Language german was not yet loaded; created with id 3 o n input line 10 Package polyglossia Info: Option: German, spelling=new. @@ -3104,12 +3104,12 @@ Package biblatex Info: ... file 'german.lbx' found. (/usr/share/texmf-dist/tex/latex/biblatex/lbx/german.lbx File: german.lbx 2020/12/31 v3.16 biblatex localization (PK/MW) Module polyglossia Info: Language data for ngerman -(polyglossia) patterns hyph-de-1996.pat.txt -(polyglossia) hyphenation -(polyglossia) righthyphenmin 2 -(polyglossia) lefthyphenmin 2 (polyglossia) loader loadhyph-de-1996.tex -(polyglossia) synonyms on input line 561 +(polyglossia) hyphenation +(polyglossia) synonyms +(polyglossia) lefthyphenmin 2 +(polyglossia) patterns hyph-de-1996.pat.txt +(polyglossia) righthyphenmin 2 on input line 561 Module polyglossia Info: Language ngerman was not yet loaded; created with id 5 on input line 561 ) @@ -3961,15 +3961,15 @@ Here is how much of LuaTeX's memory you used: n, 63 penalty, 5 margin_kern, 361 glyph, 256 attribute, 92 glue_spec, 256 attrib ute_list, 4 write, 24 pdf_literal, 92 pdf_colorstack, 1 pdf_setmatrix, 1 pdf_sav e, 1 pdf_restore nodes - avail lists: 1:3,2:387,3:215,4:335,5:215,6:59,7:2079,8:9,9:466,10:24,11:136,1 + avail lists: 1:4,2:387,3:215,4:335,5:215,6:59,7:2079,8:9,9:466,10:24,11:136,1 2:1 81249 multiletter control sequences out of 65536+600000 - 116 fonts using 34613951 bytes + 116 fonts using 34614111 bytes 136i,20n,154p,819b,2327s stack positions out of 5000i,500n,10000p,200000b,80000s -Output written on slides.pdf (22 pages, 2035288 bytes). +Output written on slides.pdf (22 pages, 2035304 bytes). PDF statistics: 262 PDF objects out of 1000 (max. 8388607) 172 compressed objects within 2 object streams diff --git a/slides.pdf b/slides.pdf index f5ed2d6..e5a5eea 100644 Binary files a/slides.pdf and b/slides.pdf differ diff --git a/slides.tex b/slides.tex index f5fbca1..0c4eeab 100644 --- a/slides.tex +++ b/slides.tex @@ -39,8 +39,8 @@ \begin{columns} \begin{column}{.475\textwidth} \begin{itemize} - \item Kick-Off - \item Some Formal Stuff + \item Kick-Off Meeting + \item Some Formalities \item Short Overview of the Topics \end{itemize} \begin{center} @@ -53,7 +53,7 @@ \begin{itemize} \item Choose a couple topics \begin{itemize} - \item Since we are only a few, you can make these requests quite complicated if you like (I prefer topic 1, but I would also take 3 or 7, except when I can do it in german, then I would prefer topic 12) + \item Since we are only a few, you can make these requests quite complicated (I prefer topic 1, but I would also take 3 or 7, except when I can do it in german, then I would prefer topic 12) \end{itemize} \item Send your choice to Simon.Kluettermann@cs.tu-dortmund.de (till tomorrow 13.10.2021 23:59) \item You will be assigned one in the next days diff --git a/summary.txt b/summary.txt index 0d408fe..d5c34eb 100644 --- a/summary.txt +++ b/summary.txt @@ -1,63 +1,66 @@ -Contextual Outliers -This Paper focuses on interpretability of anomaly detection methods. The Method described works by splitting up the set of normal events into groups and tries to relate any abnormal event to its surrounding normal ones. I would say it is more practical and I want to strongly encurage you to implement this algorithm if choosen. -https://arxiv.org/abs/1711.10589 +1) Anomaly Detection for Monitoring +This is more a book and less of a Paper. So it should be perfect for you if you have not that much experience. If focusses on Time Series Analyses, namely the Task of detection when a continous datastream becomes anomalous. This is for examle useful for a machine supervised by sensors that at some point stops working (and thus changes the sensor output) +https://assets.dynatrace.com/content/dam/en/wp/Anomaly-Detection-for-Monitoring-Ruxit.pdf -Active AD via Ensembles... -This Paper tries do to a lot. I suggest that you focus on the active learning part. Alternatively we have also a paper on ensembles so if you both want, you can combine these papers to be worked on by two Students. Active AD extends the task of finding anomalies to the case in which the anomaly status of the training events is not clearly defined. Its focus here lies in minimizing the amount of human work needed to classify a given dataset (given some labels, train a model, find those new events that are unclear, classify those, restart). -I want to note here, that great work on an easy topic if for us the same as good work on a hard topic. -https://arxiv.org/pdf/1901.08930 +2) A comprehensive survey of anomaly detection techniques for high dimensional big data +Anomaly Detection is generally more complicated when you are given higher dimensional data (Curse of dimensionality). This seems a little weird, as usually machine learning improves when you are given more informations. I imagine it as useless features confusing this algorithm. This Paper could be seen as a study of this phenomena. +https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-020-00320-x.pdf -Interpretable AD for Device Failure -This is an Application Paper. Its complexity comes mostly from the fact that real world data is messy and the Paper addresses ways to mitigate this. -https://arxiv.org/pdf/2007.10088 - -Neural Transformation Learning for Deep Anomaly Detection Beyond Images -While for Image data, certain pre-Transformations(like Rotations) can clearly improve Machine Learning Tasks like Anomaly Detection, this is much less well defined for Time-Series/Tabular data. This Paper tries to solve this by defining learnable Transformations. -https://arxiv.org/pdf/2103.16440 - -Unsupervised Anomaly Detection Ensembles using Item Response Theory -Different AD algorithms are usually better at finding different types of anomalies. To get a more general algorithm you can combine multiple ones into one using Ensembles. -This Paper could be merged together with "Active AD via Ensembles" to be handled by two students. -https://arxiv.org/pdf/2106.06243 - -A Comprehensive Survey on Graph Anomaly Detection with Deep Learning +3) A Comprehensive Survey on Graph Anomaly Detection with Deep Learning A lot of datasets that are interesting to AD (For example Email Communications or Trading Data) can be best represented as graphs. This provides unique challeges for AD algorithms. This is a paper that could either be handled by two students or split up into two. Maybe one considers anomalous graphs, while the other one considers anomalous nodes in graphs. https://arxiv.org/pdf/2106.07178 -Additive Explanations for Anomalies Detected from Multivariate Temporal Data +4) LOF: identifying Density-Based Local Outliers +LOF is a classical algorithm used in many Applications. This is the original Paper introducing it. As this is a fairly old Paper, you will also find a lot of other sources describing LOF. +https://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf + +5) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking +As most datapoints are quite high-dimensional, it is often the case that some features are useless and could actually take part in hiding the true abnomalities. This Paper suggests a method to select a subspace that filters out unimportant features. +This paper was cowritten by Prof. Müller and might be related to a future Masters thesis. +https://www.ipd.kit.edu/~muellere/publications/ICDE2012.pdf + +6) Neural Transformation Learning for Deep Anomaly Detection Beyond Images +While for Image data, certain pre-Transformations(like Rotations) can clearly improve Machine Learning Tasks like Anomaly Detection, this is much less well defined for Time-Series/Tabular data. This Paper tries to solve this by defining learnable Transformations. +https://arxiv.org/pdf/2103.16440 + +7) A Survey on GANs for Anomaly Detection +GANs are an advanced ML method, normally used to generate really realistic artificial Images (check out https://thispersondoesnotexist.com/ if you have never done so). But these can also be used for anomaly detection. Your task would be to explain how. +https://arxiv.org/pdf/1906.11632 + + +8) Unsupervised Anomaly Detection Ensembles using Item Response Theory +Different AD algorithms are usually better at finding different types of anomalies. To get a more general algorithm you can combine multiple ones into one using Ensembles. +This Paper could be merged together with "Active AD via Ensembles" to be handled by two students. +https://arxiv.org/pdf/2106.06243 + + +9) Active AD via Ensembles... +This Paper tries do to a lot. I suggest that you focus on the active learning part. Alternatively we have also a paper on ensembles so if you both want, you can combine these papers to be worked on by two Students. Active AD extends the task of finding anomalies to the case in which the anomaly status of the training events is not clearly defined. Its focus here lies in minimizing the amount of human work needed to classify a given dataset (given some labels, train a model, find those new events that are unclear, classify those, restart). +I want to note here, that great work on an easy topic if for us the same as good work on a hard topic. +https://arxiv.org/pdf/1901.08930 + +10) Contextual Outliers +This Paper focuses on interpretability of anomaly detection methods. The Method described works by splitting up the set of normal events into groups and tries to relate any abnormal event to its surrounding normal ones. I would say it is more practical and I want to strongly encurage you to implement this algorithm if choosen. +https://arxiv.org/abs/1711.10589 + + +11) Additive Explanations for Anomalies Detected from Multivariate Temporal Data Explaining why a given event is anomalous can be as important as detecting it, as it helps to create Trust. This Paper suggests a Method that is based on differentiating between features that contribute more and less. It is also a quite short paper, so it is extra important to look for other papers. https://dl.acm.org/doi/abs/10.1145/3357384.3358121 requires vpn, contact me if you have problems with this -Anomaly Detection for Monitoring -This is more a book and less of a Paper. So it should be perfect for you if you have not that much experience. If focusses on Time Series Analyses, namely the Task of detection when a continous datastream becomes anomalous. This is for examle useful for a machine supervised by sensors that at some point stops working (and thus changes the sensor output) -https://assets.dynatrace.com/content/dam/en/wp/Anomaly-Detection-for-Monitoring-Ruxit.pdf +12) Interpretable AD for Device Failure +This is an Application Paper. Its complexity comes mostly from the fact that real world data is messy and the Paper addresses ways to mitigate this. +https://arxiv.org/pdf/2007.10088 -Fast Unsupervised Anomaly Detection in Traffic Videos +13) Fast Unsupervised Anomaly Detection in Traffic Videos This is another Application Paper. Its main complexity is the Input data type, as this uses videos (which are very high dimensional and contain temporal correlations). You will see how good preprocessing can make even a basic algorithm viable for complicated problems. https://openaccess.thecvf.com/content_CVPRW_2020/papers/w35/Doshi_Fast_Unsupervised_Anomaly_Detection_in_Traffic_Videos_CVPRW_2020_paper.pdf -HiCS: High Contrast Subspaces for Density-Based Outlier Ranking -As most datapoints are quite high-dimensional, it is often the case that some features are useless and could actually take part in hiding the true abnomalities. This Paper suggests a method to select a subspace that filters out unimportant features. -This paper was cowritten by Prof. Müller and might be related to a future Masters thesis. -https://www.ipd.kit.edu/~muellere/publications/ICDE2012.pdf -LOF: identifying Density-Based Local Outliers -LOF is a classical algorithm used in many Applications. This is the original Paper introducing it. As this is a fairly old Paper, you will also find a lot of other sources describing LOF. -https://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf - -A comprehensive survey of anomaly detection techniques for high dimensional big data -Anomaly Detection is generally more complicated when you are given higher dimensional data (Curse of dimensionality). This seems a little weird, as usually machine learning improves when you are given more informations. I imagine it as useless features confusing this algorithm. This Paper could be seen as a study of this phenomena. -https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-020-00320-x.pdf - -A Survey on GANs for Anomaly Detection -GANs are an advanced ML method, normally used to generate really realistic artificial Images (check out https://thispersondoesnotexist.com/ if you have never done so). But these can also be used for anomaly detection. Your task would be to explain how. -https://arxiv.org/pdf/1906.11632 - -Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding +14) Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding This is another Application Paper, but this time using a more complicated algorithms from recurrent ML. It tries to monitor the evergrowing amount of spacecrafts for anomalous behaviour. https://arxiv.org/pdf/1802.04431 -