In recent decades, significant research effort has been put into developing solutions to support automated or semi-automated analysis of log files. A large number of algorithms appeared based on neural networks. This paper introduces a new approach to anomaly detection in log files that does not rely on neural networks. The building blocks of our approach have been well-known in machine learning for a long time. The author proposes to use a weighted Damerau-Levenshtein distance metric to quantify the similarity between log sequences. The author introduces a kNN-based algorithm for semi-supervised log anomaly detection, and an HDBSCAN-based solution for the unsupervised problem. For the latter, he extends the algorithm by incorporating a manual feedback mechanism, enabling human domain experts to modify sequence labels when necessary.
TY - JOURAU - Horvath, GaborAU - Mészáros, AndrásAU - Charaf, KamelAU - Szilágyi, PéterPY - 2026/01/10SP - T1 - Detecting anomalies in log files using the Damerau-Levenshtein distance metricVL - 40DO - 10.1007/s10618-025-01182-8JO - Data Mining and Knowledge DiscoveryER -
For full paper: https://www.researchgate.net/publication/399652190_Detecting_anomalies_in_log_files_using_the_Damerau-Levenshtein_distance_metric





