MTFL: Multi-Timescale Feature Learning for Weakly-supervised Anomaly Detection in Surveillance Videos

Yiling Zhang, Erkut Akdag (Corresponderende auteur), Egor Bondarev, Peter H.N. de With

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Samenvatting

Detection of anomaly events is relevant for public safety and requires a combination of fine-grained motion information and contextual events at variable time-scales. To this end, we propose a Multi-Timescale Feature Learning (MTFL) method to enhance the representation of anomaly features. Short, medium, and long temporal tubelets are employed to extract spatio-temporal video features using a Video Swin Transformer. Experimental results demonstrate that MTFL outperforms state-of-the-art methods on the UCF-Crime dataset, achieving an anomaly detection performance 89.78% AUC. Moreover, it performs complementary to SotA with 95.32% AUC on the ShanghaiTech and 84.57% AP on the XD-Violence dataset. Furthermore, we generate an extended dataset of the UCF-Crime for development and evaluation on a wider range of anomalies, namely Video Anomaly Detection Dataset (VADD), involving 2,591 videos in 18 classes with extensive coverage of realistic anomalies.

Originele taal-2Engels
TitelSeventeenth International Conference on Machine Vision, ICMV 2024
RedacteurenWolfgang Osten
UitgeverijSPIE
Aantal pagina's8
ISBN van elektronische versie9781510688285
ISBN van geprinte versie9781510688278
DOI's
StatusGepubliceerd - 24 feb. 2025
Evenement17th International Conference on Machine Vision, ICMV 2024 - Edinburg, Verenigd Koninkrijk
Duur: 10 okt. 202413 okt. 2024

Publicatie series

NaamProceedings of SPIE - The International Society for Optical Engineering
Volume13517
ISSN van geprinte versie0277-786X
ISSN van elektronische versie1996-756X

Congres

Congres17th International Conference on Machine Vision, ICMV 2024
Land/RegioVerenigd Koninkrijk
StadEdinburg
Periode10/10/2413/10/24

Vingerafdruk

Duik in de onderzoeksthema's van 'MTFL: Multi-Timescale Feature Learning for Weakly-supervised Anomaly Detection in Surveillance Videos'. Samen vormen ze een unieke vingerafdruk.

Citeer dit