TY - GEN
T1 - Semi-automated Generation of Accurate Ground-Truth for 3D Object Detection
AU - Zwemer, M.H.
AU - Scholte, D.
AU - de With, P.H.N.
PY - 2023/10/17
Y1 - 2023/10/17
N2 - Visual algorithms for traffic surveillance systems typically locate and observe traffic movement by representing all traffic with 2D boxes. These 2D bounding boxes around vehicles are insufficient to generate accurate real-world locations. However, 3D annotation datasets are not available for training and evaluation of detection for traffic surveillance. Therefore, a new dataset for training the 3D detector is required. We propose and validate seven different annotation configurations for automated generation of 3D box annotations using only camera calibration, scene information (static vanishing points) and existing 2D annotations. The proposed novel Simple Box method does not require segmentation of vehicles and provides a more simple 3D box construction, which assumes a fixed predefined vehicle width and height. The existing KM3D CNN-based 3D detection model is adopted for traffic surveillance, which directly estimates 3D boxes around vehicles in the camera image, by training the detector on the newly generated dataset. The KM3D detector trained with the Simple Box configuration provides the best 3D object detection results, resulting in 51.9% AP3D on this data. The 3D object detector can estimate an accurate 3D box up to a distance of 125 m from the camera, with a median middle point mean error of only 0.5–1.0 m.
AB - Visual algorithms for traffic surveillance systems typically locate and observe traffic movement by representing all traffic with 2D boxes. These 2D bounding boxes around vehicles are insufficient to generate accurate real-world locations. However, 3D annotation datasets are not available for training and evaluation of detection for traffic surveillance. Therefore, a new dataset for training the 3D detector is required. We propose and validate seven different annotation configurations for automated generation of 3D box annotations using only camera calibration, scene information (static vanishing points) and existing 2D annotations. The proposed novel Simple Box method does not require segmentation of vehicles and provides a more simple 3D box construction, which assumes a fixed predefined vehicle width and height. The existing KM3D CNN-based 3D detection model is adopted for traffic surveillance, which directly estimates 3D boxes around vehicles in the camera image, by training the detector on the newly generated dataset. The KM3D detector trained with the Simple Box configuration provides the best 3D object detection results, resulting in 51.9% AP3D on this data. The 3D object detector can estimate an accurate 3D box up to a distance of 125 m from the camera, with a median middle point mean error of only 0.5–1.0 m.
KW - 3D object detection
KW - Semi-automated annotation
KW - Traffic surveillance application
UR - http://www.scopus.com/inward/record.url?scp=85175958996&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-45725-8_2
DO - 10.1007/978-3-031-45725-8_2
M3 - Conference contribution
AN - SCOPUS:85175958996
SN - 978-3-031-45724-1
T3 - Communications in Computer and Information Science (CCIS)
SP - 21
EP - 50
BT - Computer Vision, Imaging and Computer Graphics Theory and Applications
A2 - de Sousa, A. Augusto
A2 - Debattista, Kurt
A2 - Paljic, Alexis
A2 - Ziat, Mounia
A2 - Hurter, Christophe
A2 - Purchase, Helen
A2 - Farinella, Giovanni Maria
A2 - Radeva, Petia
A2 - Bouatouch, Kadi
PB - Springer
CY - Cham
T2 - 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications , VISIGRAPP 2022
Y2 - 6 February 2022 through 8 February 2022
ER -