Abstract
Dynamic Vision Sensors (DVS) offer unique advantages, such as high temporal resolution and low power consumption, making them ideal for low-latency, energy-efficient applications. Current techniques frequently underutilize their capabilities because they depend on conventional frame-based deep neural networks, which sacrifice temporal detail and demand high computational resources. In this work, we propose SpikeVision, a Transformer-inspired Spiking Neural Network (SNN) model with a fully event-based encoding and processing strategy tailored for DVS input streams. SpikeVision integrates attention-inspired mechanisms adapted for spiking computations, enabling efficient spatial feature extraction without relying on matrix multiplications while leveraging stateful neurons for temporal event processing. We demonstrate that SpikeVision achieves state-of-the-art classification accuracy (99.3%) on the DVS128 Gesture benchmark while maintaining low energy consumption in Field-Programmable Gate Array (FPGA) implementations, highlighting its potential for real-time, edge-based vision tasks.
Original language | English |
---|---|
Publication status | Accepted/In press - 2024 |
Event | The Asilomar Conference on Signals, Systems, and Computers - Asilomar, Pacific Grove, United States Duration: 27 Oct 2024 → 30 Oct 2024 https://www.asilomarsscconf.org/ |
Conference
Conference | The Asilomar Conference on Signals, Systems, and Computers |
---|---|
Abbreviated title | ACSSC 2024 |
Country/Territory | United States |
City | Pacific Grove |
Period | 27/10/24 → 30/10/24 |
Internet address |
Funding
This work was supported by the Dutch Research Council (NWO) IMAGINE project, Grant ID: 17911, KICH1.ST04.22.033.
Funders | Funder number |
---|---|
Not added | KICH1.ST04.22.033 |
Keywords
- spiking neural network
- Transformers
- dynamic vision sensor
- Edge AI
- gesture classification
- FPGA