Samenvatting
Mutual Information (MI) is an established measure for the dependence of two variables and is often used as a generalization of correlation measures. Existing methods to estimate MI focus on static data. However, dynamic data is ubiquitous as well, and MI estimates on it are useful for stream mining and advanced monitoring tasks. In dynamic data, small changes (e.g., insertion or deletion of a value) may often invalidate the previous estimate. In this article, we study how to efficiently adjust an existing MI estimate when such a change occurs. As a first step, we focus on the well-known nearest-neighbor based estimators for static data and derive a tight lower bound for their computational complexity, which is unknown so far. We then propose two dynamic data structures that can update existing estimates asymptotically faster than any approach that computes the estimates independently, i.e., from scratch. Next, we infer a lower bound for the computational complexity of such updates, irrespective of the data structure and the algorithm, and present an algorithm that is only a logarithmic factor slower than this bound. In absolute numbers, these solutions offer fast and accurate estimates of MI on dynamic data as well.
| Originele taal-2 | Engels |
|---|---|
| Titel | Advances in Database Technology - EDBT 2018 |
| Subtitel | 21st International Conference on Extending Database Technology, Proceedings |
| Redacteuren | Michael Bohlen, Reinhard Pichler, Norman May, Erhard Rahm, Shan-Hung Wu, Katja Hose |
| Plaats van productie | Konstanz |
| Uitgeverij | OpenProceedings.org |
| Pagina's | 49-60 |
| Aantal pagina's | 12 |
| ISBN van elektronische versie | 978-3-89318-078-3 |
| Status | Gepubliceerd - 2018 |
| Evenement | 21st International Conference on Extending Database Technology, EDBT 2018 - Vienna, Oostenrijk Duur: 26 mrt. 2018 → 29 mrt. 2018 |
Congres
| Congres | 21st International Conference on Extending Database Technology, EDBT 2018 |
|---|---|
| Land/Regio | Oostenrijk |
| Stad | Vienna |
| Periode | 26/03/18 → 29/03/18 |
Financiering
This work was partially supported by the DFG Research Training Group 2153: “Energy Status Data − Informatics Methods for its Collection, Analysis and Exploitation”
Vingerafdruk
Duik in de onderzoeksthema's van 'On complexity and efficiency of mutual information estimation on static and dynamic data'. Samen vormen ze een unieke vingerafdruk.Citeer dit
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver