Universal noiseless compression for noisy data

G.I. Shamir, T.J. Tjalkens, F.M.J. Willems

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

We study universal compression for discrete data sequences that were corrupted by noise. We show that while, as expected, there exist many cases in which the entropy of these sequences increases from that of the original data, somewhat surprisingly and counter-intuitively, universal coding redundancy of such sequences cannot increase compared to the original data. We derive conditions that guarantee that this redundancy does not decrease asymptotically (in first order) from the original sequence redundancy in the stationary memoryless case. We then provide bounds on the redundancy for coding finite length (large) noisy blocks generated by stationary memoryless sources and corrupted by some speci??c memoryless channels. Finally, we propose a sequential probability estimation method that can be used to compress binary data corrupted by some noisy channel. While there is much benefit in using this method in compressing short blocks of noise corrupted data, the new method is more general and allows sequential compression of binary sequences for which the probability of a bit is known to be limited within any given interval (not necessarily between 0 and 1). Additionally, this method has many different applications, including, prediction, sequential channel estimation, and others.
Original languageEnglish
Title of host publicationProceedings of the 2nd Information Theory and Applications workshop (ITA 2007) 29 January - 2 February 2007, San Diego, California, USA
Pages1/10-
Publication statusPublished - 2007
Eventconference; ITA 2007, San Diego, California, USA; 2007-01-29; 2007-02-02 -
Duration: 29 Jan 20072 Feb 2007

Conference

Conferenceconference; ITA 2007, San Diego, California, USA; 2007-01-29; 2007-02-02
Period29/01/072/02/07
OtherITA 2007, San Diego, California, USA

Fingerprint

Dive into the research topics of 'Universal noiseless compression for noisy data'. Together they form a unique fingerprint.

Cite this