Formal modeling and verification of distributed failure detectors

M. Atif

Research output: ThesisPhd Thesis 1 (Research TU/e / Graduation TU/e)

170 Downloads (Pure)

Abstract

Model checking is a systematic way of checking the absence of errors in a distributed system, i.e., assessing the functional requirements in a distributed system. However, there are certain challenges in this field, e.g., developing true abstract models and on their basis generalizing/guranteeing results, limited capacity of model checking tools and computational resources, identification of all requirements and their accurate specifications, etc. To understand and face such challenges, it is necessary to apply the prominent model checking techniques to different distributed systems designed for different communication models. In this thesis this challenge is accepted and resultantly encountered issues are discussed/addressed. The results reported are sufficient for advocating the need for applying model checking techniques as debugging. Therefore, we report bugs and the propose fixes but for ambiguous algorithms, we reconstruct them. We model check both fixed and reconstructed algorithms. We assess the following protocols: • Accelerated heartbeat protocols, • Consensus protocols in asynchronous distributed systems, • Group membership protocols and • Efficient algorithms to implement failure detectors in partially synchronous systems. We found that the accelerated heartbeat protocols proposed in [M.G. Gouda and T.M. McGuire, Accelerated Heartbeat Protocols, Proc. Of ICDCS’98], violated some natural and essential properties. We proved the results by giving counterexamples and developed the techniques to address the time-triggered events in mCRL2 and investigated the correct time bounds for all the protocols. Regarding consensus problem, we proved the correctness of the proposed algorithms where the failure detectors are unreliable (i.e., failure detectors may make mistakes). These algorithms are proposed in [T. Deepak Chandra and S. Toueg, Unreliable Failure Detectors for Reliable Distributed Systems, J. ACM, 1996 ]. For the group membership protocols proposed in [Y. Amir, D. Dolev, S. Kramer and D. Malki, Membership Algorithms for Multicast Communication Groups, Springer-Verlag, 1992], we found that the original specifications and the text explaining the protocols can be interpreted in different ways and even some natural interpretations contradict each other. Our formalization with respect to different interpretations showed the violation of claimed properties. So to resolve the ambiguities, we reconstructed the protocols and model-checked them. For analyzing the algorithms proposed in [M. Larrea, S. Arevalo and A.Fernndez, Efficient Algorithms to Implement Unreliable Failure Detectors in Partially Synchronous Systems, Proc. of DISC’99 ], we applied symmetry reduction techniques. We found that every algorithm encounters a deadlock if there is a bounded (yet arbitrarily large) buffer in the communication channel between a pair of nodes. We propose fixes for deadlock avoidance and model check the proposed algorithm in UPPAAL, FDR2 and MCRL2. We also present a comparison of these three tools for model checking one of the given four protocols.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Department of Mathematics and Computer Science
Supervisors/Advisors
  • Groote, Jan Friso, Promotor
  • van den Brand, Mark G.J., Promotor
  • Mousavi, Mohammad, Copromotor
Award date28 Sep 2011
Place of PublicationEindhoven
Publisher
Print ISBNs978-90-386-2620-8
DOIs
Publication statusPublished - 2011

Fingerprint Dive into the research topics of 'Formal modeling and verification of distributed failure detectors'. Together they form a unique fingerprint.

  • Cite this