TY - GEN
T1 - Quantifying the resiliency of fail-operational real-time networked control systems
AU - Gujarati, Arpan
AU - Nasri, Mitra
AU - Brandenburg, Björn B.
PY - 2018/6/1
Y1 - 2018/6/1
N2 - In time-sensitive, safety-critical systems that must be fail-operational, active replication is commonly used to mitigate transient faults that arise due to electromagnetic interference (EMI). However, designing an effective and well-performing active replication scheme is challenging since replication conflicts with the size, weight, power, and cost constraints of embedded applications. To enable a systematic and rigorous exploration of the resulting tradeoffs, we present an analysis to quantify the resiliency of fail-operational networked control systems against EMI-induced memory corruption, host crashes, and retransmission delays. Since control systems are typically robust to a few failed iterations, e.g., one missed actuation does not crash an inverted pendulum, traditional solutions based on hard real-time assumptions are often too pessimistic. Our analysis reduces this pessimism by modeling a control system's inherent robustness as an (m, k)-firm specification. A case study with an active suspension workload indicates that the analytical bounds closely predict the failure rate estimates obtained through simulation, thereby enabling a meaningful design-space exploration, and also demonstrates the utility of the analysis in identifying non-trivial and non-obvious reliability tradeoffs.
AB - In time-sensitive, safety-critical systems that must be fail-operational, active replication is commonly used to mitigate transient faults that arise due to electromagnetic interference (EMI). However, designing an effective and well-performing active replication scheme is challenging since replication conflicts with the size, weight, power, and cost constraints of embedded applications. To enable a systematic and rigorous exploration of the resulting tradeoffs, we present an analysis to quantify the resiliency of fail-operational networked control systems against EMI-induced memory corruption, host crashes, and retransmission delays. Since control systems are typically robust to a few failed iterations, e.g., one missed actuation does not crash an inverted pendulum, traditional solutions based on hard real-time assumptions are often too pessimistic. Our analysis reduces this pessimism by modeling a control system's inherent robustness as an (m, k)-firm specification. A case study with an active suspension workload indicates that the analytical bounds closely predict the failure rate estimates obtained through simulation, thereby enabling a meaningful design-space exploration, and also demonstrates the utility of the analysis in identifying non-trivial and non-obvious reliability tradeoffs.
KW - Networked control systems
KW - Probabilistic analysis
KW - Reliability analysis
UR - http://www.scopus.com/inward/record.url?scp=85049298712&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ECRTS.2018.16
DO - 10.4230/LIPIcs.ECRTS.2018.16
M3 - Conference contribution
AN - SCOPUS:85049298712
T3 - Leibniz International Proceedings in Informatics (LIPIcs)
BT - 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)
A2 - Altmeyer, Sebastian
PB - Schloss Dagstuhl - Leibniz-Zentrum für Informatik
T2 - 30th Euromicro Conference on Real-Time Systems, ECRTS 2018
Y2 - 3 June 2018 through 6 June 2018
ER -