Samenvatting
Respecting stringent timing constraints is a fundamental requirement for many safety-critical cyber-physical systems, such as autonomous vehicles, avionics, robotics, and industrial automation systems. These systems often operate under real-time constraints, where missing a deadline can lead to system failure or even catastrophic consequences. At the heart of ensuring timing predictability and meeting timing constraints in real-time systems lies the scheduling problem: constructing a schedule or scheduling policy for a set of tasks, each characterized by its execution time, dependencies, and deadlines, on a system with a given set of computing resources, such that all timing constraints are satisfied. To assess whether such a schedule can meet all timing requirements, scheduling analysis focuses on determining the worst-case response time of each task, since a schedule is feasible only if every task’s worst-case response time does not exceed its deadline. This feasibility condition is the core timing requirement in real‑time systems, and in safety-critical domains, it becomes even more crucial, as meeting timing requirements must be guaranteed not only on average but under all scenarios, including the worst case. As hardware platforms and software applications become increasingly complex, system designers face significant obstacles in ensuring predictable timing behavior. The use of multicore and heterogeneous architectures, while offering higher performance, introduces challenges such as task migration, designing a proper scheduling policy, and accurately analyzing the system’s timing behavior to determine the worst-case response time of each task and ensure compliance with timing requirements. At the same time, meeting system-level requirements, such as end-to-end latencies and ensuring reliability in the presence of faults, requires analysis techniques that can account for complex interactions between tasks, resources, and fault-tolerance mechanisms. These challenges are further complicated by the substantial timing variability due to modern processor features, including caches, pipelines, and out-of-order execution, which make the worst-case execution time highly sensitive to both hardware and software conditions. Moreover, factors like data dependencies and interference from co-running tasks worsen this variability, widening the gap between average-case and worst-case execution times. Combined with release jitter, these uncertainties make response-time analysis increasingly challenging. As a result, computing tight and safe bounds on the worst-case response time remains one of the most critical and difficult problems in the verification and design of real-time systems. In practice, response-time analysis must meet two essential criteria: accuracy (providing safe and tight bounds), and scalability (remaining computationally feasible as the size and complexity of the system increases). Accurate analysis enables the use of sufficiently tight response-time bounds, helping designers avoid the inefficiencies of overly pessimistic estimates that waste computational resources. In contrast, using unsafe or overly optimistic bounds can lead to missed deadlines and potentially catastrophic system failures. Scalability, on the other hand, is an important property for integrating the analysis into iterative design space exploration and validation processes. Although response-time analysis is typically performed offline and does not affect the system’s runtime performance, it must still produce results within a reasonable time to support development cycles and facilitate the certification of safety-critical systems. This is particularly challenging because many instances of the underlying worst-case response time problem are NP-hard, so practical, scalable analysis typically relies on efficient heuristics or approximations rather than exact solutions. This thesis tackles the above challenges by developing new scheduling techniques and response-time analysis methods that improve the process of designing and analyzing real-time systems. Our solutions reduce the computational cost of response-time analysis, particularly in complex systems with large task sets, dependencies, and multicore architectures, where exhaustive or naïve response-time analyses quickly become infeasible. By improving the efficiency of the analysis itself, our techniques enable faster design iterations and broader design-space exploration, without sacrificing accuracy. Our proposed techniques are applicable to practical problems in domains such as automotive systems, avionics, and industrial automation, where strict timing and reliability requirements must be met. First, this thesis introduces ReTA, the first unified framework for modeling and analyzing user-defined scheduling problems. Unlike existing approaches that are typically restricted to fixed scheduling policies, ReTA allows users to formally specify arbitrary scheduling strategies through a user-friendly domain-specific language. This enables, for the first time, automated and accurate response-time analysis for a wide class of custom scheduling policies that were previously out of reach for existing response time analyses. Even for scheduling policies with well-known response time analysis methods, ReTA provides a more accurate worst-case response time. It can identify up to 50 times more schedulable task sets compared to the existing analysis. Moreover, it delivers results up to two orders of magnitude faster than the existing exact analysis, which relies on exhaustive enumeration techniques that do not scale for large-scale systems and quickly become impractical. Second, we introduce the first response-time analysis tailored for fault-tolerant multicore systems under global scheduling. In global scheduling, tasks share a ready queue and can be dispatched for execution on different cores at runtime, allowing better processor utilization than partitioned scheduling, where tasks are statically mapped to specific core. This flexibility, however, complicates response-time analysis since it leads to increased variability in scheduling. Our approach systematically derives safe upper bounds on the worst-case response time of non-preemptive tasks with re-execution fault-tolerant mechanism. The experimental results demonstrate that our analysis can successfully handle a wide range of systems and scale to systems with up to 16 cores, 20 tasks, and 3 faults. This work represents a key advancement toward making response-time analysis practical for fault-tolerant multicore real-time systems. Third, we develop a reachability-based response-time analysis for preemptive periodic tasks with release jitter under global scheduling. Supporting preemption in response-time analysis is particularly challenging, as it requires accounting for an unknown number of preemptions and their timing, which significantly increases the complexity of the state space exploration. Our work addresses this challenge by introducing the concept of time partitions, enabling an efficient and accurate extension of the schedule-abstraction graph technique to preemptive task sets. The resulting analysis identifies up to 12 times more schedulable task sets than existing methods, while keeping analysis times practical. This advancement contributes to the development of more effective tools for the design and verification of complex multicore real-time systems. Fourth, we develop a state-space-reduction technique for reachability-based response-time analysis introduced in the third step, which enhances scalability by reducing the number of explored states without compromising accuracy. This technique reduces runtime by a factor of six while improving schedulability results by 16%, enabling the analysis of larger systems with better efficiency. Fifth, we introduce the first data-age analysis technique to account for timing uncertainties, such as release jitter and execution-time variation. Data age is an important form of end-to-end latency constraints and shows the freshness of data as it flows through processing stages, which is essential for ensuring that the input data is still accurately representing the current state of the environment. In this work, we model timing uncertainties using timing intervals and develop an analysis that leverages these intervals to explore possible dependencies between jobs. To maintain computational efficiency, we incorporate pruning rules that significantly reduce the complexity of the analysis, making the method well-suited for practical problem sizes in industry. Our approach achieves, on average, a 36% tighter bound on data age compared to the state of the art. This contribution is especially relevant for latency-sensitive applications in the automotive domain, such as advanced driver-assistance system, where precise data age estimation is critical for ensuring timely and safe decision-making under variable and unpredictable operating conditions. Finally, we propose a latency-aware fault-tolerant scheduling technique for multi-rate task chains (data-dependent tasks executing at different activation rates), addressing the critical need for both reliability and meeting stringent end-to-end latency requirements in safety-critical applications such as autonomous vehicles. By leveraging job redundancies and efficiently managing fault recovery operations, this method reduces data age by 21% and achieves up to six times higher schedulability than traditional fault-tolerance mechanisms, ensuring reliability without compromising end-to-end latency. Together, these contributions address important gaps in real-time system design and analysis, providing practical, scalable, and accurate solutions to key challenges and laying the groundwork for the development of more reliable, efficient, and versatile real-time systems.
| Originele taal-2 | Engels |
|---|---|
| Kwalificatie | Doctor in de Filosofie |
| Toekennende instantie |
|
| Begeleider(s)/adviseur |
|
| Datum van toekenning | 25 mrt. 2026 |
| Plaats van publicatie | Eindhoven |
| Uitgever | |
| Gedrukte ISBN's | 978-90-386-6650-1 |
| Status | Gepubliceerd - 25 mrt. 2026 |
Bibliografische nota
Proefschrift.Vingerafdruk
Duik in de onderzoeksthema's van 'Scheduling and Response-Time Analysis for Complex Real-Time Systems'. Samen vormen ze een unieke vingerafdruk.Citeer dit
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver