Accurate Dead Code Detection in Embedded C Code by Arithmetic Constraint Solving

| Beteiligte Mitarbeiter | Kooperationspartner | Projektbeschreibung |

Beteiligte Mitarbeiter
Lehrstuhl für Rechnerarchitektur
Bernd Becker, Prof. Dr.	Kontakt
Felix Neubauer, M.Sc.	wissenschaftlicher Mitarbeiter
Karsten Scheibler, Dipl.-Inf.	wissenschaftlicher Mitarbeiter
Kooperationspartner
Universität Oldenburg
Martin Fränzle, Prof. Dr.	Kontakt
Ahmed Mahdi, M.Sc.	wissenschaftlicher Mitarbeiter
Thomas Scheidsteger, Dr.	wissenschaftlicher Mitarbeiter
BTC Embedded Systems (Oldenburg)
Tom Bienmüller, Dr.	Industriepartner
Tino Teige, Dr.	Industriepartner
Sick AG (Waldkirch)
Detlef Fehrer, Dr.	Industriepartner

Projektbeschreibung

The transfer project aims at efficient and precise detection of dead code in reactive, control-
oriented, and floating-point dominated C programs, in particular those derived from Simulink-
plus-Stateflow models of embedded control applications via automatic C-code generation or those
hand-coded according to guidelines as imposed in the embedded domain. Dead, i.e. dynamically
unreachable, code segments occur regularly in generated code due to case distinctions, exception
handling, etc. necessary for a semantically adequate implementation of, e.g., a Simulink block in
unrestricted application context being rendered redundant in a specific context. Likewise, dead
code is regularly introduced into hand-written code either intentionally by proven engineering
methods like defensive programming or inadvertently during the course of code reuse. Finding
such dead code is more than just a commodity: (1) it facilitates code optimization enhancing
space (less code volume) and time (fewer branching, enhanced cacheability and code prefetch)
efficiency; (2) it adds to the readability of the resulting code corpus, thus making it less prone to
the addition of further errors as well as positively influencing the time needed for development and
maintenance, and last but not least, (3) it permits a more rigorous interpretation of coverage-based
criteria in code validation and certification, as present in testing regimes recommended in various
standards for embedded system development, as well as meeting the demand for absence of dead
code imposed by some standards for high-integrity software.
Dead code detection for code generated from model-based design tools, like Simulink, is com-
plicated by the fact that their code generators tend to encode the major part of the model’s control
flow into the data part of the C code generated, for example the hierarchical state of Stateflow
charts by vectors of Booleans, thereby blurring the distinction between control and data. On top
of this comes an intrinsic dependency of control flow on deep histories of complex floating-point
computations, e.g., having to answer questions like “can this saturating integrator in the middle
of a large Simulink model ever receive a sequence of inputs causing it to wind up?”. Deciding this
question, which may qualify code associated to wind-up control as redundant, would in principle
require tracing the integrator’s possible inputs over an unbounded temporal horizon, where in each
time instant, this input may depend on a complex computation involving numerous floating-point
and discrete data entities. Constructions blurring the distinction between control and data are
also commonplace in hand-written embedded control software, where a central control loop often
defers obligations to a set of dedicated handlers, some of which may be generic by offering state
and data dependent control flow. With respect to a clear architectural structure of the program
this must in many cases be considered as good programming style and therefore can not simply
be eliminated by more restrictive coding guidelines (though there exist attempts to do so, e.g. in
MISRA).
The main goal of the transfer project is to develop a method which (1) is able to cope with
the lacking distinction between control and data as well as the dominant impact of (frequently
non-linear) floating-point computations on reachability of code and (2) at the same time scales to
industrial-size programs. Various techniques for dealing with individual issues have been proposed
in the recent past. Abstract interpretation (AI) provides scalability to very large programs, yet
falls short on exactness when data operations can fraction and reshuffle the data part of the
state space quite arbitrarily. Counter-example guided abstraction refinement (CEGAR) has been
shown to provide adaptive abstraction of the state-space by often being able to uncover surprisingly
concise abstractions of the interplay of control and data in a property-driven manner, but is as
of now constrained to analysis either of programs exploiting confined fragments of arithmetic
(mostly linear) only or of full-fledged hybrid systems featuring just tiny control skeletons. Recent
breakthroughs in non-linear arithmetic constraint solving based on conflict-driven clause learning
over arithmetic theories (CDCL(T)), to a significant extent originating in AVACS project H1/2,
can solve extremely large arithmetic constraint systems with complex Boolean structure and linear,
polynomial, and transcendental arithmetic, yet fail to provide summaries for loops in the control
flow and do not scale enough to cover the full branching structure of a complex program in just
one sweep.
While adequate building blocks for the desired analysis technology are thus available, none of
them in isolation would be able to solve the problem at hand, calling for a combination of these
technologies and for tuning on real application code, which motivates the proposed transfer project.
It is expected that this combination will provide dead code detection of unprecedented accuracy
in floating-point dominated reactive C code, as originating from code generation in model-based
development environments as well as hand-coding in the embedded control domain.