Picon : Control Flow Integrity on LLVM IR

Résumé Control ow integrity is a well explored eld of software security for more than a decade. However, most of the proposed approaches are stalled in a proof of concept state when the implementation is publicly available or have been designed with minimal performance overhead as main objective, sacricing security. Currently, none of the proposed approaches can be used to fully protect real-world programs compiled with most common compilers (e.g. GCC, Clang/LLVM). In this paper we describe a control ow integrity enforcement mechanism for LLVM IR, called Picon, whose main objective is security. Our approach is based on compile-time code instrumentation, making the program communicate with its external execution monitor. The program is terminated by the monitor as soon as a control ow integrity violation is detected. Our approach is implemented as an LLVM plugin and is working on LLVM's Intermediate Representation.

Traditional program exploitation by an attacker often involves bypassing the size of a buer to write to an arbitrary address in memory, and then redirecting execution to the code newly written to this address. This has lead to the introduction of protections to prevent these problems. Stack canaries [10] add random values between frames in the call stack, to detect stack overows, and equivalent protections exist to prevent heap overows. Data Execution Prevention (DEP) [2] adds a separation between data, which can be read or written, and code, which should be executable and never written. It can be enforced by the hardware, e.g. NX (No-eXecute) bit on x86, XN (eXecute-Never) on ARM.

The generalization of these protections, now widely used in modern operating systems, has changed the typical form of exploits to work around them. In addition, the separation between code and data in W X is not so clear in real programs : some data are interpreted not directly as code, but as an indirect way of executing code. This is the case of the return address, which species the address of the instruction to be executed when returning from a function. This address, if modied, can be used by an attacker to execute some existing code in the application, leading to code-reuse attacks.

The initial attack vector has to be provided by the attacker, usually in a data buer. Even when these data are not directly executable, the attacker can specify a sequence of return addresses, each of them pointing to some instructions followed by a return. By choosing the eect of these instructions, the executed code can be controlled by the attacker. This technique is known as Return-Oriented Programming, or ROP [21] and has been proved Turing-complete using a set of gadgets from the standard GNU C library. Other techniques involve code-reuse such as Jump-Oriented Programming [6] which removes the reliance to the stack and the return instruction. String Oriented Programming (SOP) [20] and Signal Return Oriented Programming (SROP) [8] are also based on the same principle.

To protect the execution of a program against code-reuse attacks, a common technique is to randomize the memory layout of a program on every program execution, using Address Space Layout Randomization (ASLR). This way, memory addresses change at each program execution and are harder to predict. While powerful, this technique is not sucient, mostly because addresses of some parts of the program may leak or be guessed, but also because of remaining problems like format-string vulnerabilities [17]. Because randomization is done for entire sections at once, the discovery of one address often means defeating the entire randomization. In some other cases, brute-force techniques or blind-ROP [4] can be used to detect the small parts of code preceding a return, also known as gadgets (ROP Widgets).

To protect the execution ow of a program, other techniques must be used in addition to existing protections, such as control ow integrity.

1.1 Control ow integrity

Program exploitation often subverts the intended data ow in a vulnerable program. This in turns makes it possible to hijack the control ow in order to control the program behaviour. Control Flow Integrity (CFI) provides a protection against control ow hijacking attacks. The CFI property was formalized by Abadi et al. in 2005 [1]. In this paper, CFI is used to enforce the program execution to follow only paths existing in the Control Flow Graph (CFG), obtained using static analysis of the binary. Then, the binary is rewritten to add checkpoints before branch instructions, along with tags to check that the target is in accord with the predicted/expected control ow.

One of the dicult parts of CFI is the extraction of the control ow graph. It can be recovered statically on the source code [28], on the binary le (using structural analysis with tools like Hex-Rays decompiler, for example), or dynamically (e.g. execution proling [27]). Some of these reconstructions may be incomplete, when a branch cannot be predicted precisely, or when the analyzed form does not match the code actually produced by the compiler (modied by some optimizations).

A rst way to classify control ow integrity methods is based on the type of input program : some of them work on the original sources [13, 15, 26], while some others work on the assembly, or binary form of the program [1, 7, 29]. Working at the source code level allows one to extract a precise control ow graph, and more information about the program. On the other hand, working at the binary level allows one to protect programs without requiring the source code, but is architecture-dependent and inherently less precise.

The second classication method is based on the type of component responsible for enforcing the security policy, called an Execution Monitor. The enforcement of security policies by monitoring executions was formalized by Schneider in 2000 [24], with the denition of safety properties, later extended by Basin et al. [3]. An execution monitor can be internal or external to the protected program. It can also have a dierent granularity depending on the enforcement policy mechanism used. An execution monitor must be tamper-proof to ensure control ow integrity, and have the ability to terminate the process in case the policy is not respected.

An inlined execution monitor integrates the verication code into the program code during compilation or via binary rewriting. An inline monitor shares the same address space as the monitored program, and the verication code has to be protected in the binary itself.

An execution monitor can also be externalized, i.e. be moved out of the program. In this case, the monitor observes the program execution, and checks that the expected control ow of the program is respected. On one hand, this approach is more secure than an inlined monitor as the monitor as its own logic, independent from the monitored program. It can be implemented in dierent locations : it can be a user process, a kernel module, an hypervisor, etc. The more hardened the monitor is, the harder it will be for an attacker to compromise the binary protected using CFI. On the other hand, since the monitor must be able to observe quite precisely the execution ow of some program, and to kill it in case an unexpected execution ow is detected, an external monitor is a very sensitive process. Actually, some additional care must thus be taken to ensure the monitor itself cannot be subverted and/or compromised. Depending on how the execution ow of the monitored program is concretely observed by the monitor, the runtime cost of the external approach is by denition more costly than the integrated/inlined one.

The main advantage of using an external monitor is to globally improve the security oered by a CFI protection : an attacker needs to control both the program and its monitor to successfully exploit a vulnerability.

1.2 Limitations of current approaches

Most implementations of control ow integrity make trade-os between security and performance. This implies removing some of the checkpoints, for example by not instrumenting function calls, and only taking care of return instructions.

However, removing protections leaves the protected program vulnerable to attacks, because some gadgets are still available for an attacker. As shown in [12] and [11], most control ow integrity implementations have been tested and demonstrated to be quite permissive in still allowing an attacker to build ROP attacks.

Other techniques, like forward-edge CFI [26], or control ow guard [5, 13] as implemented recently in Windows, only protect indirect forward calls, and thus only provide partial protection.

Another technique, called control ow locking, was introduced in [7] to mitigate these attacks. The method relies on a lock that is set before any control ow change, and that the next instrumented point will verify some predened condition before unlock it and permit the execution to continue. However, this work was limited to statically linked binaries.

Unaligned ROP gadgets

Many of the ROP gadgets found in binaries consist of unaligned instructions that have not been produced by the compiler, but that happen to be interpretable as valid instructions by the processor. This mainly concerns the x86 architecture, due to the number and the repartition of possible opcodes, and the fact that instructions are not required to be aligned on this architecture. This limitation is also shared by most implementations of CFI on these architectures.

It is possible, however, to analyze the instructions from the binary le, and apply translations or randomization of instructions, and insertion of neutral instructions automatically, to ensure that the unaligned gadgets are replaced by other sequences, as described in [18].

However, even when the unaligned gadgets are removed from a binary, some gadgets still remains, because of the control ow, especially the function returns. As a complement of the elimination of unaligned gadgets, the program still needs to be protected so that an attacker will not be able to use any of the two kinds of gadgets.

1.3 Control ow integrity on LLVM IR

In this paper, we present a practical approach to control ow integrity, with some similarities to control ow locking, but with dierent properties.

This paper is organized as follows. In Section 2, we describe a formalization of our model of control ow integrity on LLVM IR, based on a pushdown automaton. In Section 3, we describe an implementation of the proposed model, called Picon, as a plugin for the LLVM compiler, and a separate process for the execution monitor. In Section 4, we analyze the security impact on binaries protected by Picon, and discuss the results.

2 Theoretical foundations

The goal of this section is to formalize our model of control ow integrity on LLVM IR, and to show that it can enforce strong protection against most common attack patterns. One of the main advantages of this model is to permit easier debugging and a posteriori analysis of a program execution whose control ow integrity has been compromised. The last part of this section shows how to tackle a forthcoming implementation issue of the model, with identical security guarantees and reasonable debugging and a posteriori analysis features.

2.1 Control ow integrity model

Roughly, the control ow integrity model proposed is based on an execution monitor [24] whose goal is to enforce a security policy described by a pushdown automaton (i.e. a state machine equipped with a stack). In order to formally dene the control ow integrity policy enforced, some notations and denitions need to be introduced.

First of all, the class of programs considered needs to be dened. Basically, a program is a set of functions, one of which is the single entrypoint of the program, i.e. the main function.

In LLVM IR, a function is a set of basic blocks, each of which is a sequence of LLVM IR instructions. Only a very small subset of these instructions has to be considered to enforce control ow integrity property. Actually, it is useless to modelize the behaviour of instructions that cannot alter the control ow execution such as arithmetic/logic operations. Only instructions that inuence the control ow are thus modelized. To keep the model as simple as possible, LLVM IR instructions with equivalent semantic according to the control ow denition are grouped and an abstract instruction is introduced to represent each group.

Denition 2 (Basic block). A basic block in LLVM IR is a nite non-empty sequence of instructions. Among the set of all possible LLVM IR instructions, only the four following abstract instructions are considered : call, ret, unreachable and branch.

A call instruction interrupts the execution of its enclosing basic block to execute some function, and so recursively. It corresponds to the LLVM instructions call and invoke.

A ret instruction can only occur as the last instruction of a basic block. Its execution stops denitively the execution of the current function (and thus of its enclosing basic block), and the execution restarts with the instruction immediately following the last executed call instruction. The LLVM corresponding instructions are ret and resume.

A branch instruction targets a nite non-empty set of basic blocks. Its execution selects one of the targeted blocks as the next one to be executed based on some condition. If only one basic block is targeted, the branch is unconditional to the single targeted block. It corresponds to the LLVM instructions br, indirectbr, and switch.

A unreachable instruction is a special instruction to indicate an unreachable statement and its execution is equivalent to a do nothing. It corresponds to the LLVM instruction unreachable.

The last instruction of a basic block is systematically a ret, a branch or a unreachable instruction.

The organization of basic blocks within a function is constrained in LLVM IR. Each function has a single entry block which has no predecessors (i.e. it is not possible to jump on the entry block from within the function). Basic blocks with no successors are terminated by a ret instruction, when the function returns to its caller, or a unreachable instruction which is a special LLVM IR instruction to indicate an unreachable statement.

Interactions between functions dened and called/used in the program must be dened. These interactions have to be completely and statically known at the model level in order to enforce a control ow integrity property. This assumption may seem strong, but any missing interaction will be detected as a control ow integrity violation, so that integrity is not compromised.

Some sequences of instructions have to be inserted in the program either to report an upcoming execution ow change to an external entity able to kill the program, or to enforce the control ow policy directly within the program which will terminates if compromised. At the model level, both alternatives are equivalent. Each kind of sequence to be inserted is called a hook, and insertion of these hooks is called instrumentation. The instrumentation step is crucial for later denition of control ow integrity policy enforcement since a control ow integrity violation will be detectable only where a hook is inserted.

Roughly, a hook is inserted before any bifurcation of execution ow occurs, that is just before call instructions with a hook called cfiCall, ret and unreachable instructions with a hook called cfiExit, and branch instructions with a hook called cfiBeforeJump. Although it is required to control before the execution ow is modied, it is also important to insert some control after the execution ow is modied, that is at the entry of a function with a hook called cfiEnter, at the entry of a basic block with a hook called cfiAfterJump, and at the return of function calls with a hook called cfiReturned.

An instrumented program contains sucient hooks to protect its control ow integrity.

Denition 6 (Control ow integrity policy). Let P = (F;f_P) be an instrumented program and G_P = (V _P;E_P;B_P) its call graph. The control ow integrity policy for program P is described by a deterministic pushdown automaton M = (Q;;;;q₀;Z₀;F) where Q = fq_e;q_c;q_r;q_bg is its set of states, = fcfiCall f;cfiEnter f;cfiExit f;cfiReturned f∣f 2 V _Pg[fcfiBeforeJump (f;b);cfiAfterJump (f;b)∣f 2 V _P;b 2 V _fg is its set of inputs, = fhf;i∣f 2 V _P; 2 V _fg is its nite set of stack symbols, q₀ = q_c is its initial state, Z₀ = hf_P;entry(f_P)i is its initial stack symbol, F = fq_eg is its set of accepting states, and 2 (Q ) ! }(Q ) is its transition function such that

0
(qe;cfiCa(ll f ;hf;b i) =
f(qc;hf0;entry(f0)ihf;b i) j (f;f0) 2 EP g i b 2 BP (f;f 0)

; otherwise

(1)

(qr;cfiReturned f0;hf;b i) =
(
f(qe;hf;b i) j (f;f0) 2 EP g i b 2 BP (f;f 0)
; otherwise

(2)

(q_e;cfiExit f;hf;bi)	=	(3)
(q_c;cfiEnter f;hf;bi)	=	(4)
(q_e;cfiBeforeJump (f;b);hf;bi)	= f(q_b;hf;bi)g	(5)
(q_b;cfiAfterJump (f;b0);hf;bi)	= f(q_e;hf;b0bi)∣(b;b0) 2 E_fg	(6)

where G_f = (V _f;E_f) is the CFG of f and denotes an empty sequence.

An instantaneous description of M is a triple (q;!;) 2 Q describing a situation of M where q is a state of the automaton, ! is a sequence of inputs to treat, and is a stack.

When no transition exists in the automaton for a given input according to its current state and stack, it indicates that the control ow integrity of the program has been compromised. As a consequence, the compromised program must be immediately terminated.

Démonstration. Let P = (F;f_P) be an instrumented program and M = (Q;;;;q₀;Z₀;F) the automaton describing the control ow integrity policy enforced on P. Let (q;!;) be the instantaneous description of M just after the exploited instruction i in basic block b of f is executed.

By denition, instruction i can only be a call, a ret, a branch or a unreachable instruction as only those instructions can inuence the control ow.

If i is a call, then it is immediately preceded by a cfiCall f0 hook. So, according to the denition of the transition function , the only valid values for q and are q_c and hf0;entry(f0)i, respectively. From this instantaneous description, the only valid next input is cfiEnter f0, which can be emitted only when f0 is called, by denition.

If i is a branch, then it is immediately preceded by a cfiBeforeJump (f;b) hook. So, according to the denition of the transition function , the only valid values for q and are q_b and hf;bi, respectively. From this instantaneous description, the only valid next input is cfiAfterJump (f;b0) with (b;b0) 2 E_f, which can be emitted only by jumping on an expected basic block, by denition.

If i is a ret, then it is immediately preceded by a cfiExit f hook. So, according to the denition of the transition function , and because b 2 exit(f) by denition, the only valid values for q and are q_r and hf0;b0i, respectively, where (f0;f) 2 E_P and b0 2 B_P(f0;f). From this instantaneous description, the only valid next input is cfiReturned (f;b0), which can be emitted only after returning from a call to f occurring in basic block b0 of f0, by denition.

If i is a unreachable, then it is immediately preceded by a cfiExit f hook. Since b 62 exit(f), there exists no denition of this transition in . So this instantaneous description does not exist as the unreachable instruction cannot have been executed.

When a violation of the control ow integrity policy occurs, the stack of the monitor contains the call stack trace, i.e. function calls and basic blocks trace for each function called. This information is crucial for debugging purposes but also for a posteriori analysis of compromised program execution. However, dened as is, the sequence of basic blocks explored within a function only grows and will grow very quickly in presence of cycles in control ow graphs.

2.2 Partial tracing of intra-procedural executions

Keeping basic blocks sequences in the automaton stack is useless for enforcing control ow integrity policy as only the topmost (i.e. currently executing) basic block of the stack is used in practice. The main benet of keeping the complete trace of executed basic blocks is to provide very precise information for a posteriori analysis of control ow integrity violation. If only the last basic block were kept, it would be very rough to understand how a violation occurred. As a compromise, we propose in this section to keep incomplete sequences of basic blocks in a way that guarantees a bounded size for any basic block sequences without losing too much information for a posteriori analysis of compromised execution.

In order to build only nite sequences of basic blocks in the transition corresponding to the input cfiAfterJump of the control ow integrity policy, we rely on the domination relationship. Computation of this relation permits to discover the set of basic blocks that will systematically be explored/executed from a given basic block in order to reach the exit basic block of a function. Consequently the sequence of explored basic blocks will be extended only if the basic block prepended to the current sequence cannot be executed anymore before the end of the function.

Knowing that a basic block b₁ post-dominates a basic block b₂ is sucient to decide whether b₁ can be prepended to the current sequence starting with b₂ when b₁ is executed. However, a more ecient test can be dened than an inclusion in a set if the immediate post-dominator relationship is used. A basic block b₁ is the immediate post-dominator of a basic block b₂ if it is the rst basic block that will be systematically explored/executed to reach the end of the function when b₂ is executed.

Given these denitions, the transition rule associated to the cfiAfterJump hook in Denition 6 is modied to push/prepend the basic block to be executed only if it is the immediate post-dominator of the last pushed/prepended one. In order to maintain the Proposition 1 proved in the previous section, the top of the stack (i.e. the rst basic block of the sequence) must always be the currently executing one. So, when the immediate domination condition is not veried, the top of the stack is updated to always contain the currently executing basic block.

The size of the sequence of basic blocks explored is now bounded by the number of basic blocks in the CFG of the currently executing function.

3 Implementation

To experiment with our proposed CFI protection on LLVM IR, an implementation has been developed, called Picon¹ , based on the LLVM compiler framework [16] version 3.5. This implementation supports any program compiled by the Clang frontend, which is the LLVM native C/C++/Objective-C compiler.

3.1 Overview

Picon is implemented in a two-step process, to follow the denitions given in the previous section :

Unlike others [15, 19], we have chosen not to fork the LLVM compiler, but rather to create a dynamically loaded module for the opt tool to implement the compilation step. Currently, compilation of an input source le (C or C++) by the Clang frontend produces a le in the LLVM Intermediate Representation (IR), which is the common code representation used throughout all target-independent phases of the LLVM compilation process. We have chosen to instrument the LLVM IR because of the following advantages :

While this choice may restrict possible actions only to the APIs exported by LLVM, this also greatly reduces the dependency on LLVM internal functions, and makes maintenance easier (especially to keep the plugin up to date, LLVM being a very active project). The compilation workow, including the Picon plugin step, from a C le to a nal binary is depicted on Figure 1.

The main goal of the plugin is to instrument the code to introduce communication hooks with an external execution monitor. However, it is also in charge of producing several les where essential data is stored : identiers generated for functions and basic blocks to handle separate compilation units and dynamically linked libraries, and transition tables which contain all control ow related data that will be passed to the external execution monitor. In fact, when an instrumented program is executed, it requires the execution monitor to be running with the corresponding transition tables loaded.

It is important to note that the instrumentation and the creation of transition tables is done in a completely static and automatic manner.

3.2 Instrumentation of the LLVM IR

The Picon plugin has two levels of granularity. One can decide to protect only inter-procedural control ow (i.e. function calls), or both inter- and intra-procedural (i.e. basic block transitions) control ow.

The instrumentation is a two-step process. First, unique identiers are computed and assigned to each function, and each basic block if intra-procedural protection is activated. Then, the instrumentation code is injected to communicate with the execution monitor, relying on identiers previously computed to name functions and basic blocks.

Attribution of identiers

A unique identier is assigned to each function and to each basic block, when appropriate ; these identiers are later denoted idFun and idBB, respectively.

Assigning unique identiers to each function may appear trivial. However, in case of binaries created from multiple source les, some diculties arise because function identiers must be identical for the same functions across dierent compiler executions. To solve this problem, the plugin creates and maintains across compilations a le where each function already encountered is mapped to its unique identier. When a call to a function not yet dened is encountered, the plugin assigns it a new unique identier according to those already used and updates the le. Algorithm 1 depicts this straightforward algorithm. An analogous process is applied to compute and to assign a unique identier to each basic block per function.

Once a unique identier is assigned to each function and each basic block, the plugin creates the transition tables according to the desired level of granularity for the control ow integrity protection, i.e. with(out) intra-procedural control ow. The inter-procedural transition table exactly consists in the set of edges appearing in the call graph along with the edge labeling function (Denition 4), so if the plugin has to build this transition table, it iterates over all functions and all basic blocks of these functions of the compilation unit to build its call graph. Building the intra-procedural transition table is a completely analogous process, but applied to each function for which it has to build its control ow graph (Denition 3).

Code instrumentation

Code instrumentation must be done according to the Denition 5 in the model section. That is, instrumentation consists in inserting some hooks, i.e. predened sequences of instructions, at strategical positions in the code, to report execution ow bifurcations to the execution monitor, as shown in the Denition 6.

There are several ways to implement these hooks : a hook can be a call to a custom function, a specic syscall, a jump to some inlined basic block, etc. It is important to note that a hook is a sensitive piece of code that must be written carefully not to introduce vulnerable code such as gadgets. In the implementation proposed, we have chosen to implement each hook by a custom function call. Figure 2 gives an example of a non-instrumented foo function, and Figure 3 shows the same foo function with injected instrumentation code (both inter- and intra-procedural related hooks).

According to the level of granularity set in the plugin, only a subset of the six available hooks may be inserted according to the Denition 5. For the mandatory inter-procedural protection, the four following hooks are systematically inserted : cfiCall, cfiReturned, cfiEnter and cfiExit. If intra-procedural protection is activated, then the two following hooks are also inserted : cfiAfterJump and cfiBeforeJump.

3.3 Resolution of externals

The compilation of a source le is a local process : the LLVM compiler only has information on the le being compiled. This causes problems when handling calls to external functions. In particular, it is not possible at that point to distinguish functions that will be dened in another object le linked into the same executable from functions that will be stored in external shared libraries. In the following, the term module designates a single binary compilation target, for example an executable le, or a shared library.

A module identier, later denoted idMod, is assigned for the complete binary target being compiled (all object les part of the same binary share the same module identier). The module identier is generated at compile-time, it has to be unique and deterministic. For example, it can be derived from a cryptographic hash of the binary.

When analyzing a C le, it is not possible to know if a function, e.g. printf, will be dened in the same binary or in a shared library before the link step. A function dened in a shared library can also be shadowed by a function with the same name in the current binary.

We have decided not to try identifying the modules of functions during the compilation process, because it is not easy, or even feasible. Instead, function identiers are automatically assigned. These identiers are relative and unique to the current module. This method allows the compilation process to remain simple, but results in a new problem : the identier of a function f will not be the same in module m₁ and in module m₂.

The compilation process has to be modied to add new steps in order to identify the symbols and the dierent modules, and to add information to the created les for the monitor. The modied compilation workow is depicted on Figure 4.

After the compilation process, there is one identier le per resulting binary (a standalone executable, or shared library, for example), containing function identiers. We have to bind the caller module identier with the callee module identier. This process is done in two steps : symbol identication and symbol binding (described in the next section). The symbol identication step resolves dynamically linked function and their identiers. For this, a Python script analyzes the compiled binary using objdump, and nds all symbols related to external functions. Each external symbol is then searched for recursively in each shared library dependencies of the binary, to identify in which library it is dened, and thus to nd the associated module and transitions les. The transitions le of the binary is then updated to link each symbol to the identier of the found module.

If a function is dened in several libraries, the binary instrumented with Picon will only be allowed to call the one that matched the function identication. This provides a protection against library replacement, or symbol override by one of the libraries.

The identication of symbols is described in Algorithm 2. To be correct, this algorithm must follow the resolution of symbols as done by the ld loader, otherwise the function that will eectively be called at runtime will not be authorized.

For example, the printf function, is rst marked as an external symbol in a.out.cfi. Using ldd and objdump recursively, the symbol is found in /lib/x86_64-linux-gnu/libc.so.6. The corresponding identiers le is libc.so.6.cfi, which has been created during the compilation of libc.so.6 with Picon enabled. Using this le, we check that an identier exists in order to verify the transitions, but, at this point, the exact identier of the function is not important. Finally, we update the binary's identiers le and add the information that the printf function is associated with the module identier of libc.so.6.

3.4 Execution monitor

The execution monitor is externalized from the instrumented binary ; it can be implemented in dierent places : for example in a user process or in the kernel, as described in Section 1.1. In Picon, the execution monitor is implemented as an external process, which forks and uses the child to run the instrumented program, and uses pipes to communicate. To ease the burden of storing the transitions les, and running the monitor before each instrumented program, the following enhancements have been added :

When the execution monitor starts, it looks in the ELF section headers of the instrumented binary for a Picon description le, describing the needed information for that binary. The Picon description le contains the unique module identier idMod of the binary, its dependencies, and the transition table. The monitor must then recursively load all transition les and dependencies. If they are embedded, it is important to ensure the Picon description les nor the transition tables are modied. A simple solution is to use an asymmetric signature to sign the le headers, so that the monitor will be able to verify the integrity of the headers.

The transition table contains the allowed transitions inside the binary described in Section 3.2. Dependencies indicate the list of transition tables required for external libraries. Algorithm 3 describes how the monitor loads the transition les, and marks identiers for the same function in dierent modules as equivalent.

Each time a function f is used (or dened), the pair (m_i;f_i) is added to the equivalence class of f. After loading all the transition les, if the function f is used in dierent modules, its equivalence class contains a list of tuples [(m₁;f₁);(m₂;f₂);…].

When a function f in a module m₁ is about to call function g in module m₂, the Picon plugin has inserted a cfiCall with the identiers (m₁;g₁), that is, the identier of g as seen in module m₁. To verify this function call in m₂, the monitor will verify during cfiEnter of g that the value (m₂;g₂) as seen in module m₂ is equivalent to (m₁;g₁).

Each time the monitored process hits a Picon hook, it noties the execution monitor with current information about the context, as dened in the Denition 6 of the model section : the current module and function identiers, and return address for cfiEnter. The monitor updates the state of the process being instrumented. Two types of unauthorized behaviours can be detected : state mismatch, and identication mismatch.

After a cfiCall instrumentation, if the next instrumentation is a cfiBeforeJump, the execution monitor triggers a state mismatch because cfiEnter is expected after a cfiCall. The list of all possible automaton states is described in Figure 5.

An identication mismatch can happen, for example, when a cfiCall registers a function identier to be called, and a dierent function identier from the CFG allowed-transition (dened at compile time in the transitions le) is provided during the cfiEnter. Identication can also mismatch for a basic blocks transition when cfiBeforeJump provides a given idBB, and cfiAfterJump provides a dierent one from the CFG expected one.

Identication mismatch and state mismatch both trigger an alert from the execution monitor depending on the security policy used. The best action is to kill the instrumented process, but the execution monitor can also log the unexpected behaviour with precise information about the transition for debugging purposes.

4 Discussion and results

4.1 Security evaluation

The evaluation of the security of the protected program is done by comparing the number of gadgets in the original binary, and in the protected one. While it is not possible to prevent code-reuse attacks globally, our objective is to reduce the number of available ROP gadgets as much as possible, and to verify that tools cannot reconstruct a shellcode.

Return-to-libc attacks

Return-to-libc is the perfect candidate to bypass the well known NX protection. With Picon protection applied to all functions dynamically linked to an instrumented binary, it is possible to prevent this type of attack by denying calls to forbidden functions of the libc like system or execve beyond their expected uses.

Return-oriented programming attacks

Picon can successfully instrument all non-dynamic call instructions, which results in a signicant decrease of the usable ROP gadgets. To successfully bypass our model, attackers have to nd ROP gadgets that are not protected by Picon. However, as seen before, CFI instrumentation is widely used in a protected binary, and to fully create a reliable ROP gadgets payload, attackers have to build their entire payload while taking care not to fall in an instrumented portion of code, which will result in an execution monitor alert.

Picon, by instrumenting all return sequences in compiled programs, avoids all these potential ROP gadgets. Our current implementation does, however, keep the linked C runtime unchanged, which has 6 such gadgets in the glibc version of crt1.o used :

Another source of gadgets is the Picon runtime itself, which embeds a few functions containing potential gadgets : seven functions contains potential gadgets, but ve of them are strictly identical in term of sequence of instructions.

Using a standard disassembler, we measured the number of potential gadgets in a shared library, the Better String Library ² . With standard compiler, 134 potential gadgets were found. With Picon enabled, only 9 potential gadgets on aligned instructions remain.

The following programs were also tested, looking for potential aligned gadgets in Picon protected binaries, including all their dependencies :

This conrms that the remaining potential gadgets on aligned instructions are those previously described, i.e. coming from Picon and C runtimes. All return instructions in these binaries are unusable ROP gadgets.

Note that tools like ROPgadgets [23] will search gadgets in the entire program and also in unaligned instruction stream. This increase the resulting number of gadgets available, but will be greatly reduced (to the number of gadget protected by Picon minus previous gadgets from the runtime) with the In-place Code randomization [18].

Return address attacks

Another way to hinder ROP is to replace the return address when entering a function (using cfiEnter hook), and restore it by the execution monitor in cfiExit hook. This means that, between the entry and the exit point of a function, the return address is invalid, to complicate even more the work of an attacker. We have implemented this extra feature in Picon, but it depends on some security-oriented changes to the target-specic code generator, which is target and architecture-dependent. The modication of LLVM to allow the modication of the return address in a portable way is ongoing work in LLVM project, and has not yet been nalized. We plan to submit it to upstream LLVM later, as there might be other uses of this feature.

Jump-oriented programming attacks

Possibilities of Jump-Oriented attacks are reduced, since the source code must not use indirect jumps or calls. All calls or jumps are statically known during the compilation, so the attacker cannot gain any gadget to jump to a non-instrumented point. This, however, adds strong limits on input les : indirect calls are used in C++ vtables, for example.

Dierent solutions exist to add support of indirect calls while retaining protection of the control ow. The rst solution is to use the instrumentation of forward function calls which is currently being added to Clang [25]. Forward-edge CFI could be used as a complement of our protection, and protect indirect calls using restricted jump tables. Forward-edge CFI is done during LTO and would not be integrated by Picon, so another solution is to use the information provided by LLVM to instrument indirect branches and dynamic calls in our model. This is left for future work.

4.2 Implementation remarks

Compiler optimizations

In some cases, compiler optimizations introduce a change in the symmetry of enter/exit points of functions, for example the tail-call optimization. This optimization changes the instructions of a function to transform recursive function calls into iterative execution of basic blocks, heavily modifying the structure and the contents of the function.

To avoid this kind of problems, the Picon pass must be the last pass executed on the LLVM IR. Other optimizations must be applied before, especially those modifying the control ow graph.

Execution environment

As our implementation uses an external monitor, it is critical to ensure the security of communications with the monitor. The process and the monitor should be mutually authenticated, and the integrity of the communication channel should be ensured to avoid Man-In-The-Middle (MITM) classes of attacks.

In Picon, the communication channel between the monitor and the instrumented binary is a pair of unnamed pipes. This requires, however, the monitor and the process to be created in the same process hierarchy.

Another possible attack is to preload shared libraries (for example using LD_PRELOAD) to override some functions, most importantly the functions used to communicate with the monitor. To avoid that, the execution environment should be restricted to prevent preloading custom libraries, for example using the noexec mount option to prevent the user to be able to build libraries and use them, and/or by patching the ld command to remove the preload feature. Another workaround is to set le capabilities on the instrumented program using SELinux or any other mechanism to disable the preload feature.

Authorized functions whitelist

Sometimes, the program has to be linked with closed-source binary-only dynamic libraries. Picon has the abilities to handle these cases, and implements a whitelisting mechanism to permit non-instrumented calls to/returns from some given functions. When compiling a program with Picon enabled, instrumentation will not be inserted in functions present in the whitelist, and the transition will not be veried. However, it is clear that excluding dynamic libraries of the control ow integrity is insecure, and results in the addition of free ROP gadgets in the resulting executable.

A workaround, for closed-source libraries, could be to implement the same protection by disassembling the le, reconstructing the control ow graph, and adding the hooks to protect it. This has several drawbacks : notably, it is not portable, and rebuilded basic blocks is not as precise as computing basic blocks from source code.

4.3 Limitations

Multi-programming

Our implementation is currently not able to handle program with parallel/concurrent programming, i.e. multiple threads/processes. One straightforward way to override this limit is to instrument concurrency-related system calls (clone(2) and fork(2) on POSIX systems, for instances).

By injecting specic instrumentation code for these calls, it has been possible to successfully detect the creation of new processes, and to instantiate a dedicated execution monitor for each process. Each monitor had its own dedicated state machine, and was able to monitor one process. However, more work is required to fully implement multi-programming support in Picon, but also to support multi-threading.

Parallel compilation

When compiling dierent source les of a program with Picon enabled, each le requires information about other les, for example function identiers in the transitions le as shown in Figure 4. This requires a sequential compilation because of a race condition on the access to the transition le. A solution could be to implement locking on the transition le, to ensure only one instance of the compiler process can modify it at the same time.

Dynamic code

To implement the CFI protection, we rely on the construction of the control ow graph statically, and thus are not able to track dynamic function calls as used in just-in-time (JIT) compilation, exceptions, pointer arithmetic on function addresses or dynamic loading of shared libraries. This limitation is shared by most CFI approaches.

4.4 Future work

Binary instrumentation

One fundamental limitation of our approach is the requirement of the source code, and the need to compile them. When the source code of a program is not available, as it is usually the case for commercial software, or it is not supported by Clang, instrumentation of a program is not possible with our current compile-time instrumentation.

To instrument binary les, one solution is to decompile the executable into LLVM IR, apply the Picon pass on the resulting LLVM IR, and then compile it back to an executable. However, the binary translation environment provides some additional challenges. Static translation has some fundamental limitations, due to its equivalence to the halting problem [14] making it undecidable. One such limitation is the presence of indirect branches and calls, which do not have statically discoverable targets. In practice, indirect branches are usually caused by the following constructions :

In the case of an executable that has not been specially crafted, indirect calls targets are expected to be in the set of all function symbols available in the binary, and can usually be recovered by static or dynamic analysis.

Projects such as Dagger [9] have been successfully tested, and provide a straightforward method to instrument executable les without requiring the source code.

Link-time optimization

Modern compilers such as LLVM support a feature called Link-Time Optimization (LTO), which defers code generation to link-time, and keeps the intermediate object les in LLVM IR. Traditionally, this was used with great success to enable optimizations otherwise impossible on isolated le (such as cross-object function inlining). The Picon pass could be implemented as a part of LTO IR optimizations. This would solve the problems related to cross-object transition table and identication uniquing, by having all functions visible in a single IR module. Also, LTO improves precision by making the program's complete control ow graph available. Finally, since LTO passes run as part of the ld linker, it is also possible to directly use the linker for resolving external function symbols in linked shared libraries, thus avoiding the need of symbols identication at compile-time.

Picon and obfuscation mechanisms

Picon protects the binary for execution integrity, but does not hide the instrumentation, or the control ow graph of the binary. Other protection mechanisms, especially obfuscation techniques such as o-llvm [19] at the LLVM IR layer, or a Protector Packer [22] like UPX at the binary layer, could be used in addition to CFI protection.

However, obfuscation and CFI might interfere. The obfuscation must not break the CFI protection by altering the semantics of the program. The obfuscation step must not create dynamic-code/self modifying code (sometimes used in virtualized packer) nor add gadgets for the obfuscation step. Although CFI and obfuscation could be used together, the CFI adds extra information about the CFG that could help an attacker to reconstruct the logic of the program.

5 Conclusion

In this paper, we have discussed a model for robust control ow integrity protection, and the security properties of programs protected by this model. A proof-of-concept implementation has been proposed, based on the LLVM compiler framework, and an external monitor. The result is a plugin for the LLVM compiler called Picon, which does not complicate the compilation process. The plugin allows a global protection of the program, including shared libraries, without having to sacrice parts of the protection.

As complementary, simpler protections like prevention of execution of the stack and randomization become commonplace, we believe that control ow integrity will become more systematic in the future as it is a key part of the protection against ROP attacks. The protection of control ow integrity must be complete to be powerful, and thus must not be weakened for the sake of performances.

Références

1. Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control-ow Integrity. pages 340353, 2005.

2. Starr Andersen and Vincent Abella. Changes to functionality in microsoft windows xp service pack 2, part 3 : Memory protection technologies, Data Execution Prevention. https://technet.microsoft.com/en-us/library/bb457155.aspx. Accessed : 2015-01-21.

3. David Basin, Vincent Jugé, Felix Klaedtke, and Eugen Zălinescu. Enforceable Security Policies Revisited. In Pierpaolo Degano and Joshua D. Guttman, editors, Principles of Security and Trust, volume 7215 of Lecture Notes in Computer Science, pages 309328. Springer Berlin Heidelberg, 2012.

4. Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Mazières, and Dan Boneh. Hacking Blind. pages 227242, 2014.

5. R.J. Black, T.W. Burrell, M.O.T. de Castro, M.S. Da Silva Costa, K. Johnson, and M.R. Miller. Control ow integrity enforcement at scale, October 24 2013. US Patent App. 13/450,487.

6. Tyler Bletsch, Xuxian Jiang, Vince W. Freeh, and Zhenkai Liang. Jump-oriented Programming : A New Class of Code-reuse Attack. pages 3040, 2011.

7. Tyler K. Bletsch, Xuxian Jiang, and Vincent W. Freeh. Mitigating code-reuse attacks with control-ow locking. pages 353362, 2011.

8. Erik Bosman and Herbert Bos. Framing Signals - A Return to Portable Shellcode. pages 243258, 2014.

9. Ahmed Bougacha, Georoy Aubey, Pierre Collet, Thomas Coudray, Amaury de la Vieuville, and Jonathan Salwan. Dagger : decompiling to LLVM IR. LLVM Europe, 2013.

10. Crispin Cowan, Calton Pu, Dave Maier, Heather Hintony, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. StackGuard : Automatic Adaptive Detection and Prevention of Buer-overow Attacks. pages 55, 1998.

11. Isaac Evans, Sam Fingeret, Julian Gonzalez, Ulziibayar Otgonbaatar, Tiany Tang, Howard Shrobe, Stelios Sidiroglou-Douskos, Martin Rinard, and Hamed Okhravi. Missing the Point(er) : On the Eectiveness of Code Pointer Integrity. May 2015.

12. Enes Goktas, Elias Athanasopoulos, Herbert Bos, and Gerogios Portokalidis. Out Of Control : Overcoming Control-Flow Integrity. May 2014.

13. Jim Hogg. Visual Studio 2015 Preview : Work-in-Progress Security Feature. http://blogs.msdn.com/b/vcblog/archive/2014/12/08/visual-studio-2015-preview-work-in-progress-security-feature.aspx. Accessed : 2015-01-21.

14. R. Nigel Horspool and Nenad Marovac. An Approach to the Problem of Detranslation of Computer Programs. Comput. J., 23(3) :223229, 1980.

15. Volodymyr Kuznetsov, László Szekeres, Mathias Payer, George Candea, R. Sekar, and Dawn Song. Code-pointer Integrity. pages 147163, 2014.

16. Chris Lattner and Vikram Adve. LLVM : A Compilation Framework for Lifelong Program Analysis and Transformation. pages 7588, Mar 2004.

17. Kyung-Suk Lhee and Steve J. Chapin. Buer Overow and Format String Overow Vulnerabilities. Softw. Pract. Exper., 33(5) :423460, April 2003.

18. Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis. Smashing the Gadgets : Hindering Return-Oriented Programming Using In-place Code Randomization. pages 601615, 2012.

19. Grégory Ruch Pascal Junod, Julien Rinaldini. Obfuscator-LLVM. http://www.o-llvm.org. Accessed : 2015-01-21.

20. Mathias Payer and Thomas R. Gross. String Oriented Programming : When ASLR is Not Enough. pages 2 :12 :9, 2013.

21. Ryan Roemer, Erik Buchanan, Hovav Shacham, and Stefan Savage. Return-Oriented Programming : Systems, Languages, and Applications. ACM Trans. Inf. Syst. Secur., 15(1) :2 :12 :34, March 2012.

22. Kevin A. Roundy and Barton P. Miller. Binary-code Obfuscations in Prevalent Packer Tools. ACM Comput. Surv., 46(1) :4 :14 :32, July 2013.

23. Jonathan Salwan. ROPgadget. https://github.com/JonathanSalwan/ROPgadget. Accessed : 2015-01-21.

24. Fred B. Schneider. Enforceable Security Policies. ACM Trans. Inf. Syst. Secur., 3(1), February 2000.

25. The Clang Team. Clang 3.7 documentation : Control Flow Integrity. http://clang.llvm.org/docs/ControlFlowIntegrity.html. Accessed : 2015-04-08.

26. Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway, Úlfar Erlingsson, Luis Lozano, and Geo Pike. Enforcing Forward-edge Control-ow Integrity in GCC & LLVM. pages 941955, 2014.

27. Yubin Xia, Yutao Liu, Haibo Chen, and Binyu Zang. CFIMon : Detecting Violation of Control Flow Integrity Using Performance Counters. pages 112, 2012.

28. Zhongxing Xu, Ted Kremenek, and Jian Zhang. A Memory Model for Static Analysis of C Programs. pages 535548, 2010.

29. Mingwei Zhang and R. Sekar. Control Flow Integrity for COTS Binaries. pages 337352, 2013.

Table des matières