# Compiling Techniques Lecture 17: Register Allocation #### Overview # Register Allocation #### Introduction #### This lecture: - Local Allocation spill code - Global Allocation based on graph colouring - Techniques to reduce spill code #### Register Allocation - Physical machines have limited number of registers - Scheduling and selection typically assume infinite registers - Register allocation and assignment from infinite to k registers - Produce correct code that uses k (or fewer) registers - Minimise added loads and stores - Minimise space used to hold spilled values - Operate efficiently: - $\circ$ O(n), O(n<sup>2</sup>), but not O(2<sup>n</sup>) #### Register Allocation: Definitions #### Allocation vs Assignment: - *Allocation* is deciding which values to keep in registers - Assignment is choosing specific registers for values #### Liveness A value is live from its definition to its last use. #### Interference Two values cannot be mapped to the same register wherever they are both live. Such values are said to **interfere**. #### Live Range The live range of a value is the set of statements at which it is live. A live range may be conservatively overestimated (e.g, just begin $\rightarrow$ end) #### Register Allocation: Definitions #### **Spilling** Spilling saves a value from a register to memory. That register is then free - Another value often loaded Requires *F* registers to be reserved. #### Clean and dirty values A previously spilled value is **clean** if not changed since last spill. Otherwise it is **dirty**. A clean value can be spilled without a new store instruction. #### Local Register Allocation Register allocation only on basic block. Let MAXLIVE be the maximum, over each instruction i in the block, of the number of values (pseudo-registers) live at i. - If MAXLIVE ≤ k, allocation should be easy - If MAXLIVE ≤ k, no need to reserve F registers for spilling - If MAXLIVE > k, some values must be spilled to memory - If MAXLIVE > k, need to reserve F registers for spilling #### Two main forms: - Top down - Bottom up #### Local Register Allocation: MAXLIVE ``` loadI 1028 \Rightarrow r<sub>a</sub> // r<sub>a</sub> \leftarrow 1028 load r<sub>a</sub> \Rightarrow r<sub>b</sub> // r<sub>b</sub> \leftarrow MEM(r<sub>a</sub>) mult r<sub>a</sub>, r<sub>b</sub> \Rightarrow r<sub>c</sub> // r<sub>c</sub> \leftarrow 1028 ·y load x \Rightarrow r<sub>d</sub> // r<sub>d</sub> \leftarrow x sub r<sub>d</sub>, r<sub>b</sub> \Rightarrow r<sub>e</sub> // r<sub>e</sub> \leftarrow x-y load z \Rightarrow r<sub>f</sub> // r<sub>f</sub> \leftarrow z mult r<sub>e</sub>, r<sub>f</sub> \Rightarrow r<sub>g</sub> // r<sub>g</sub> \leftarrow z · (x-y) sub r<sub>g</sub>, r<sub>c</sub> \Rightarrow r<sub>h</sub> // r<sub>h</sub> \leftarrow z · (x-y) -1028 ·y store r<sub>h</sub> \rightarrow r<sub>a</sub> // MEM(r<sub>a</sub>) \leftarrow z · (x-y) -1028 ·y ``` #### Local Register Allocation: MAXLIVE #### **Example MAXLIVE computation** #### Live registers #### Local Register Allocation: MAXLIVE #### **Example MAXLIVE computation** MAXLIVE is 4 # Local register allocation: Top Down #### **Algorithm:** - If number of values > k - Rank values by occurrences - Allocate first k F values to registers - Spill other values #### Local register allocation: top down #### **Example top down** #### Usage counts #### Local register allocation: Top Down #### Local Register Allocation: Top down #### **Example top down** Spill code inserted ``` loadI 1028 r_a load ra r_b mult ra, rb \Rightarrow r<sub>c</sub> r<sub>arp</sub>, spill<sub>c</sub> store r<sub>c</sub> load x r_d sub r_d, r_b re load z r_f mult r_e, r_f r_{q} load r<sub>arp</sub>, spill<sub>c</sub> r sub r_f, r_c r_h store rh r<sub>a</sub> ``` #### Local register allocation: Top Down #### **Example top down** Register assignment straightforward ``` loadI 1028 r_1 load r_1 \mathbf{r}_2 mult r_1, r_2 \Rightarrow r<sub>3</sub> r<sub>arp</sub>, spill<sub>c</sub> store r<sub>3</sub> load x r_3 sub r_3, r_2 \mathbf{r}_2 load z r_3 mult r_2, r_3 r_2 load r<sub>arp</sub>, spill<sub>c</sub> \mathbf{r}_3 sub r_2, r_3 \mathbf{r}_2 store r<sub>2</sub> r<sub>1</sub> ``` # Local register allocation: Bottom Up #### **Algorithm:** - Start with empty register set - Load on demand - When no register is available, free one #### Replacement: - Spill the value whose next use is farthest in the future - Prefer clean value to dirty value #### Local register allocation: Bottom Up #### Local register allocation: Bottom Up #### **Example bottom up** Spill code inserted ``` loadI 1028 r_a load ra r_{b} \Rightarrow r<sub>c</sub> mult ra, rb store ra load x r_{d} sub r<sub>d</sub>, r<sub>b</sub> r_e load z r_f mult re, rf r_{a} sub r_f, r_c r_h load r<sub>arp</sub>, spill<sub>a</sub> ra store rh r_a ``` # Global register allocation Local allocation does not capture reuse of values across multiple blocks Most modern, global allocators use a graph-colouring paradigm Build a "conflict graph" or "interference graph" Data flow based liveness analysis for interference Find a **k-colouring** for the graph, or change the code to a nearby problem that it can k-colour NP-complete under nearly all assumptions<sup>1</sup> <sup>&</sup>lt;sup>1</sup>Local allocation is NP-complete with dirty vs clean # Global register allocation: algorithm sketch - From live ranges construct an interference graph - Colour interference graph so that no two neighbouring nodes have same colour - If graph needs more than k colours transform code - Coalesce merge-able copies - Split live ranges - o Spill - Colouring is NP-complete so we will need heuristics - Map colours onto physical registers # Global register allocation: Graph Coloring A graph G is said to be **k-colourable** if the nodes can be labeled with integers 1 ... k so that no edge in G connects two nodes with the same label # Global register allocation: Interference Graph The interference graph, G = (N, E) - Nodes in G represent values, or live ranges - Edges in G represent individual interferences - $\forall x, y \in N, x \rightarrow y \in E \text{ iff } x \text{ and } y \text{ interfere}$ A *k-colouring* of *G* can be mapped into an allocation to *k* registers Two values interfere wherever they are both live Two live ranges interfere if their values interfere at any point # Global register allocation: Coloring the Register Graph - Degree<sup>3</sup> of a node (n°) is a loose upper bound on colourability - Any node, n, such that n° < k is always trivially k-colourable</li> - Trivially colourable nodes cannot adversely affect the colourability of neighbours - Can remove them from graph - Reduces degree of neighbours may be trivially colourable - If left with any nodes such that n° ≥ k spill one - Reduces degree of neighbours may be trivially colourable - While ∃ vertices with < k neighbours in G</li> - Pick any vertex n such that n° < k and put it on the stack</li> - o Remove n and all edges incident to it from G - 2. If G is non-empty ( $n^{\circ} >= k$ , $\exists n \ni G$ ) then: - Pick vertex n (heuristic), spill live range of n - Remove vertex n and edges from GI, put n on "spill list" - Goto step 1 - 3. If the spill list is not empty, insert spill code, then rebuild the interference graph and try to allocate, again - 4. Otherwise, successively pop vertices of the stack and colour them in the lowest colour not used by some neighbour Push d and remove from graph d e r1 b **r2** a r3 Stack **Colours** е Pop d, neighbours use red, choose blue е r1 b **r2** a r3 Stack **Colours** # Global Register Allocation: Optimistic Colouring If Chaitin's algorithm reaches a state where every node has k or more neighbours, it chooses a node to spill. Example of Chaitin overzealous spilling $$k = 2$$ Graph is 2-colourable Chaitin must immediately spill one of these nodes Briggs said, take that same node and push it on the stack! When you pop it off, a colour might be available for it! Chaitin-Briggs algorithm uses this to colour that graph b a - While ∃ vertices with < k neighbours in G</li> - Pick any vertex n such that n° < k and put it on the stack</li> - Remove n and all edges incident to it from GI - If G is non-empty (n° >= k, ∀n ∈ G) then: - Pick vertex n (heuristic) (Do not spill) - Remove vertex n from GI, put n on stack (Not spill list) - o Goto step 1 - Otherwise, successively pop vertices off the stack and colour them in the lowest colour not used by some neighbour - If some vertex cannot be coloured, then pick an uncoloured vertex to spill, spill it, and restart at step 1 Pop d, neighbours use no colours, choose blue d r1 a **r2** Stack **Colours** ## Global register allocation: Spill Candidates - Minimise spill cost/degree - Spill cost is the loads and stores needed. Weighted by scope - i.e. avoid inner loops - The higher the degree of a node to spill the greater the chance that it will help colouring - Negative spill cost load and store to same memory location with no other uses - Infinite cost definition immediately followed by use. Spilling does not decrease live range ## Global Register Allocation: Alternative Spilling - Splitting live ranges - Coalesce # Global Register Allocation: Live Range Splitting - A whole live range may have many interferences, but perhaps not all at the same time - Split live range into two variables connected by copy - Can reduce degree of interference graph - Smart splitting allows spilling to occur in "cheap" regions # Global register allocation Splitting example: Non contiguous live ranges - cannot be 2 coloured # Global Register Allocation: Live Range Splitting Splitting example: Non contiguous live ranges - can be 2 coloured Live ranges Interference Graph ## Global register allocation: Coalescing If two ranges don't interfere and are connected by a copy coalesce into one – opposite of splitting. Reduces degree of nodes that interfered with both If x := y and $x \to y \in G_1$ then can combine $LR_x$ and $LR_y$ - Eliminates the copy operation - Reduces degree of LRs that interfere with both x and y - If a node interfered with both before, coalescing helps - As it reduces degree, often applied before colouring takes place # Global register allocation: Coalescing Coalescing can make the graph harder to color - Typically, $LR_{xy}^{\circ} > max (LR_{x}^{\circ}, LR_{y}^{\circ})$ If $max (LRx^{\circ}, LRy^{\circ}) < k$ and $k < LR_{xy}^{\circ}$ then $LR_{xy}$ might spill, while $LR_{x}$ and LR<sub>v</sub> would not spill # Global register allocation: Coalescing #### Observation led to conservative coalescing - 1. Conceptually, coalesce x and y iff $x \to y \in G_l$ and $LR_{xy}^{\circ} < k$ - 2. We can do better - Coalesce LR<sub>x</sub> and LR<sub>y</sub> iff LR<sub>xy</sub> has < k neighbours with degree > k - $\circ$ $\;$ Only neighbours of "significant degree" can force $\mathsf{LR}_{\mathsf{X}\mathsf{Y}}$ to spill - 3. Always safe to perform coalesce - Cannot introduce a node of non-trivial degree - Cannot introduce a new spill ## Global register allocation: Other Approaches - Top-down uses high level priorities to decide on colouring - Hierarchical approaches use control flow structure to guide allocation - Exhaustive allocation go through combinatorial options very expensive but occasional improvement - Re-materialisation if easy to recreate a value do so rather than spill - Passive splitting using a containment graph to make spills effective - Linear scan fast but weak; useful for JITs ## Global register allocation: Ongoing work - Eisenbeis et al examining optimality of combined reg alloc and scheduling. Difficulty with general control-flow - Partitioned register sets complicate matters. Allocation can require insertion of code which in turn affects allocation. - Leupers investigated use of genetic algs for TM series partitioned reg sets. - New work by Fabrice Rastello and others. Chordal graphs reduce complexity - As latency increases see work in combined code generation, instruction scheduling and register allocation #### Summary - Local Allocation spill code - Global Allocation based on graph colouring - Techniques to reduce spill code