Does A Languageã¢â‚¬â„¢s Calling Convention Include The Preserving Of Certain Registers By Procedure

Calling convention

A calling convention governs how functions on a detail compages and operating organisation interact. This includes rules about includes how function arguments are placed, where render values go, what registers functions may utilise, how they may allocate local variables, and and so along. Calling conventions ensure that functions compiled past different compilers can interoperate, and they ensure that operating systems can run code from unlike programming languages and compilers. Some aspects of a calling convention are derived from the educational activity gear up itself, but some are conventional, meaning decided upon past people (for instance, at a convention).

Calling conventions constrain both callers and callees. A caller is a function that calls another part; a callee is a part that was chosen. The currently-executing office is a callee, but not a caller.

For concreteness, we learn the x86-64 calling conventions for Linux. These conventions are shared by many OSes, including MacOS (but not Windows), and are officially called the "System 5 AMD64 ABI."

The official specification: AMD64 ABI

Argument passing and stack frames

One prepare of calling convention rules governs how function arguments and return values are passed. On x86-64 Linux, the start six function arguments are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9, respectively. The seventh and subsequent arguments are passed on the stack, about which more below. The render value is passed in annals %rax.

The full rules more complex than this. Y'all tin can read them in the AMD64 ABI, section 3.2.three, but they're quite detailed. Some highlights:

A construction argument that fits in a single auto discussion (64 bits/8 bytes) is passed in a unmarried register.

Example: struct modest { char a1, a2; }
A structure that fits in two to four automobile words (xvi–32 bytes) is passed in sequential registers, as if information technology were multiple arguments.

Example: struct medium { long a1, a2; }
A construction that's larger than 4 machine words is always passed on the stack.

Example: struct big { long a, b, c, d, east, f, g; }
Floating point arguments are generally passed in special registers, the "SSE registers," that we don't talk over further.
If the return value takes more than eight bytes, then the caller reserves infinite for the return value, and passes the address of that infinite every bit the first argument of the function. The callee volition fill up in that infinite when information technology returns.

Writing minor programs to demonstrate these rules is a pleasant exercise; for example:

                          struct              minor              {              char              a1, a2; };              int              f(small-scale s) {              return              s.a1              +              2              *              southward.a2; }

compiles to:

                          movl              %edi,              %eax              # copy argument to %eax                                          movsbl              %dil,              %edi              # %edi := sign-extension of everyman byte of argument (s.a1)                                          movsbl              %ah,              %eax              # %eax := sign-extension of 2nd byte of argument (due south.a2)                                          movsbl              %al,              %eax              leal              (%rdi,%rax,ii),              %eax              # %eax := %edi + two * %eax                                          ret

Stack

Recall that the stack is a segment of retention used to store objects with automatic lifetime. Typical stack addresses on x86-64 look similar 0x7ffd'9f10'4f58—that is, shut to 2⁴⁷.

The stack is named afterwards a data construction, which was sort of named after pancakes. Stack data structures back up at least 3 operations: push button adds a new element to the "top" of the stack; popular removes the top element, showing whatever was underneath; and top accesses the pinnacle chemical element. Note what's missing: the data structure does not allow access to elements other than the elevation. (Which is sort of how stacks of pancakes work.) This restriction can speed up stack implementations.

Like a stack data structure, the stack retentiveness segment is just accessed from the top. The currently running part accesses its local variables; the part'due south caller, grand-caller, great-grand-caller, then along are fallow until the currently running function returns.

x86-64 stacks look like this:

The x86-64 %rsp register is a special-purpose register that defines the current "stack pointer." This holds the address of the current top of the stack. On x86-64, as on many architectures, stacks grow down: a "push" operation adds space for more automatic-lifetime objects by moving the stack pointer left, to a numerically-smaller address, and a "popular" functioning recycles infinite past moving the stack pointer right, to a numerically-larger address. This means that, considered numerically, the "acme" of the stack has a smaller accost than the "lesser."

This is built in to the architecture by the operation of instructions like pushq, popq, call, and ret. A push educational activity pushes a value onto the stack. This both modifies the stack arrow (making it smaller) and modifies the stack segment (by moving information there). For instance, the instruction pushq Ten means:

                          subq              $eight,              %rsp              movq              X, (%rsp)

And popq X undoes the effect of pushq X. Information technology means:

                          movq              (%rsp),              Ten              addq              $8,              %rsp

X can be a register or a retention reference.

The portion of the stack reserved for a function is called that function's stack frame. Stack frames are aligned: x86-64 requires that each stack frame be a multiple of 16 bytes, and when a callq instruction begins execution, the %rsp register must be 16-byte aligned. This means that every function'south entry %rsp address will be 8 bytes off a multiple of sixteen.

Return address and entry and exit sequence

The steps required to call a role are sometimes called the entry sequence and the steps required to return are called the get out sequence. Both caller and callee have responsibilities in each sequence.

To set up for a role call, the caller performs the post-obit tasks in its entry sequence.

The caller stores the first six arguments in the corresponding registers.
If the callee takes more than six arguments, or if some of its arguments are large, the caller must store the surplus arguments on its stack frame. It stores these in increasing order, so that the seventh argument has a smaller address than the 8th argument, then forth. The seventh argument must be stored at (%rsp) (that is, the acme of the stack) when the caller executes its callq instruction.
The caller saves any caller-saved registers (see below).
The caller executes callq Function. This has an result like pushq $NEXT_INSTRUCTION; jmp FUNCTION (or, equivalently, subq $8, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp FUNCTION), where NEXT_INSTRUCTION is the accost of the instruction immediately following callq.

This leaves a stack like this:

To render from a function:

The callee places its return value in %rax.
The callee restores the stack arrow to its value at entry ("entry %rsp"), if necessary.
The callee executes the retq instruction. This has an upshot like popq %rip, which removes the return address from the stack and jumps to that address.
The caller then cleans upwards any space it prepared for arguments and restores caller-saved registers if necessary.

Particularly unproblematic callees don't need to do much more than than return, but near callees volition perform more than tasks, such as allocating infinite for local variables and calling functions themselves.

Callee-saved registers and caller-saved registers

The calling convention gives callers and callees certain guarantees and responsibilities about the values of registers beyond office calls. Function implementations may expect these guarantees to hold, and must piece of work to fulfill their responsibilities.

The most important responsibility is that certain registers' values must be preserved beyond function calls. A callee may use these registers, but if it changes them, it must restore them to their original values before returning. These registers are called callee-saved registers. All other registers are caller-saved.

Callers can but utilize callee-saved registers across function calls; in this sense they behave like C++ local variables. Caller-saved registers behave differently: if a caller wants to preserve the value of a caller-saved register across a office call, the caller must explicitly save information technology before the callq and restore it when the function resumes.

On x86-64 Linux, %rbp, %rbx, %r12, %r13, %r14, and %r15 are callee-saved, as (sort of) are %rsp and %rip. The other registers are caller-saved.

Base of operations arrow (frame pointer)

The %rbp register is chosen the base of operations arrow (and sometimes the frame arrow). For simple functions, an optimizing compiler mostly treats this like any other callee-saved full general-purpose register. However, for more complex functions, %rbp is used in a specific blueprint that facilitates debugging. It works like this:

The first instruction executed on role entry is pushq %rbp. This saves the caller'due south value for %rbp into the callee'southward stack. (Since %rbp is callee-saved, the callee must save it.)
The 2d instruction is movq %rsp, %rbp. This saves the electric current stack pointer in %rbp (and then %rbp = entry %rsp - 8).

This adapted value of %rbp is the callee'due south "frame pointer." The callee will non change this value until it returns. The frame pointer provides a stable reference bespeak for local variables and caller arguments. (Circuitous functions may need a stable reference signal because they reserve varying amounts of space for calling different functions.)

Annotation, also, that the value stored at (%rbp) is the caller'southward %rbp, and the value stored at viii(%rbp) is the return address. This information can be used to trace backwards through callers' stack frames by functions such as debuggers.
The function ends with movq %rbp, %rsp; popq %rbp; retq, or, equivalently, leave; retq. This sequence restores the caller's %rbp and entry %rsp before returning.

Stack size and red zone

Functions execute fast because allocating space within a role is simply a matter of decrementing %rsp. This is much cheaper than a call to malloc or new! But making this work takes a lot of machinery. We'll run into this in more than detail later; but in brief: The operating system knows that %rsp points to the stack, then if a office accesses nonexistent retention near %rsp, the Os assumes it's for the stack and transparently allocates new retentivity there.

So how can a program "run out of stack"? The operating system puts a limit on each function's stack, and if %rsp gets likewise depression, the program sectionalisation faults.

The diagram above too shows a squeamish feature of the x86-64 architecture, namely the red zone. This is a modest area above the stack pointer (that is, at lower addresses than %rsp) that can be used by the currently-running office for local variables. The red zone is nice considering it tin exist used without mucking around with the stack arrow; for small functions push and pop instructions end upward taking time.

Branches

The processor typically executes instructions in sequence, incrementing %rip each fourth dimension. Deviations from sequential instruction execution, such as function calls, are called command flow transfers.

Role calls aren't the only kind of command period transfer. A co-operative pedagogy jumps to a new education without saving a return address on the stack.

Branches come up in two flavors, unconditional and provisional. The jmp or j instruction executes an unconditional branch (similar a goto). All other branch instructions are provisional: they merely branch if some status holds. That condition is represented by status flags that are set equally a side effect of every arithmetic operation.

Arithmetics instructions modify office of the %rflags annals as a side effect of their functioning. The most often used flags are:

ZF (zero flag): gear up iff the result was zilch.
SF (sign flag): set iff the nigh significant bit (the sign chip) of the result was 1 (i.eastward., the issue was negative if considered as a signed integer).
CF (comport flag): set iff the result overflowed when considered as unsigned (i.e., the result was greater than 2^West-1).
OF (overflow flag): set iff the result overflowed when considered as signed (i.e., the result was greater than 2^W-one-ane or less than –2^W-1).

Although some instructions let you load specific flags into registers (east.g., setz; see CS:APP3e §three.6.two, p203), lawmaking more than often accesses them via conditional jump or provisional move instructions.

Education	Mnemonic	C example	Flags
j (jmp)	Spring	`break;`	(Unconditional)
je (jz)	Leap if equal (goose egg)	`if (10 == y)`	ZF
jne (jnz)	Jump if non equal (nonzero)	`if (x != y)`	!ZF
jg (jnle)	Jump if greater	`if (x > y)`, signed	!ZF && !(SF ^ OF)
jge (jnl)	Jump if greater or equal	`if (x >= y)`, signed	!(SF ^ OF)
jl (jnge)	Spring if less	`if (ten < y)`, signed	SF ^ OF
jle (jng)	Jump if less or equal	`if (x <= y)`, signed	(SF ^ OF) \|\| ZF
ja (jnbe)	Jump if above	`if (x > y)`, unsigned	!CF && !ZF
jae (jnb)	Jump if above or equal	`if (x >= y)`, unsigned	!CF
jb (jnae)	Spring if below	`if (x < y)`, unsigned	CF
jbe (jna)	Bound if beneath or equal	`if (ten <= y)`, unsigned	CF \|\| ZF
js	Spring if sign bit	`if (x < 0)`, signed	SF
jns	Spring if non sign bit	`if (x >= 0)`, signed	!SF
jc	Spring if carry bit	Northward/A	CF
jnc	Jump if not bear bit	N/A	!CF
jo	Jump if overflow bit	N/A	OF
jno	Leap if not overflow bit	N/A	!OF

The examination and cmp instructions are frequently seen before a provisional branch. These operations perform arithmetic but throw away the consequence, except for condition codes. test performs binary-and, cmp performs subtraction.

cmp is hard to grasp: recollect that subq %rax, %rbx performs %rbx := %rbx - %rax—the source/destination operand is on the left. Then cmpq %rax, %rbx evaluates %rbx - %rax. The sequence cmpq %rax, %rbx; jg Fifty volition jump to label L if and but if %rbx is greater than %rax (signed).

The weird-looking instruction testq %rax, %rax, or more than generally testq REG, SAMEREG, is used to load the condition flags appropriately for a single register. For example, the bitwise-and of %rax and %rax is zero if and just if %rax is zero, so testq %rax, %rax; je 50 jumps to 50 if and only if %rax is zero.

C++ compilers and data structure implementations accept been designed to avoid the so-chosen abstraction penalty, which is when user-friendly data structures compile to more and more-expensive instructions than simple, raw memory accesses. When this works, information technology works quite well; for example, this:

                          long              f(std::vector<              int              >&              v) {              long              sum              =              0;              for              (automobile              &              i              : five) {         sum              +=              i;     }              return              sum; }

compiles to this, a very tight loop similar to the C version:

                          movq              (%rdi),              %rax              movq              8(%rdi),              %rcx              cmpq              %rcx,              %rax              je              .L4              movq              %rax,              %rdx              addq              $iv,              %rax              subq              %rax,              %rcx              andq              $-four,              %rcx              addq              %rax,              %rcx              movl              $0,              %eax              .L3:              movslq              (%rdx),              %rsi              addq              %rsi,              %rax              addq              $iv,              %rdx              cmpq              %rcx,              %rdx              jne              .L3              rep              ret              .L4:              movl              $0,              %eax              ret

Nosotros can also use this output to infer some aspects of std::vector's implementation. Information technology looks like:

The first chemical element of a std::vector structure is a arrow to the first chemical element of the vector;
The elements are stored in retention in a unproblematic array;
The 2nd element of a std::vector structure is a pointer to one-past-the-cease of the elements of the vector (i.eastward., if the vector is empty, the first and second elements of the structure have the aforementioned value).