Does A Languageã¢â‚¬â„¢s Calling Convention Include The Preserving Of Certain Registers By Procedure
Calling convention
A calling convention governs how functions on a detail compages and operating organisation interact. This includes rules about includes how function arguments are placed, where render values go, what registers functions may utilise, how they may allocate local variables, and and so along. Calling conventions ensure that functions compiled past different compilers can interoperate, and they ensure that operating systems can run code from unlike programming languages and compilers. Some aspects of a calling convention are derived from the educational activity gear up itself, but some are conventional, meaning decided upon past people (for instance, at a convention).
Calling conventions constrain both callers and callees. A caller is a function that calls another part; a callee is a part that was chosen. The currently-executing office is a callee, but not a caller.
For concreteness, we learn the x86-64 calling conventions for Linux. These conventions are shared by many OSes, including MacOS (but not Windows), and are officially called the "System 5 AMD64 ABI."
The official specification: AMD64 ABI
Argument passing and stack frames
One prepare of calling convention rules governs how function arguments and return values are passed. On x86-64 Linux, the start six function arguments are passed in registers %rdi
, %rsi
, %rdx
, %rcx
, %r8
, and %r9
, respectively. The seventh and subsequent arguments are passed on the stack, about which more below. The render value is passed in annals %rax
.
The full rules more complex than this. Y'all tin can read them in the AMD64 ABI, section 3.2.three, but they're quite detailed. Some highlights:
-
A construction argument that fits in a single auto discussion (64 bits/8 bytes) is passed in a unmarried register.
Example:
struct modest { char a1, a2; }
-
A structure that fits in two to four automobile words (xvi–32 bytes) is passed in sequential registers, as if information technology were multiple arguments.
Example:
struct medium { long a1, a2; }
-
A construction that's larger than 4 machine words is always passed on the stack.
Example:
struct big { long a, b, c, d, east, f, g; }
-
Floating point arguments are generally passed in special registers, the "SSE registers," that we don't talk over further.
-
If the return value takes more than eight bytes, then the caller reserves infinite for the return value, and passes the address of that infinite every bit the first argument of the function. The callee volition fill up in that infinite when information technology returns.
Writing minor programs to demonstrate these rules is a pleasant exercise; for example:
struct minor { char a1, a2; }; int f(small-scale s) { return s.a1 + 2 * southward.a2; }
compiles to:
movl %edi, %eax # copy argument to %eax movsbl %dil, %edi # %edi := sign-extension of everyman byte of argument (s.a1) movsbl %ah, %eax # %eax := sign-extension of 2nd byte of argument (due south.a2) movsbl %al, %eax leal (%rdi,%rax,ii), %eax # %eax := %edi + two * %eax ret
Stack
Recall that the stack is a segment of retention used to store objects with automatic lifetime. Typical stack addresses on x86-64 look similar 0x7ffd'9f10'4f58
—that is, shut to 247.
The stack is named afterwards a data construction, which was sort of named after pancakes. Stack data structures back up at least 3 operations: push button adds a new element to the "top" of the stack; popular removes the top element, showing whatever was underneath; and top accesses the pinnacle chemical element. Note what's missing: the data structure does not allow access to elements other than the elevation. (Which is sort of how stacks of pancakes work.) This restriction can speed up stack implementations.
Like a stack data structure, the stack retentiveness segment is just accessed from the top. The currently running part accesses its local variables; the part'due south caller, grand-caller, great-grand-caller, then along are fallow until the currently running function returns.
x86-64 stacks look like this:

The x86-64 %rsp
register is a special-purpose register that defines the current "stack pointer." This holds the address of the current top of the stack. On x86-64, as on many architectures, stacks grow down: a "push" operation adds space for more automatic-lifetime objects by moving the stack pointer left, to a numerically-smaller address, and a "popular" functioning recycles infinite past moving the stack pointer right, to a numerically-larger address. This means that, considered numerically, the "acme" of the stack has a smaller accost than the "lesser."
This is built in to the architecture by the operation of instructions like pushq
, popq
, call
, and ret
. A push
educational activity pushes a value onto the stack. This both modifies the stack arrow (making it smaller) and modifies the stack segment (by moving information there). For instance, the instruction pushq Ten
means:
subq $eight, %rsp movq X, (%rsp)
And popq X
undoes the effect of pushq X
. Information technology means:
movq (%rsp), Ten addq $8, %rsp
X
can be a register or a retention reference.
The portion of the stack reserved for a function is called that function's stack frame. Stack frames are aligned: x86-64 requires that each stack frame be a multiple of 16 bytes, and when a callq
instruction begins execution, the %rsp
register must be 16-byte aligned. This means that every function'south entry %rsp
address will be 8 bytes off a multiple of sixteen.
Return address and entry and exit sequence
The steps required to call a role are sometimes called the entry sequence and the steps required to return are called the get out sequence. Both caller and callee have responsibilities in each sequence.
To set up for a role call, the caller performs the post-obit tasks in its entry sequence.
-
The caller stores the first six arguments in the corresponding registers.
-
If the callee takes more than six arguments, or if some of its arguments are large, the caller must store the surplus arguments on its stack frame. It stores these in increasing order, so that the seventh argument has a smaller address than the 8th argument, then forth. The seventh argument must be stored at
(%rsp)
(that is, the acme of the stack) when the caller executes itscallq
instruction. -
The caller saves any caller-saved registers (see below).
-
The caller executes
callq Function
. This has an result likepushq $NEXT_INSTRUCTION; jmp FUNCTION
(or, equivalently,subq $8, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp FUNCTION
), whereNEXT_INSTRUCTION
is the accost of the instruction immediately followingcallq
.
This leaves a stack like this:

To render from a function:
-
The callee places its return value in
%rax
. -
The callee restores the stack arrow to its value at entry ("entry
%rsp
"), if necessary. -
The callee executes the
retq
instruction. This has an upshot likepopq %rip
, which removes the return address from the stack and jumps to that address. -
The caller then cleans upwards any space it prepared for arguments and restores caller-saved registers if necessary.
Particularly unproblematic callees don't need to do much more than than return, but near callees volition perform more than tasks, such as allocating infinite for local variables and calling functions themselves.
Callee-saved registers and caller-saved registers
The calling convention gives callers and callees certain guarantees and responsibilities about the values of registers beyond office calls. Function implementations may expect these guarantees to hold, and must piece of work to fulfill their responsibilities.
The most important responsibility is that certain registers' values must be preserved beyond function calls. A callee may use these registers, but if it changes them, it must restore them to their original values before returning. These registers are called callee-saved registers. All other registers are caller-saved.
Callers can but utilize callee-saved registers across function calls; in this sense they behave like C++ local variables. Caller-saved registers behave differently: if a caller wants to preserve the value of a caller-saved register across a office call, the caller must explicitly save information technology before the callq
and restore it when the function resumes.
On x86-64 Linux, %rbp
, %rbx
, %r12
, %r13
, %r14
, and %r15
are callee-saved, as (sort of) are %rsp
and %rip
. The other registers are caller-saved.
Base of operations arrow (frame pointer)
The %rbp
register is chosen the base of operations arrow (and sometimes the frame arrow). For simple functions, an optimizing compiler mostly treats this like any other callee-saved full general-purpose register. However, for more complex functions, %rbp
is used in a specific blueprint that facilitates debugging. It works like this:

-
The first instruction executed on role entry is
pushq %rbp
. This saves the caller'due south value for%rbp
into the callee'southward stack. (Since%rbp
is callee-saved, the callee must save it.) -
The 2d instruction is
movq %rsp, %rbp
. This saves the electric current stack pointer in%rbp
(and then%rbp
= entry%rsp
- 8).This adapted value of
%rbp
is the callee'due south "frame pointer." The callee will non change this value until it returns. The frame pointer provides a stable reference bespeak for local variables and caller arguments. (Circuitous functions may need a stable reference signal because they reserve varying amounts of space for calling different functions.)Annotation, also, that the value stored at
(%rbp)
is the caller'southward%rbp
, and the value stored atviii(%rbp)
is the return address. This information can be used to trace backwards through callers' stack frames by functions such as debuggers. -
The function ends with
movq %rbp, %rsp; popq %rbp; retq
, or, equivalently,leave; retq
. This sequence restores the caller's%rbp
and entry%rsp
before returning.
Stack size and red zone
Functions execute fast because allocating space within a role is simply a matter of decrementing %rsp
. This is much cheaper than a call to malloc
or new
! But making this work takes a lot of machinery. We'll run into this in more than detail later; but in brief: The operating system knows that %rsp
points to the stack, then if a office accesses nonexistent retention near %rsp
, the Os assumes it's for the stack and transparently allocates new retentivity there.
So how can a program "run out of stack"? The operating system puts a limit on each function's stack, and if %rsp
gets likewise depression, the program sectionalisation faults.
The diagram above too shows a squeamish feature of the x86-64 architecture, namely the red zone. This is a modest area above the stack pointer (that is, at lower addresses than %rsp
) that can be used by the currently-running office for local variables. The red zone is nice considering it tin exist used without mucking around with the stack arrow; for small functions push
and pop
instructions end upward taking time.
Branches
The processor typically executes instructions in sequence, incrementing %rip
each fourth dimension. Deviations from sequential instruction execution, such as function calls, are called command flow transfers.
Role calls aren't the only kind of command period transfer. A co-operative pedagogy jumps to a new education without saving a return address on the stack.
Branches come up in two flavors, unconditional and provisional. The jmp
or j
instruction executes an unconditional branch (similar a goto
). All other branch instructions are provisional: they merely branch if some status holds. That condition is represented by status flags that are set equally a side effect of every arithmetic operation.
Arithmetics instructions modify office of the %rflags
annals as a side effect of their functioning. The most often used flags are:
- ZF (zero flag): gear up iff the result was zilch.
- SF (sign flag): set iff the nigh significant bit (the sign chip) of the result was 1 (i.eastward., the issue was negative if considered as a signed integer).
- CF (comport flag): set iff the result overflowed when considered as unsigned (i.e., the result was greater than 2West-1).
- OF (overflow flag): set iff the result overflowed when considered as signed (i.e., the result was greater than 2W-one-ane or less than –2W-1).
Although some instructions let you load specific flags into registers (east.g., setz
; see CS:APP3e §three.6.two, p203), lawmaking more than often accesses them via conditional jump or provisional move instructions.
Education | Mnemonic | C example | Flags |
---|---|---|---|
j (jmp) | Spring | break; | (Unconditional) |
je (jz) | Leap if equal (goose egg) | if (10 == y) | ZF |
jne (jnz) | Jump if non equal (nonzero) | if (x != y) | !ZF |
jg (jnle) | Jump if greater | if (x > y) , signed | !ZF && !(SF ^ OF) |
jge (jnl) | Jump if greater or equal | if (x >= y) , signed | !(SF ^ OF) |
jl (jnge) | Spring if less | if (ten < y) , signed | SF ^ OF |
jle (jng) | Jump if less or equal | if (x <= y) , signed | (SF ^ OF) || ZF |
ja (jnbe) | Jump if above | if (x > y) , unsigned | !CF && !ZF |
jae (jnb) | Jump if above or equal | if (x >= y) , unsigned | !CF |
jb (jnae) | Spring if below | if (x < y) , unsigned | CF |
jbe (jna) | Bound if beneath or equal | if (ten <= y) , unsigned | CF || ZF |
js | Spring if sign bit | if (x < 0) , signed | SF |
jns | Spring if non sign bit | if (x >= 0) , signed | !SF |
jc | Spring if carry bit | Northward/A | CF |
jnc | Jump if not bear bit | N/A | !CF |
jo | Jump if overflow bit | N/A | OF |
jno | Leap if not overflow bit | N/A | !OF |
The examination
and cmp
instructions are frequently seen before a provisional branch. These operations perform arithmetic but throw away the consequence, except for condition codes. test
performs binary-and, cmp
performs subtraction.
cmp
is hard to grasp: recollect that subq %rax, %rbx
performs %rbx := %rbx - %rax
—the source/destination operand is on the left. Then cmpq %rax, %rbx
evaluates %rbx - %rax
. The sequence cmpq %rax, %rbx; jg Fifty
volition jump to label L
if and but if %rbx
is greater than %rax
(signed).
The weird-looking instruction testq %rax, %rax
, or more than generally testq REG, SAMEREG
, is used to load the condition flags appropriately for a single register. For example, the bitwise-and of %rax
and %rax
is zero if and just if %rax
is zero, so testq %rax, %rax; je 50
jumps to 50
if and only if %rax
is zero.
C++ compilers and data structure implementations accept been designed to avoid the so-chosen abstraction penalty, which is when user-friendly data structures compile to more and more-expensive instructions than simple, raw memory accesses. When this works, information technology works quite well; for example, this:
long f(std::vector< int >& v) { long sum = 0; for (automobile & i : five) { sum += i; } return sum; }
compiles to this, a very tight loop similar to the C version:
movq (%rdi), %rax movq 8(%rdi), %rcx cmpq %rcx, %rax je .L4 movq %rax, %rdx addq $iv, %rax subq %rax, %rcx andq $-four, %rcx addq %rax, %rcx movl $0, %eax .L3: movslq (%rdx), %rsi addq %rsi, %rax addq $iv, %rdx cmpq %rcx, %rdx jne .L3 rep ret .L4: movl $0, %eax ret
Nosotros can also use this output to infer some aspects of std::vector
's implementation. Information technology looks like:
- The first chemical element of a
std::vector
structure is a arrow to the first chemical element of the vector; - The elements are stored in retention in a unproblematic array;
- The 2nd element of a
std::vector
structure is a pointer to one-past-the-cease of the elements of the vector (i.eastward., if the vector is empty, the first and second elements of the structure have the aforementioned value).
Does A Languageã¢â‚¬â„¢s Calling Convention Include The Preserving Of Certain Registers By Procedure,
Source: https://cs61.seas.harvard.edu/site/2018/Asm2/
Posted by: thomsonhise1955.blogspot.com
0 Response to "Does A Languageã¢â‚¬â„¢s Calling Convention Include The Preserving Of Certain Registers By Procedure"
Post a Comment