Wednesday, January 28, 2015

x86 Calling Conventions

I have used the Wikipedia article as my starting point. Then I have added some information that I have found experimentally (by looking at the generated ASM code) and I've written this article in a form that was useful to me.

Introduction

A calling convention determines the way parameters and return values are passed to/from functions. All calling conventions use the stack for parameters; additionally, processor registers can be used.

Call Stack

The call stack is a special memory area, storing:
  • Function parameters and return values
  • Local variables
  • Additional function call information (address of the instruction to return to, original stack pointer etc.)
The processor has special instructions for working with the stack; notably PUSH, POP, CALL and RET.
The x86 call stack grows, perhaps counter-intuitively, from higher memory addresses to lower ones.
Since x86 is a 32-bit architecture, everything on the stack is in 32-bit chunks:
  • Values that have less than 32 bits are expanded to 32-bits
  • Values that have more than 32 bits are split into 32 bit chunks.
  • Structures with sizes that are not a multiple of 32 bits are padded to the nearest multiple of 32 bits.

Parameter Order

The calling convention determines the order in which the parameters are pushed on the stack. It can be either left-to-right or right-to-left.
For example, given the function call:
f(p1, p2, p3, p4, p5)
We can pass the parameters in two ways:
Left-to-Right
Top Lower address
p1
Stack grows to lower addresses
p2
p3
p4
p5
Bottom Higher address
Right-to-Left
Top Lower address
p5
Stack grows to lower addresses
p4
p3
p2
p1
Bottom Higher address
The left-to-right (a.k.a. Pascal) order is the obvious order. However, the right-to-left order (a.k.a. C) has the advantage that it supports functions with variable number of parameters (of course, for this to work, the caller must clean up the stack). This enables C to have functions like printf.

Stack Clean-up

Either the calling or the called function can restore the stack to its position before the call (i.e. by changing the ESP register).
It is more convenient if the called function restores the stack. However, for functions with variable number of arguments, this doesn’t work anymore, because the called function cannot know how many arguments were passed. Instead, the calling function must clean up the stack.

Function Name Decoration (“Mangling”)

While not part of the calling convention, per se, compilers decorate the function names according to their calling convention. For example, void Fn(int) is decorated to _Fn@4 in the stdcall calling convention (the number after the @ sign is the number of bytes on the stack).
When exporting functions from a DLL, the exported name of the function can be changed (it’s usually exported without any decoration).

stdcall Calling Convention

This is the calling convention used by the Windows API.
Decoration: names are prefixed with an underscore (_) and are followed by an @ sign and the number of bytes required on the stack; for example: void Fn(int) is decorated to _Fn@4.
Note: the return value is not counted, even when it takes up space on the stack.
Arguments are passed right-to-left (C order)1; the called function cleans up the stack.
1 However, the 16-bit Windows API used the left-to-right (Pascal) order for arguments.
All arguments are passed on the stack.
  • Integers: Arguments smaller than 32 bits are enlarged to 32 bits. 64 bits arguments are passed as 2 32-bit values; first upper half, then lower half. That means that they are in the normal order in the stack memory (i.e. little endian).
  • Floats are 32 bits wide; they are passed in one stack position.
  • Doubles are 64 bits wide; they are passed in two stack position (similar to 64 bit integers).
  • Pointers are treated as 32 bit integers.
  • Arrays are passed by reference, i.e. a pointer to their start is passed as a 32 bit value.
  • Structures are just copied on the stack, extended to a multiple of 32 bits.
Return values:
  • 32 bit integer results (this includes pointers) are returned in the EAX register. Integers shorter than 32 bits are extended to 32 bits.
  • 64 bit integer results are returned in EAX (lower half) and EDX (upper half).
  • Floating point arguments are returned in ST(0), which is 80 bits wide. From here, they can be copied to float or double variables, with a corresponding precision loss.
  • Structures:
    • 1, 2, 4, 8 byte structures are returned in EAX and EDX (upper half, if necessary) but not 3, 5, 6, 7 byte structures!
      NOTE: The IBM Visual Age C++ 3.5 compiler doesn’t do this; it just puts the pointer on the stack! Also the functions don’t take this into account when cleaning up.
    • Everything else is not returned per se, instead, the caller must push a pointer to their start on the top of the stack.

cdecl Calling Convention

This is the default calling convention used by the Microsoft Visual C++ compiler; it’s also supported by most compilers on Windows.
(Differences from the stdcall calling convention are underlined.)
Decoration: names are prefixed with an uderscore (_); for example: void Fn(int) is decorated to _Fn. (But, according to MSDN, they are not decorated when exported from DLLs.)
Arguments are passed right-to-left (C order)1; the calling function cleans up the stack – this enables calling functions with a variable number of parameters.
All arguments are passed on the stack.
  • Integers: Arguments smaller than 32 bits are enlarged to 32 bits. 64 bits arguments are passed as 2 32-bit values; first upper half, then lower half. That means that they are in the normal order in the stack memory (i.e. little endian).
  • Floats are 32 bits wide; they are passed in one stack position.
  • Doubles are 64 bits wide; they are passed in two stack position (similar to 64 bit integers).
  • Pointers are treated as 32 bit integers.
  • Arrays are passed by reference, i.e. a pointer to their start is passed as a 32 bit value.
  • Structures are just copied on the stack, extended to a multiple of 32 bits.
Return values:
  • 32 bit integer results (this includes pointers) are returned in the EAX register. Integers shorter than 32 bits are extended to 32 bits.
  • 64 bit integer results are returned in EAX (lower half) and EDX (upper half).
  • Floating point arguments are returned in ST(0), which is 80 bits wide. From here, they can be copied to float or double variables, with a corresponding precision loss.
  • Structures:
    • 1, 2, 4, 8 byte structures are returned in EAX and EDX (upper half, if necessary) but not 3, 5, 6, 7 byte structures!
    • Everything else is not returned per se, instead, the caller must push a pointer to their start on the top of the stack.

optlink Calling Convention

This is the default calling convention used by the IBM Visual Age compiler.
(Differences from the stdcall calling convention are underlined.)
Decoration: names are prefixed with a question mark (?); for example: void Fn(int) is decorated to ?Fn.
Arguments are passed right-to-left (C order)1; the calling function cleans up the stack – this enables calling functions with a variable number of parameters.
All arguments are passed on the stack and in certain registers.
  • Integers: Arguments smaller than 32 bits are enlarged to 32 bits. 64 bits arguments are passed as 2 32-bit values; first upper half, then lower half. That means that they are in the normal order in the stack memory (i.e. little endian).
  • Floats are 32 bits wide; they are passed in one stack position.
  • Doubles are 64 bits wide; they are passed in two stack position (similar to 64 bit integers).
  • Pointers are treated as 32 bit integers.
  • Arrays are passed by reference, i.e. a pointer to their start is passed as a 32 bit value.
  • Structures are just copied on the stack, extended to a multiple of 32 bits.
The first 3 integer values are passed in the EAX, EDX and ECX registers (1st argument in EAX, 2nd in EDX, 3rd in ECX). A 64 bit value is treated simply as 2 32-bit values; if the 3rd argument is a 64-bit integer, EDX will contain the lower half and the rest will be on the stack. The first 4 floating point values are passed in four floating point registers: 1st value in ST(0), 2nd in ST(1), 3rd in ST(2) and 4th in ST(3). These registers are 80-bit wide, so all values take up just one register.
If all values fit into registers, nothing is passed onto the stack. However, if they don’t, space is left on the stack, in the regular positions, even for the arguments passed in registers. (The IBM compiler doesn’t initialize this stack space.)
Return values:
  • 32 bit integer results (this includes pointers) are returned in the EAX register. Integers shorter than 32 bits are extended to 32 bits.
  • 64 bit integer results are returned in EAX (lower half) and EDX (upper half).
  • Floating point arguments are returned in ST(0), which is 80 bits wide. From here, they can be copied to float or double variables, with a corresponding precision loss.
  • Structures are not returned per se, instead, the caller must push a pointer to their start on the top of the stack, regardless of their size.

Tuesday, January 27, 2015

Pointer Safety Tips

0. Understand the difference between global (static), stack, heap (free memory) and unallocated memory.

A pointer is just the address of a memory location. However, you must consider what you are pointing to, and how long the pointed object “lives”.
The objects that live the longest are global objects. They are “alive” as long as the program runs. A global object is simply any variable defined outside a function.
Additionally, local variables defined as static are also valid as long as the program runs – they are global variables that are only visible in the block that defines them.
The stack is a special memory area that holds local variables, function parameters (which are, in fact, also local variables), and function call information. The stack is allocated by the operating system for each thread; there are special processor instructions (and registers) for dealing with the function call.
We don’t need to care about allocating and releasing memory for the local variables; the compiler does this for us. However, we must remember that local variables live only as long as the block that defines them – so never return a pointer to a local variable.
The free store, or heap, is where malloc allocates memory. The responsibility for allocating and releasing memory from the heap falls to the programmer.

1. Prefer local variables.

Of course, you don’t always have a choice. You may not know how many objects you have until runtime. Or the object might be too big for the stack. But for simple cases, you don’t need the headache.
int * pArray = calloc(3, sizeof(int));
pArray[1] = 3;
pArray[2] = 7;
pArray[3] = 12;
DoSomething(pArray, 3);
free(pArray);
In this case, we know the whole array in advance, so we can keep the code simple:
int array[] = { 3, 7, 5 };
DoSomething(array, ARRAYSIZE(array));

2. Never return a pointer to a local variable from a function.

This is obvious: the variable is only valid during the function call. Of course, you can return the address of a static local variable, since this is valid for the whole duration of the program.
int *a(void) {
    int x;
    static int y = 12;
    // ...
    printf("x = %d", &x); // OK, x is valid for the whole duration of the printf call.
    return &x; // Bad, x ceases to be valid when exiting the function.
    return &y; // OK, y is static.
}

3. Initialize any variable you define.

This piece of advice is valid regardless of the actual data type.
By convention, any pointer that doesn’t point anywhere should be initialized with NULL.

4.Allocated memory is not initialized. You must initialize it yourself.

Well, you might argue that calloc does initialize the memory. That’s true,
int * p = malloc(sizeof(int));
memset(p, 0, sizeof(int));
is equivalent to:
int * p = calloc(1, sizeof(int));
But there are more complicated cases:
int *p1 = realloc(p, sizeof(int) * 2);
p1[0] = 0;
In this case, we want to extend the allocated memory area, from enough for one integer to two. The first integer will keep its value, but the second will not be initialized, so we need to do it ourselves.

5. Everything that was allocated must be freed, or you’ll have a memory “leak”.

It’s always fun to watch the memory consumption for this program, using Process Explorer or a similar tool:
size_t i;
for (i = 0; i < SIZE_MAX; ++i) {
    int * p = malloc(sizeof(int));
    if (!p)
        break;
}
Hmm, kind of boring; let’s make it leak faster:
size_t i;
for (i = 0; i < SIZE_MAX; ++i) {
    int * p = malloc(sizeof(int) * 100000);
    if (!p)
        break;
}
It’s true that the operating system frees all allocated memory after a program ends its executions. Still, the leaked memory means less memory for the other programs running. It can also mean that your program will run out of memory instead of keeping on running for days or years.

6. Freed memory looks just like allocated memory* – always set pointers to NULL after deallocating the memory.

* Just the same as, in the chemistry lab, hot glass looks just like cold glass.

Quick now: what does this program print?
int *p = calloc(1000, sizeof(int));
printf("p = %x\n", p[3]);
free(p);

printf("p = %x\n", p[3]);
The answer is: it depends. If you use Visual Studio, it will probably show:
p = 0
p = feeefeee
But it might also print:
p = 0
p = 0
Why? Because memory keeps its value until overwritten. In general, deallocation does not overwrite the freed memory – as you know, C was designed to be fast (or to shoot yourself in the leg quickly).
So, to prevent problems, always set pointers to NULL after freeing them.
free(p);
p = NULL;

For advanced practitioners: sometimes even this may be enough.

 int *p = calloc(1000, sizeof(int));
int *q = p + 3;
printf("p = %x\n", p[3]);
printf("q = %x\n", *q);
free(p);
p = NULL;

printf("q = %x\n", *q);
Here, we have several pointers. While we set the pointer to the beginning of the block to NULL, we may not know who else references our memory. The code might even work correctly, using the “ghost” values from the freed memory, until a change in a completely different place will mysteriously modify the value pointed to by q.
So how would you debug a situation like this? I would overwrite the whole memory block with 0 (or another suitable value) just before freeing it:
memset(p, 0, 1000);
free(p);
p = NULL;
This, of course, will not solve the problem, but it might help find the bug faster – the value that q points to will change after freeing p.

7. Always check that the allocation was successful.

In general, you should always check the return code of any function.
int *p = malloc(1000000000 * sizeof(int));
p[0] = 13;
free(p);
What’s wrong here? It’s not the fact that we try to allocate enough space for a billion integers, but the fact that we never check if we actually got it.
Let’s try again:
int *p = malloc(1000000000 * sizeof(int));
if (!p) {
    fprintf(stderr, "Not enough memory.\n");
}
else {
    p[0] = 13;
    free(p);
}
Easy, right?
Let’s try a more complicated case:
typedef struct Employee {
    char name[10];
    // ...
} Employee;

typedef struct Employees {
    size_t count;
    Employee *pList;
} Employees;

bool AddEmployee(Employees *pEmployees, const Employee *pPers) {
    assert(pEmployees);
    assert(pPers);
    if (pEmployees->count == 0) {
        assert(pEmployees->pList == NULL);
        if (NULL == (pEmployees->pList = malloc(sizeof(Employee))))
            return false;
        pEmployees ->pList[0] = *pPers;
        ++pEmployees->count;
        return true;
    }
    else {
        assert(pEmployees->pList != NULL);
        if (NULL == (pEmployees->pList = realloc(pEmployees->pList,
                sizeof(Employee) * (pEmployees->count + 1))))
            return false;
        pEmployees->pList[pEmployees->count] = *pPers;
        ++pEmployees->count;
        return true;
    }
}
At first sight, this looks correct. We verify if the allocation was successful and we stop if it didn’t work. Maybe it could’ve been written using only realloc, but that’s not our main concern now.
However, a quick look in the documentation tells us that realloc returns NULL on failure. What we’ve created here is a memory leak – if there is not enough room for a new employee, we lose the list (even though it’s still allocated).
The correct approach would be to use a temporary variable:
Employee *pNewList = NULL;
assert(pEmployees->pList != NULL);
if (NULL == (pNewList = realloc(pEmployees->pList, sizeof(Employee) * (pEmployees->count + 1))))
    return false;
pEmployees->pList = pNewList;
pEmployees->pList[pEmployees->count] = *pPers;
++pEmployees->count;
return true;

8. free() needs a pointer to the beginning of the allocated memory block.

Consider the following code:
size_t count = 100;
size_t i = 0;
int *p = malloc(count * sizeof(int));
int *q = p + count;
while (p != q) {
    *p = i;
    ++p;
}
// ...
free(p);
The result is unspecified and probably bad. Fortunately, it’s easy to fix:
size_t count = 100;
size_t i = 0;
int * const pStart = malloc(count * sizeof(int));
int *p = pStart;
int *q = p + count;
while (p != q) {
    *p = i;
    ++p;
}
// ...
free(pStart);
The const keeps us honest, by preventing us from moving the pointer. Remember that only the const after the asterisk makes the pointer constant.

9. Never write more than you have allocated.

This goes even if you don’t use the heap. There is still too much code like this out there:
char c[3];
sprintf(c, "Abc");
puts(c);
There are tools to find this. But it’s still the programmer’s responsibility to keep track of allocated memory and to make sure only the allocated memory is accessed.
The traditional C functions like sprintf and strcpy are not very safe, since they don’t check if you’re writing outside the allocated string. There is a strncat and strncpy, but there’s a gotcha: if the block is not enough to write the string, the string terminator is not added automatically.
char c[4];
strncpy(c, "Hello, world!", ARRAYSIZE(c) - 1);
c[ARRAYSIZE(c) - 1] = '\0'; // Make it safe.
The newer versions of the standard add Microsoft’s safe string functions (sprintf_s, strcat_s), but only optional. Use them if you can. If you don’t have them, you may want to create your own.

10. String literals are constant.

This is the type of thing that can cause problems when upgrading from an old compiler to a newer, stricter one.
char *c = "abc";
c[0] = 'A';
printf("%s", c);
On modern compilers, this give you an access violation – you are trying to write in a read-only memory block. Older compilers let you get away with this. Some compilers make the strings read-only by default, but have a switch to make them writable.
It’s perhaps surprising that most compilers let you compile this code without warnings, even on the highest warning level. I would have expected them to complain that c is not const char * (or const char[]).

11. Always use the correct deallocation function.

If you also use C++, you will know that whatever you’ve allocated with new has to be freed with delete and what’s been allocated with malloc/realloc/calloc should be freed with free. But that’s not the end of the story.
Operating systems offer various memory management functions. For example, on Windows you have GlobalAlloc, VirtualAlloc, CoTaskMemoryAlloc and perhaps others, each with its own counterpart for freeing the memory. If you call the wrong function, there’s no telling what will happen (but you can be sure that it’s not going to be good).
You may think you’re safe if you use just malloc/calloc/realloc and free. Think again.
Employee * CreateEmployee(const char *name, Gender gender) {
    Employee *pEmp = calloc(1, sizeof(Employee));
    if (pEmp) {
        strcpy_s(pEmp->name, ARRAYSIZE(pEmp->name), name);
        pEmp->gender = gender;
    }
    return pEmp;
}

void f(void) {
    Employee *pEmp = CreateEmployee("John Doe", Gender_Male);
    if (pEmp) {
        // Process and store employee
        // ...
        free(pEmp);
        pEmp = NULL;
    }
}
As long as you use the two functions in the same module, you’re fine. But at some point you might want to move the CreateEmployee function to a library. Somebody with a different compiler (or the same compiler but different settings) will use your library but recompile f(). Suddenly, you’ll have a problem, because the copy of free from the client code will not know how the calloc from the Employee library has allocated the memory.
The solution is to create and export a DeleteEmployee function together with the CreateEmployee function:
// Employee library
Employee * CreateEmployee(const char *name, Gender gender);
void DeleteEmployee(Employee *pEmp);

// Main program
void f(void) {
    Employee *pEmp = CreateEmployee("John Doe", Gender_Male);
    if (pEmp) {
        // Process and store employee
        // ...
        DeleteEmployee(pEmp);
        pEmp = NULL;
    }
}
Which brings me to the next point:

12. Remember that structures containing pointers need special treatment.

What I mean by this is that you cannot just free the top level object; you need to also free any object owned by it – but not the ones that are just referenced.
For example:
typedef struct Employee {
    Gender gender;
    char *name;
    char *surname;
    Date birthday;
    Employee *pManager;
    size_t subordinatesCount;
    Employee *pSubordinates;
} Employee;

Employee * CreateEmployee(const char *name, const char *surname, Gender gender) {
    Employee *pEmp = calloc(1, sizeof(Employee));
    if (pEmp) {
        strcpy_s(pEmp->name, ARRAYSIZE(pEmp->name), name);
        pEmp->gender = gender;
    }
    return pEmp;
}

void f(void) {
    Employee *pEmp = CreateEmployee("John", "Doe", Gender_Male);
    if (pEmp) {
        // Process and store employee
        // ...
        free(pEmp->name);
        free(pEmp->surname);
        // Don't free manager and subordinates, those are just references, not owned by this object
        free(pEmp);
        pEmp = NULL;
    }
}
As Employee grows more complicated, it becomes more and more convenient to have a DeleteEmployee function that takes care of all these details. In other words, even in plains C it is worthwhile to think object-oriented.

13. Be sure to document who owns the memory block.

Consider the following function declaration:
const char * GetLastErrorText(void);
Should the caller free the text buffer? It is a pointer to a static buffer? Maybe the library allocates memory for the message and keeps it until the next call?
There is only one answer: see the documentation.