Tuesday, January 27, 2015

Pointer Safety Tips

0. Understand the difference between global (static), stack, heap (free memory) and unallocated memory.

A pointer is just the address of a memory location. However, you must consider what you are pointing to, and how long the pointed object “lives”.
The objects that live the longest are global objects. They are “alive” as long as the program runs. A global object is simply any variable defined outside a function.
Additionally, local variables defined as static are also valid as long as the program runs – they are global variables that are only visible in the block that defines them.
The stack is a special memory area that holds local variables, function parameters (which are, in fact, also local variables), and function call information. The stack is allocated by the operating system for each thread; there are special processor instructions (and registers) for dealing with the function call.
We don’t need to care about allocating and releasing memory for the local variables; the compiler does this for us. However, we must remember that local variables live only as long as the block that defines them – so never return a pointer to a local variable.
The free store, or heap, is where malloc allocates memory. The responsibility for allocating and releasing memory from the heap falls to the programmer.

1. Prefer local variables.

Of course, you don’t always have a choice. You may not know how many objects you have until runtime. Or the object might be too big for the stack. But for simple cases, you don’t need the headache.
int * pArray = calloc(3, sizeof(int));
pArray[1] = 3;
pArray[2] = 7;
pArray[3] = 12;
DoSomething(pArray, 3);
free(pArray);
In this case, we know the whole array in advance, so we can keep the code simple:
int array[] = { 3, 7, 5 };
DoSomething(array, ARRAYSIZE(array));

2. Never return a pointer to a local variable from a function.

This is obvious: the variable is only valid during the function call. Of course, you can return the address of a static local variable, since this is valid for the whole duration of the program.
int *a(void) {
    int x;
    static int y = 12;
    // ...
    printf("x = %d", &x); // OK, x is valid for the whole duration of the printf call.
    return &x; // Bad, x ceases to be valid when exiting the function.
    return &y; // OK, y is static.
}

3. Initialize any variable you define.

This piece of advice is valid regardless of the actual data type.
By convention, any pointer that doesn’t point anywhere should be initialized with NULL.

4.Allocated memory is not initialized. You must initialize it yourself.

Well, you might argue that calloc does initialize the memory. That’s true,
int * p = malloc(sizeof(int));
memset(p, 0, sizeof(int));
is equivalent to:
int * p = calloc(1, sizeof(int));
But there are more complicated cases:
int *p1 = realloc(p, sizeof(int) * 2);
p1[0] = 0;
In this case, we want to extend the allocated memory area, from enough for one integer to two. The first integer will keep its value, but the second will not be initialized, so we need to do it ourselves.

5. Everything that was allocated must be freed, or you’ll have a memory “leak”.

It’s always fun to watch the memory consumption for this program, using Process Explorer or a similar tool:
size_t i;
for (i = 0; i < SIZE_MAX; ++i) {
    int * p = malloc(sizeof(int));
    if (!p)
        break;
}
Hmm, kind of boring; let’s make it leak faster:
size_t i;
for (i = 0; i < SIZE_MAX; ++i) {
    int * p = malloc(sizeof(int) * 100000);
    if (!p)
        break;
}
It’s true that the operating system frees all allocated memory after a program ends its executions. Still, the leaked memory means less memory for the other programs running. It can also mean that your program will run out of memory instead of keeping on running for days or years.

6. Freed memory looks just like allocated memory* – always set pointers to NULL after deallocating the memory.

* Just the same as, in the chemistry lab, hot glass looks just like cold glass.

Quick now: what does this program print?
int *p = calloc(1000, sizeof(int));
printf("p = %x\n", p[3]);
free(p);

printf("p = %x\n", p[3]);
The answer is: it depends. If you use Visual Studio, it will probably show:
p = 0
p = feeefeee
But it might also print:
p = 0
p = 0
Why? Because memory keeps its value until overwritten. In general, deallocation does not overwrite the freed memory – as you know, C was designed to be fast (or to shoot yourself in the leg quickly).
So, to prevent problems, always set pointers to NULL after freeing them.
free(p);
p = NULL;

For advanced practitioners: sometimes even this may be enough.

 int *p = calloc(1000, sizeof(int));
int *q = p + 3;
printf("p = %x\n", p[3]);
printf("q = %x\n", *q);
free(p);
p = NULL;

printf("q = %x\n", *q);
Here, we have several pointers. While we set the pointer to the beginning of the block to NULL, we may not know who else references our memory. The code might even work correctly, using the “ghost” values from the freed memory, until a change in a completely different place will mysteriously modify the value pointed to by q.
So how would you debug a situation like this? I would overwrite the whole memory block with 0 (or another suitable value) just before freeing it:
memset(p, 0, 1000);
free(p);
p = NULL;
This, of course, will not solve the problem, but it might help find the bug faster – the value that q points to will change after freeing p.

7. Always check that the allocation was successful.

In general, you should always check the return code of any function.
int *p = malloc(1000000000 * sizeof(int));
p[0] = 13;
free(p);
What’s wrong here? It’s not the fact that we try to allocate enough space for a billion integers, but the fact that we never check if we actually got it.
Let’s try again:
int *p = malloc(1000000000 * sizeof(int));
if (!p) {
    fprintf(stderr, "Not enough memory.\n");
}
else {
    p[0] = 13;
    free(p);
}
Easy, right?
Let’s try a more complicated case:
typedef struct Employee {
    char name[10];
    // ...
} Employee;

typedef struct Employees {
    size_t count;
    Employee *pList;
} Employees;

bool AddEmployee(Employees *pEmployees, const Employee *pPers) {
    assert(pEmployees);
    assert(pPers);
    if (pEmployees->count == 0) {
        assert(pEmployees->pList == NULL);
        if (NULL == (pEmployees->pList = malloc(sizeof(Employee))))
            return false;
        pEmployees ->pList[0] = *pPers;
        ++pEmployees->count;
        return true;
    }
    else {
        assert(pEmployees->pList != NULL);
        if (NULL == (pEmployees->pList = realloc(pEmployees->pList,
                sizeof(Employee) * (pEmployees->count + 1))))
            return false;
        pEmployees->pList[pEmployees->count] = *pPers;
        ++pEmployees->count;
        return true;
    }
}
At first sight, this looks correct. We verify if the allocation was successful and we stop if it didn’t work. Maybe it could’ve been written using only realloc, but that’s not our main concern now.
However, a quick look in the documentation tells us that realloc returns NULL on failure. What we’ve created here is a memory leak – if there is not enough room for a new employee, we lose the list (even though it’s still allocated).
The correct approach would be to use a temporary variable:
Employee *pNewList = NULL;
assert(pEmployees->pList != NULL);
if (NULL == (pNewList = realloc(pEmployees->pList, sizeof(Employee) * (pEmployees->count + 1))))
    return false;
pEmployees->pList = pNewList;
pEmployees->pList[pEmployees->count] = *pPers;
++pEmployees->count;
return true;

8. free() needs a pointer to the beginning of the allocated memory block.

Consider the following code:
size_t count = 100;
size_t i = 0;
int *p = malloc(count * sizeof(int));
int *q = p + count;
while (p != q) {
    *p = i;
    ++p;
}
// ...
free(p);
The result is unspecified and probably bad. Fortunately, it’s easy to fix:
size_t count = 100;
size_t i = 0;
int * const pStart = malloc(count * sizeof(int));
int *p = pStart;
int *q = p + count;
while (p != q) {
    *p = i;
    ++p;
}
// ...
free(pStart);
The const keeps us honest, by preventing us from moving the pointer. Remember that only the const after the asterisk makes the pointer constant.

9. Never write more than you have allocated.

This goes even if you don’t use the heap. There is still too much code like this out there:
char c[3];
sprintf(c, "Abc");
puts(c);
There are tools to find this. But it’s still the programmer’s responsibility to keep track of allocated memory and to make sure only the allocated memory is accessed.
The traditional C functions like sprintf and strcpy are not very safe, since they don’t check if you’re writing outside the allocated string. There is a strncat and strncpy, but there’s a gotcha: if the block is not enough to write the string, the string terminator is not added automatically.
char c[4];
strncpy(c, "Hello, world!", ARRAYSIZE(c) - 1);
c[ARRAYSIZE(c) - 1] = '\0'; // Make it safe.
The newer versions of the standard add Microsoft’s safe string functions (sprintf_s, strcat_s), but only optional. Use them if you can. If you don’t have them, you may want to create your own.

10. String literals are constant.

This is the type of thing that can cause problems when upgrading from an old compiler to a newer, stricter one.
char *c = "abc";
c[0] = 'A';
printf("%s", c);
On modern compilers, this give you an access violation – you are trying to write in a read-only memory block. Older compilers let you get away with this. Some compilers make the strings read-only by default, but have a switch to make them writable.
It’s perhaps surprising that most compilers let you compile this code without warnings, even on the highest warning level. I would have expected them to complain that c is not const char * (or const char[]).

11. Always use the correct deallocation function.

If you also use C++, you will know that whatever you’ve allocated with new has to be freed with delete and what’s been allocated with malloc/realloc/calloc should be freed with free. But that’s not the end of the story.
Operating systems offer various memory management functions. For example, on Windows you have GlobalAlloc, VirtualAlloc, CoTaskMemoryAlloc and perhaps others, each with its own counterpart for freeing the memory. If you call the wrong function, there’s no telling what will happen (but you can be sure that it’s not going to be good).
You may think you’re safe if you use just malloc/calloc/realloc and free. Think again.
Employee * CreateEmployee(const char *name, Gender gender) {
    Employee *pEmp = calloc(1, sizeof(Employee));
    if (pEmp) {
        strcpy_s(pEmp->name, ARRAYSIZE(pEmp->name), name);
        pEmp->gender = gender;
    }
    return pEmp;
}

void f(void) {
    Employee *pEmp = CreateEmployee("John Doe", Gender_Male);
    if (pEmp) {
        // Process and store employee
        // ...
        free(pEmp);
        pEmp = NULL;
    }
}
As long as you use the two functions in the same module, you’re fine. But at some point you might want to move the CreateEmployee function to a library. Somebody with a different compiler (or the same compiler but different settings) will use your library but recompile f(). Suddenly, you’ll have a problem, because the copy of free from the client code will not know how the calloc from the Employee library has allocated the memory.
The solution is to create and export a DeleteEmployee function together with the CreateEmployee function:
// Employee library
Employee * CreateEmployee(const char *name, Gender gender);
void DeleteEmployee(Employee *pEmp);

// Main program
void f(void) {
    Employee *pEmp = CreateEmployee("John Doe", Gender_Male);
    if (pEmp) {
        // Process and store employee
        // ...
        DeleteEmployee(pEmp);
        pEmp = NULL;
    }
}
Which brings me to the next point:

12. Remember that structures containing pointers need special treatment.

What I mean by this is that you cannot just free the top level object; you need to also free any object owned by it – but not the ones that are just referenced.
For example:
typedef struct Employee {
    Gender gender;
    char *name;
    char *surname;
    Date birthday;
    Employee *pManager;
    size_t subordinatesCount;
    Employee *pSubordinates;
} Employee;

Employee * CreateEmployee(const char *name, const char *surname, Gender gender) {
    Employee *pEmp = calloc(1, sizeof(Employee));
    if (pEmp) {
        strcpy_s(pEmp->name, ARRAYSIZE(pEmp->name), name);
        pEmp->gender = gender;
    }
    return pEmp;
}

void f(void) {
    Employee *pEmp = CreateEmployee("John", "Doe", Gender_Male);
    if (pEmp) {
        // Process and store employee
        // ...
        free(pEmp->name);
        free(pEmp->surname);
        // Don't free manager and subordinates, those are just references, not owned by this object
        free(pEmp);
        pEmp = NULL;
    }
}
As Employee grows more complicated, it becomes more and more convenient to have a DeleteEmployee function that takes care of all these details. In other words, even in plains C it is worthwhile to think object-oriented.

13. Be sure to document who owns the memory block.

Consider the following function declaration:
const char * GetLastErrorText(void);
Should the caller free the text buffer? It is a pointer to a static buffer? Maybe the library allocates memory for the message and keeps it until the next call?
There is only one answer: see the documentation.

No comments:

Post a Comment