Writing Secure Code

By Alex Allain
Writing secure code is a big deal. There are a lot of viruses in the world, and a lot of them rely on exploits in poorly coded programs. Sometimes the solution is just to use a safer language -- Java, for instance -- that typically runs code in a protected environment (for instance, the Java Virtual Machine). But this isn't always the best option -- if you need top performance, for example, or if you're working on legacy code written in C or C++. And you need to be aware of the issues involved in writing unexploitable code.

Two common attacks are buffer overflows and the double free attack. Since I'm not out to write a how-to on cracking security, I won't cover the details of exploiting these attacks any more than you need to know to avoid them. Instead, I'll talk about practices you can use to prevent them.

Buffer Overflows - Smashing the Stack

A buffer overflow occurs when you allow the user to enter more data than your program was expecting, thereby allowing arbitrary modifications to memory. Due to the way the stack is set up, an attacker can write arbitrary code into memory. This is how the Morris Worm worked, and it's how thousands of exploits since have worked.

When functions are called, both the memory to store the variables declared in the function and the memory to store the arguments to the function are pushed onto the stack as part of a "stack frame".

Here is a rough picture of what the stack frame would look like for a function call:
[memory for variables in the function][FP][ret][function arguments]
First, memory is allocated for variables declared in the function. Then the frame pointer, FP, which is used to address variables within the stack frame, then the address to return to after the function call, ret, followed by the arguments to the function.

The gist of this attack is that on the stack, for every function call, the ret pointer indicates where in memory to jump once the function has finished executing. In normal execution, this should be back to the calling function. However, in some cases, if the program allows overflowing the buffer stored in the memory allocated for the function, it's possible for an attacker to set this value to point to an arbitrary region of memory into which the attacker has written executable code. (Often, this will actually be the buffer itself.)

How can a buffer overflow attack happen?

When you declare arrays in C or C++, you get a specified amount of memory to work with. This memory, on many systems, is placed before the pointer to the return site (where the function should return to after executing).

For instance, you might declare an array of 100 bytes:
char memory[100];
This is all well and good. But what if you really wanted to use 200 bytes?

C will let you:

memory[150] = 'a';
There aren't bounds checks on the array, and the code might even work. (At least in some cases, you'll get a segmentation fault, but this will depend on whether or not the memory you're accessing belongs to your program or not. You might just overwrite other data in your portion of the stack.)

But you know not to just play with memory you haven't asked for, so you probably won't do things quite like that. Generally, what happens is that a function you call will overwrite the memory instead.

You might use a function such as gets() or strcpy that isn't aware of how much memory you asked for, and consequently, how large an array you have space for. This is particularly a problem for standard library functions that work on NULL-terminated strings such as strcpy, strcat, etc. Since they rely on finding a terminating character, if the string being worked with is too long, they'll happily overwrite the end of the buffer, and if you had declared the memory as an array, you might end up overwriting data on the stack. This is what is referred to as "smashing the stack".

Other dangerous functions include scanf and sprintf for similar reasons as gets.

What should you use instead? In place of gets() or using scanf to read in a string, use fgets()
char *fgets(char *buffer, int size, FILE *stream);
fgets takes a size -- make this the size of your buffer and it will read in up to size bytes into buffer from the file pointed to by stream. So, if you want to read from standard input (stdin) in order to replace gets:
char buffer[10];
fgets(buffer, sizeof(buffer), stdin);
This will allow the user to input up to 10 bytes for use in buffer. It will, however, add a '\0' terminator to buffer. Doing so would, of course, write off the end of the array. This means that you have to add the NULL terminator manually if you want to use functions such as printf that rely on it.

Note that sizeof(buffer) works because buffer is an array; you cannot do the same thing using a char* that you dynamically allocate using malloc or new.

In place of strcpy or strcat, use the corresponding strncpy or strncat functions that take the length of data to be copied.
char *strncpy(char *destination, const char *source, size_t n);
Using strncpy will result in at most n bytes being copied from source to destination, and strncpy will return a pointer to destination. For instance, if you know that destination can hold only 20 characters:
char destination[20];
char *source = "This is a particularly long string";
strncpy(destination, source, sizeof(destination));
This will copy only as much of source as can fit in destination and return a pointer to destination. You should be aware that strncpy will not automatically append a null terminator, which means that you can go from a regular, null-terminated string to a non-null-terminated string if you try to copy a string that won't fit into the destination.
char *strncat(char *destination, const char *source, size_t n);
The strncat function copies up to n characters from source onto the end of destination, starting from the null-terminator character of destination. Again, if you run out of space in destination before reaching the end of source, you won't get a null-terminated string back. And, to replace sprintf, you can use snprintf
int snprintf(char *destination, size_t n, const char *format, ...);
which will only write a string of n bytes into the memory pointed to by destination, protecting you from an attacker's writing arbitrary data into your string when you do something like
sprintf(dest, "The user entered: %s", user_input_string);
which allows the result stored in dest to be as long as is necessary to store the user_input_string.

Although we've only looked at examples where the size of memory was allocated at compile time, and consequently ended up on the stack, similar problems apply to memory allocated during program execution. Although I don't know of a way an attacker would be able to change the flow of control by modifying memory allocated on the heap, simply by changing variables an attacker could cause problems (imagine having a username field that your program uses for access control, and that an attacker can find a way to change that memory).

Double Free Attack

Another, more sophisticated attack, is the double free attack that affects some implementations of malloc. The attack can happen when you call free on a pointer that has already been freed before you have re-initialized the pointer with a new memory address:
/* code */
This shouldn't ever need to happen in your code. The easiest way to avoid it is simply to set your pointer to point to NULL once you've freed it:
free(x); x = NULL;
/* code */
Since free(NULL) is a valid operation that doesn't do anything, you're safe. Of course, this doesn't protect you from code that frees memory without telling you (or without making it obvious that memory is getting freed). One way of detecting these types of problems is to use a memory error detector such Valgrind to ensure that you only execute valid operations. (Valgrind will also detect use of initialized memory, use of invalid memory such as is the case with buffer overflows, and memory leaks.) Since the double free problem is one that should show up more or less regardless of user input -- you don't need a malicious attempt to overflow the buffer to test for double frees -- Valgrind is a good tool for finding these bugs. Of course, this isn't always the case. For instance, a double free vulnerability in the zlib library (CERT Advisory) required a certain type of user input to even cause free to be called twice on the same memory.

This bug is harder to exploit than potential buffer overflows, and it also relies on a particular implementation of the memory allocation system. Nevertheless, it's important to ensure that your code correctly frees only valid blocks of memory.
Related articles

Learn more about pointer errors

Hunting memory leaks with Valgrind