C++ References


By Alex Allain
C++ references allow you to create a second name for the a variable that you can use to read or modify the original data stored in that variable. While this may not sound appealing at first, what this means is that when you declare a reference and assign it a variable, it will allow you to treat the reference exactly as though it were the original variable for the purpose of accessing and modifying the value of the original variable--even if the second name (the reference) is located within a different scope. This means, for instance, that if you make your function arguments references, and you will effectively have a way to change the original data passed into the function. This is quite different from how C++ normally works, where you have arguments to a function copied into new variables. It also allows you to dramatically reduce the amount of copying that takes place behind the scenes, both with functions and in other areas of C++, like catch clauses.


Basic Syntax

Declaring a variable as a reference rather than a normal variable simply entails appending an ampersand to the type name, such as this "reference to an int"
int& foo = ....;
Did you notice the "...."? (Probably, right? After all, it's 25% of the example.) When a reference is created, you must tell it which variable it will become an alias for. After you create the reference, whenever you use the variable, you can just treat it as though it were a regular integer variable. But when you create it, you must initialize it with another variable, whose address it will keep around behind the scenes to allow you to use it to modify that variable.

In a way, this is similar to having a pointer that always points to the same thing. One key difference is that references do not require dereferencing in the same way that pointers do; you just treat them as normal variables. A second difference is that when you create a reference to a variable, you need not do anything special to get the memory address. The compiler figures this out for you:
int x;
int& foo = x;

// foo is now a reference to x so this sets x to 56
foo = 56;
std::cout << x <<std::endl;

Functions taking References Parameters

Here's a simple example of setting up a function to take an argument "by reference", implementing the swap function:
void swap (int& first, int& second)
{
        int temp = first;
        first = second;
        second = temp;
}
Both arguments are passed "by reference"--the caller of the function need not even be aware of it:
int a = 2;
int b = 3;
swap( a, b );
After the swap, a will be 3 and b will be 2. The fact that references require no extra work can lead to confusion at times when variables magically change after being passed into a function. Bjarne Stroustrup suggests that for arguments that the function is expected to change, using a pointer instead of a reference helps make this clear--pointers require that the caller explicitly pass in the memory address.

Efficiency Gains

You might wonder why you would ever want to use references other than to change the value--well, the answer is that passing by reference means that the variable need not be copied, yet it can still be passed into a function without doing anything special. This gives you the most bang for your buck when working with classes. If you want to pass a class into a function, it almost always makes sense for the function to take the class "by reference"--but generally, you want to use a const reference.

This might look something like this:
int workWithClass( const MyClass& a_class_object )
{
}
The great thing about using const references is that you can be sure that the variable isn't modified, so you can immediately change all of your functions that take large objects--no need to make a copy anymore. And even you were conscientious and used pointers to pass around large objects, using references in the future can still make your code that much cleaner.

References and Safety

You're probably noticing a similarity to pointers here--and that's true, references are often implemented by the compiler writers as pointers. A major difference is that references are "safer". In general, references should always be valid because you must always initialize a reference. This means that barring some bizarre circumstances (see below), you can be certain that using a reference is just like using a plain old non-reference variable. You don't need to check to make sure that a reference isn't pointing to NULL, and you won't get bitten by an uninitialized reference that you forgot to allocate memory for.

References and Safety: the Exceptions

For the sake of full disclosure, it is possible to have an invalid references in one minor and one major case.

First, if you explicitly assign a reference to a dereferenced NULL pointer, your reference will be invalid:
int *x = 0;
int& y = *x;
Now when you try to use the reference, you'll get a segmentation fault since you're trying to access invalid memory (well, on most systems anyhow).

By the way, this actually does work: since you're not actually accessing the value stored in *x when you make the reference to it, this will compile just fine.

A more pressing issue is that it is possible to "invalidate" a reference in the sense that it is possible that a reference to a block of memory can live on past the time when the memory is valid. The most immediate example is that you shouldn't return a reference to local memory:
int& getLocalVariable()
{
        int x;
        return x;
}
Once the stack frame containing the memory for getLocalVariable is taken off the stack, then the reference returned by this function will no longer be valid. Oops.

References and Dynamically Allocated Memory

Finally, beware of references to dynamically allocated memory. One problem is that when you use references, it's not clear that the memory backing the reference needs to be deallocated--it usually doesn't, after all. This can be fine when you're passing data into a function since the function would generally not be responsible for deallocating the memory anyway.

On the other hand, if you return a reference to dynamically allocated memory, then you're asking for trouble since it won't be clear that there is something that needs to be cleaned up by the function caller.