Writing Remote Procedural Calls (RPCs) in C

by Jeff Bezanson

Getting to Know RPCs

Perhaps you are bored of writing programs that run on only one computer at a time—how limiting! If so, you'll be glad to hear that there's a tool designed to get your programs on the network in a hurry: remote procedure calls (RPCs). The idea of RPCs is to enable network programming without requiring you to learn new interfaces like BSD sockets. Instead, network communication is done through an interface all programmers are already quite familiar with: the humble function call. The semantics are simple; you make a function call and control transfers to some other code as usual, except the code happens to be running on a different machine.

Hopefully this strikes you as a neat idea, and I'm not sure why you don't hear about RPCs more often. My guess is that programmers don't like to think about the new failure modes that are introduced. All of a sudden, what looks like a simple function call can fail due to any of the myriad issues that plague networks: broken links, long ping times, low bandwidth, malicious users, and so on. Our systems professors try to get it through our skulls that, no matter what, every operation can fail, but shamefully we acknowledge this only about as far as checking for -1 after a getline or recv. If a call is going to fail, the thinking goes, it had better look like one of these. That's my theory anyway. But if you're comfortable checking errors outside your I/O subsystem, read on and reap the benefits of networking without system calls or the obscure data structures they tend to require.

Several different RPC implementations exist; I'll be using the one provided with glibc on Linux. There are also several ways to use RPCs, and I won't be able to go into all the details here. Hopefully I'll be able to get you up and running, and if you want more, O'Reilly has a pretty good book on the subject called Power Programming with RPC. I'm also planning a follow-up article showing how to deal with some of the shortcomings of RPCs by using them asynchronously, so stay tuned.

Let's begin. RPCs could be used for any network application in theory, but are particularly attractive for distributed computation—when you've got a huge amount of scientific data to crunch and systems programmers are in short supply. Like most network applications, a program using RPC can be understood with the client/server model. The client is a program making a call, and the "RPC server" is the receiver that actually performs the function. In other words, an RPC server provides the service of executing some code for you. Simple enough, but before we can actually make an RPC, we have to wonder: how do you pass arguments over a network? Sending a primitive type like an int should be easy enough, but you also might want to pass a pointer to a data structure in memory. Sending the pointer over the network won't do anybody any good: it's an address in your memory after all, which will not be valid on some other machine.

RPC solves this problem for you, using a mechanism called XDR. XDR is able to take your data structure, serialize it behind the scenes, send it over the network, and reassemble it properly on the other end. This service is one of the major benefits of using RPC; not only do you not have to program sockets, you don't have to write tedious and error-prone functions to serialize your linked structures either. In fact, XDR works by reading a description of your data structures and generating such code for you. The program that does this is called an RPC compiler, and its language for describing data structures is similar to C. You describe a procedure call interface along with the data specifications, and the RPC compiler generates client stubs (which you call on the client side to issue an RPC) and server stubs (which will be called on the server side when a request comes in).

Skip this paragraph

Big tangent: XDR is cool because it almost solves an interesting and important open problem in programming practice, which is to create a framework for dynamically sharing structured data, which would, among other things, implement a standard interface for serializing such data. XDR lets you describe data, but unfortunately, this metadata does not persist at run time; you're still stuck with all your code having to assume particular data structures at some point. Technologies like XML, gconf, the GIMP's procedural database, plugins, extension languages, and SQL all sort of chip away at the issue from different angles, but none of them decisively nail it.

.NET and Mono pretty much solve the problem, I admit, but they go a little far, extending interoperability to code as well as data. Every language compiler has to be rewritten (instead of just porting a lightweight API), plus you get this big (and let's face it—evil) thing running under you. All the tools are still ad-hoc; you get the Glade UI builder, which is great for putting together standard GTK+ components, but where are the real creative tools? In other words, I won't be convinced until I see something that's hard to use.

I have my ideas about how it should be done—if you're interested, email me and let's set up a CVS repository :)

Using the RPC Compiler

Since you already know what a function call looks like, I will begin from the perspective of data instead of code. The development process will start with specifying data structures using XDR, then specifying an interface. Then we'll write a server, then the client (which, by the way, could very well be the same program). Our server will perform the highly contrived function of printing out some data stored as a linked list, in order to demonstrate passing somewhat complicated data to an RPC.

Here is the input to the RPC compiler, a file called "llist.x" (.x is the conventional extension for these files):

enum color {ORANGE, PUCE, TURQUOISE};

struct list {
	string data<>;
	int key;
	color col;
	list *next;
};

program PRINTER {
	version PRINTER_V1 {
		int PRINT_LIST(list) = 1;
		int SUM_LIST(list) = 2;
	} = 1;
} = 0x2fffffff;

As you can see, enums and structs can be declared using a syntax very similar to C. One obvious variation is the use of angle brackets ("<>") to denote a variable-length element. You can declare variable-length arrays of any type, which the RPC compiler will interpret as a pointer together with an integer length specifier. Strings are null-terminated, though, so no length field is generated in this case. Next we assign an arbitrary ID (0x2fffffff) and version number (1) to our program, and indicate that our program has two RPCs, each with its own unique ID (actually, there is a third RPC, since the RPC compiler automatically generates a "null" RPC with ID #0 for you, to be used for testing). The program ID provides a way to address your application specifically on a system, hiding the details of assigning IP ports. In short, you will be able to specify a server connection simply using a hostname and a program ID. RPC also lets you assign a version number, to prevent clients from attempting to communicate with a different, and possibly incompatible, version of the server they talk to.

Invoke the RPC compiler using "rpcgen llist.x". rpcgen spits a slew of files back out at you:

llist.h: Actual declarations in C, based on your .x file.

llist_svc.c: Implements main() for the server. Handles listening for incoming RPCs and dispatching control appropriately.

llist_xdr.c: Code to convert your data structures for network transmission.

llist_clnt.c: Functions the client can call to issue RPCs.

You should definitely read llist.h, as it contains definitions that every file in your project will need to access. Understanding the other files is less important, but it should make sense to you that llist_xdr.c will be linked with both the client and server, llist_svc.c will be linked with the server, and llist_clnt.c will be linked with the client. The client and server have a certain duality: for the client, rpcgen writes functions for making RPCs and you have to supply main(), whereas for the server, rpcgen writes main() for you and you have to write functions to service incoming RPCs. Therefore we'll be creating two more .c files: llist_svc_proc.c (implementing RPCs on the server side) and llist.c (the client).

Looking at the bottom of llist.h, we find function prototypes:

#define PRINT_LIST 1
extern  int * print_list_1(list *, CLIENT *);
extern  int * print_list_1_svc(list *, struct svc_req *);
#define SUM_LIST 2
extern  int * sum_list_1(list *, CLIENT *);
extern  int * sum_list_1_svc(list *, struct svc_req *);

Names ending in "_svc" are prototypes for the server functions we need to fill in. The other function in each pair is the client stub, already written for us. "CLIENT" is a data type specifying a connection to an RPC server, and "svc_req" gives a server function some information about an incoming request.

Next: Writing the Server Code