Code Capsules

A Better C

Chuck Allison


Chuck Allison is a regular columnist with CUJ and a Senior Software Engineer in the Information and Communication Systems Department of the Church of Jesus Christ of Latter Day Saints in Salt Lake City. He has a B.S. and M.S. in mathematics, has been programming since 1975, and has been teaching and developing in C since 1984. His current interest is object-oriented technology and education. He is a member of X3J16, the ANSI C++ Standards Committee. Chuck can be reached on the Internet at 72640.1507@compuserve.com.

Bjarne Stroustrup, the inventor of C++, has suggested three ways to look at C++:

1) as a better C

2) as a language that supports data abstraction, and

3) as an object-oriented programming language

Because C++ truly is all of these things, it is somewhat more complex than most popular programming languages, especially the other object-oriented ones. The proponents of SmallTalk, Eiffel and CLOS (object-oriented Common LISP) engage in much religious debate about purity, simplicity, and the "true" definition of the term "object-oriented." But C++ is in fact a multi-paradigm language — it supports the traditional procedural style of programming, like C and Pascal do; like Ada it supports data abstraction and generics (templates); and it supports inheritance and polymorphism, like all the other object-oriented languages. All this may make for somewhat of an impure programming language, but it also makes C++ the more practical choice for production programming. C++ unquestionably gives the best performance, functions well in mixed-language environments (not just with C, but with FORTRAN, COBOL, and other languages, as well), and does not require the enormous resources that Smalltalk, Eiffel and LISP do (the latter being environments, not just compile-and-link processes).

In this article I explore the non-object-oriented features of C++ that make it "a Better C." Remember, however, that C++ came about by adding classes to C — in other words, by adding data abstraction support to C. It is difficult to motivate some of the Better-C features without showing their class-based origins. I therefore use the class mechanism of C++ liberally.

The Type System

Perhaps the most important thing to understand about C++ is its devotion to type safety. The OO languages mentioned above are very weakly typed, if at all, because they perform error-checking mainly during execution. C++, on the other hand, requires you to declare the type of every variable, and the types of the parameters and the return types of every function. It fastidiously checks your usage of these objects at compile time. It is type safety, more than anything else, that makes C++ a Better C, and the most reasonable choice for most programming tasks.

Function Prototypes

ANSI-C-style function prototypes are not optional in C++. In fact, the prototype mechanism was invented for C++ before the ANSI C committee adopted it. You must either declare or fully define each function before its first use. The compiler will check each function invocation for the correct number and type of arguments. In addition, it will perform automatic conversions where they apply. The program in Listing 1 shows a common error that occurs in C when you don't use prototypes. The function dprint expects a double argument. Without knowing dprint's prototype, the compiler doesn't know that the call dprint(123) is an error. When you provide the prototype for dprint, the compiler automatically converts 123 to a double for you (see Listing 2) .

Since C++ allows types that you define to behave like built-in types, it allows implicit conversions for user-defined types as well. Since the constructor for struct A in Listing 3 expects a double argument, the compiler automatically converts the integer 1 to a double in the definition for a. The call f(2) generates the equivalent of the following actions:

1. convert 2 to a double

2. initialize a temporary A object with the value 2.0

3. pass that object to f

In other words, the compiler generates code equivalent to:

f(A(double(2)) );
Note C++'s function-style cast syntax. The expression

double(2)
is equivalent to

(double) 2
Only one implicit user-defined conversion is allowed in any expression, however. The program in Listing 4 requires you to provide a B object to initialize an A object. A B object in turn requires a double, because its only constructor is B::B(double). The expression

A a(1)
becomes:

A a(B(double(1)))
which has only one user-defined conversion. The expression f(3), however, is invalid, because it would require the compiler to provide two automatic user-defined conversions:

// Can't do both an A and a B
// conversion implicitly
f(A(B(double(3)))   // invalid
The expression f(B(3)) is okay, because you explicitly requested the conversion B(double(3)), so the compiler provides only the remaining conversion to A.

Type-Safe Linkage

C++ can even detect improper function calls across compilation units. The program in Listing 5 calls a function in Listing 6. When compiled as a C program, the situation is analogous to Listing 1, giving the output:

f:0.000000
C has no way of knowing that the f's are different. The conventional work-around is to put the correct prototype in a header file that all compilation units include. In C++, a function call will only link with a function that has the same signature, which is the combination of the function name and its sequence of argument types. When compiled as a C++ program, the output of Listing 5 and Listing 6 from Borland C++ is

Error: Undefined symbol f(int) in module safe1.cpp
Most compilers achieve this type-safe linkage by encoding the function's signature along with its name, a technique often referred to as function name encoding, name decorating, or my favorite, name mangling. For example, the function f(int) might appear to the linker as

f__Fi  // f is a function
      // taking an int
but f(double) would be

f__Fd  // f is a function
      // taking a double
Since the names are different, the linker can't find f(int) in this example, and reports an error.

References

Since C passes function parameters by value, passing large structures to functions can waste a lot of time and stack space. The typical work-around is to pass a pointer. For example, if struct Foo is a large record structure, you can do something like the following:

void f(struct Foo *fp)
{
    /* Access the structure
      through fp */
    fp->x=...
    etc.
}
You have to pass the address of a struct Foo in order to use this function, of course:

struct Foo a;
...
f(&a);
The C++ reference mechanism is a notational convenience that saves you the bother of providing explicit indirection of pointer variables. In C++, you can render the above code as:

void f(Foo &fr)
{
    /* Access members directly */
    fr.x = ...
    etc.
}
You can now call f without using the address-of operator, like this:

Foo a;
...
f(a);
The ampersand in the prototype for f instructs the compiler to pass its argument by reference, which in effect takes care of all the indirection for you. For you Pascal programmers, reference parameters are equivalent to VAR parameters.

Call-by-reference means that any changes you make to a function parameter affect the original argument in the calling program. Thus, you can write a swap function (not a macro) that actually works (see Listing 7) . If you don't plan on modifying a reference argument, declare it a reference-to-const, like I did in Listing 4. A reference-to-const argument has the safety and notational convenience of call-by-value, and the efficiency of call-by-reference.

As Listing 8 illustrates, you can also return an object from a function by reference. It may look strange to have a function call on the left-hand side of an assignment, but this comes in handy when overloading operators (especially operator= and operator[] when used as member functions).

Type-Safe I/O

I'm sure every C programmer has been bitten by using incorrect format descriptors in printf. printf has no way to check if the data items you pass it match your format string. How often have you done something like the following, only to discover the problem at run time:

double d;
...
printf("%d\n",d); /* should've used
                 %f */
The C++ IOStreams library, on the other hand, uses the object's type to determine the proper formatting:

double d;
...
cout << d << endl;  // can't fail
There is no way for the output stream to misinterpret your value, If you want to print floating-point numbers with a fixed precision, you can say so just once:

double x = 1.5, y = 2.5;
cout.precision(2);            // Show 2 decimals
cout.setf(ios::showpoint);    // Preserve trailing 0's
cout << x << endl;            // prints 1.50
cout << y << endl;            // prints 2.50
The token endl is a manipulator, a special object you insert into a stream to create a side effect — in this case, to start a new line and flush the output buffer. For more information on IOstreams, see the Code Capsule in the July 1993 issue of CUJ.

Function Overloading and Templates

The swap function in Listing 7 is useful only if you want to swap integers. What if you want to swap two objects of any built-in type? C++ allows you to define multiple functions of the same name, as long as their signatures are different. Therefore you can define a swap for all built-in types:

void swap(char &, char &);
void swap(int &, int &);
void swap(long &, long &);
void swap(float &, float &);
void swap(double &, double &);
etc.
You can then call swap for any two objects of the same built-in type. If you were to implement each of these functions, however, before long you would discover that you were doing the same thing over and over — the only thing that changes is the type of the objects you want to swap. To save tedium and the chance of making a silly mistake, you can define a function template instead. As Listing 9 demonstrates, you preface the function with the phrase

template<class T>
which says that in the code that follows, the token T stands for an arbitrary data type, either built-in or user-defined. You then replace all occurrences of the data type of the objects to be swapped with the template parameter T. When the compiler sees a call to swap, it instantiates the appropriate function, inferring the type from the type of the operands. In other words, when the compiler sees

swap(i ,j);
it actually generates the code for swap(int &, int &), as if you had created it yourself (but without human error).

Operator Overloading

You can also overload operators in C++. For example, suppose you define a complex number data type as:

struct complex
{
   double real, imag;
};
It would be quite convenient if you could use infix notation for adding complex numbers, such as:

complex c1, c2;
...
complex c3 = c1 + c2;
Operator overloading enables you to do just this. When the compiler encounters an expression such as c1 + c2, it looks for one of the following two functions:

operator+(const complex &, const complex &);
complex::operator+(const complex &);
The operator keyword is part of the function name. You could define a global operator+ for adding two complex numbers like this:

complex operator+(const complex &c1, const complex &c2)
{
   complex r;
   r.real = c1.real + c2.real;
   r.imag = c1.imag + c2.imag;
   return r;
}
The compiler will not allow you to overload built-in operations, such as addition of two ints, therefore at least one of the operands must be of a user-defined type.

The IOStreams library uses operator overloading to determine how to format the various built-in types. For example, the ostream class, of which cout is an instance, overloads operator<< for all the built-in types. When the compiler sees the expression

cout << i;        // where i is an int
it generates the following function invocation

cout.operator<<(i);
which formats the number correctly.

Listing 10 shows how to extend IOStreams by overloading operator<< for complex numbers (sample output in Listing 11) . The compiler transforms the expression

cout << c    // c is a complex number
into the function call

operator<<(cout, c)
which invokes operator<<(ostream&, const complex&). This function in turn breaks the operation down into formatting objects of built-in types. This function also returns the stream so that you can chain multiple stream insertions in a single statement. For example, the expression

cout << c1 << c2
becomes

operator<<(operator<<(cout,c1), c2)
which requires that operator<<(ostream&, const complex&) return the stream. The operator returns the stream by reference for efficiency.

Inline Functions

The inline keyword, seen in Listing 10, tells the compiler that you want the code "inlined" for efficiency. Each call to an inline function inserts the appropriate code inline, avoiding the usual overhead of an actual function call. This mechanism is different from a function-like macro, which performs text substitution before program translation. Inline functions have all the type checking and semantics of true functions, without the sensitivity to side effects that macros have. For example, you might define a macro to find the smaller of two numbers as follows:

#define min(x,y) ((x) < (y) ? (x) : (y))
which fails miserably with an incremented argument, such as

min(x++,y++)
Inline functions don't have this problem, since they behave like real functions.

Not all functions can or should be inlined, however. Certainly a recursive function doesn't qualify for inlining. Large functions can increase code size substantially when inlined. Inlining is mainly for small, simple functions.

Default Arguments

Default arguments allow a function to infer values from its prototype. The program in Listing 12 has a function with the prototype:

int minutes(int hrs, int min = 0);
The "= 0" after the last parameter instructs the compiler to supply the value 0 for the second argument when you omit it in a program. This mechanism is essentially a shorthand for defining related overloaded functions. In this case, it is a short cut for the following:

int minutes(int hrs, int min);
int minutes(int hrs);     // ignores minutes
The complex constructor in Listing 10 uses default arguments to allow you to define a complex number with 0, 1, or 2 arguments, as in

complex c1;        // (0,0)
complex c2(1);     // (1,0)
complex c3(2,3)    // (2,3)
The third form is the one used in the return statement of operator+ in Listing 10.

new and delete

To use the heap in C, you need to compute the size of the object you want to create:

struct Foo *fp = malloc(sizeof(struct Foo));
In C++,the new operator computes the size of an object for you:

Foo *fp = new Foo;
To allocate an array in C, you call a different function:

struct Foo *fpa = calloc(n,sizeof(struct Foo));
In C++, new knows about arrays:

Foo *fpa = new Foo[n];
In addition, the new operator automatically invokes the appropriate constructor to initialize the object(s) before it returns you the pointer. For example, creating complex numbers on the heap automatically initializes them, as in

complex *cp1 = new complex;        // -> (0,0)
complex *cp2 = new complex(1);     // -> (1,0)
complex *cp3 = new complex(2,3);   // -> (2,3)
To return dynamic memory to the heap, you use the delete operator, which comes in two forms. For singleton objects you use this one:

delete fp;
delete cp1;
but deleting arrays requires the second form, with the following syntax:

delete [] fpa; // array-delete syntax
Like other C++ features, new and delete improve the type safety of your programs — you aren't just asking for an amount of memory, you are requesting objects, with the appropriate type-checking and initialization.

Declaration Statements

In C++, a declaration can appear anywhere a statement can. Instead of having to group declarations at the beginning of a block, you can declare objects at their point of first use. For example, in Listing 13 the array a is visible throughout the function body, but n is not valid until its declaration, and i not until the next line. With current compilers all three objects persist until the end of the block, which is why i is not redeclared in the second for-loop. The C++ standard now states, however, that the scope of variables declared in a control loop is the loop itself. In the future, therefore, you will have to redeclare i in the second loop, as in

for (int i = n-1; i >= 0; --i)

Static Initialization

In C, variables of static storage duration (those declared at file scope or with the static keyword) are initialized at program startup and remain active throughout program execution. Since such objects can only use constant expressions as initializers, this isn't a challenge for compiler writers to implement. Because objects in C++ usually require constructors, however, static objects must be able to call functions in their initialization. For this reason, C++ allows statements at file scope such as

double x = sqrt(5);

C Compatibility

To accommodate strong type-checking and object-orientation, C++ has had to part ways with C on a few language issues. If you are going to use C++ as a better C, you should be aware of those features that behave differently in the two languages.

First of all, C++ has more keywords than C. You must avoid using any of the tokens in Table 1 as identifiers in your programs.

You can use const integer objects and enumerated constants in array declarations in C++, as in

const int SIZE = 100;
enum {BIGGER = 1000};
int a[SIZE], b[BIGGER];
Global const declarations have internal linkage by default, whereas in C they have external linkage. As a result, you can use const definitions at file scope in C++, in place of #define macros in header files. If you want a const object to have external linkage, you must use the extern keyword.

In C, you can assign a pointer to any type to and from a void *. This allows you to use malloc without a cast, as in

#include <stdlib.h>
...
char *p = malloc(strlen*s) + 1);
The C++ type system will not allow you to assign from a void pointer without a cast. For the example above, you should use the new operator anyway.

If you omit arguments in a function definition in C, the compiler does not check how you use that function (i.e., you can pass any number and type of arguments to it). In C++, the prototype f() is equivalent to f(void). If you insist upon the unsafe C behavior, use f(...).

And finally, single-quoted character constants are of type char in C++, not int. Otherwise, the expression

cout << 'a'
would print the internal character code (e.g., 97 for ASCII) instead of the letter a.

Summary

You might think that the key distinction between C and C++ is support for object-oriented programming, but even more crucial is the issue of type safety. Not all programs are object-oriented programs. Even if you don't use classes and other object-oriented mechanisms, using C++ as a type-safe C will help make you a safer programmer.