Code Capsules

The Standard C++ Library

Chuck Allison


Chuck Allison is a regular columnist with CUJ and a Senior Software Engineer in the Information and Communication Systems Department of the Church of Jesus Christ of Latter Day Saints in Salt Lake City. He has a B.S. and M.S. in mathematics, has been programming since 1975, and has been teaching and developing in C since 1984. His current interest is object-oriented technology and education. He is a member of X3J16, the ANSI C++ Standards Committee. Chuck can be reached on the Internet at 72640.1507@compuserve.com.

The C and C++ standard libraries have both evolved from efforts to make new functionality available to programmers without the addition of new language features. It is much easier to use a language's existing facilities to create a standard library than it is to invent a new language construct with its associated syntax rules and potential interactions with the rest of the language. The widespread success of the Standard C library shows how far you can go with just the concepts of functions and separate compilation. It's no surprise then that C++ features such as classes, templates, namespaces, exceptions, and run-time type identification make for powerflibraries. The Standard C++ library, and the commercial libraries that use it, provide a framework for tackling very complex programming tasks. We have a long way to go before we need a more powerful language. In this survey article I'll share how the Standard C++ library came to be, who the major players were, and then I'll illustrate some of its major components.

History

The very first C++ library was Bjarne Stroustrup's task library, a collection of classes that support coroutines. Coroutines provide a technique for implementing synchronous, concurrent functions. Note the present tense in "support"; the task library is still in use today, and comes with AT&T's release of C++. Release 1.0 of C++ in 1985 included only the task library along with a complex number class and the streams library, which Bjarne also wrote.

Jonathan Shopiro developed the first string and list classes, which eventually led to the development of the container classes now found in the AT&T Standard C++ Components, a library of more than 50 classes. By the time Release 2.0 came out in 1989, the Components were standard AT&T issue. The major alteration to the library occurred in the stream classes, which came to be known as iostreams in Release 2.0. Jerry Schwarz of AT&T redesigned streams to make use of multiple inheritance and virtual base classes, which were key 2.0 innovations.

When ANSI Committee X3J16 convened in 1989, the plan for the Standard C++ library was to at least standardize the stream classes. Under the leadership of Mike Vilot, the Library Working Group emerged with more specific direction at the close of the third meeting in July 1990: 1) to standardize the stream classes, 2) to define mechanisms to support language features such as the new and delete operators and exceptions, 3) to define the relationship between the Standard C library and the C++ library, and 4) to define a standard string class. By the end of 1990, Jerry had initiated a proposal to standardize iostreams, Steve Clamage of TauMetric was defining the relationship between the C and C++ libraries, and Aron Insinga of Digital had made a good start on a standard string class.

Shortly after I joined the committee early in 1991, I accepted the assignment to review a number of commercial class libraries, just to see what "existing practice" was. All vendors offered at least one string class and a number of container classes, such as lists, vectors, queues, stacks, maps, and sets. Many vendors also supported bitsets. By the end of 1991 the C library had become an integral part of the C++ library (modulo a few tunings in the direction of type-safety), the language support functions and iostreams were still under construction, and I was working on classes to support bitsets. Uwe Steinmueller of Siemens-Nixdorf and Pete Becker of Borland took over the string class, since Aron was no longer able to attend meetings of the committee. Uwe also volunteered to author a dynamic array specification.

At the London meeting in March of 1992, the Library WG got the grandiose idea of having a set of related classes to support strings over a spectrum of fundamental data types: bits, bytes, characters, and wide characters. The latter two were to include very high-level operations for international text processing. This "text class" work turned out to be a little too ambitious for the committee and was abandoned in July 1993. The major library milestone for 1992 was the acceptance of the language support functions for exception handling and memory management, which was mainly the work of Mike Vilot.

The library finally started taking shape in 1993. The string and bit handling classes and Jerry's revision of iostreams were approved in March, the dynamic array classes and Beman Dawes' wide character version of the string class in July, and some complex number classes designed by Al Vermulen of Rogue Wave in November. We also accepted some useful new iostream manipulators proposed by Bruce Eckel.

Some major new work also began in 1993. For some time a number of committee members had been requesting a language change to allow for "smart arrays" — arrays with intelligent automatic memory management for objects with value semantics. When it became clear that the committee as a whole wasn't interested in such a language extension, Kent Budge of Sandia Labs and Al Vermulen came forth with a library solution. Their val_array class is a smart array class with support for most of the mathematical operations needed for scientific applications.

About the same time, Tom Keffer, president of Rogue Wave, developed an error handling model that would guide the design of exception classes used in the standard library. Standard exception classes derive from one of two base classes: logic_error, which describe errors that could have been prevented by the programmer, and runtime_error, which are unforeseen errors, such as memory exhaustion. (See Listing 1) .

The final innovation of 1993 was the Standard Template Library (STL), a proposal from Alex Stepanov and Meng Lee of Hewlett-Packard. STL is a collection of algorithms and data structures, including a useful set of container classes: vector, list, deque, set, multiset, map, and multimap. The algorithms are designed to work on both built-in and user-defined types uniformly. As STL is integrated into the standard library, it will likely displace dyn_array and bitstring, since it has an expandable vector template class, along with a specialization for a vector of boolean objects.

STL fills some gaps that have long plagued the draft Standard C++ library. Of C++ Release 1.0, Bjarne has said, "To my mind, there really is only one contender for the title of Worst Mistake. Release 1.0 ... should have been delayed until a larger library including some fundamental classes such as singly and doubly linked lists, an associative array class, a range-checked array class, and a simple string class could have been included. The absence of those led to everybody reinventing the wheel and to an unnecessary diversity in the most fundamental classes." [1] It looks like STL is the finishing touch that has made the C++ standard library the robust collection of foundation classes Bjarne wished for. (NOTE: Pete Becker will be giving a full-day tutorial on STL at "C/C++ Solutions '95," a seminar sponsored by R&D Publications, in Kansas City, Jan. 26-27, 1995.)

The July 1994 meeting in Waterloo, Ontario was the last chance to add anything to Standard C++. STL and val_array were accepted, and the string and complex number classes were "templatized." Prior to the July meeting, two nearly-identical string classes, string and wstring were included in the library. The only substantive difference between them was that string operated on sequences of type char while the latter dealt with type wchar_t. Both classes are now replaced by instantiations of the template class basic_string, which can be written roughly as:

    template<class T> class basic_string{...};
    typedef basic_string<char> string;
    typedef basic_string<wchar_t> wstring;
Templatizing the complex number classes is not so simple; it requires a change to the language. Instead there are three classes:

    class float_complex;
    class double_complex;
    class long_double_complex;
These classes are defined with mutual implicit conversions to accommodate mixed expressions. For example, in the expression:

    float_complex f;
    double_complex d;
    // ...
    f + d ...
f is converted to a double_complex by the constructor

   double_complex::double_complex(float_complex);
If you try to define a complex number class template, you can't include such conversions. To see why, consider the following class template:

template <class T> class complex
{
   T real, imag;

public:
   template<class T2> complex(const complex<T2>& c)
     : real(c.real), imag(c.imag)
   {}
   // ...
};
The constructor definition above is invalid in current C++ compilers, because the Annotated Reference Manual (ARM — the ANSI C++ base document) does not allow nesting of template declarations. If this definition were valid, it would allow conversion from one complex number type to another only if T2 converted to a T, which would be a reasonable conversion to make. This nested template mechanism is called a member template and became part of the language at the end of 1993. STL relies heavily on member templates. But the complex classes have not (yet) been replaced by a single template.

The first draft of the C++ standard, the "Committee Draft" (CD), was finished in September 1994, and is on its way to ISO. The remainder of this article summarizes some of the major components of the standard C++ library as found in the CD. Remember that most of these features are not yet available in any current C++ compiler. For more detailed information, see P. J. Plauger's new book, The Draft Standard C++ Library [2].

Using the Standard C Library in C++

The C library was part of C++ from the beginning. There are some "gotchas" when using the C library in a C++ environment, however. For example, the Standard C header <string. h> declares the following function:

   char *strchr(const char *s, int c);
This function returns a pointer to the first occurrence of c in the string s, but it returns a non-const pointer. That non-const pointer will let you do something weird like this:

    const char *s;
    int c;
    //...
    *strchr(s,c)= '!';    /* Probably an error */
You wouldn't have declared the string that s points to as const if you had wanted to change it. This example reveals a hole in C's type system. Since C++ encourages a "const-correct" style of programming, conforming C++ compilers will provide a modified version of <string. h> that contains the following two overloaded prototypes of strchr, in place of the original C prototype:

    const char * strchr(const char *, int);
         char * strchr(char *, int);
The appropriate version executes depending on the const-ness of the first argument, rendering the spurious assignment in the example a compile-time error. Conforming C++ compilers will apply this same treatment to the functions memchr, strpbrk, strrchr, and strstr.

Conditions for using the longjmp function defined in <setjmp.h> have changed slightly in C++. Calling longjmp is undefined if any automatic objects defined between the target and the jump point would be destroyed if an exception had been thrown instead. You should be using exceptions anyway. (See the Code Capsule "C++ Exceptions," CUJ, July 1994.)

You must also be careful if you use the offsetof macro, which calculates the relative position of members in a structure. Since C++ objects can have pointers to virtual function tables (and who knows what else), you can only use offsetof safely on plain old data structures, i.e., objects without member functions.

C++ reserves a few keywords that are mere macros in C. Although most C programmers are unaware of it, an addendum to ISO C approved this year (the "Normative Addendum") defines a number of features that promote international portability of C programs. These features include more multi byte and wide-character processing functions, and new language tokens. The header <iso646.h> defines macros which compensate for foreign keyboards (see Listing 2) . These macros make for more readable programs than their trigraph equivalents, shown in Table 1. (Wouldn't you rather use "or" instead of "??!"?) If you didn't #include <iso646. h>, you could use these tokens as variable names in a C program. In C++, these tokens are keywords instead of macro definitions, so you can't use them as variable names under any circumstances.

Finally, be aware that the token wchar_t is a keyword in C++. In C, it is a typedef, defined in <wchar.h>, <stdlib.h>, and <stddef.h>. In C++, wchar_t names a type that is distinguishable from other integral types during function overload resolution. In all other respects, however, wchar_t behaves like an ordinary integral type (usually 16-bit). (For more information about trigraphs see "Code Capsules: The Preprocessor," CUJ, March 1994.)

Iostreams

The iostreams classes are essentially the same as in Release 2.0. One notable difference is the absence of the classes iostream, fstream, and strstream in the CD. The committee omitted these classes to avoid the use of multiple inheritance in the standard library.

If you want to open a stream for simultaneous I/O, you connect an input stream and an output stream to the same streambuf, as in:

    ofstream f("myfile", ios::in | ios::out);
    ifstream g(f.rdbuf());
The rdbuf() member function returns a pointer to a stream's underlying streambuf object. Another alternative is to create a filebuf separately and connect streams to it:

    filebuf fb("myfile",ios::in | ios::out);
    ostream f(&fb);
    istream g(&fb);
Because those three I/O classes have been in the streams library for so long, there is some talk of putting them back in.

While the CD dropped a few classes from iostreams, it added some new manipulators (see Table 2) . These new manipulators make a number of stream insertions more convenient. For example, instead of setting the floatfield flag to scientific notation mode like this:

   float x;
   cout.setf(ios::scientific, ios::floatfield);
   cout << x;
you can do it in-line like this:

   float x;
   cout << scientific << x;
You can find a detailed description of iostreams in the Code Capsule "C++ Streams," CUJ, July 1993.

Locales

A locale in Standard C is a set of preferences that regulate the processing and display of information that is sensitive to culture, language, or national origin. Date and monetary formats are examples of locale-sensitive information. Standard C defines five categories of information, named by macros defined in <locale.h>, which locales affect (listed in Table 3) . Each of these five categories can be set to a different locale ("American," "Italian," etc.). For want of a better term, I call the collection of settings for all categories the locale profile.

Standard C specifies two functions that deal with locales directly:

    struct lconv *localeconv(void);
which returns a static structure containing settings for the LC_MONETARY and LC_NUMERIC categories, and

    char *setlocale(int category, char *locale);
which changes the locale for the given category to that specified in locale. You can set all categories to the given locale by specifying the category LC_ALL. If locale is a null pointer, setlocale returns the current locale string for the category. All implementations must support the minimalist "C" locale, as well as a native locale named by the empty string (which may be the same as the "C" locale). Unfortunately, few U.S vendors provide any additional locale support.

In July 1993 I proposed that each stream have an associated locale object to maintain its own locale profile. Nathan Myers of Rogue Wave subsequently engineered a locale class specification which is extremely well integrated with the rest of the library. The committee accepted his solution early in 1994. Each stream begins with a locale object set to the classic "C" locale. You can "imbue" a stream with a different locale profile at any time, for example:

    ostream str;
    locale loc;
    ...
    // Set locale
    str.locale(loc);

    // Get locale
    locale loc2: str.locale();

Language Support

The Standard C++ library defines a small number of functions to support three important language features: memory management, exception handling, and run-time type identification. The memory management functions are defined in <new> (see Listing 3) . When you use the new operator to allocate dynamic memory, two things happen: 1) the compiler computes the number of bytes needed and calls ::operator new(size_t) or ::operator[](size_t) (or their class-specific counterparts) to allocate the memory, and 2) the compiler calls a suitable constructor.

Overriding the functions declared in <new> allows you to usurp step 1 by replacing the default memory management provided by the compiler.

When a dynamic object is deleted, its destructor executes and its memory is freed via the appropriate version of operator delete(void *). Along similar lines, Laura Yaker of Mentor Graphics developed the specification for operator new[]() and operator delete[]() — for freeing array memory. For more detailed information on overriding the various incarnations of operator new(), see last month's Code Capsule ("Memory Management in C++," CUJ, November 1994).

When memory is exhausted, the default behavior specified by the CD is to throw a bad_alloc exception, not to return a null pointer. You can intercept this operation by providing your own new handler. Just call set_new_handler, and provide the address of your new handler function:

    #include <new>
    void my_handler(void);
    new_handler old_handler;
    old_handler = set_new_handler(my_handler);
Since set_new_handler returns the current new handler's address, you can easily restore the default new handler, and resume throwing bad_alloc exceptions when things go awry:

    set_new_handler(old_handler);
Unless you can free up some memory somehow, the usual thing to do in a new handler is abort the program.

Language Support for Exceptions

When an exception is thrown the run-time mechanism searches for a handler that matches the type of the exception (either an exact match or an accessible base class). When the system can't find a handler for an exception, it calls the standard library function terminate(), which aborts the program. You can substitute your own termination function by passing it as a parameter to the set_terminate(), library function, defined in the standard header <exception>.

You can enumerate the exceptions that a function will throw with an exception specification:

    void f() throw(A,B)
    {
       // Whatever
    }
This definition states that while f is executing, only exceptions of type A or B will occur. Not only does this make good documentation, but the run-time system will verify that only these types of exceptions occur. In the presence of any other exception, control passes to the standard library function unexpected, which by default terminates the program. The definition of f above is equivalent to the program in
Listing 4. You can provide your own unexpected handler by passing it to the standard library function set_unexpected (also declared in <exception>).

Language Support for RTTI

The Standard Library also includes a support class called type_info, used in connection with C++'s run-time type identification facility (RTTI). A type_info object is returned by the typeid operator, which is useful for listing the actual type of objects of polymorphic types (i.e., either built-in types or classes with virtual functions). You can compare type_info objects to see if the objects they refer to have the same type. The program in Listing 5 illustrates the typeid operator and dynamic casts. For more information on C++'s new-style casts, see the Code Capsule "Conversions and Casts," CUJ, September 1994.

Standard Exceptions

In March 1994 the standards committee decided that the Standard C++ library will only throw exceptions from the hierarchy shown in Listing 1. Logic exceptions are those due to errors in the internal logic of a program. A domain error, which is a particular type of logic error, violates the preconditions of some operation. Examples of domain errors are accessing an array with an out-of-range index, or attempting to use a bad file descriptor. The bitstring class from the standard library throws an invalid_argument exception if you ask it to set a bit that doesn't exist:

    #include <stdexcept>

    bitstring& bitstring::set(size_t pos, int val)
    {
       if (pos >= nbits_)
          throw invalid_argument("invalid position");
       set_(pos,val);
       return *this;
    }
As opposed to domain errors, run-time errors are those that you cannot easily predict in advance, and are usually due to forces external to a program. A range error, for example, is one that violates the postcondition of a function, such as arithmetic overflow from valid arguments.

String Class

Text processing in C is inconvenient because the programmer has to worry about array bounds. The C++ basic_string class template provides automatic memory management, eliminating most concerns about bounds checking, and also provides member functions for common text-processing operations, like search, replace, insert, remove, and append. You can now paste strings together without worrying, as in:

    string s1, s2, s3;
    // ...
    s4 = s1 + s2 + s3;
See
Listing 6 for a sample program that illustrates the string class.

Containers

Before STL, the only containers in the standard library were the dynamic array class, its pointer version ptr_dyn_array, and the bit manipulation classes bits and bitstring. dyn_array does for arrays in general what string does for arrays of characters: it provides memory management plus convenient access and retrieval (see Listing 7) . The bitstring class provides string processing on sequences of bits. Since STL includes a vector class that manages memory (all STL classes do), vector will doubtless supplant dyn_array, and a template specialization, vector<bool>, will doubtless replace bitstring.

The bits class will remain, however, because it operates on a fixed number of bits (see Listing 8) . You can think of a bits object as an arbitrarily large unsigned integer. The bits object is implemented as a template, with the number of bits in the collection as the template parameter:

    template<size_t N>
    class bits {...}
bits is highly suitable for interfacing with the host operating system, and is designed for efficiency (it will be stack-based under most implementations). Although the example in Listing 8 uses a word-sized object, you can have bit collections as large as your stack allows.

Using an expression instead of a type as a template parameter has a couple of interesting effects:

The bits class exposed a weakness in the C++ language definition. Consider, for example, how you would define an ostream inserter:

    template<size_t N>
    ostream& bits<N>::operator<<(ostream& os,
                            const bits<N>& b)
    { /* details omitted */ }
This is a template function with a non-type parameter, which was prohibited by the ARM's language definition. The work-around for global functions such as this, that take a template class parameter, is to define them inline in the class template definition itself. In order to allow out-of-line definitions for such functions, the committee voted to allow template functions with non-type parameters in November 1993. For a detailed treatment of the bits class template, see "Bit Handling in C++, Part 2," CUJ, January 1994.

In addition to a set of useful containers, STL defines common algorithms that operate on both user-defined and built-in types. STL achieves this transparency in part by defining generic iterators that can operate on all of its data structures. Iterators are pointer-like objects that traverse a container by, among other things, overloading operator++(), operator--(), and operator*(). STL is still being integrated into the standard library and is definitely a topic for an article all its own.

Afterword

Remember that the charter of a standards committee is to standardize existing practice. It is not feasible to expect such a committee to design a comprehensive object-oriented library. A large group of volunteers that meets together only three or four times per year cannot accomplish what a small, focused department can. The philosophy of the committee is to standardize a small number of popular, low-level classes for use as building blocks in a wide spectrum of applications. Bjarne said,

"Any libraries beyond the C libraries and iostreams accepted by the committee must be in the nature of building blocks rather than more ambitious frameworks. The key role of a standard is to ease communication between separately-developed, more ambitious libraries." [3]

The standard C++ library provides a sound foundation for the development of sophisticated, portable, reusable class libraries. I'm sure we'll be seeing a lot more of them.

References

[1] Bjarne Stroustrup. The Design and Evolution of C++ (Addison-Wesley, 1994), p. 200.

[2] P. J. Plauger. The Draft Standard C++ Library (Prentice-Hall, 1995).

[3] Bjarne Stroustrup. The Design and Evolution of C++, p. 194.