Code Capsules

The Standard C Library, Part 1

Chuck Allison


Chuck Allison is a regular columnist with CUJ and a Senior Software Engineer in the Information and Communication Systems Department of the Church of Jesus Christ of Latter Day Saints in Salt Lake City. He has a B.S. and M.S. in mathematics, has been programming since 1975, and has been teaching and developing in C since 1984. His current interest is object-oriented technology and education. He is a member of X3J16, the ANSI C++ Standards Committee. Chuck can be reached on the Internet at 72640.1507@compuserve.com.

Although it may not seem like it, C is a very small language. In fact, it was first implemented on a very small platform by today's standards. The first C compiler I owned ran on a Commodore 64! C's simplicity and compactness make it ideal for systems programming and for developing programs that run in embedded systems, such as in automobiles or cameras.

A key difference between C in such freestanding environments and C in a hosted environment, such as a desktop or mid-range computer, is the presence of the Standard C library. In a freestanding environment a conforming compiler only needs to provide the types and macros specified in <float.h>, <limits.h>, <stdarg.h>, and <stddef.h>. By contrast, in hosted environments, programmers who work on typical data processing projects take the library for granted — in fact, they think of it as part of the language. A large portion of everyday C code consists of library calls. Even I/O facilities like printf and scanf are part of the library, not the language.

The Standard C library consists of functions, type definitions, and macros declared in fifteen header files. Each header more or less represents a domain of programming functionality, such as I/O or string processing operations. Some macros and type definitions, such as NULL and size_t, appear in more than one header file for convenience.

In this article I divide the Standard library into three groups (see Table 1, Table 2, and Table 3) . Group I represents library components that you should understand thoroughly if you want to consider yourself a C programmer. Too often I have seen programs that "reinvent" basic library facilities such as memcpy or strchr. Such programs should not be executed, but sometimes I think their programmers should be (or at least fired). To receive your paycheck in good conscience, though, you should really master Group II as well. And although you may need the functions in Group III only once in a blue moon, you should be familiar enough with them that you know how to use them when that need arises.

I certainly don't plan to review the entire library in this article. If you want a comprehensive reference, there is nothing better than Plauger's book, The Standard C Library [1]. For tips, tricks, and tutorials, you can read most of the back issues of this column (which started in October, 1992). What I will do here is bring to light some of the library functions you may have overlooked, and some behavior you may not be aware of.

Group I — For the "Adequate" C Programmer

<ctype. h>

The functions in <ctype.h> support typical operations for handling single characters (see Table 4) . For example, to determine if a character C is upper case, use the expression isupper(c). Many old-time C programs are peppered with expressions such as

   ('A' <= c && c <= 'Z')
instead, which makes poor reading. Putting such an expression in a macro helps, as in

   #define ISUPPER(c) ('A' <= c & c <= 'Z')
But this makes expressions with side effects (such as ISUPPER(c++)) unreliable. And of course this test for uppercase membership works only with a character set that encodes the alphabet contiguously, such as ASCII. By contrast, the character classification functions in <ctype.h> are safe and portable across all platforms.

It is important not to assume that ASCII is always the execution character set. For example, ASCII control characters comprise the code 127 and those less than 32, but only seven control characters behave uniformly across all environments: alert ( '\a'), backspace ( '\b'), carriage return ( '\r'), form feed ( '\f''), horizontal tab ( '\t'), newline ( '\n'), and vertical tab ( '\v'). The only character-handling functions that do not change behavior when you change locale are isdigit and isxdigit.(See Group III next month for more on locales).

Although you can assume that the digits '0' through '9' have contiguous codes in all C execution character sets, the hexadecimal digits, being alphabetic characters, do not. The function atox in Listing 1 shows how to convert a hexadecimal string to an integer value. Unfortunately, it only works for ASCII-like character sets. The offending line is:

   digit= toupper(*s) - 'A' + 10;
There is no guarantee that the expression

   toupper(*s) - 'A'
will give the correct result. The version in Listing 2 works on any platform because it stores all hexadecimal digits contiguously in its own array. It searches the array with strchr and then uses pointer arithmetic to compute the value of the digit.

<stdio.h>

The author of Listing 1 and Listing 2 could have avoided a lot of trouble if he had only understood scanf formatting a little better. As Listing 3 illustrates, the "%x" edit descriptor does all the work of reading hexadecimal numbers for you. Unlike the previous two versions, it even handles a leading plus or minus sign. Both scanf and printf are laden with features that so many programmers overlook. For more detail on these two functions, see the Code Capsules in the October 1992 and November 1992 issues of CUJ.

The printf/scanf families of functions shown in Table 5 perform formatted I/0. Furthermore, they provide these facilities for three types of streams: standard streams, file streams, and string (i.e., in-core) streams. Formatting operates identically on the different types of streams, but of necessity the function names and calling sequences are somewhat different.

The <stdio.h> component of the Standard C library provides two other classes of input/output facilities: character I/0 and block I/0 (see Table 6 and Table 7) . The functions in Listing 4 and Listing 5 copy one file to another using character I/0 and block I/0 functions respectively. Note that since fread does not return an error code, I must make an explicit call to ferror to detect a read error.

As Table 7 illustrates, <stdio.h> provides functions for file positioning. The time-worn functions fseek and ftell work reliably only on files opened in binary mode, and are limited to file positions that can be represented by a long integer. To overcome these limitations, the ANSI committee invented fgetpos and fsetpos, which use the abstract type fpos_t (Table 8) as a file position indicator.

The program in Listing 6 puts fgetpos and fsetpos to good use in a simple four-way scrolling browser for large files. The browser keeps only one screen's worth of text in memory. If you want to scroll up or down through the file, it reads (or re-reads) the adjacent text and displays it. When scrolling down (i.e., forward) through the file, the program pushes the file position corresponding to the old screen data on a stack, and reads the next screenful from the current file position. To scroll up, the program retrieves the file position of the previous screen from the stack. For the complete program, and for more detailed information about file I/0, see the Code Capsule in the May 1993 edition of CUJ.

<stdlib.h>

The header <stdlib.h> is a bit of a catch-all. It defines types, macros, and functions for memory management, sorting and searching, integer arithmetic, string-to-number conversions, sequences of pseudorandom numbers, interfacing with the environment, and converting multi-byte strings and characters to and from wide character representations (see Table 9) . The program in Listing 7 uses all four memory management functions to sort a text file. Whenever its array of pointers to char fills up, it expands it with realloc, which preserves the original contents. See "Code Capsules: Dynamic Memory Management, Part 1," CUJ, October 1994, for an in-depth treatment of memory management in C. For more information on the sort function qsort, see the Code Capsule "Sorting with qsort," CUJ, April 1993. The search function, bsearch, searches a sorted list for a given key. Like qsort, you supply bsearch with a compare function and it returns a pointer to the array element containing the key (see Listing 8) .

The program in Listing 9 illustrates some of <stdlib.h>'s seldom-used functions. It shuffles a deck of 52 cards by creating a randomized sequence of the numbers 0 through 51. The srand function seeds the pseudo-random generator by encoding the current time and date. To derive the suit and denomination from a number, I divide the number by 13, the number of cards in each suit. The remainder of this division is the denomination (0 through 12 corresponding to ace through king), and the quotient represents the suit as follows:

0 == clubs

1 == diamonds

2 == hearts

3 == spades

The div function computes the quotient and remainder all at once and stores the result in a structure of type div_t..

The functions in the scanf family call strtol to convert character strings to integers. Using strtol directly, however, you can read numbers in any base from 2 to 35, as the program in Listing 10 illustrates. strtol updates nextp through its second argument so you can progress through the string, converting one number after another. The functions strtoul and strtod behave similarly for unsigned longs and doubles respectively. With strtol, I can write a superior version of the atox conversion function, as shown in Listing 11.

Everyone reading this column is probably familiar with the functions exit and abort. You may not know, however, that you can "register" functions to be called automatically at program exit. These functions are usually called "exit handlers" and you register them with the atexit function, like this:

   void my_handler();
   atexit(my_handler);
An exit handler must take no arguments and must return void. Upon normal exit (i.e., return from main or a call to exit), C calls all of your handlers in the reverse of the order that they were registered. You can register up to 32 exit handlers.

The getenv function allows you to query strings in your host environment. For example, to find the current setting of the PATH variable, which is common to many environments, you can do the following:

   char *path = getenv("PATH");
The pointer refers to memory outside of your program, so if you want to keep the value, you'll have to copy it to a program variable before the next call to getenv.

<string. h>

The functions defined in <string.h> are shown in Table 10. All the functions with the str prefix work on null-terminated strings, and the mem-functions process raw memory. You've already seen strchr in Listing 2, and its companion memchr in Listing 9. To transfer raw bytes from one location to another, use memcpy, or memmove, if the source and destination buffers overlap. (Judging by the number of times I've seen memcpy reinvented in others' code, I believe that it is the most overlooked function in the standard library).

The string search functions also go too often unused. The program in Listing 12 uses strstr to extract all lines from a text file that contain a given string. Due to their cryptic names, the following three <string.h> functions are probably the least used:

1) size_t strspn(char *s1, char *s 2); "Spans" the characters from s2 occurring in s1. In plain English, strspn returns the index of the first character in s1 which is not in s2.

2) size_t strcspn(char *s1, char *s2); "Spans" the characters not in s2 occurring in s1. In other words, strcspn returns the index of the first character in s1 that is also in s2.

3) char *strpbrk(char *s1, char *s2); Returns a pointer to the first character from s2 that occurs in s1. It's kind of like a cross between strchr and strcspn.

The program in Listing 13 illustrates these functions. For more on string handling, see the "Code Capsule" in the December 1992 issue (vol. 10, no. 12).

Summary

Although you may not agree totally with my categorization of Standard C library components, I hope I have caused you to think about your own practices and level of expertise. Whatever our opinions, one bit of advice is indisputably in order: know and use the Standard library! Next month I'll cover Group II.

Further Reading

[1] Plauger, P. J. The Standard C Library. Prentice-Hall, 1992. ISBN 0-13-131509-9.