November 1993/Code Capsules

Code Capsules

Bit Handling in C

Chuck Allison

The ability to manipulate the individual bits of an integer is essential in systems programming. In the MS-DOS file system, for example, each disk file has an associated attribute byte. You can delete-protect a file or even omit it from directory listings, depending on which attribute bits you set. Before the days of C, most programmers had to resort to assembly language to manipulate bits. Nowadays, with bitwise operators and the host system interface that most compilers provide, there is almost nothing that you cannot do with your operating system from within a C program. In this article I'll review common bit-twiddling techniques and extend them with a couple of useful abstract data types.

Bitwise Operators
There are six bitwise operators in C (see Table 1) . As the program in Listing 1 illustrates, the bitwise-OR operator sets a bit in the result if either one of the corresponding operand bits is "set" (has a value of 1). For example:
     01010101  (== 0x55)
  |  11110000  (== 0xf0)
-------------
  =  11110101  (== 0xf5)
The bitwise-AND operator sets only those bits whose corresponding operand bits are both set:
     01010101
  &  11110000
-------------
  =  01010000  (== 0x50)
The bitwise-EXCLUSIVE-OR operator sets those bits when one or the other (but not both) of the original operand bits is set:
     01010101
  ^  11110000
-------------
  =  10100000  (== 0xa0)
The bitwise-NOT operator "flips" each bit by making each 1 a 0, and vice-versa. The cast to unsigned char in Listing 1 is necessary because Standard-conforming compilers always promote chars to ints before applying bitwise operations. Without the cast the output would be:
   ~a == FFAA
since printf never truncates significant values in numbers, and ints on my machine are two bytes wide.
The shift operators shift all bits left or right by the number of positions specified in the right operand. Bits that "fall off the end" are lost. The left-shift operator sets vacated bits to zero. When shifting signed integers to the right, some machines zero-fill while others replicate the sign bit. It is common practice, therefore, to declare integers unsigned, which will always be zero-filled when you shift them to the right.

Accessing Individual Bits
Quite often you need to set, reset, or test individual bits. If not too many bits are involved, you might consider using bitfield structures. For example, suppose you are only interested in bits 4 and 5 of an integer. Logically, bits are numbered right-to-left, starting from 0. Each bit number represents a power of 2, signifying the weight of each digit in a number's binary representation. For example, bit 0 represents 20, bit 1 represents 21, bit 2 represents 22, and so on. On a little-endian machine, the following structure would typically be suitable:

struct bits4_5 { unsigned : 4; /* Skip bits 0-> 3 */ unsigned bit4 : 1; unsigned bit5 : 1; };
A big-endian architecture typically calls for

struct bits4_5 { unsigned : 10; /* Skip bits 15 -> 6 */ unsigned bit5 : 1; unsigned bit4 : 1; };
You can then process the bits via the named variables:

unsigned x; struct bits4_5 *xp = (struct bits4_5 *) &x; xp->bit4 = 0; xp->bit5 = 1; f(xp); /* May modify the bits */ if (xp->bi t4) . . .
This technique is somewhat unattractive not only because it isn't very portable, but also because it is cumbersome with a large number of bits. The preferred method to manipulate arbitrary, individual bits is via the bitwise operators. For example, to set the ith bit of an integer, form a mask with a 1 in the ith bit position and zeroes elsewhere (I'll call it the "1-mask"), then bitwise-OR it into the number:

x | = (1u << i)
To turn a bit off, form the complement of the above mask (the "0-mask") and bitwise-AND it in:

x &= ~(1u << i)
The program in Listing 2 sets all the bits of an integer in turn, right-to-left (logically), and resets them in the same order. The quantity CHAR_BIT, from limits.h, is the number of bits in a byte.
To toggle a bit, exclusive-OR the 1-mask into the number:
    x ^= (1u << i)
To see whether or not a bit is set, bitwise-AND the integer with the same mask:
    x & (1u << i)
This expression evaluates to 0 if the bit is not set, or non-zero (2 to the ith power, in particular) if it is. The following useful idiom yields a 0 or 1 only:
    !!(x & (1u << i))
in case you want to use the result as an index into an array.
The header file in Listing 3 defines macros for non-destructive versions of these operations, and declares some useful functions. The function implementations in Listing 4 use the nbits macro to compute the size of an unsigned int in bits. The function fputb prints an unsigned integer in binary form with bit-0 rightmost and with leading zeroes if necessary. fgetb reads a string of bit characters and converts it into an unsigned. The curious statement
    sprintf(format," %%%d [01]", nb);
builds a scan format string that skips whitespace and then reads up to nb 1s and 0s, or reads until it scans a non-bit character. You might think that you could avoid the sprintf by using a variable-width format descriptor in the scan, such as:
    fscanf(f," %*[01] ",nb,buf);
Unfortunately, variable-width substitution doesn't work with scansets. (If this is Greek to you, see October 1992's Code Capsule, "Text Processing I".)
The count function computes the number of 1-bits in a number by right-shifting one bit at a time and testing bit 0. The test program in Listing 5 illustrates these functions.

Large Bitsets
What if you need more bits than an integer holds? For example, suppose that you must create a "picklist" user interface object. A picklist is a scrollable, pop-up listbox that allows the user to choose multiple entries. The number of entries can easily exceed the number of bits in a long integer. The most efficient way to track the state of each entry is to associate a bitset with the picklist — if the user highlights the ith entry, then you set the ith bit.
You can of course use an array of unsigneds to hold the bits you need:

If you need n bits, then you need N = ceil (n / BLKSIZ) array elements, where BLKSIZ is the number of bits in an unsigned (or whatever integral type you derive the array from). If n is not a multiple of BLKSIZ, then there will be unused bits in the zeroth block (represented by XX in the preceding figure). Now, accessing a particular bit reduces to finding the particular block it resides in, and computing the offset for the mask to apply to that block. To find the right block, notice the following pattern:

bits in this range are in this block b/BLKSIZ --------------------------------------------------- [0 , BLKSIZ-1] N-1 0 [BLKSIZ , 2*BLKSIZ-1] N-2 1 . . . . . . . . . [(N-1)*BLKSIZ , n-1] 0 N-1
If b is the bit number in question, then

B + b/BLKSIZ == N-1
where B is the block we are after. Therefore,

B = N-1 - b/BLKSIZ
The offset of bit b within block B is simply

offset = b % BLKSIZ
If bits is the array name, then you can process the Bth bit with these expressions:

bits[B] |= (1u << offset); /* set */ bits[B] &= ~(1u << offset); /* reset */ bits[B] ^= (1u << offset); /* toggle */ !!(bits[B] & (1u << offset)); /* test */
The header file in Listing 6 presents an interface for processing arbitrarily large bitsets. In addition to the functions already discussed, it adds conversions to and from unsigned, in case you are working only with word-sized bitsets and want to interface with your host environment. The statement
    typedef struct bits Bits;
declares struct bits as an incomplete type with synonym Bits. The definition of the structure is in the implementation file. (See Listing 7 — If incomplete types are Greek to you, see last month's installment.) If you want to use another integral type for the base array, just substitute it for unsigned int in line 11 of bits. c.
A Bits object consists of the number of bits allowed, the array of integers, the number of elements in the array, and a mask to reset the unused bits to zero that have inadvertently been set by operations on other bits. The test program in Listing 8 shows how you might use a Bits object in an application.

Bit Strings
The storage scheme in a Bits object puts bit 0 last in the array, so bits are numbered right-to-left, just like they are in single integers. In situations where there is no numerical application of bitsets, like the picklist example above, it might seem more natural to number bits from left to right, like a string. It is trivial to modify bits.c to make this change in orientation. The orientation is represented as:

The block for the bth bit is now

B = b / BLKSIZ
and the offset is calculated from the left instead of the right of the block:

offset = BLKSIZ - b%BLKSIZ - 1
Listing 9 and Listing 10 summarize the changes required to transform the header file and implementation respectively from a bitset into a bit string. The test program is in Listing 11.

Stay Tuned
There are a number of features we could and should add to these objects. It would be convenient to provide bitwise operations that apply to entire objects, for example

Bits *bp1, *bp2, *bp3; /* . . . */ bits_and(bp3,bp1,bp2); bits_shift_left(bp3,4); /* . . . */
It would also seem natural to allow bit strings to grow and shrink, just like strings. Before we get too carried away, however, it's time to consider moving to C++. In C++, we can overload operators so that the above excerpt looks like:

Bits bp1, bp2; // . . . Bits bp3 = bp1 & bp2; bp3 <<= 4;
The next two installments will cover the two bitset classes that were voted into the standard C++ library in March, 1993 in Portland.