Code Capsules

Data Abstraction

Chuck Allison


Chuck Allison is a regular columnist with CUJ and a Senior Software Engineer in the Information and Communication Systems Department of the Church of Jesus Christ of Latter Day Saints in Salt Lake City. He has a B.S. and M.S. in mathematics, has been programming since 1975, and has been teaching and developing in C since 1984. His current interest is object-oriented technology and education. He is a member of X3J16, the ANSI C++ Standards Committee. Chuck can be reached on the Internet at 72640.1507@compuserve.com.

You may be tired of hearing it, but let me say it one last time: you can view C++ in three ways: 1) as a better C, 2) as a language that supports data abstraction, and 3) as a vehicle for object-oriented programming. In last month's article I explored the "better C" features of C++. This month I'll show how it supports data abstraction, or, in other words, how it helps you create new data types.

Abstraction

Unlike a machine, the human mind can handle only a very limited amount of complexity. Life is inherently complex, and so are software systems that model reality. To master a complex system, it is necessary to focus on only the few things that matter most in a given context, and ignore the rest. This process, called abstraction, enables system designers to solve complex problems in an organized, manageable way. Grady Booch defines an abstraction as follows:

"An abstraction denotes the essential characteristics of an object that distinguish it from all other kinds of objects and thus provide crisply-defined conceptual boundaries, relative to the perspective of the viewer." [1]

Abstraction often manifests itself in software through user-defined data types, new classes of objects composed of built-in or other user-defined types. This concept is not new, of course, but languages that support it well, such as Ada, C++, CLOS, and Smalltalk, have only recently seen widespread use.

If you've been programming in C for any length of time, you've probably used the struct mechanism. You've also had to provide functions to process those structures. Whether you knew it or not, you were creating a user-defined type. Chances are, for example, you've needed to handle dates in a program. Listing 1 defines a struct Date and two functions for processing dates, one to format a date as Month Day, Year, and one to compare dates. (See Listing 2 for the functions' implementation and Listing 3 for a test program.)

To use this abstract data type, you only need to know the protocol for its two functions, which constitute its interface. You don't need to know how the functions were implemented, or even the structure layout. A well-defined abstraction enables concentration on the outside view while ignoring implementation details. This barrier between interface and implementation is called encapsulation. There is a hole in type Date's encapsulation barrier, of course, because it's possible to bypass the interface and directly manipulate one of the structure members, as in

   dp->month = 7;
That hole can be closed up, since the interface itself doesn't need to know anything about a Date, it just needs a pointer to it. In Listing 4 I define Date as an incomplete type — i.e., I just state that it exists and nothing more. A client program sees only date2.h, and therefore knows nothing about a Date's layout or implementation. I complete the type in the implementation in Listing 5 (the test program is in Listing 6) . This extra measure of protection costs something, however. Programs using Date must now call explicit create and destroy functions, and must access all Date objects through pointers only.

C++ improves support for user-defined data types in at least two ways: 1) by allowing definition of the interface functions within the scope of the class, and 2) by barring client access to implementation details via a language keyword, as opposed to the incomplete type trick. The Date class in Listing 7 and Listing 8 differs from that in Listing 1 and Listing 2 in the following ways:

1) The data members are private, so client programs can't access them directly (only the member functions can).

2) The interface functions are public member functions, so they can only apply to Date objects, and the function names do not pollute the global namespace. The date_ prefix is no longer necessary, since the operations belong to the class. Note that format and compare are const member functions, meaning that they promise not to alter a Date's data members.

3) A constructor replaces C's structure initialization syntax.

4) The text representations for the month names now appear in the scope of struct Date (being static members).

The sample program in Listing 9 shows that member functions are invoked with the usual dot operator for structure members.

If it bothers you that a client programmer can still see the layout of your data members — albeit without direct access — you can hide the data member declarations in a new incomplete type, but this time with absolutely no impact to the interface (see Listing 10 and Listing 11) . To demonstrate this, I define the DateRep helper class with a constructor, and for efficiency and simplicity I give the Date class direct access to the its data members by making it a friend to DateRep. The Date constructor now needs to create its associated DateRep object on the heap, so a destructor is needed to free the heap memory. Only the header file has to change in the test program in Listing 9. With a well-defined abstraction, a change in implementation will not inflict change on client code.

Operator Overloading

You can make user-defined types nearly as convenient to use as built-in types by adding operator functions to your class definition. In Listing 12, Listing 13, and Listing 14 I've added the six relational operations and a stream inserter for easy output. For efficiency I've also defined the smaller functions inline within the header file. Note also the declaration of ostream as an incomplete type in the header file. Since an ostream object appears there only by reference, date5.h does not have to be dependent on the iostream.h header, which is one of the largest in the Standard C++ library. The implementation of operator<< shows that defining a stream inserter usually just reduces to inserting an object's members into the stream. I simulate the new C++ Boolean data type, bool, with the definitions in Listing 15.

As the definition of the Person class in Listing 16 shows, you can compose a new type from other user-defined types, as well as built-in types. Each Person has a birth date, which is a Date object wholly contained within a Person object. The Person constructor passes the Date information on to its Date subobject via an initialization list. The initialization list follows the colon in the function header, as defined in the implementation file (Listing 17) . Notice how Person::operator<< implicitly uses Date::operator<<. Class Person also uses the new C++ standard string class for its text data members. (Borland C++, the compiler I used for most of these examples, defines the string class in the header (cstring.h>).

Concrete Data Types

C++ support for data abstraction, and operator overloading in particular, can make user-defined data types as convenient to use as built-in types. What are typical operations performed on built-in types? Programs can initialize them, assign them to other objects of compatible type, and pass them to or receive them back from functions. In C++, you can perform these same actions with your own types, as long as either you or the compiler provide certain special member functions. The presence of these member functions constitutes a concrete data type, one that behaves intelligently wherever a built-in type does. These functions include the following:

copy constructor
default constructor (and others as needed)
assignment operator
destructor

Whenever an object is created, a constructor is called to initialize it. The initializer arguments must match a constructor's parameters in number and type. The default constructor is the one that takes no arguments. Another special constructor, called the copy constructor, initializes a new object from an existing object of the same type, and has the signature T(const T&), or T(T&), for some type T. This constructor executes whenever you pass or return an object by value, or explicitly initialize a new object from an old one, as with y in the following example:

   T x;
   ...
   T y = x;     // same as "T y(x)"
The assignment operator executes whenever you assign the value of an object to an existing one, and has the signature

   T& T::operator=(const T&)
Note the assignment statement in
Listing 18 (p3 = p2;). I did not explicitly define Person::operator=(), but it seems to have worked anyway. This is because the compiler generated it for me. The absence of operator=() in the class definition caused the compiler to generate one that does memberwise assignment, like this:

   Person& Person::operator=(const Person& p)
   {
      last = p.last;
      first = p.first;
      birth = p.birth;
      ssn = p.ssn;
      return *this;
   }
Similarly, if you do not supply a copy constructor, the compiler generates this one:

   Person::Person(const Person& p)
     : last(p.last), first(p.first), birth(p.birth),
       ssn(p.ssn)
   {}
which, as you can see, calls the copy constructor for each member object.

If you define any constructor other than a copy constructor, then the compiler will not generate a default constructor for you. If you don't define any constructors (other than copy constructors) the compiler will generate a default constructor which will invoke the corresponding default constructor for all user-defined member objects. A compiler-generated destructor calls the destructors associated with any user-defined member objects.

The compiler-generated assignment operator and copy constructor for Person worked because the standard string class provides its own version of these two functions, and because the Data class data members are built-in, so its compiler-generated constructors and operators suffice. In general, whenever the state of an object is contained entirely within the object itself, you do not need to override the compiler-generated member functions.

The Person class in Listing 19 and Listing 20 stores its text data on the heap. A compiler-generated assignment operator or copy constructor will only copy pointers to the receiving object, when what it really needs to do is allocate space on the heap for the duplicated text. When I execute the test program on this class, Metaware's C++ gives the following output on Windows NT:

p1 == {Richardson,Alice,[December 16, 1947], 123-45-6789}
p2 == {Doe,John,[Bad month 0, 0],}
p1 does not equal p2
p3 does equal p2
***Free(0X4C105C):X POINTER already free
ABORTING. . .
At program exit, the compiler calls the Person destructor to destroy p3, which frees the text memory on the heap. Since p2 points to the same memory, the destructor attempts to delete it a second time when destroying p2, which is an error. Listing 21 and Listing 22 fix the problem by adding the missing member functions. The assignment operator uses the clever-but-lazy trick of explicitly invoking the destructor, followed by a call to placement new, which in this case executes the copy constructor on the current object. (For more information on placement new, see the Code Capsule "Dynamic Memory Management, Part II," CUJ, November 1994.)

Generic Types

In programming, one of the most useful abstractions is the kind that allows independence from data type. An example of such an abstraction is a container. Containers are objects that hold other objects. A set is a container that provides operations to insert, remove, and test elements for membership. Listing 23 - Listing 25 illustrate a class that implements a set type for integers. It uses a fixed-size array as the underlying data structure (although bit sets are a more efficient implementation for sets of ordinal values). The implementation in Listing 24 uses the find algorithm, a new addition to the Standard C++ library, which expects pointers to the beginning and to one past the end of the array as its first two parameters. If you were to define SetOfLong, or SetOfString, or a set for any other type supporting the equality operator, you would find that the only thing that changed would be the type of objects in the set. If you think about it, a set really shouldn't have to care about the type of objects it holds.

C++'s template mechanism allows you to define a generic set type that essentially ignores the type of the objects it contains, by specifying that type as a parameter (see Listing 26) . For efficiency, I use the dynamically-sized vector template class from the Standard C++ library in the implementation. Dynamic sizing is crucial when using a set of large objects, since a fixed-size array of the same would result in much wasted space. As you can see in set2.h in Listing 26, the find algorithm also works on vectors. This is because its begin and end member functions return an iterator, which is a generalization of a pointer. The erase member function expects an iterator argument, which find returns. The program in Listing 27 uses the Set template to hold Person objects.

Summary

Complexity is an unavoidable fact of life. Support for user-defined data types enables the programmer to model and manage a complex system in software. C++ supports data abstraction via classes, member access control, constructors and destructors, operator functions, and templates. If you provide the suitable member functions, you can define abstractions that have the convenience of built-in data types.

Computers In Kansas

During the last couple of years I've enjoyed wearing the official T-shirt of C/C++ Users Journal. On the back it says, "Yes, there really are computers in Kansas." Not only are there computers, but there is a group of talented and pleasant individuals who comprise R&D Publications. In January they sponsored "C/C++ Solutions '95," a practical, "how-to" seminar in Kansas City. (Missouri. All right, it's not in Kansas, but it's close.) We had ten of the best C/C++ developers and teachers you will find anywhere, including Scott Meyers, author of "Effective C++," and our own Dan Saks. The balance of material and the emphasis on C as well as C++ was very well received. Sorry if you missed it. Stay tuned — we'll do it again. You won't get this type of intensive training at any other seminar.

I've also enjoyed contributing as a columnist for this journal since October 1992. This has been a valuable outlet for me to share some of my learnings as a C/C++ developer and contributor to the C++ standard. Due to personal time constraints, I reluctantly announce that this is my last column. I've appreciated your feedback and support, and will miss you and the folks at R&D. Goodbye, farewell, and keep on hacking.

Reference

[1] Grady Booch, Object-oriented Analysis and Design with Applications, Second Edition (Benjamin-Cummings, 1994).