November 1994/Code Capsules

Code Capsules

Dynamic Memory Management, Part 2

Chuck Allison

In last month's article I explored some of the finer points of using heap memory in C programs, and then introduced the new and delete operators of C++. The word operators is key. In C, you allocate dynamic memory via the library functions malloc, calloc, and realloc. You must explicitly pass the size of the objects you're working with to these functions. Since new is an operator in C++, the compiler figures out how much memory is needed — you just deal with objects.
To get a pointer to a dynamic object of any type, just ask for it:
   T *tp = new T;
Using new does two things:
1) Allocates memory for the object
2) Calls the appropriate constructor
Arrays are easy, too:

const size_t SIZE = 100; T *tap = new T[SIZE];
In this case new calls the default constructor to initialize each array element, in order of increasing index.
Even multi-dimensional arrays are a snap. For example, to allocate a 2-by-3 array, you can do something like:

const size_t NROWS = 2; const size_t NCOLS = 3; T (*p)[NCOLS] = new T[NROWS][NCOLS];
You can now use p with array syntax, as in

T t; ... p[0][1] = t;
(If p's declaration syntax is puzzling to you, see the Code Capsule, "Pointers and Arrays," CUJ, September 1993.)
To dispose of a heap object, you use the delete operator:

delete tp; // scalar version delete [] tap; // array version
A call to delete invokes the appropriate destructor before deallocating memory. The array version, delete[], destroys the array elements in decreasing index order. It is important to use the correct version of delete when returning memory to the heap.
In this month's article I discuss some of the things you should be aware of when using the heap in C++.

Deep vs. Shallow Copy
The simple string class in Listing 1 and Listing 2 holds its data in an underlying C-style string. Since the string can grow and shrink, the class allocates the data buffer from the heap. The test program in Listing 3 exposes a problem with this class, however. Somehow a string which is either initialized or assigned from another stays "connected" to the original — changing one changes the other. If you look closely at the class definition, you'll notice that I did not define a copy constructor or an assignment operator. A copy constructor has the signature
   string(const string&);
and executes whenever a new string object is initialized with the value of another, as in
   string t = s;
The assignment operator has the following signature:
   string& operator=(const string&);
The compiler automatically generates these functions if you don't. These automatic versions use memberwise semantics, which means that when you initialize a string object with another's value, each member of the receiving object is initialized with the value of the corresponding member in the original object. If the data member is an instance of a type that has a copy constructor, that constructor gets called. In the case of string, the data members are built-in types, so no constructor is called for each member; the pointer and counter are just copied as scalar values. This simple copy operation results in the situation shown in Figure 1.
The string objects s and t actually share the underlying C-string buffer, since the compiler-generated copy constructor just copied s.data to t.data. This is called a shallow copy. Sharing private data this way is not only inconvenient, but dangerous. If s should go out of scope before t does, the string destructor will turn t.data into a dangling pointer, as depicted in Figure 2.
What you really want is a "deep copy," which creates a separate copy of the underlying character buffer for each initialization or assignment. A deep copy requires a special copy constructor and assignment operator, both of which allocate a new buffer for the receiving object, and copy the characters from the original with strcpy. (See Listing 4, Listing 5, and Listing 6. ) In general, when an object contains a pointer to heap memory, you should define a custom copy constructor and assignment operator, not to mention a destructor and other constructors as necessary.

Virtual Destructors
The essence of object-oriented programming is polymorphism, a fancy word for being able to invoke different but related behavior through a single interface. For example, the universal interface of a brake pedal allows you to stop any motor vehicle, even though the mechanics of stopping may differ from vehicle to vehicle, depending on the type of brake system (disc, drum, air, etc.). You can stop a car without even knowing what type of brake system it has, and without understanding what happens under the hood. In programming, polymorphism allows a higher degree of reuse and maintainability than traditional procedural systems.
C++ supports polymorphism through virtual functions. The program in Listing 7 uses an array of pointers that can point to objects of either of two class types: B (base) or D (derived). As the program traverses the array the appropriate member function executes transparently. If the current pointer refers to a B object, B::f executes, otherwise D::f does.
A user interface library might support objects for data entry, such as tables, popup menus, or single data cells. A data entry form might consist of a collection of such objects in a particular layout. It would be convenient if a form-management system could simply traverse its objects (lets call them "fields") to invoke a particular action, without having to know what type of field each object was. With this sort of capability, you could refresh a form display by doing this:
   for (int i = 0; i < nfields; ++i)
       fields[i]->display();
The appropriate display member function would be called depending on the specific type of the object that fields[i] refered to.
Novices to C++ often don't realize that destructors need to be virtual too. To illustrate, the programs in Listing 8 - Listing 12 contain skeletal definitions of the following class hierarchy:
   field           (Listing 8)
      cell        (Listing 9)
      column      (Listing 10)
         popmenu  (Listing 11)
      table        (Listing 12)
field is an abstract class, a class that exists only to unite the other classes as derived classes in a single hierarchy. A form object contains an array of pointers to dynamic field objects, and knows nothing about the other concrete types (see Listing 13 and Listing 14) . The test program in Listing 15 merely creates a form with a few fields so that you can observe what happens when the form is destroyed. The statement
   delete fields[i]
in Form::~Form() as usual calls a destructor and then deallocates the associated memory. Because the destructors are virtual, the correct destructor gets called according to the dynamic type of the referenced object. By contrast, Listing 16 shows that only the base part of each object is destroyed when the destructor is not virtual. In general, all base classes should have a virtual destructor, because you never know when a user might access derived objects on the heap through a base class pointer.

Handling Memory Allocation Failures
Traditional C++ compilers return a null pointer when new fails to allocate memory:

T *tp = new T; if (t == 0) { cerr << "memory exhausted" << endl; abort(); }
Since inserting code like this can be quite tedious in applications that make heavy use of the heap, programmers of such applications often don't bother to check if the pointer is valid. Even if you religiously check every call to new, what do you do when memory fails within a constructor? Constructors don't return a value, so it can be awkward to pass that information down the line to client code.
One alternative is to define a new handler, a function that executes whenever memory runs out. To install a function as a new_handler, pass it to the standard library function set_new_handler (see Listing 17) . A typical custom new handler tries to free up some memory and then returns, allowing new to try again. But if it can't come up with any more memory, then you should probably abort the program.
A standard-conforming C++ compiler throws a bad_alloc exception when memory fails. Some compilers allow you to revert to the traditional "return-null" behavior by calling set_new_handler(0) (see Listing 18) . The program in Listing 19 does not handle the bad_alloc exception, so the standard library function terminate executes, aborting the program. Listing 20 shows how to catch a bad_alloc exception (except that the Borland compiler I use calls it xalloc, which was its old name). For more information on exceptions, see the Code Capsule "C++ Exceptions," CUJ, July 1994.

Overriding new and delete
The Standard C++ library defines six functions for allocating and deallocating memory. When you use the new operator, it calculates the number of bytes needed and in turn calls one of these functions to allocate those bytes. The six functions are:

// Scalar versions void *operator new(size_t); void operator delete(void *); // Array versions void *operator new[](size_t); void operator delete[](void *); // Placement new void *operator new(size_t, void *); void *operator new[](size_t, void *);
Whenever you allocate an object of a built-in type from the heap, new calls the first function shown above to get the memory. You can easily implement this new via the malloc() function from Standard C (see Listing 21) . Likewise, deleting a dynamic, built-in object results in a call to ::operator delete(void *). Listing 21 also illustrates that you can override these functions. You override a function by defining it with the same signature as a previously-defined function. Any function you define with the same signature as one of the above replaces the library version of that function in your program.
Listing 22 shows a typical implementation of the default ::operator new and ::operator delete functions. The bad_alloc exception mentioned previously is thrown from the default new handler, which ::operator new calls if the allocation fails. As you might expect, the array versions of these functions are called whenever you deal with dynamic arrays of built-in types.
You can also replace the various versions of operator new and operator delete on a per-class basis. As Listing 23 shows, you just include declarations of the functions you want to replace in the class definition. The test program in Listing 24 creates and destroys both scalar objects and arrays of both base and derived classes so you can observe the interaction of constructors and destructors with class-specific versions of operator new and delete (see Listing 25) .

Placement new
Sometimes you want to create an object at a pre-determined address. The location could be in a region of RAM mapped to some device, or within a class-specific heap. You can construct such an object using the placement syntax for the new operator. For example, to construct a T at address p, do the following:

#include <new.h> T *tp = new (p) T;
In this case, no version of operator new executes, since the memory is already available. In fact, the default version of operator new associated with this use of new ignores its size parameter and just returns the address parameter:

void *operator new(size_t, void *p) { return p; }
You may need to include this code in your program since not all compilers provide this function yet. As Listing 26 illustrates, you can overload placement new with as many parameters as you like.
You can use placement new to keep a class's assignment operator in sync with its copy constructor without duplicating the code that initializes a new object. You explicitly invoke the destructor on the object's this pointer and reconstruct the new copy in-place, with placement new:
   T& T::operator=(const T& x)
   {
      if (this != &x)
      {
         this->T::~T();
         new (this) T(x);
      }
      return *this;
   }
Class-specific Storage Management
A common use for placement new is in managing class-specific heaps. As I illustrated last month, to save the overhead inherent in creating dynamic objects, it is often a good idea to allocate one large chunk off the heap and manage it yourself for allocating smaller objects. This action results in critical savings of time and space for container classes. The program in Listing 27 uses a class template for an expandable vector. To save overhead, the program allocates storage to hold a pre-determined number (CHUNK) of elements. Whenever you append an element to the vector, the program just constructs that element at the next available location:
   new (arena + length_++) T(x);
The program only calls the real operator new when it is time to expand the underlying storage pool:
   T *new_arena = (T*) ::operator new(sizeof(T) *
                                   new_capacity);
When it comes time to destroy a single element, you don't want to deallocate any memory, since it is part of the storage arena and can be reused. Instead, all you do is explicitly invoke the destructor:
   (arena+i)->T::~T();
The test program and sample output are in Listing 28 and Listing 29 respectively.

Summary
Using the new and delete operators is more convenient than using malloc and free, but there are a few gotchas to watch out for, namely:
1) Be sure to do deep copies for objects with members that point to the heap.
2) Make base class destructors virtual.
3) Always define a bad_alloc exception handler.
4) Override new and delete and/or their array versions for custom control of the heap.
I haven't exhausted all there is to say about dynamic memory in C++. In a future article I'll discuss smart pointers and garbage collection.

Answer to Last Month's Exercise
Last month I asked you to write a generic memory pool manager in C with the following interface:

Pool *pool_create(size_t elem_size, size_t init_alloc, size_t extent);
Creates a pool to manage elements of size elem_size. Allocates init_alloc elements initially, and increases the pool by extent elements as needed.

void *pool_get_elem(Pool *p);
Returns a pointer to the next free element in the pool, and updates the free pointer.

void pool_release_elem(Pool *p, void *elem);
Returns an element to the pool.

void pool_free(Pool *p);
Releases all the memory used by the pool.
My solution is in Listing 30 - Listing 31. Like the less generic solution in last month's article, it allocates a region large enough to hold a given number of elements, and then links them together by placing in each element a pointer to the subsequent element, as represented by Figure 3. The following structure definition allows me to interpret the contents as a pointer:
struct Overlay
{
   void *next;
};
For example, to return a pointer to the next free space and then advance past it, the function pool_get_elem does this:
   new_node = p">free_ptr;
   p">free_ptr = ((Overlay *) p">free_ptr)">next;
   return new_node;
See the cross-reference program (xref2.c) in last month's article for hints on where to put calls to the pool interface.