Client/Server Programming


Object Persistence with Relational Databases

Chuck Allison

A heavy-duty client/server application requires a lot more technology than a simple C++ program. But a judicious use of C++ machinery can really make a difference.


"Companies without a strategy for client/server are creating the next generation of legacy systems." [1] Those are strong words for today's developers who are peddling as fast as they can to keep up with the pace of our industry. Client/server (C/S) computing can indeed be a formidable challenge, but it is one that developers must face and master to be productive and competitive in the long term. Fortunately for this journal's audience, object-oriented (OO) technology, and C++ in particular, play a key role in C/S solutions.

In this article I present a simple C++ class framework for object persistence that can work with any SQL-conforming relational database, in either a stand-alone or C/S environment.

Client/Server Architecture

Client/Server concepts have evolved in part from a key insight of the structured development era — the separation of interface and implementation, which has also provided the basis for the current revolution in object-orientation. This high degree of modularity enables a "building block" approach that results in systems and components that are flexible, reusable, reliable, and efficient.

A typical C/S environment has three main categories of services (see Figure 1) :

Separating systems into these three layers has a number of benefits. First, a logically-layered architecture allows an orderly division of labor that optimizes the use of technical resources. Data services personnel are typically DBAs, experts in both the conceptual and physical aspects of the corporate DBMS. Business services developers create reusable components without any concern for user interface issues. User services personnel call upon business services without any direct knowledge of the data services layer.

In short, distinct teams can do what they do well, with results that fit together. Secondly, the number of opportunities for reuse increases. By developing modular components, new applications can plug into existing services, or can extend those services with a minimum of effort. Modular components also promote maintainability and distributivity. Each service resides in a unique component, providing a single point of update for manageable evolution. Service components can be deployed on a single server and shared by all clients in a way that optimizes network performance.

Traditional corporate information-systems (IS) solutions have been anchored in data modeling. This has caused undue emphasis on data structures and database models too early in the development cycle. Object modeling, on the other hand, encourages a service-based approach, allowing designers to focus on the responsibilities of business objects, and to leave physical data issues to implementers. In an object-centered architecture, developers and users alike focus on the business, and user services and data services exist merely to display and store business objects (see Figure 2) . It is usually much easier to get all the parties involved to agree on functionality than on schema and data type concerns.

Distributed Computing

From the perspective of modularity, this three-tier architecture is beneficial whether or not you deploy the different layers on separate platforms. Three logical tiers can exist on a single machine, or you can physically distribute them across distinct client machines, application servers, and data servers. In our environment, we use Visual BASIC for presentation on client machines, Visual C++ to implement business objects on application servers, and Oracle on data servers to provide object persistence. Our client machines still run 16-bit Windows, while the application servers operate under Windows NT 3.51, and the Oracle databases reside on VAXes running OpenVMS.

Two very critical architectural issues immediately present themselves:

1) How to connect the different layers efficiently across networks, and

2) How to store objects in a relational database.

Being mainly a Microsoft shop, we use Microsoft's connectivity solutions: OLE and ODBC (see Figure 3) . Remote OLE Automation provides client applications concurrent access to business objects in a networked environment. The ODBC standard resolves connectivity issues between applications and relational databases.

Mapping Objects to Tables

Persistence is unquestionably the struggling step-child of the OO revolution. Object concepts have been around since 1967, Smalltalk and C++ were under development almost 20 years ago, and methodologies have now evolved to the point that anyone not doing objects is just not "with it." When it comes to object persistence, however, most of the corporate world is definitely not "with it," and for good reason.

Not to say that robust object-oriented database management systems aren't available. There are a number of powerful OODBMSs that provide persistent storage of objects with a minimum of fuss. Typically, an OODBMS reads existing class specifications and arranges for retrieval and update of objects with little or no programmer intervention. But the complexity of corporate information systems as a rule stretches a DBMS to its limits, and OODBMSs have not yet gained wide acceptance in a corporate environment. Even if they had, the expense of converting corporate data services guarantees that relational systems will be the norm for some time to come.

Since a relational DBMS stores information in two-dimensional tables, it is necessary to map objects attributes to fit the relational model. The academic approach is to store objects of each class in a table unique to that class. If an object references another, its table contains the object ID of the referent as a foreign key. For embedded objects you have to find or assign the ID, and update both tables as required. With objects of derived classes, the base and derived parts reside in different tables, connected by a common ID. In all three cases you need to perform joins to compose an object from its separately-stored parts, which can result in unacceptable performance.

An alternative approach pieces subobjects together by reference. A typical OODBMS in C++ overloads operator-> to automatically query the database for referenced subobjects only when you need them. It's a lot of work to do this yourself, of course (otherwise there wouldn't be an OODBMS market!). We chose the following hybrid approach:

Objects having an identity (that is, containing an identifier member) must have their own table. Objects from the same hierarchy reside in the same table. Objects with trivial embedded objects, such as addresses and dates, are flattened — the subobjects are stored in situ in the database row, and reconstituted upon retrieval. All other embedded or referenced objects reside in their own tables and are accessed through simple pointers returned by member functions (with names like getParentOrg).

Our derived objects usually add functionality without adding extra attributes, so storing a hierarchy in a single table isn't as space-wasteful as it might seem at first blush. Coupled with Oracle's variable-length storage scheme, it's a non issue. To make this work, it is necessary to have a field in the table that indicates the type of object the row represents. As an example, Figure 4 depicts a simple hierarchy of organization types. The corresponding SQL columns might include some of the following:


ObjectID TypeID Name LocationID LeaderID


The C++ class definitions might then look like this:


class Org
{
    // Record attributes:
    long ObjectID;
    long TypeID;
    CString Name;
    long LocationID;
    long LeaderID;
public:
    //Accessors:
    long getObjectID()const;
    long getTypeID()const;
    CString getName()const;
    long LocationID()const;
    long LeaderID()const;
    //Other methods:
    Org *getParentOrg()const;
    Location *getLocation()const;
    Person *getLeader()const;
    protected:
    Org(longid=0);
};
class Division : public Org
{
public:
    // Will invoke Org ctor
    Division(long id = 0);
    // Functionality for Divisions...
};
    // etc...


To find the parent organization of another, you call the appropriate method and get a pointer back:


//Retrieve an Org by ID
Org myOrg(1234);
Org *myParent = myOrg.getParentOrg();
//Use parent object
cout << myParent.getName();
//Must delete referents explicitly
delete myParent;

The Open Database Connectivity specification (ODBC) manages database access through a connection. A connection string contains all the information necessary to identify and log on to a network database. Through the connection a program can execute any SQL statement and retrieve query results.

The Microsoft Foundation Class library (MFC) that accompanies Visual C++ encapsulates a connection in the CDatabase class. A CRecordset object provides the functions you would expect to traverse a query result set and update records. Visual C++'s Class Wizard creates custom recordset classes for you directly from pre-defined ODBC datasets. Listing 1 shows the header file generated for a projection of the Employees table in the North Wind sample database distributed with Microsoft Access, as shown in Table 1 (I've left out some fields to save print space). Listing 2 has the implementation file, and a test program is in Listing 3.

A Persistence Framework for C++

A business object class like Org needs to shield you from the details of DBMS access. It also needs to shield itself from the fickle forces of the class library industry. Instead of depending directly on MFC, it is better to have a layer of abstraction between the business object model and the DBMS access mechanism. This way the business services layer doesn't have to change should you replace MFC with another library.

My department has developed a persistence framework, which I call PFX, that encapsulates database access, and allows you to customize its functionality to suit your needs through polymorphism. As the object diagram in Figure 6 illustrates, PFX consists of five classes, which in the current implementation use CDatabase and CRecordset.

Connection is a concrete class that encapsulates the connectivity mechanism, and extends CDatabase with some housekeeping functionality (see Listing 4). The other four classes are abstract. (See Listings 5, 6, 7, and 8 — I've left out the .CPP files, but you can download them plus more from CUJ's FTP site). DBXface manages the connection and recordset objects required for reading or writing the database.

Class Query is an interface for building and executing queries and performing read-only operations on result sets. Update provides methods for updating or creating an object's row in the database. To integrate query and update capability with an object, objects derived from Persist cooperate with classes derived from Query and Update.

As Figure 7 illustrates, you use PFX by deriving one class each from Query, Update, and Persist. EmpRecSet, which Class Wizard builds for you, inherits from CRecordset. As you can see in Listing 9, you must provide overrides for the virtual functions getObjID, getDesc, and SQLName, in addition to defining member functions to access the fields.

Query's criteria-selection methods call SQLName to get the column names that correspond to the object's attributes. The test program in Listing 10 shows that you can specify selection criteria in two different ways:

1) store the desired SQL WHERE clause in the filter data member of the underlying recordset, or

2) build the WHERE clause with the criteria-building functions from the Query class. The latter is more tedious but doesn't require that users know SQL or the exact names of fields in the table. You can append any number of subclauses to be ORed with the first. You then execute the query and fetch the results.

In EmpUpdate (see Listing 11) , you specialize Update by overriding Update::getID, which returns the value in the recordset that corresponds to the object's identifier in the table, and Update::getIDTable, which returns the name of an auxiliary table in the database which contains the next available identifier for new records. You must provide this table. I added the table EmployeeID to the North Wind database with the value 10, since there were 9 records already. (NOTE: This ID-hack is single-user only. The production version of PFX uses Oracle sequence objects, which auto-increment in a way that allows multiple concurrent users.)

To finally obtain a persistent Employee class (Listing 12) , derive it from Persist, override the write and remove methods (these are the most challenging of all), and provide accessors to retrieve and modify attributes as needed. The test program in Listing 13 updates an existing object and creates a new one.

Summary

The architecture presented here lends itself well to heterogeneous environments which are subject to change. The modularity of objects allows you to define systems in layers so that they are more easily maintainable and pluggable. The production implementation supports simultaneous users through the ODBC connection mechanism. Each user gets his or her own Connection object which acts as a handle to PFX functionality. We support transactions through the CDatabase methods BeginTrans, CommitTrans, and Rollback, which pass the corresponding SQL command on to the database. This means that transactions cannot span multiple databases. You can fix this limitation, however, by using a transaction manager, such as Tuxedo. PFX has low overhead, performs well, and is flexible enough to accommodate any objects you want to make persistent.

References

[1] "Developing a Client-server Strategy," Final Report 102, Index Foundation/CSC, 1995, p. 3.

[2] "Implementing Business Objects," Final Report 104, Index Foundation/CSC, October 1995, p. 31.