May 1994/Code Capsules

Code Capsules

Visibility in C++

Chuck Allison

In last month's article I explained the C concepts of scope, lifetime, linkage, and namespace. An identifier's scope is the region of program text where it is visible. In addition to the local, function prototype, function, and file scopes of standard C, C++ introduces class scope and namespace scope. These two scoping mechanisms are unique in that they do not have to span a single, contiguous region of text, and can even be distributed across multiple files. You can also minimize the scope of local variables in C++ by delaying their definition to their point of first use. The lifetime (or storage duration ) of an object is the duration between its creation and destruction. C++ has special rules for the duration of static objects and for temporary objects that compilers generate when evaluating expressions. Remote uses of the same identifier are related by the rules of linkage. Unlike C, C++ provides type-safe linkage, an extension of type-checking across translation units. C++ also allows you to link with functions defined in other languages. C-style namespaces are lists of identifiers that play the same syntactic role in a program (e.g., labels, tags, or structure members). C++, in addition to making a slight change to the namespace rules of C, allows you to define your own namespaces. This feature allows you to use multiple libraries without the inconvenience of name conflicts. In this article I will attempt to illustrate the C++ concepts of scope, lifetime, linkage, and namespace as defined in the current standard working paper.

Declaration Statements
In C++, a declaration can appear anywhere a statement can. This means that instead of having to group declarations at the beginning of a block, you can define objects at their point of first use. For example, in Listing 1 the array a is visible throughout the function body, but n is not valid until line 9, and i until line 10. The variables a and n persist until the end of the block. In current implementation so does i, as this code reflects by not redefining i in the second loop. At the standards meeting in March of this year, however, we voted to restrict the scope of i to the body of its for loop. We did this so that variables declared in the initialization portion of a for loop have the same scope as those declared in the condition.

Conditional Scope
Although few (if any) compilers currently support it, C++ allows you to declare objects within a condition clause, as in

if (int i = x) // for some visible x whatever...
The scope of i is the body of the if, just as if you had written the following instead:

if (x) { int i = x; whatever... } // i dies here

Scope of Caught Exceptions
An exception parameter in a catch clause is visible throughout the body of the clause, similar to a function parameter, as in:

catch (T& x) { cout << "EXCEPTION: " << x.what() << endl; } // x dies here
It is an error to define an entity with the same name as an exception or function parameter in the outermost body of its respective catch clause or function.

Class Scope
Every class definition creates a new scope. The scope of a class includes its declaration, all member function bodies, and constructor initializers (this extends to derived classes and nested classes as well). You can only access members of a class in one of the following ways:
1) as the target of the . operator (e.g., x. foo(), where x is an object of the class or of a derived class)
2) as the target of the -> operator (e,g., xp->foo() where xp points to an object of the class or of a derived class)
3) as the target of the scope resolution operator used with the name of its class or a base class (e.g., ios::binary — this applies only to static members)
4) from within the body of a member function of the class or of a derived class
These rules are themselves subject to the rules of the access specifiers public, protected, and private. Private members are visible only to member functions of the class and to friends of the class. Protected members are also visible to members and friends of derived classes. Public members are available universally.
Listing 2 shows the definition of a simple date class. An implementation is shown in Listing 3. If you want to know the month associated with a certain date, you do not attempt to access the data member directly, as in
Date d;            // today's date
cout << d.month_;  // access denied
You use the appropriate member function instead:
Date d;            // today's date
cout << d.month(); // OK
Most of the members of the Date class are non-static. Each Date object has its own copy of each non-static data member (i.e., month_, day_ and year_), and each non-static member function must be called in connection with a Date object (e.g., d.month()). A static member belongs to the entire class instead of an individual object. (Other object-oriented languages call static data members class variables and static member functions class methods, while they call the non-static equivalents instance variables and instance methods.) For example, there is only one class-wide copy of the array dtab. If it were public, you could access it like this:
//  display the number of days in a leap-February:
cout << Date::dtab[1][2];
Using the scope resolution operator (::) in connection with a class name allows you to access a static class member directly. Since the isleap member function is public, you can use it to determine the "leapness" of any year like this:
int y = 1994;
if (Date::isleap(y))
   //  It's a leap year...
else
   //  It's not...
You must explicitly define static data members at file scope (see the definition of dtab in Listing 3) .
As Date::day(int) in Listing 3 shows, other class members are in scope within the body of a member function, so you can access them directly, without explicit scope resolution. Every nonstatic member function uses a hidden parameter which points to the object with which it is associated; that object is available to you through the keyword this. Inside of Date::day(int), the compiler interprets the usage of day_, isleap, and dtab as if you had written this->day_, Date::dtab, and Date::isleap instead. There is no this pointer for static member functions, since they have no associated object. Hence using the identifier day_ unadorned in a static member function such as isleap makes no sense (what object does it belong to?), and would refer to an identifier of the same name from an enclosing scope, if there was one. Static member functions can directly refer only to enumerators, nested class names, and other static members.
Since the code for Date::day_(int) is a part of a scope, any local or member identifier with the same name as a global identifier hides the global identifier. For example, in the unlikely case that I had a global identifier named day_, it would be hidden by the data member day_ inside all non-static member functions of Date. You can still access a global identifier, however, by using the scope resolution operator without a prefix:
day_ = ::day_;
     // assign global day_ to member
In effect, C++ allows you to "unhide" global identifiers. Here's another example:
int i = 10;

main()
{
   int i = 20;
   cout << "local i:  " << i << endl;
   cout << "global i: " << ::i << endl;
   return 0;
}

// Output:
local i: 20
global i: 10
It is good practice to explicitly qualify global functions with the scope resolution operator for documentation purposes, and to avoid name conflicts with member names (see the implementation of the constructor Date::Date() in Listing 3) .

Nested Classes
In addition to data members, member functions, and enumerated constants, you can define other classes within a class definition. For example, a common string class implementation technique called "copy-on-write" allows several string objects to point to the same underlying text (see Listing 4) . It only makes a separate copy when one of the strings changes its contents (this saves time when strings are passed as parameters — local read-only processing uses a "shared" copy). A separate class (Srep) handles the details of connecting and disconnecting references to the underlying text. There is no reason for the Srep class to be globally accessible, so I define it inside the String class. The implementation in Listing 5 shows how to use the scope resolution operator to reach the members of Srep. For example, to inform the compiler that I want to define the constructor Srep, I use the declaration String::Srep::Srep.
In the unusual case that a name in a nested class hides a name in an enclosing class, you can reach the outer instance with the scope resolution operator (only if it's public, of course — the usual access rules apply to and from nested classes). For example, in the following situation
class A
{
   int i;
   
   class B
   {
       int i;
       void f() { cout << i + A::i; }
   };
};
you can still reach the i in class A by the expression A::i.

Local Classes
A class defined within a function definition is called a local class (see Listing 6) . Its name is local to the function body, like any other identifier declared therein. You must define all member functions within the class definition. You can't define them at file scope, because the class is not visible there. This also rules out static data members of Local, since you must define them at file scope. You can't define function bodies elsewhere within f, either, because function definitions cannot nest in C or C++. For the same reason, a member function of Local can't refer to an automatic variable of f. As always, member access rules apply, so k is not visible in f outside of Local.

Modules
Since the class construct in C++ is a scoping mechanism, you can use it to create modules. A module is a packaging of related data and functionality. Listing 7 has a definition of a class to control the shape and position of the cursor. You will notice that the Cursor class has only static members. Since I do not plan on instantiating any Cursor objects, I just want to package some cursor-control functions together. I don't need any non-static data members. Enumerations in a class definition belong to the whole class, just like static data members. The state data member keeps track of whether the cursor is block shape or a thin line, using the enumerated constants LINE and BLOCK. The sample program in Listing 8 illustrates some of the Cursor functions.
The implementation in Listing 9 is for IBM PC-compatibles. It initializes the state data member to LINE. As a side effect, this implementation also changes the cursor to a line shape. This illustrates another departure from C: in C++, you can use functions to initialize static objects (only automatic objects can be initialized this way in C). cursor.cpp itself uses another module, class Video, which writes directly to video-mapped memory (see Listing 10 and Listing 11) .

Storage Duration
Automatic objects behave the same in C++ as they do in C — they are created when execution encounters their definition within a block, and are destroyed when execution leaves the block. C++ does add meaning to the words "created" and "destroyed," though. When C++ creates a user-defined object it calls the appropriate constructor. If you do not supply constructors the compiler will generate them for you. Similarly, a destructor is required to destroy a user-defined object.
In C, objects with static storage duration are initialized at program startup and remain active throughout program execution. Since such objects can only use constant expressions as initializers, this isn't a challenge for compiler writers to implement. Because objects in C++ usually require constructors, however, static objects must be able to call functions in their initialization. For this reason, C++ allows statements at file scope such as this one from Listing 9:
int Cursor::state = Cursor::init();
A local static variable is initialized the first time control passes through its declaration. Non-local static objects (including static members) are initialized before the first use of any function in their translation unit (but exactly when is unspecified). Since initialization functions can conceivably rely on other objects which rely on other initialization functions etc., (any of which can be in another translation unit!), there is no single point in time that the compiler can identify as "program startup" for static initialization. If dependencies between inter-file objects are sufficiently complex, it is possible that you might use a global static object before initializing it. There is no foolproof way to include in a single translation unit all the information necessary to initialize global objects in different translation units in the correct order — the compiler needs information about the inter-file dependencies. This is a non-trivial problem that the standards committee is currently working on.

Lifetime of Temporaries
Compilers generate a number of temporary objects when evaluating expressions at run time. The expression x = a + b + c, for example, requires at least one temporary. This expression might result in actions equivalent to the following statements:

temp = b + c; x = a + temp; // destroy temp here
In order to conserve resources, it is important for the compiler to destroy temporaries early, but not too early. The following example from Andrew Koenig will illustrate. Consider the class declaration:

class string { public: string(const char *); operator const char*(); string operator+(const string&); //concatenation // other members omitted };
The const char * operator lets you use string arguments with standard C library functions, for example

string s, t; ... printf("%s\n",s+t);
You would certainly want the temporary from the expression s+t to persist long enough for printf to process it (but not all current compilers guarantee this!). You might want to break the print into two current statements, as in

string s, t; ... const char *p = s + t; printf("%s\n",p);
but it's not a good idea. Until July 1993, when temporaries are destroyed was left to the discretion of implementators, so some compilers may have produced the desired results with the preceding code. Now, however, the policy is defined such that the temporary assigned to p shall not persist until the printf statement, rendering the above code ill-formed. The rule states that temporaries remain until the end of the full expression in which they appear. In most cases, full expression just means until the end of the enclosing statement, so temporaries will not persist from one statement to the next. A full expression is one that is not part of another expression. Some examples are: an initializer; the controlling expression of an if, while, or switch; the expressions in a for statement; or an expression returned from a function.
The only exception to this rule is for references bound to a temporary. For example, consider the following statements:

class T {// whatever}; T F(); const T &r = f();
The temporary object returned by f persists as long as r is in scope, as you would expect.

Linkage
Linkage occurs when two or more declarations refer to the same entity. Functions and global objects declared without the static storage class specifier have external linkage, which means they are visible across translation units. The static specifier gives a global identifier internal linkage, which limits its scope to its translation unit. Typedef names, enumerators, template names, and local names declared without the extern specifier have no linkage, and therefore denote unique entities within a program.
In another departure from standard C, const objects at file scope have internal linkage, unless declared with the extern specifier. This convention allows the common practice of placing const definitions (instead of macro definitions) in include files without the linker complaining about multiple references. If you want const objects with external linkage, qualify each with extern and initialize exactly one, as in:

FILE 1 extern const int x = 10; FILE 2 extern const int x; // refers to x in File 1
Static members have external linkage, which explains why you must explicitly define them at file scope. They are essentially global identifiers that also obey scope resolution and member access rules. Non-inline class member functions have external linkage, which allows them to be defined in a separate file from the class declaration. Non-member inline functions have internal linkage.
Because the inline keyword is merely a hint to the translator, it is difficult to classify the linkage of inline member functions. If such a function is inlined, you can think of it as having internal linkage (it is common practice to put them in the class header file, anyway). If the compiler is unable to insert the code inline, it must create an out-of-line function. Some implementations like to have only one out-of-line definition to save code space, but having this restriction implies external linkage. Ambiguous linkage of inline functions is another problem the standards committee members are working on. The current paper merely states that an inline member function must have only one definition in a program.
Class names have linkage too. A class name has external linkage if:

it has any static members

it has a non-inline member function

you use it in declaring a function, object, or another class with external linkage.

it appears in a template definition
Otherwise it has internal linkage. Classes in typical applications tend to have external linkage.

Type-safe Linkage
Standard C does not guarantee that a function defined in one translation unit will be used correctly in another. For example, consider the source files:

File 1 void f(double x) { ... } File 2 extern void f(int); ... f(1);
The function defined in File 1 is expecting a double argument, but someone misinformed the user in File 2, so the program is broken. Existing C practice calls for both files to include a common header file that has the correct prototype for f. C++ takes this responsibility from the programmer and gives it to the development environment through the concept of type-safe linkage, which guarantees that the two versions of f above do not link. Most implementations achieve this by encoding information about the formal parameters into a hidden name that the linker sees (this technique is usually called "name mangling"). For example, a compiler might encode the name in File 1 as

f_Fd // "f is a function taking a double"
and the one in File 2 as

f_Fi // "f is a function taking an int"
Since the linker sees two distinct names, the reference to f from File 2 to File 1 is not resolved, causing a linker error. Hence, you can't even produce an executable unless you call your functions correctly!

Linkage Specifications
C++ appeals to C programmers for a number of reasons. Most of what they know applies unchanged in C++. They can ease their way into object-oriented programming by first using C++ as a "better C" until they get used to its syntax and idiosyncrasies. But production programmers need something even more important: to be able to use existing C code without even recompiling. It might seem that name mangling makes this practice impossible. How do you call a C function, say, f(double), from a C++ program when the C name is f (or more commonly, _f), and the C++ name is f__Fd? You can instruct the compiler to turn off C++ name mangling with a linkage specification:

In the C++ file: extern "C" f();
The extern "C" specification tells the compiler to generate the link name according to C rules so the linker can find it in the existing C object code. An implementation may support other specifications in addition to "C".

Namespaces
An identifier can play various roles in a program. For example, in the following excerpt, pair is both a function name and a structure tag:

struct pair {int x; int y;}; void pair(struct pair p) { printf ("(%d,%d)\n",p.x,p.y); }
The compiler keeps separate lists of identifiers used in different roles, so there is no danger of ambiguity. These lists are called namespaces. There are four different types of namespaces in Standard C:
1) labels
2) tags for structures, unions, and enumerations
3) members of structures and unions
4) ordinary identifiers (i.e., all others: objects, functions, types, and enumeration constants).
Because of the preeminence of types in C++, you can use structure tags (which are really class names) without the struct keyword, for example:

void f(pair p); // struct keyword optional
In other words, all struct names act as type names, as if you had also defined a typedef with the same name as the tag:

typedef struct pair pair;
This means that you cannot use a name as both a structure tag and a typedef for a different type in the same scope. Since typedefs belong to the space of ordinary identifiers, C++ has merged that namespace with the tag namespace (numbers 2 and 4 above). As a compatibility gesture, C++ still allows you to define identifiers with the same name as a tag, mainly because of the following common C practice:

struct stat{...}; extern int stat();
Problems can now arise, however, when you use the same name for struct tags and ordinary identifiers. In the program in Listing 12, the local integer named pair hides the global struct identifier, causing the compiler to flag the line
pair p = {pair,pair};
as an error. You can get around this problem by using the struct or class keyword explicitly:
struct pair p = {pair,pair};
Namespace Scope
In both C and C++ there is a single global scope. Unfortunately, most people call this the global namespace (sorry — we have to live with it). Whatever you call it, it's the place where a lot of names get dumped: all of your global types, functions, and variables, along with those of the libraries that you use to build your applications. The buzzword for this prolific dumping is "namespace pollution." Since you and all the vendors in the world didn't sit down previously and decide who will use what names, it is conceivable that name conflicts will arise. To alleviate the problem, many vendors attach a corporate prefix to their global names (e.g., ACME_List, ACME_Widget, etc.). If vendors don't, you may have to (My_list, My_Widget, etc.). An alternative work-around is to wrap the names in a class, if you can, like I did in Listing 7. Just hope the vendors haven't used the same class names as you have!
The standards committee solved the pollution problem by allowing you and vendors alike to define named scopes. But they don't call them scopes. You guessed it! They call them namespaces! You can place anything into a namespace that you can define at global scope, which is more flexible than the class wrapper hack. Listing 13 and Listing 14 show the interface and implementation of a namespace version of the Cursor module.
There are three ways to use the Cursor namespace. The first is with the scope resolution operator, exactly as shown in Listing 8. The second is to import the names you need in to the local scope with using declarations (see Listing 15) . The statement
using Cursor::block();
injects the name block into the local scope. Like other local names, block can hide a global identifier of the same name, or clash with another local one. This is the preferred way to use namespaces, since it doesn't pollute the global namespace, and it doesn't require explicit scope resolution to get at the namespace's identifiers.
The third method is to in essence dump the entire set of names right back into the global namespace with a using directive (see Listing 16) . The statement
using namespace Cursor;
makes all the names from the namespace available almost as if you had declared them globally. In this case the Cursor identifiers can be hidden by like-named identifiers at local scope, and can clash with other global identifiers. You cannot use global scope resolution to access names from the namespace scope, however (e.g., ::init).
With namespaces, the only things left to conflict with each other are namespace names, but their numbers will be comparatively few. To minimize conflicts, I can use an elaborate name, like My_cool__cursor_package, and then define a short alias locally for convenience:
namespace Cursor=
   My_cool_cursor_package;
using Cursor::block();
// etc.
The linker only sees the original name. If I find a package I like better, I only have to replace one statement:
namespace Cursor =
   Your_cool_cursor_package;
The standards committee is currently integrating the namespace mechanism into the standard C++ library. There is more to say about namespaces (for example, you can nest them), but I can hear my editors complaining about the length of this article already.

Don't Try This at Home!
Some of the features mentioned in this article are not yet available in any C++ implementation. Namespaces, the rules for lifetime of temporaries, and conditional scope were accepted as part of the language in 1993. If you feel overwhelmed by all of this, you are not alone. C++ is evolving to meet the needs of developers who solve complex problems. Fortunately, you don't have to master everything at once. Use language features only as you need them, and the language won't get in your way.