Code Capsules

Visibility in C

Chuck Allison


What's in a Name?

Every token in a C program that begins with a letter or an underscore, other than a keyword or macro, names one of the following entities:

These entities are active at certain times and places in your program, depending on how and where their declarations appear. Whether or not you are aware of it, when you declare an identifier you determine its scope, lifetime, linkage, and namespace. In this article I will illustrate these inter-related concepts as Standard C defines them.

Scope

The scope of an identifier is the region of your program's text where you can use that identifier (in other words, where it is visible). There are four types of scope in Standard C:

1) Block — a region within a pair of matching {}-braces that begins where the declarator first appears and ends with the first subsequent closing brace

2) Function Prototype — the region in a function prototype from where the identifier occurs to where the last closing parenthesis appears

3) Function — the entire body of a function

4) File — the region in a source file from where an identifier first appears outside of any block to the end of the source file

Formal parameters have the same scope as if they were declared in the outermost block of their function. Identifier names are optional in function prototypes, and serve only for documentation if they appear. Block scope and function prototype scope together are sometimes referred to as local scope.

In Listing 1, the optional identifier val serves as documentation only, and is not visible outside the prototype for the function f. f's formal parameter i is visible immediately after the opening brace that defines f. Since each block introduces a new scope, the i initialized by j in the innermost block temporarily hides the i in the outer block. The value of j is available to initialize i in the inner block because j was declared first. If j had been declared like this:

{
   int j = i;    /* outer i */
then j would have received the value 1. There is no conflict because the inner i isn't visible until the next statement.

An identifier is visible as soon as its declarator is complete. The following declaration is ill-formed:

int i = i;
Since int i is sufficient to declare an integer named i, the i on the right is the same as on the left, and i is left uninitialized.

Only labels have function scope. There's a difference between function scope and the scope of a function's outermost block. Labels which have function scope are visible throughout the function, even before they are "declared," but identifiers with block scope are visible only after their point of declaration and can be hidden by other declarations in nested blocks.

An identifier declared outside of any block or function parameter list has file scope. Such identifiers are sometimes referred to as global, and are visible from their point of declaration until the end of the translation unit, unless they are hidden by an identifier with the same name having block scope.

The program in Listing 2 illustrates function and file scope. Since the identifier i with file scope (the one initialized to 13) is visible only after its declaration, it would be an error to try to use it in the main program. The global i is not available anywhere in f1 since f1 has a parameter named i. The innermost block of f1 in turn hides that parameter with its own i (the types don't have to be the same). Since f2 declares no identifiers named i, it has access to the global i. The declarations of fl and f2 in main inject those names into the body of main only. It would be an error, for example, to call f2 from f1.

Minimal Scoping

Thoughtful placement of declarations can greatly enhance the readability of a program. Most programmers still seem to follow the convention, required by languages such as FORTRAN and COBOL, of placing all declarations together in a single section of the source code. This gives you the advantage of always knowing where to look for variable definitions. But when a program gets large, you spend a good deal of time flipping back and forth between the declaration of an identifier and its point of use. Those of you who program in Microsoft Windows know that Hungarian notation, a convention of encoding the type of an identifier into its name, evolved as a means of compensating for the distance between a name's declaration and its use. At the risk of inciting a flurry of letters to the editor, I would like to suggest instead a simple technique known to C++ programmers which, when coupled with modular design, renders strange-looking Hungarian names unnecessary in most cases. I call it minimal scoping. Simply put, it means to declare an identifier as close as possible to its first use.

For example, what do you infer from the following program segment?

void f(void)
{
   int i, j;
   ...
}
Even though you may only use i and j in a small portion of f, the declaration says that they are available everywhere within the function. Therefore, i and j have more scope than they deserve. If, for example, i and j only apply under certain conditions, you can easily limit their scope by declaring them within an appropriate inner block:

void f(void)
{
   ...
   if (<some condition>)
   {
      int i, j;
      /* only use i and j here */
   }
   ...
}
Another advantage to this practice is that i and j aren't even allocated if the condition is false — they don't exist outside their block. C++ encourages minimal scoping by allowing declarations to appear anywhere a statement can, as in:

for (int i = 0; i < n; ++i)
   ...
The index i is visible from its point of declaration to the end of its block. Minimal scoping aids readability because you don't even see identifiers until you need to — when they add meaning to the program.

Lifetime

The lifetime, or storage durations of an object is the period from the time it is created to the time it is destroyed. Objects that have static duration are created and initialized once, prior to program startup, and are destroyed when the program terminates normally. Objects with file scope, as well as objects declared with the static specifier at block scope, have static duration. Listing 3 shows an example of the latter. The variable n is initialized once at program startup and retains its last-assigned value throughout the program. (Its scope, however, is just the body of the function count.)

Function parameters and objects declared within a block without the extern or static specifier have automatic duration. Such objects are created anew every time execution enters their block. Every time execution enters a block normally, that is, not as the result of a goto, then any initialization you may have specified is also performed. When execution falls through or jumps past the end of a block, or returns from a function, all automatic variables in that scope are destroyed.

The program in Listing 4 illustrates both static and automatic duration with the familiar factorial function. The token n!, pronounced "n-factorial" (without yelling), denotes the product of all positive integers up to and including n. For example,

3! = 3 x 2 x 1
4! = 4 x 3 x 2 x 1
etc.
Most math textbooks give the following equivalent recursive definition instead:

You can render this definition concisely in C with the following recursive function:

long fac(long n)
{
   return (n <= 1) ? 1 : n * fac(n-1);
}
When n is greater than 1, fac calls itself recursively, with an argument equal to one less than it started with. This action temporarily suspends the current scope and creates a new one, with its own copy of n. This process continues until the most deeply nested copy of n is equal to 1. This scope terminates and returns 1 to the scope that called it, and so on up to the original invocation. For example, consider the execution of the expression fac(3):

fac(3):  return (3 <= 1) ? 1 : 3 * fac(2);
This calls fac(2):

fac(2):  return (2 <= 1) ? 1 : 2 * fac(1);
which in turn calls fac(1):

fac(1):  return (1 <= 1) ? 1 : fac(0);
which returns 1:

fac(1):  return 1;
fac(2) now resumes and returns the following to fac(3):

fac(2):  return 2 * 1;
which returns the value 6 to the original caller.

The program in listing 4 traces this recursive computation by wrapping the factorial formula with statements to print the value coming into the function and the computed value going out. This program keeps track of how deep the recursion has nested with the static variable depth. Since depth has static duration, it is allocated and initialized once prior to program startup and retains its value across function calls (including recursive ones). Only automatic variables, like n, are replicated with each recursive call. The auto keyword is purely documentary, since all variables with block scope are automatic by default.

Linkage

According to the rules of linkage, two same-named identifiers can refer to the same object, even if they occupy different translation units. There are three types of linkage in C:

1) External linkage — names across translation units in a program

2) Internal linkage — names throughout a single translation unit

3) No linkage — certain objects are unique, hence, they have no linkage

Both functions and global objects declared wihout the static keyword have external linkage. There must be only one definition of each such object, but there may be many declarations that refer to that definition. For example, if the following declaration occurs at file scope in a file:

/* file1.c */
int x;
then it can used in another file at any scope where the following occurs:

/* file2.c */
extern int x;
The extern specifier in essence says, "find an object named x defined at file scope." The extern specifier is not required to link to a function with external linkage:

/* file1.c */
int f(void)
{
return 1;

}

/* file2.c*/
int f(void); /* extern specifier assumed for functions
              - links to f in file1.c */
Although the C standard doesn't explicitly define it, you can think of objects with external linkage as constituting a new scope: that of objects visible across translation units. On the street this is known as program scope.

Functions and global objects declared with the static specifier have internal linkage. Identifiers with internal linkage are visible only within their translation unit. This use of the keyword static has little to do with the static storage duration specifier discussed previously. It's a good idea to commit the following pseudo-formulas to memory:

static + block scope == static storage duration
static + file scope == internal linkage
The first use of static alters lifetime, the second linkage. If you think this is confusing, C++ muddies the waters further by introducing a third use of the term (static class members). I'll spare you that one until next month.

Certain program entities, which are always unique, are said to have no linkage. These entities include objects having block scope but no extern specifier, function parameters, and anything other than function or object, such as a label, tag name, member name, typedef name, or enumeration constant.

The source files in Listing 5 and Listing 6 comprise a single executable program that illustrates the different types of linkage. Listing 5 is similar to Listing 2 except that functions f1 and f2 are private to the source file, and a function from the source file in Listing 6 is added to the executable program. The integer i at file scope in Listing 5 has external linkage because it does not carry the static specifier. A variable of the same name in another file can refer to it if declared with the extern specifier, as Listing 6 does. (It is an error to have two definitions of the same object with external linkage, e.g., two i's modified by neither static nor extern.)

The functions f1 and f2 have internal linkage because they use the static specifier. The float object named i in f1 has no linkage because it is declared at block scope without the extern specifier. The integer j in Listing 6 has internal linkage because of the static specifier, and the function f3 has external linkage because of the absence of the static specifier.

The following three lines from Listing 5 require particular explanation:

extern void f1(int);   /* Internal Linkage */
extern void f2(void);  /* Internal Linkage */
extern void f3(void);  /* External Linkage */
Since the extern specifier means "link with something at file scope," the declarations f1 and f2 in main link respectively with the functions of the same name in the same file, which happen to have internal linkage. It is important that you declare f1 and f2 static at file scope before the extern references in main, or else the compiler will assume that they have external linkage (like it does for f3), which conflicts with the actual function definitions later in the file.

Namespaces

An identifier can play various roles in a C program. For example, in the following excerpt, pair is both a function name and a structure tag:

struct pair {int x; int y;};

void pair(struct pair p)
{
    printf("(%d,%d)\n",p.x,p.y);
}
The compiler keeps separate lists of identifiers used in different roles, so there is no danger of ambiguity. These lists are called namespaces. There are four different types of namespaces in standard C:

1) labels

2) tags for structures, unions, and enumerations

3) members of structures and unions

4) ordinary identifiers (i.e., all others: data objects, functions, types, and enumeration constants).

Each function keeps its own list of labels. Each translation unit and each block keeps its own set of namespaces for tags and for ordinary identifiers, which is what allows an inner scope to hide like- named entities at outer scopes. Each structure or union type keeps its own list of identifiers. Enumeration constants belong to the space of ordinary identifiers, since you use them like objects in a program.

The program in Listing 7 uses the following namespaces (arbitrary names are mine):

The comments in the source file indicate the namespace that claims each use of the identifier foo. I hope you find this program a little (nay, a lot) confusing. My monotonous overuse of a single identifier was for illustrative purposes only. You should only reuse names for good reason — and I can't think of any at the moment, without discussing C++ overloading. If you see such code in "real life," treat it like a sensitive government document: DESTROY BEFORE READING!

Summary

There's a lot to a name in a C program. Each identifier has a scope (where it is visible), a lifetime (when it is active), a linkage (whether remote uses of the same name refer to the same entity), and a namespace (its role among identifiers). If this article hasn't made sense to you (but I hope it has), guess what: it gets worse! I can't think of a single concept I've discussed here that C++ doesn't affect. Now you know the subject of next month's article.