Classes are unavoidable in Java, but instances of classes a.k.a. objects are just a plain good idea.
You can't avoid classes in Java; they are the stuff that Java programs are made of. Every class definition is compiled into byte code in its own .class file. When you invoke a Java program, .class files load automatically as needed. But what are classes for? So far in this column I have used classes merely as containers for a static main method to illustrate basic language features such as primitive data types, operators, and control structures. But the power of object-oriented programming comes from objects, which are instances of a class. This month I'll introduce objects in Java.
Handles and the Heap
Suppose you're writing an application that deals with employees of a company, such as a personnel or payroll system. Somehow you need to represent the concept of an employee in your program. How would you do it? A C programmer would probably define an employee record with a struct that holds the pertinent data types, and then write functions that process variables of the struct type. Here's a sample definition of such a struct:
struct Employee { char last[20]; char first[20]; char title[15]; int age; };and here's how you might use it in a program:
void process_employee (struct Employee* ep) { printf("{%s,%s,%s,%d}\n", ep->last, ep->first, ep->title, ep->age); } int main() { struct Employee e; strcpy(e.last, "Malone"); strcpy(e.first, "Karl"); strcpy(e.title, "Forward"); e.age = 36; process_employee(&e); return 0; }Compare this to the equivalent Java code in Figure 1. Aside from syntactic differences such as pointer indirection, using the class keyword instead of struct, and there being no semicolon after the class definition, the first important feature to note in the Java program is the public keyword. In Java, as in C++, you must specify the access level of class members. Java requires the access specification to accompany each member definition; you can't have sections where you implicitly define members of equal access as you do in C++. Java has four different access level specifications, listed below in decreasing amount of visibility:
1. Public access, denoted by the keyword public. Public members can be accessed anywhere.
2. Protected access, denoted by the keyword protected. Protected members are not only accessible in derived-class methods, as in C++, but are also have package access. (See the next item I'll discuss packages in the September 1999 issue.)
3. Package access, denoted by the absence of any accompanying keyword. Members with package access are visible anywhere in the package they appear. (You can think of a package as a directory.)
4. Private access, denoted by the keyword private. As in C++, private members are visible only within their class. There is no concept of friends in Java. Instead of breaking encapsulation you use package access (more on this in September).
The more crucial difference between the two code examples is in the declaration of the Employee object itself. In the C version this object resides on the stack, although I could have made it static or could have allocated it on the heap. In Java you have no choice: all objects reside on the heap. Only primitive data items and object handles live on the program stack. Whenever you create an object in Java, you must use the new operator, which allocates the object on the heap, and returns a handle to that object for local use. Some people call what the new operator returns a pointer or a reference, but since it behaves only partially like a C++ pointer (you can't apply any operators to it), and not like a C++ reference at all (they don't exist in Java), I along with many others prefer the term handle. When processEmployee executes in Figure 1, it receives a copy of the handle of the Employee object from main (which is similar to passing a pointer in C, except the indirection is automatic).
C++ programmers will be quick to notice something "wrong" in Figure 1: there is no evidence that the Employee object's heap memory is ever freed. Hunting for memory leaks is common sport in the C++ world, and one of the first things a C++ developer gets used to is ensuring that there is a delete for every new in a program. Java has no delete because it has garbage collection, which means that the JVM (Java Virtual Machine) is responsible for detecting if an object is no longer referenced and freeing its heap memory. Hurray! We don't have to worry about it! But wait a minute. What does this mean with respect to performance? Well, there is a cost, and that is still a subject of debate between advocates of the two languages. If performance is super-critical, then C++ is still probably a better choice, but for a large class of applications, especially GUI-based ones, Java is quite acceptable. Many people find that the freedom from worrying about memory leaks more than compensates for the performance gap between Java and C++, and that gap seems to be narrowing all the time.
From Records to Objects
The C example above is of course woefully out of date. Practically no one uses plain structs anymore. Instead they use C++ so they can define methods within the Employee class itself. It is also usually a bad idea to give your data members public access, so the object-smart programmer makes them private and provides methods to control how data members are accessed by users. I have done this very thing in Figure 2. I declared the get and set methods non-static, because they operate on behalf of an Employee object. You invoke these methods with the dot-operator, as I did here in EmployeeTest2.main. As in C++, all non-static methods have a hidden this parameter, which is a handle to the object associated with the current invocation. Were you to call the getLast method, for example, the statement
return last;would resolve to
return this.last; // same thingbecause the compiler looks in the scope of the class to find a match for the identifier last. In the set methods I have to use the this keyword because I have named the input parameter identically to the associated field name. Some people really object to the extra typing and prefer something like the following instead:
public void setlast(String lst) { last = lst; }I have also removed the definition of processEmployee and added a toString method. Since all I want to do is print out a representation of an Employee's data, I decided to take advantage of this special method name for convenience. When you pass System.out.println any object other than a String, it looks for a method named toString to convert it to a String.
You may have noticed that making the data fields private hasn't accomplished much here, since the user can still change an Employee's attributes at will. Well, use your imagination. You could easily insert processing into the set methods to control the changes a user can make. The point is that the object's state is separated from its interface (i.e., its public methods), so I can change the former without affecting the latter, and users are none the wiser. This technique of hiding implementation is called encapsulation, and is crucial for creating well-behaved classes.
Object Initialization and Destruction
What would have happened in Figure 2 if I had tried to print the Employee object before I had called the set methods for the data members? In C or C++ there's no telling what would happen, since there is always garbage in uninitialized (non-static) objects. Java, on the other hand, always initializes the fields of an object to default values before anything else happens. Primitive values are initialized to the appropriate flavor of zero and object handles are initialized to a special value represented by the keyword null. This initialization applies only to fields in an object, not to local variables defined in methods. If you try to use an uninitialized local variable the compiler will issue a diagnostic.
Default initialization is a good thing, but what you really want is a way to do your own initialization automatically when an object is created. You do this in Java the same way you do in C++, via constructors. A constructor is a method with the same name as the class, and can be overloaded, just like regular methods can. The program in Figure 3 has two constructors, a default (i.e., no-argument) constructor and a constructor that expects one argument for each object field, respectively. These constructors execute for the variables e0 and e, respectively, in the main method. You can see evidence of default initialization when e0 prints.
The effect of defining the fields last, first, title, and age without the static keyword is that each Employee object has its own copy of those fields. For this reason, non-static fields are sometimes called instance fields (or data members, or attributes, whatever) because they apply to each instance. Likewise, non-static methods always execute on behalf of an object of their class, and are called instance methods. Sometimes you want to have a field or method that applies to an entire class, not just a particular object. These members, called class members, are defined with the static keyword. Since this column commenced in the beginning of this year we have seen one particular static method over and over again: main. When you invoke the virtual machine with a command such as
$ java Foothe JRE (Java Runtime Environment) loads the class Foo from the file Foo.class and invokes Foo.main. If you leave the static keyword off in main's definition the JRE won't find it and will report an error.
Likewise, if Foo has a static method named f, you can call it with the syntax Foo.f().
The program in Figure 4 has a static field, count, which keeps track of the number of Employee objects currently in use, and a static method, getCount, which yields the value of count on demand. Note that you can initialize a field right in its definition. If static fields require more complex initialization than can easily be put in a single expression, you can do the work in a static initialization block, as follows:
class Employee { static { <do some complex stuff> count = <whatever>; } }Of course you can only access static fields and methods in a static initialization block (there is no this). Static initialization occurs when a class is first loaded. You can also have a non-static initialization block (just remove the static keyword), which is called whenever any object of its class is created, no matter what constructor ultimately executes. Such a block acts as a shared constructor (see Figure 4): it executes first, followed by the particular constructor that matches the new expression. You can have multiple initialization blocks, and they execute in the order written (static first of course, as explained above).
Java has no concept of a destructor that executes when an object goes out of scope. This presents a problem for my object count scheme, since there is no implicit way to mark an object unused. The only sure way to get the job done is to do it yourself, so I've defined a release method to call explicitly to decrement the count. Of course, this simple example isn't very robust nothing will stop you from using an object after you call release. The only sure-fire alternative would be to define a state field that indicates whether an object is usable, and which you must manage yourself. This is definitely an inconvenience for C++ programmers using Java.
It is tempting to use Java's finalize method for a destructor. When an object is about to be garbage-collected, finalize will run on its behalf, if you have defined one, so you can have it do the cleanup. There are a number of problems to consider though:
1. Objects can be garbage-collected in any order, so you have no control over the order of object cleanup via finalize.
2. Objects are not guaranteed to be garbage-collected at all.
3. Any objects that are awaiting garbage collection when a program exits are not cleaned up.
There is a static method in the System class, System.gc, that will invoke the garbage collector; but again, there are no guarantees. The program in Figure 5 moves the decrement of count in the Employee class from release to finalize, and sets each object to null (which marks it as unreferenced and hence available for garbage collection) before invoking the garbage collector, and everything seems to work fine. But the following example, contributed by an Astute Reader, shows that you can't rely on finalize:
public class Dummy { public static void main(String[] args) { int count = (new Integer(args[0])).intValue(); for (int i = 0; i < count; i++) { jimbo j = new jimbo(i); } System.gc(); } } class jimbo { private int i; public jimbo(int i) { this.i = i; } protected void finalize() { System.out.println(i); } }This program creates a number of jimbo objects, which hold consecutive integers. After these objects go out of scope, it invokes System.gc, but every time I run this program, regardless of the number I enter for args[0] on the command line, the last two objects are not cleaned up. Go figure. Bottom line, if you must deallocate resources on an object basis, do it explicitly. A practical use for finalize would be to verify that an object had been previously cleaned up by inspecting a flag that the clean-up routine sets, and perhaps throwing an exception or otherwise logging an error if the flag was not set.
Class Relationships
The usual way to express simple relationships between objects is to have them store handles to each other as needed. Consider the relationship between an employee and a Department. If each employee belongs to only one Department, then you can place a non-static Department field in the Employee class and initialize it with a Department object. This is also sufficient to express that a Department has many employees there is no need to place a list of Employees in a Department object. (You can just traverse all the Employee objects looking for a particular Department.) You may, however, want to record who is the manager of each Department. The program in Figure 6 defines a Department class and implements these relationships by defining the appropriate fields and get/set methods in each class. As you look at EmployeeTest6.main, remember that the arguments to the set methods and the return values from the get methods are just handles. This is why I don't need to use the new operator in the statement
Department d = e.getDepartment();Objects are always referred to through handles, while the corresponding data on the heap stays put.
Summary
The virtues of objects are no longer confined to the research laboratory or SmallTalk shop. In fact, it's very difficult to find developers nowadays who aren't object-savvy. But whereas C++ supports different programming paradigms, you just can't use Java effectively without thinking in objects. In the next article I'll talk more about objects and how to package them.