Java Solutions


The Java 2 Collections — Finally, Some News You Can Use

Chuck Allison

Inspired by STL, Java 2 adds still more containers and algorithms for your coding enjoyment.


It's been over 20 years since I first studied data structures. Yes, that was in the days of FORTRAN, COBOL, BASIC, and PL/I, and I am happy to be among the crowd of programmers to have implemented trees in FORTRAN (what fun!). Actually, it was frustrating to learn recursive data structures and algorithms and not be able to implement them directly because of a language that came up short on features. Then along came Pascal and C and we all went crazy writing code that looked like the pseudocode in our textbooks.

But implementing lists and trees for every new user data type that comes along gets old after a while, not to mention error prone. Two things have come along to change all that: objects and templates. With objects you can write data structures once and for all in terms of some high-level class (like Object), as long as your classes hook into that hierarchy. This is the approach Smalltalk, Java, and some early C++ container libraries took. The C++ STL (Standard Template Library) revolutionized generic programming by defining templates for type-safe containers and algorithms. Whatever approach you take, you can use an existing data structure implementation for virtually any programming task. Hurray!

But Java containers through version 1.1 of the JDK were a disappointment, to say the least. There were only four containers: Bitset, Hashtable, Vector, and Stack; and they didn't come off as shining examples of object-oriented design. For some reason unknown to mankind, Stack inherited from Vector, instead of just using the latter as an artifact of implementation. And there were no algorithms; you couldn't even sort a Vector without writing your own sort routine!

With the collections and algorithms in Java 2, you now have a full array (sic) of generic data structures to handle most of the programming tasks that come your way. In this article I illustrate all the key features of the new collections; but first, let's talk about arrays. That's right. Arrays.

Arrays

The array is the quintessential data structure. Just because you have trees and lists at your fingertips doesn't mean that arrays don't matter. In fact, if you need to hold primitive types, arrays are the way to go, since the collection classes can only hold instances of Object. The class java.util.Arrays includes some useful algorithms for processing arrays, namely

These algorithms come in the form of overloaded methods that take arrays of Object and also of every primitive data type. The program in Figure 1 uses all four algorithms on arrays of int. The binarySearch algorithm requires an ordered array, of course, so the first thing I do is call the sort algorithm with the statement

Arrays.sort(array);

binarySearch returns the zero-based index of the element you're after, if it's in the array; otherwise it returns

-pos - 1

where pos is the index of the position it should occupy.

Solving for pos by adding 1 and then changing sign will either give you the location before which the number should be inserted, or array.length, in case it should be appended to the array. Be aware that if the array contains multiple instances of a search object, there is no guarantee as to which one the search algorithm will find.

Unfortunately, when used with arrays of primitives, Arrays.sort puts things in ascending order only. If you want some other ordering scheme, you have to use arrays of object and pass sort an extra comparator argument.

A comparator is an object that implements the following interface:

public interface Comparator
{
    int compare(Object o1, Object o2);
}

The compare method must behave like C's strcmp function by returning a negative number if o1 precedes o2 in your ordering, zero if they are equal, or a positive number otherwise. The program in Figure 2 sorts an array of Integers in descending order by using the following comparator:

class Descending implements Comparator
{
    public int compare(Object o1, Object o2)
    {
        Comparable c1 = (Comparable) o1;
        Comparable c2 = (Comparable) o2;
        return -c1.compareTo(c2);
    }
}

A Comparable object implements the method int compareTo(Object o), which also behaves like C's strcmp. String, Date, and all the primitive wrapper classes implement Comparable, and the collections described below can hold Comparable objects. Arrays.sort uses Descending.compare to compare Integers in Figure 2.

Note that the definition of Descending above is totally generic — you can use it to reverse-sort any collection of objects derived from Comparable. Since this is something you'll want to do quite often, the Java 2 library provides this comparator as the return value of Collections.reverseOrder, so you can replace the definition of desc in Figure 2 with the following one:

static Comparator desc = Collections.reverseOrder();

Since all Java 2 algorithms process instances of Object, it is easy to handle objects of any class. Figure 3 defines a Person class. Note that it implements Comparable, in case I want to sort an array of Persons, and so I will also be able to store Person objects in other types of collections later. The fact that Person also implements Cloneable is a coincidence of my test program. I tried to implement hashCode and compareTo to insure that different Person objects will yield distinct results. The program in Figure 4 searches an array of Persons.

It is possible to order and search an array by a subset of the objects' fields by defining a suitable comparator. In Figure 5, I use a comparator that just looks at a Person's name. Unfortunately, I have to use a complete Person object as the search key, even though the last three fields are ignored. As far as I can tell, there is no way to define a comparator to use a Person as one argument and a string as the other, as you can with function objects in the Standard C++ library.

Collections

Java 2 provides two families of containers: Collections and Maps. Collections are containers that implement the Collection interface (see Figure 6). The Collection interface is further refined into List and Set (see Figures 7 and 8). Lists are ordered sequences that support direct indexing and bidirectional traversal. Java 2 comes with LinkedList and ArrayList as implementations for List, as shown in Figure 9. Vector is also there for historical reasons, but ArrayList is preferred over Vector in new code. LinkedList implements a doubly-linked list, and is optimized for inserting and removing elements anywhere in a list. ArrayList is simply an expandable array, and is preferred when you need to use random access to list elements.

The program in Figure 10 puts both ArrayList and LinkedList to the test. The get method returns the object at a particular position, while indexOf performs a linear search to find the index of an element, and returns -1 if it's not present. The subList method is a useful feature of the List interface. It returns a List that is a subset of the original, using an asymmetric range specification, similar to STL ranges, where the first index is inclusive and the second is exclusive. In Figure 10, for example, the statements

LinkedList list = new LinkedList(array);
List view = list.subList(1,3);

define view as containing the second and third elements of list (i.e., indexes 1 and 2), where list is itself a copy of array. Since view is just a wrapper for a subset of list, any operations on view affect list. Appending Gregory's instance to view, therefore, inserts it after James, not Charles, the latter being outside of view.

Figure 10 also illustrates various algorithms found in the java.util.Collections class, such as sort, binarySearch, max, min, and reverse. These are similar in spirit to those in java.util.Arrays, but apply only to collections. The complete listing of algorithms in java.util.Collections is in Figure 11. Notice the methods containing the prefixes unmodifiable and synchronized. These methods act as wrappers for existing collections and restrict their use. If you call unmodifiableCollection on an existing collection, for example, the object returned will throw an UnsupportedOperationException if you try to call any of the Collection methods that modify a collection.

The synchronized methods return objects that are thread safe. Using these methods is the only way to use collections in multithreaded situations, since for efficiency none of the Collection methods are synchronized. The wrappers act as a synchronized view into the original collection. If you want to iterate through a synchronized collection you need to synchronize on the collection to preserve thread safety, as in:

Collection coll =
    Collections.synchronizedCollection(aCollection);
...
synchronized(coll)
{
    Iterator iter = coll.iterator();
    while (iter.hasNext())
}

You may have noticed the glaring omission of some popular data structures, such as queue, stack, and deque. Although they're not there explicitly, you can use LinkedList to achieve the desired effect. LinkedList includes the following methods, among others:

// Some LinkedList methods:
void addFirst(Object o); // push_front()
void addLast(Object o);  // push_back()
Object getFirst();       // front()
Object getLast();        // back()
Object removeFirst();    // pop_front()
Object removeLast();     // pop_back()

For the STL-aware I have included the names of the equivalent C++ functions in the comments above. To implement a queue, all you have to do is use LinkedList's addFirst and removeLast methods, as follows:

class Queue
{
    private LinkedList data;
    
    public Queue()
    {
        data = new LinkedList();
    }
    public void put(Object o)
    {
        data.addFirst(o);
    }
    public Object get()
        throws NoSuchElementException
    {
        return data.removeLast();
    }
    public int size()
    {
        return data.size();
    }
}

For a stack, just replace removeLast with removeFirst. For a deque, which allows insertions and removals at both ends, just use LinkedList as it is.

Iterators

Iterators, like Enumerations, are objects that facilitate traversing through an ordered sequence (see Figure 12). Unlike Enumerations, you can remove elements via an iterator. All collections support iterators, but the order of the traversal depends on the collection itself. For lists, you visit objects in the order they are stored. For sets (discussed below), objects appear in no particular order unless the set is ordered. The ListIterator interface, also in Figure 12, lets you update an object through an iterator, as shown in Figure 13. All the iterators returned by the Java 2 collections are fail-fast, which means that if the underlying structure changes during the lifetime of an iterator, other than through operations performed through the iterator itself, the iterator becomes invalid. Any attempted use of it thereafter will result in a ConcurrentModificationException.

Sets

The Set interface conforms to the usual notion of a mathematical set, which basically is just an unordered receptacle for elements, and provides operations for insertion, removal, and testing of membership. Since these are already part of the Collections interface, the Set interface is identical to Collection, with the added specification that duplicates are not allowed. The SortedSet interface adds methods useful for processing a set ordered by comparator (see Figure 14). The HashSet class implements Set, and TreeSet implements SortedSet. HashSets are more efficient, since they use hashing algorithms; but if order is important, TreeSets at least provide logarithmic-time algorithms, since they store objects in balanced tree structures. As you can see in Figure 15, any attempts to insert duplicates into a set are ignored. For TreeSets, headSet returns a view from the beginning up to but not including the specified element and tailSet comprises the rest of the sequence.

Maps

A map is a dictionary, or associative array. A map stores pairs, treating the first element of the pair as a key, and the second as its associated value. Maps provide fast search times for keys, and duplicate keys are not allowed. Like Sets, there are HashMaps and TreeMaps, the former being faster and unordered while instances of the latter are ordered. For some reason maps are in no way related to collections in Java 2; they have their own interface and implementation hierarchies (see Figures 16 and 17). A WeakHashMap stores keys as weak references, which means that if the only reference to the key object is the key in the map, then that object can be garbage-collected, whereupon the entry is removed from the map. How's that for easy caching?

The key methods of the Map interface are put and get (see Figure 18 for the complete specification). The put method stores a pair of objects in the map, and get retrieves the value for the given key. If you want to just see the keys, keySet returns them as a Set view, and the values method returns the values as a Collection. If you want to iterate through the pairs, you can call entrySet, which returns the pairs as a Set of instances of Map.EntrySet, which also appears in Figure 18. All the above is illustrated in the program in Figure 19. Note that calling put a second time with the same key replaces its associated value.

An Application

Let's finish by doing something useful. The program in Figure 20 implements a typical cross-reference lister, which prints each word from a text file with the line numbers where it appears. Applying Figure 20 to its own text produces the output:

a: 2, 28
Add: 45, 68
addLast: 69
already: 49
and: 45
args: 95, 98, 107, 114, 115
at: 28
boolean: 51
BufferedReader: 21, 102, 116
(etc.)

This program uses a Map that associates a token with a list of line numbers. Since I want the tokens to appear in alphabetical order, I use a TreeMap with the default comparator. I want to ignore case in comparisons, so I define a comparator, called NoCase, that calls on String.compareToIgnoreCase to do the job. Before inserting a new pair, I first check to see if it is already in the map. If not, I insert it with the statement:

map.put(token, new ArrayList());

Then I retrieve the list of lines with

ArrayList lines = (ArrayList) map.get(token);

I then check to see if the current line number has already been added to the list; if not, I append it. It would have been simpler to just use a TreeSet, but the constant re-balancing of the tree would degrade performance, and since the lines are entered in the correct order, a list suffices.

Conclusion

I hope you'll agree that Java has finally come of age as far as data structures and algorithms are concerned. The Arrays and Collections classes have useful algorithms for arrays and collections, respectively; and the families of Collection and Map implementations provide virtually everything you need in the way of data structures. As a C++ programmer, I must admit that I miss function objects a little, but only a little. I must also confess that I find the Java 2 collections easier to use than STL. Although my C++ version of the cross-reference program was 13 lines shorter than the Java version, it took me longer to get it right. It's hard to imagine a situation that the Java 2 collections wouldn't handle, but if you need something more specialized, you have a nice hierarchy to plug into and algorithms ready to use. Finally, news you can use! o

Chuck Allison is Consulting Editor and a columnist with CUJ. He is the owner of Fresh Sources, a company specializing in object-oriented software development, training, and mentoring. He has been a contributing member of J16, the C++ Standards Committee, since 1991, and is the author of C and C++ Code Capsules: A Guide for Practitioners, Prentice-Hall, 1998. You can email Chuck at chuck@freshsources.com.