C/C++ Contributing Editors


import java.*: Using Primitive Types and Wrappers

Chuck Allison

The primitive types of C/C++ are much the same in Java, except that they're even more primitive and more predictable.


A pure object-oriented programming language treats each datum as a complete object. This makes for a uniform way to handle data, but it also adds overhead in both time and space that can sometimes be a significant disadvantage. If all you want to do is process integers, for example, you'd rather not bother with the heap space and implicit pointer indirection that using objects would bring about. This explains why C++ is preferable to SmallTalk for time-critical applications. Everything in SmallTalk, even an integer, is an object. C++, on the other hand, has efficient built-in integer and floating-point types, so you only incur object overhead when you explicitly ask for it.

Java allows you to choose either data-handling paradigm. It has built-in numeric types similar to those in C++, but it also provides wrapper classes for integral and floating-point objects (see Table 1). You usually use the wrapper classes only when complete objects are required, such as in collections. In this article I'll talk about built-in (or primitive) types, wrappers, and operators.

The most significant feature of Java's primitive types is that they are truly portable. Because C was conceived primarily as a systems programming language, its data types are tailored to each host system and therefore vary in size from platform to platform. Building portable programs in C/C++ can consequently require conditional compilation to determine the correct numeric type to use. In Java, on the other hand, an int is a 32-bit two-compliant integer wherever you go. On systems where the size of a machine word is not 32 bits, there could be a small performance penalty, but for applications that can spare a nanosecond here and there (i.e., most applications) no one will ever notice. This uniform data size for primitive types renders unnecessary the need to recompile code for different platforms, thus allowing pre-interpreted byte codes to run on any platform that has a Java Virtual Machine. That's portability.

Size, sizeof, and Signs

The C and C++ standards require a long to be at least 32 bits, and that's exactly what most compilers give you, but Java mandates that you get 64 bits, resulting in a range over 4 billion times larger. C/C++ compilers are also at liberty to make float and double the same size, but again, Java guarantees distinct sizes of 32 and 64 bits respectively. Because of the fixed sizes of primitive types, and because of the way objects are created in Java, there is no need for a sizeof operator.

There is no unsigned qualifier in Java. Except for boolean and char, all types are signed. A boolean holds only two values, true and false, so signedness doesn't apply. The char type represents a Unicode character, which is the industry standard 16-bit encoding covering the range [0, 65535]. (For more on Unicode, see my article "The Standard C Library, Part 3," CUJ, February 1995, p. 94). All the other numeric types are signed. This makes it a little tedious to enforce non-negative parameter values, but it also avoids the signed/unsigned mismatch problems that occur all too frequently in C++.

A key use of unsigned in C++ is in zero-filling an integer when right-shifting. For example, the result of the expression

x >> 3

may differ among C/C++ compilers when x is signed and negative. Most implementations will propagate the sign bit into the upper three slots of the result. To guarantee zero-fill instead of sign extension, you must declare x as unsigned. Once again, Java is more precise. With the normal right-shift operator (>>) you always get sign extension. If you want zero-fill semantics you use the >>> operator, as in

x >>> 3

Operators

Speaking of operators, Java follows C very closely by providing identical bitwise operators, the ternary conditional operator, and the full array of binary operators, including the handy assignment operators C programmers are so fond of (see Table 2). As far as precedence goes, unary operators are high priority, assignment operators are low, and everything else in-between is pretty much as you would expect. The bitwise operators have the same priority as in C, so you have the same surprises you've always had. So when in doubt, use parentheses. Operators in Java also associate as they do in C: unary and assignment operators group right-to-left, everything else groups left-to-right. The logical and conditional operators also obey the short-circuit semantics that C programmers are used to. For example, if i is less than N in the following expression, then a[i] is not even evaluated:

if (i < N && a[i] != 0)

The most interesting feature of Java operators for C/C++ programmers is what's missing. You already know there is no sizeof operator. Since Java does not use explicit pointers you won't find the pointer operations: unary & and *, and the -> operator. There is also no general comma operator, although you can place comma-separated sequences of expressions in the initialization and iteration parts of a for loop as you can in C. The only other surprise is the addition of the >>> operator mentioned above, and the instanceof operator, discussed in the Wrappers section below.

The equality operator (==) requires some special care in Java. When comparing primitive types, it is similar to C with one improvement: Java's type system catches the following common error:

if (x = y) // Oops! Meant to type ==

Java expects the target of a logical expression to be of type boolean, but the type of an assignment is the type of its (possibly promoted) left operand, so unless x is a boolean, the compiler will flag the typo above as an error.

But when it comes to objects, == is rarely what you want. For example, if x and y are instances of class Foo, then the expression x == y compares the objects' handles, not the values of the objects' fields. I'll reveal the secrets of this mystery in a future column. (If you know Lisp, the situation is analogous to eq vs. equal in that language). For now, just remember not to use == to compare objects, but to use the equals method instead.

A final word on operators that C programmers will find interesting. It concerns how Java evaluates operands. Java always fully evaluates the operands of a binary operator left-to-right. Guaranteed. For example, in the expression f() + g() you can count on any side effects of f being complete before the call to g. C, on the other hand, makes no guarantees whatsoever on the order of evaluation of operands, which is why the C standards committee had to define sequence points to give programmers some control over side effects. You don't need to worry about sequence points in Java.

Literals and Constants

Objects can contain objects which contain objects seemingly ad infinitum, but even the most complex objects ultimately reduce to primitive types, and primitive types ultimately get initialized with literal values, whether in source code or from external input. The program in Figure 1 illustrates the various types of literals.

As you can see, literals are similar to those in C++. The boolean literals are true and false. Unadorned numeric literals are of type double if they have a decimal point, and int if they don't. As in C, a leading 0 denotes an octal int, and a 0x prefix introduces a hexadecimal number. Any numeric expression can initialize a double. If you want to be explicit, you can use an f suffix for float and a d for double. In all cases the letter you use to identify the type of a numeric literal can be either upper or lower case, but a lower case l is discouraged since you can too easily mistake it for the digit 1. There are no suffixes for short and byte. You either assign an int literal in the correct range, or you cast to the appropriate type as needed (see Conversions and Casts below).

Character literals occur between single quotes, as in C, except that you can also specify a Unicode character escape sequence with a lower case u, as in '\u001c', much like in C++. Unicode escape sequences are always interpreted as hexadecimal. Java does not support 32-bit ISO 10646 characters like C++ does (e.g., '\U0000001c'). Java supports most of the character escape sequences that C does, such as '\n', '\t', etc., except for '\a' (audible bell) and '\v' (vertical tab).

The Java equivalent of const, as far as variables are concerned, is the final keyword, which suggests that a variable cannot be changed (i.e., it has its final value). The following declares a constant int:

final int max = 32767;

Local final variables should always be initialized in their declaration. In future installments you'll see alternative initialization techniques for class data members.

Conversions and Casts

You can always assign a numeric value to a wider type, such as a float to a double or an int to a long. Assigning the other way usually requires a cast, as in

// A "narrowing" conversion
int i = 2;
byte b = (byte) i; // note C-style cast

Java's cast syntax is identical to C (i.e., the target type precedes the operand in parentheses). If a literal represents a value small enough to fit into the range of the target variable then a cast is not required. For example:

// 127 is an int literal; 128 would fail
byte b = 127;

If you substituted 128 for 127 above the compiler would complain, since 128 is outside the range of a byte. Narrowing conversions often result in a loss of information, including sign, since you lose bits off the top. For example, substituting 128 for 127 above would initialize b to -128. Starting with the bit representation of 128 (0...010000000, i.e., 24 zeroes, a one, then seven zeroes), the upper 24 zeroes are dropped, and the remaining eight bits are interpreted as a signed integer.

Like standard numeric conversions in C, widening conversions occur implicitly for primitive types when you use them in an expression or as parameters to a function. Binary numeric and comparison operations, for example, follow this simple logic:

if either operand is a double then
    convert the other to double if needed
else if either operand is float then
    convert the other to float if needed
else if either operand is long then
    convert the other to long if needed
else
    convert both to int as needed

Passing a byte as an argument to a function expecting an int likewise causes an implicit conversion of the byte to an int. You don't, however, get implicit conversions from a primitive type to a class object like you do with single argument constructors in C++. For example, if you have a class Foo with a constructor that takes a single int, and a function f that takes a single Foo argument, you can't call f(1), nor even f((Foo)1). Why? Because objects must always be created via the new operator, so the correct form is f(new Foo(1)). The key motivation for implicit conversions via single-argument constructors in C++ was to complement operator overloading, which doesn't exist in Java. One less thing to worry about.

Wrappers

The wrapper classes listed in Table 1 provide methods and constants relevant to their corresponding primitive type. You can, for example, inquire as to the range of values, as the program in Figure 2 illustrates.

Many classes in the Java library work with generic objects, or, in other words, with instances of the Object class. A class that does not explicitly extend another class implicitly extends Object, so all classes inherit from Object one way or the other. A collection class, such as Vector, can act as a generic container in that it holds objects of type Object, and can therefore hold any Java object. But primitive types are not objects, so a Vector cannot hold integers or any other numeric type directly. The work-around is to populate the Vector with objects of type Integer, the wrapper for int. The following program uses this technique to store ten integers in a Vector.

import java.util.*; // Import the Vector class

public class UseVector {
    public static void main(String[] args) {
        Vector v = new Vector();
        for (int i = 0; i < 10; ++i)
            v.addElement(new Integer(i));
        for (int i = 0; i < v.size(); ++i)
            System.out.print(v.elementAt(i) + " ");
    }
}

/* Output:
0 1 2 3 4 5 6 7 8 9
*/

The wrapper classes have a number of useful methods. Each integer-related type has an atoi equivalent for converting a string representation of a number to a number. For example, Integer has parseInt, Long has parseLong, and so on. Each wrapper type also has functions that return its value in all numeric formats, e.g., byteValue, longValue, doubleValue, etc. The following program converts strings to int and float.

public class ParseNums {
    public static void main(String[] args) {
        int i = Integer.parseInt("123");
        int j = Integer.parseInt("4f", 16);
        float x = Float.valueOf("123.45").floatValue();
        System.out.println("i = " + i + "," + "j = " + j + ","
            + "x = " + x);
        System.out.println("i = " + Integer.toBinaryString(i));
    }
}

/* Output:
i = 123,j = 79,x = 123.45
i = 1111011
*/

There is no parse function for the floating-point wrapper types. All wrappers except Character have a valueOf method that parses a string but returns a wrapper object, not a primitive, so I used that for the Float example above.

The six numeric wrapper classes all inherit from the Number abstract class, which defines the methods byteValue, intValue, etc. This allows you to define classes that can process any numeric type, simply by writing to the interface of the Number class.

If you ever want to verify that an object is an instance of a particular type you can use the instanceof operator. For example, if you have a function that takes a single Number parameter, f(Number n), say, you can verify that the argument is of a type derived from Number, as follows:

void f(Number n) {
    If (n instanceof Number)
        // go ahead (n is a Byte, Short, Integer, etc.)
    else
        // error
}

The instanceof operator returns true if its left operand is an instance of its right operand or of any class that inherits from its right operand.

The Character wrapper contains a number of methods for classifying characters, similar to the functionality found in the C header <ctype.h>, such as isDigit, isISOControl, isLetter, isLetterOrDigit, isSpaceChar, isUpperCase, toUpperCase, etc. The methods IsJavaIdentifierStart and IsJavaIdentifierPart identify a character as a valid part of an identifier. Java identifiers can begin with a dollar sign, an underscore, or a valid letter from any Unicode script. The following are valid Java identifiers:

A$very_long$identifier
preço
pme

In the beginning of this article I mentioned that the wrappers incur a performance hit compared to using primitive types. To prove that point, the program in Listing 1 creates an identity vector in an array of 250,000 ints, computes its sum, and then displays the elapsed time using a Date object. Listing 2 shows a program that does the same thing using Integer objects. When I run these programs on my 400 mHz Pentium II, the primitive version takes 50 milliseconds while the object version takes 1,710 ms, which is slower by a factor 34. (I'm using JDK 1.1.7A). So use primitive types whenever you can!

Arbitrary Precision Arithmetic

Java provides two classes for arbitrary-precision arithmetic: BigInteger and BigDecimal, both of which inherit from Number. There are methods in each for the usual numeric and logical operations. Listing 3 shows an example of BigInteger.

The number of bits is derived from a two's-complement representation of the number. BigInteger also has methods for shifting and primality testing.

Wrap-up

Java's primitive types resemble C's built-in types very closely in name, and somewhat closely in functionality. For me, the key feature is that primitives are portable across all platforms that implement a Java Virtual Machine. I also like not worrying about signed vs. unsigned issues that sometimes bite you in C. Java's operators are almost 100% identical to those of C, except there is no need for pointer operations, and Java has the >>> and instanceof operators. Wrappers provide the features that <limits.h> does in C, and then some. In typical Java style, the functionality you need is where you expect it to be: in appropriately-named classes. You may sometimes end up doing a little more typing than you do in C, what with class name prefixes and all, but you probably will do less hunting for the right identifiers.

Chuck Allison is Consulting Editor and a columnist with CUJ. He is the owner of Fresh Sources, a company specializing in object-oriented software development, training, and mentoring. He has been a contributing member of J16, the C++ Standards Committee, since 1991, and is the author of C and C++ Code Capsules: A Guide for Practitioners, Prentice-Hall, 1998. You can email Chuck at chuck@freshsources.com.