Project Valhalla: A look inside Java’s epic refactor

In Java, everything is an object—except primitives like int. Turns out that small caveat has had big implications for the language, which have compounded over the years. This seemingly minor design decision causes problems in key areas like collections and generics. It also limits certain performance optimizations. Project Valhalla, the Java language refactor, aims to correct these issues. Valhalla project lead Brian Goetz has said that Valhalla will “heal the rift between primitives and objects.”

It’s fair to say Project Valhalla is an epic refactor, seeking to address technical debt buried in the platform since Java’s inception. This thoroughgoing evolution proves that Java is not only a classic but remains at the forefront of programming language design. Let’s take a look at the key technical components of Project Valhalla and why they are so critical to the future of Java.

Performance issues in Java

When Java was first introduced way back in the ’90s, it was decided that all user-created types would be classes. Only a handful of primitive types were put aside as special. These were not handled as pointer-based class structures but directly mapped to operating system types. The eight primitive types are int, byte, short, long, float, double, boolean, and char.

Directly mapping these variables to the operating system was better for performance because numerical operations performed better when divested of the referential overhead of objects. Moreover, all data ultimately resolves to these eight primitive types in a program. Classes are just a kind of structural and organizational layer that offers more powerful ways of handling primitive types. The only other kind of structure is the array. Primitives, classes, and arrays comprise the whole range of Java’s expressive power.

But primitives are a different category of animal from classes and arrays. As programmers, we have learned to deal with the differences intuitively. Primitives are pass-by-value while objects are pass-by-reference, for example. The why of this goes quite deep. It comes down to the question of identity. We can say that primitive values are fungible: int x = 4 is the integer 4, no matter where it appears. We see this distinction in equals() vs ==, where the former is testing for the value equivalence of objects and the latter is testing for identity. If two references share the same space in memory, they satisfy ==, meaning that they are the same object. Any ints set to 4 will also satisfy ==, whereas int doesn’t support .equals() at all.

The Java virtual machine (JVM) can take advantage of the way primitives are handled to optimize how it stores, retrieves, and operates on them. In particular, if the platform determines that a variable is not altered (that is, it’s a constant or immutable) then it is available to be optimized.

Objects, by contrast, are resistant to this kind of optimization because they have an identity. As an instance of a class, an object holds data that can be both primitives and other classes. The object itself is addressed with a pointer handle. This creates a network of references: the object graph. Whenever some value is changed—or even if it might be changed—the JVM is forced to maintain a definitive record of the object for referencing. The need to reference objects is a barrier to some performance optimizations.

The performance difficulties don’t stop there. The nature of objects as buckets of references means they exist in memory in a very fluffy way. Fluffy is my technical term to describe the fact that the JVM cannot compress objects to minimize their memory footprint. When one object has a reference to another object as part of its makeup, the JVM is forced to maintain that pointer relationship. (In some cases, a clever optimization could help determine that a nested reference is the only handle on a particular entity.)

In his State of Valhalla blog post, Goetz uses an array of points to illustrate the non-dense nature of references. We can use a class. For example, let’s say we have a Landmark class with a name and a geolocation field. These imply a memory structure like the one shown here:

Diagram of object memory. — Figure 1. A ‘fluffy’ memory footprint of Java objects.

What we’d like to achieve is the ability to hold an object, when appropriate, as shown in Figure 2.

Diagram of a Java object held in memory. — Figure 2. A dense object in memory.

That’s an overview of the performance challenges that were baked into the Java platform by early design decisions. Now let’s consider how these decisions impact performance in three key areas.

Problem 1: Method calling and pass-by-value

The default structure of objects in memory is inefficient for both memory and caching. In addition, there is an opportunity to make gains in method calling conventions. Being able to pass call-by-value arguments to methods with class syntax (when appropriate) would yield serious performance benefits.

Problem 2: Boxes and autoboxing

Beyond inefficiencies, the distinction between primitive and class creates language-level difficulties. Creating primitive “boxes” like Integer and Long (along with autoboxing) is an attempt to alleviate the problems caused by this distinction. It doesn’t really fix them, however, and it introduces a degree of overhead for both the developer and the machine. As a developer, you have to know about and remember the difference between int and Integer (and ArrayList<Integer>, int[], Integer[], and the lack of an ArrayList<int>). The machine, meanwhile, has to convert between the two.

In a way, boxing gives us the worst of both worlds. Obscuring the underlying nuances of how these entities work makes it harder to access both the power of class syntax and the performance of primitives.

Problem 3: Generics and streams

All these considerations come to a head in generics. Generics are intended to make generalizing across functionality easier and more explicit, but the persnickety presence of this set of non-object variables (the primitives) just causes it to break down. <int> doesn’t exist—it can’t exist because int is not a class at all; it doesn’t descend from Object.

This problem then manifests in libraries like collections and streams, where the ideal of generic library functions is forced to deal with the reality of int versus Integer, long versus Long, and so on, by offering IntStream and other non-generic variations.

Valhalla’s solution: Value classes and primitive types

Project Valhalla attacks these problems at the root. The first and most fundamental concept is the value class. The idea here is that you can define a class that partakes of everything that is great about classes, like having methods and being able to fulfill generics, but without the identity. In practice, that means the classes are immutable and cannot be layout-polymorphic (wherein the superclass can operate upon the subclasses via abstract properties).

Value classes give us a clear and definitive way to obtain the performance characteristics we are after while still accessing the benefits of class syntax and behavior. That means library builders can also use them and thereby improve their API design.

A step further is the primitive class, which is like a more extreme value class. In essence, the primitive class is a thin wrapper around a true primitive variable, but with class methods. This is something like custom, streamlined primitive boxes. The improvement is in making the boxing system more explicit and extensible. Additionally, the primitive value wrapped by a primitive class retains the performance characteristics of the primitive (no under-the-hood boxing and unboxing). Therefore, the primitive class can be used wherever classes can be—in an Object[] array, for instance. Primitive types will not be nullable (they cannot be set to null).

In general, we could say that Project Valhalla brings primitives and user-defined types closer together. This gives developers more options in the spectrum between pure primitives and objects and makes the tradeoffs explicit. It also makes these operations overall more consistent. In particular, the new primitive system will smooth out how primitives and objects work, how they are boxed, and how new ones can be added.

How Java’s syntax will change

Valhalla has seen a few different syntax proposals, but now the project is taking a clear form and direction. Two new keywords modify the class keyword: value and primitive. A class declared with the value class syntax will surrender its identity, and in the process gain performance improvements. Besides mutability and polymorphism restrictions, most of the things you’d expect from a class still apply and such classes can fully participate in generic code (such as object[] or ArrayList<T>). Value classes default to null.

The primitive class syntax creates a class that is one step further from traditional objects and toward traditional primitives. These classes default to the underlying value of the fields (0 for int, 0.0 for double, and so on) and cannot be null. Primitive classes gain the most in optimization and sacrifice the most in terms of features. Primitive classes are not 32-bit tear safe. The primitive class will ultimately be used to model all the primitives in the platform, meaning user- and library-defined primitive additions will participate in the same system as built-ins.

IdentityObject and ValueObject

IdentityObject and ValueObject are two new interfaces being introduced in Project Valhalla. These will allow for the runtime determination of what kind of class you are dealing with.

Perhaps the most radical syntax change for experienced Java developers is the addition of the .ref member. All types will now have the V.ref() field. This field operates like the box on primitives, so int.ref is analogous to wrapping an int with an Integer. Normal classes will resolve .ref to their reference. The overall effect is to make for a consistent way to ask for a reference on a variable regardless of its kind. This also has the effect of making all Java arrays “covariant,” which is to say, they all descend from Object[]. Therefore, int[] now descends from Object[] and can be used wherever that is called for.

Conclusion

Value classes and primitive classes will have a big impact on Java and its ecosystem. The current roadmap plans to introduce value classes first, followed by primitive classes. Next will be the migration of the existing primitive boxing classes (like Integer) to use the new primitive class. With those features in hand, the next feature, called universal generics, will allow primitive classes to be used directly with generics, smoothing out many of the complexities of reuse in APIs. Finally, specialized generics (allowing for all the expressive capability of T extends Foo) will be integrated with primitive classes.

Project Valhalla and the projects that comprise it are still in design stages, but we are getting closer and activity around the project indicates it won’t be long before value classes drop in a JDK preview.

Beyond all the interesting technical work is the sense of Java’s ongoing vitality. That there is both will and ability to undergo the process of identifying where the platform can be evolved in fundamental ways is evidence of real commitment to keeping Java relevant. Project Loom is another undertaking that lends weight to an optimistic view of Java’s future.

READ SOURCE