Tech

Exploring Project Valhalla: Unveiling the Massive Refactoring of Java

Source link

Everything in Java is an object, except primitives like int. Turns out that small caveat has big implications for the language, which have compounded over the years. This seemingly minor design decision causes problems in key areas like collections and generics. It also limits certain performance optimizations. Project Valhalla, the Java language refactor, aims to correct these issues. Valhalla project lead Brian Goetz has said that Valhalla will “heal the rift between primitives and objects.”

It’s fair to say Project Valhalla is an epic refactor, seeking to address technical debt buried deep in the platform since Java’s inception. This thoroughgoing evolution proves that Java is not only a classic but remains at the forefront of programming language design. Let’s take a look at the key technical components of Project Valhalla and why they are so critical to the future of Java.

Java primitives and performance

When Java was first introduced in the 1990s, it was decided that all user-created types would be classes. Only a handful of primitive types were put aside as special. These were not handled as pointer-based class structures but directly mapped to operating system types. The eight primitive types are int, byte, short, long, float, double, boolean, and char

Directly mapping these variables to the operating system was better for performance because numerical operations performed better when divested of the referential overhead of objects. Moreover, all data ultimately resolves to these eight primitive types in a program. Classes are just a kind of structural and organizational layer that offers more powerful ways of grouping and handling primitive types. The only other kind of structure is the array. Primitives, classes, and arrays comprise the whole range of Java’s expressive power. And it is powerful.

But primitives are a different type of animal than classes and arrays. As programmers, we have learned to deal with the differences intuitively. Primitives are pass-by-value while objects are pass-by-reference, for example. The why of this goes quite deep. It comes down to the question of identity. We can say that primitive values are fungible: int x = 4 is the integer 4, no matter where it appears. Any instance of integer 4 is just as good as another. We see this distinction in equals() versus ==, where the former tests for the value equivalence of objects and the latter tests for identity. If two references share the same space in memory, they satisfy ==, meaning that they are the same object. Any ints set to 4 will also satisfy ==, whereas int doesn’t support .equals() at all.

In a way, we can say that for a primitive, the value is the identity. This simple notion is the source of primitive variable performance. The platform all the way down to the CPU instructions can safely assume that any int 4 is just as good as another, caching and copying them freely. 

Object references and memory

The Java virtual machine (JVM) can take advantage of the way primitives are handled to optimize how it stores, retrieves, and operates on them. In particular, if the platform determines that a variable is not altered (that is, it’s a constant or immutable) then it is available to be optimized.

Objects, by contrast, are resistant to this kind of optimization because they have an identity distinct from their value. This is what .equals() tests for: the object’s location in memory. This moves the platform and machine into the world of references, where only the reference to the actual instance will do.

As an instance of a class, an object holds data that can be both primitives and other classes and arrays. The object itself is addressed with a pointer handle. This creates a network of references: the object graph. Whenever some value is changed—or even if it might be changed—the JVM is forced to maintain a definitive record of the object for referencing. The need to reference objects is a barrier to performance optimizations.

The performance difficulties don’t stop there. The nature of objects as buckets of references means they exist in memory in a very fluffy way. Fluffy is my technical term to describe the fact that the JVM cannot compress objects to minimize their memory footprint. When one object has a reference to another object as part of its makeup, the JVM is forced to maintain that pointer relationship. (In some cases, a clever optimization could help determine that a nested reference is the only handle on a particular entity.)

In his State of Valhalla blog post, Goetz uses an array of points to illustrate the non-dense nature of references. We can use a class. For example, let’s say we have a Landmark class with a name and a geolocation field. These imply a memory structure like the one shown here:

Diagram of object memory. IDG

Figure 1. A ‘fluffy’ memory footprint of Java objects.

What we’d like to achieve is the ability to hold an object, when appropriate, as shown in Figure 2.

Project Valhalla--a dense object in memory IDG

Figure 2. A dense object in memory.

Java performance pain points

That’s an overview of the performance challenges that were baked into the Java platform by early design decisions. Now let’s consider how these decisions impact performance in three key areas:

  • Method calling and pass-by-value
  • Boxes and autoboxing
  • Generics and streams

Method calling and pass-by-value

The default structure of objects in memory is inefficient for both memory and caching. In addition, there is an opportunity to make gains in method calling conventions. Being able to pass call-by-value arguments to methods with class syntax (when appropriate) would yield serious performance benefits.

Boxes and autoboxing

Beyond inefficiencies, the distinction between primitives and classes creates language-level difficulties. Creating primitive “boxes” like Integer and Long (along with autoboxing) is an attempt to alleviate the problems caused by this distinction. It doesn’t really fix them, however, and it introduces a degree of overhead for both the developer and the machine. As a developer, you have to remember the difference between int and Integer. (Not to mention ArrayList<Integer>, int[], Integer[], and the lack of an ArrayList<int>.) The machine, meanwhile, has to convert between the two. 

In a way, boxing gives us the worst of both worlds. Obscuring the underlying nuances of how these entities work makes it harder to access both the power of class syntax and the performance of primitives.

Generics and streams

All these considerations come to a head in generics. Generics are intended to make generalizing across functionality easier and more explicit. But the persnickety presence of this set of non-object variables (the primitives) causes it to break down. <int> doesn’t exist—it can’t exist because int is not a class at all; it doesn’t descend from Object. It doesn’t allow for polymorphism. 

This problem then manifests in libraries like Java collections and streams, where the ideal of generic library functions is forced to deal with the reality of int versus Integer, long versus Long, and so on. Currently, the workaround is to offer IntStream and other non-generic variations.

Valhalla’s solution: Value classes and primitive classes

Project Valhalla attacks these three Java performance pain points at the root. The first and most fundamental concept is the value class. The idea here is that you can define a class that partakes of everything that is great about classes, like having methods and being able to fulfill generics, but without the identity. In practice, that means the classes are immutable and cannot be layout-polymorphic (wherein the superclass can operate upon the subclasses via abstract properties). It’s like a class that is just a bucket of primitives. 

Value classes give us a clear and definitive way of obtaining the performance characteristics we are after while still accessing the benefits of class syntax and behavior. That means library builders can also use them to improve their API design. 

A step further is the primitive class, which is like a more extreme value class. In essence, the primitive class is a thin wrapper around a true primitive variable, but with class methods. This is something like custom, streamlined primitive boxes. The improvement is in making the boxing system more explicit and extensible. Additionally, the primitive value wrapped by a primitive class retains the performance characteristics of the primitive (no under-the-hood boxing and unboxing). Therefore, the primitive class can be used wherever classes can be—in an Object[] array, for instance. Primitive types will not be nullable (they cannot be set to null). 

In general, we could say that Project Valhalla brings primitives and user-defined types closer together. This gives developers more options in the spectrum between pure primitives and objects and makes the tradeoffs explicit. It also makes these operations overall more consistent. In particular, the new primitive system will smooth out how primitives and objects work, how they are boxed, and how new ones can be added.

The new value and primitive keywords

Valhalla has seen a few different syntax proposals, but now the project is taking a clear form and direction. Two new keywords modify the class keyword: value and primitive. A class declared with the value class syntax surrenders its identity but gains performance improvements. Besides mutability and polymorphism restrictions, most of the things you’d expect from a class still apply, and such classes can fully participate in generic code (such as object[] or ArrayList<T>). Value classes default to null.

The primitive class syntax creates a class that is one step further from traditional objects and toward traditional primitives. These classes default to the underlying value of the fields (0 for int, 0.0 for double, and so on) and cannot be null. Primitive classes gain the most in optimization and sacrifice the most in terms of features. Primitive classes are not 32-bit tear safe. The primitive class will ultimately be used to model all the primitives in the platform, meaning user- and library-defined primitive additions will participate in the same system as built-ins.

IdentityObject and ValueObject

Project Valhalla also introduces two new interfaces: IdentityObject and ValueObject. These will allow for the runtime determination of what kind of class you are dealing with.

Perhaps the most radical syntax change for experienced Java developers is the addition of the .ref member. All types will now have the V.ref() field. This field operates like the box on primitives, so int.ref is analogous to wrapping an int with an Integer. Normal classes will resolve .ref to their reference. The overall effect is to have a consistent way to ask for a reference on a variable regardless of its kind. This also has the effect of making all Java arrays “covariant,” which is to say, they all descend from Object[]. Therefore, int[] now descends from Object[] and can be used wherever that is called for.

Conclusion

Value classes and primitive classes will have a big impact on Java and its ecosystem. The current roadmap for Project Valhalla plans to introduce value classes first, followed by primitive classes. Next will be the migration of the existing primitive boxing classes (like Integer) to use the new primitive class. With those features in hand, the next feature, called universal generics, will allow primitive classes to be used directly with generics, smoothing out many of the complexities of reuse in APIs. Finally, specialized generics (allowing for all the expressive capability of T extends Foo) will be integrated with primitive classes.

Project Valhalla and the projects that comprise it are still in design stages, but we are getting closer. Current activity indicates it won’t be long before value classes drop in a JDK preview.

Beyond all the interesting technical work is the sense of Java’s ongoing vitality. That there is both will and ability to undergo the process of identifying where the platform can be evolved in fundamental ways is evidence of real commitment to keeping Java relevant. Project Loom is another undertaking that lends weight to an optimistic view of Java’s future. An essential feature of that project is virtual threads, an efficient alternative to traditional Java threads.

On a language design level, Valhalla is one of the most interesting things happening anywhere. As 2023 winds down, many developers are anticipating what’s next for Valhalla. The Java enthusiast in me can’t wait to see the finalized syntax. At the same time, I hope it won’t increase the complexity of the language too much, especially for beginners.

Copyright © 2023 IDG Communications, Inc.

(The following story may or may not have been edited by NEUSCORP.COM and was generated automatically from a Syndicated Feed. NEUSCORP.COM also bears no responsibility or liability for the content.)

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button