Oracle JRockit The Definitive Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 588

DownloadOracle JRockit The Definitive Guide
Open PDF In BrowserView PDF
Oracle JRockit
The Definitive Guide

Develop and manage robust Java applications with
Oracle's high-performance Java Virtual Machine

Marcus Hirt
Marcus Lagergren

BIRMINGHAM - MUMBAI

Oracle JRockit

The Definitive Guide
Copyright © 2010 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book is
sold without warranty, either express or implied. Neither the authors, nor Packt
Publishing, and its dealers and distributors will be held liable for any damages
caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.

First published: June 2010

Production Reference: 1260510

Published by Packt Publishing Ltd.
32 Lincoln Road
Olton
Birmingham, B27 6PA, UK.
ISBN 978-1-847198-06-8
www.packtpub.com

Cover Image by Mark Holland (MJH767@bham.ac.uk)

Credits
Authors
Marcus Hirt

Editorial Team Leader
Gagandeep Singh

Marcus Lagergren
Project Team Leader
Reviewers

Priya Mukherji

Anders Åstrand
Staffan Friberg
Markus Grönlund
Daniel Källander
Bengt Rutisson
Henrik Ståhl
Acquisition Editor
James Lumsden
Development Editor
Rakesh Shejwal
Technical Editor
Sandesh Modhe
Indexer
Rekha Nair

Project Coordinator
Ashwin Shetty
Proofreader
Andie Scothern
Graphics
Geetanjali Sawant
Production Coordinator
Melwyn D'sa
Cover Work
Melwyn D'sa

Foreword
I remember quite clearly the first time I met the JRockit team. It was JavaOne 1999
and I was there representing WebLogic. Here were these Swedish college kids in
black T-shirts describing how they would build the world's best server VM. I was
interested in hearing their story as the 1.2 release of HotSpot had been delayed again
and we'd been running into no end of scalability problems with the Classic VM.
However I walked away from the booth thinking that, while these guys were smart,
they had no idea what they were biting off.
Fast-forward a few years. BEA buys JRockit and I become the technical liaison
between the WebLogic and JRockit teams. By now JRockit has developed into an
excellent offering—providing great scalability and performance on server-side
systems. As we begin working together I have the distinct pleasure of getting to
know the authors of this book: Marcus Lagergren and Marcus Hirt.
Lagergren is a remarkably prolific developer, who at the time was working on the
compiler. He and I spent several sessions together examining optimizations of
WebLogic code and deciphering why this method or that wasn't getting inlined or
devirtualized. In the process we, along with the rest of the WebLogic and JRockit
teams, were able to produce several SPECjAppServer world records and cement
JRockit's reputation for performance.
Hirt, on the other hand, is extremely focused on profiling and diagnostics. It was
natural, therefore, that he should lead the nascent tooling effort that would become
JRockit Mission Control. This was an extension of an early observation we had, that
in order to scale the JRockit engineering team, we would have to invest in tooling to
make support and debugging easier.
Fast-forward a few more years. I'm now at Oracle when it acquires BEA. I have the
distinct pleasure of again welcoming the JRockit team into a new company as they
joined my team at Oracle. The core of the JRockit team is still the same and they now
have a place among the small group of the world's experts in virtual machines.

Lagergren is still working on internals—now on JRockit Virtual Edition—and is
as productive as ever. Under Hirt's leadership, Mission Control has evolved from
an internal developer's tool into one of the JRockit features most appreciated by
customers. With this combination of long experience and expertise in all layers of
JRockit, it is difficult for me to imagine a better combination of authors to write
this book.
Therefore, as has been the case many times before, I'm proud to be associated in some
small way with the JRockit team. I trust that you will enjoy reading this book and hope
that you will find the topic to be as satisfying as I have found it to be over the years.
Adam Messinger
Vice President of Development, Oracle Fusion Middleware group
February 14, 2010
San Francisco, CA

About the Authors
Marcus Hirt is one of the founders of Appeal Virtual Machines, the company that

created the JRockit Java Virtual Machine. He is currently working as Architect, Team
Lead, and Engineering Manager for the JRockit Mission Control team. In his spare
time he enjoys coding on his many pet projects, composing music, and scuba diving.
Marcus has contributed JRockit related articles, whitepapers, tutorials, and webinars
to the JRockit community, and has been an appreciated speaker at Oracle Open World,
eWorld, BEAWorld, EclipseCon, Nordev, and Expert Zone Developer Summit. He
received his M.Sc. education in Computer Science at the Royal Institute of Technology
in Stockholm. Marcus Hirt lives in Stockholm with his wife and two children.

Marcus Lagergren has an M.Sc. in Computer Science from the Royal Institute of
Technology in Stockholm, Sweden. He majored in theoretical computer science and
complexity theory since it was way more manly than, for example, database systems.
He has a background in computer security but has worked with runtimes since 1999.
Marcus was one of the founding members of Appeal Virtual Machines, the company
that developed the JRockit JVM. Marcus has been Team Lead and Architect for the
JRockit code generators and has been involved in pretty much every other aspect of
the JRockit JVM internals over the years. He has presented at various conferences, such
as JavaOne, BEAWorld, and eWorld and holds several patents on runtime technology.
Since 2008, he works for Oracle on a fast virtualization platform. Marcus likes power
tools, heavy metal, and scuba diving. Marcus Lagergren lives in Stockholm with his
wife and two daughters.

Acknowledgement
We'd like to thank all the people who have been creative with us throughout the
years, especially the other Appeal guys who have been a part of our lives for quite
some time now. The authors cannot think of a finer and more competent group of
people to have shared this journey with.
Furthermore, a great thank you is in order to our families who have been extremely
patient with us during the writing of this book.

About the Reviewers
Anders Åstrand has a Master's degree in Computer Science from the Royal

Institute of Technology, Sweden. He has worked at Oracle (formerly BEA Systems)
since 2007, in the JRockit performance team.

Staffan Friberg leads the JRockit Performance Team at Oracle, with seven years

of experience in QA and Performance Engineering for the JVM.

Markus Grönlund is a Senior Software Engineer with Oracle Corporation and

has worked extensively in the Oracle JRockit Virtual Machine development and
support arena for the past three years. Markus has been supporting Oracle JRockit
VMs the largest mission critical JRockit customers, providing expertise in debugging,
configuration, and training.
Prior to joining Oracle Corporation, Markus worked for seven years as a Senior
Technical Architect for Intel Corporation, driving early adoption of next-generation
Intel Architectures.
I would like to thank the entire Oracle JRockit Virtual Machine team
in Stockholm, Sweden. It is a true privilege to be part of such an
amazing group of talented people. Thank you all!

Daniel Källander is Development Manager for JRockit and has been with the
JRockit team since 2005. Since 1996, he has been a founding member of three IT
companies. Before entering the IT industry he completed a Ph.D. in Theoretical
Physics, and later also an MBA in International Business.

Bengt Rutisson is Development Manager at Oracle focusing on JRockit garbage
collection and memory management. He joined the JRockit team in 2006 and has
been working with garbage collection and memory management since then.
Prior to working with JRockit, Bengt has been responsible for several products
in Java (for example, the Appear Context Engine) and in Component Pascal
(for example, the BlackBox Component Builder).

Henrik Ståhl is Senior Director of Product Management at Oracle, responsible for

product strategy for JRockit. In this position, he is constantly looking for new ways
to make the Java Virtual Machine more useful. He has been working with the JRockit
team since 2004, starting out as Team Lead for the JVM performance team before
moving to a product management role. Prior to Oracle, he was Co-Founder and
CTO of the Swedish IT consultancy Omegapoint, lead developer for the core part of
the Swedish BankID service and Senior Consultant at Icon Medialab. Henrik holds
an M.Sc. in Engineering Physics from the Royal Institute of Technology and lives
outside of Stockholm, Sweden with his family.

For my family, who endured me being inaccessible for nights and weekends both working on
the new major release and writing this book: Malin, Alexander, and little Natalie.
							

– Marcus Hirt

For my family: Klara, Alice, and Ylva. Especially for my lovely wife Klara, who ended up
having to singlehandedly juggle two children too often and who, more than once, expressed
her desire to purchase a copy of this book and burn it.
							

– Marcus Lagergren

Table of Contents
Preface
Chapter 1: Getting Started
Obtaining the JRockit JVM
Migrating to JRockit
Command-line options
System properties
Standardized options
Non-standard options

Changes in behavior
A note on JRockit versioning
Getting help
Summary

Chapter 2: Adaptive Code Generation
Platform independence
The Java Virtual Machine
Stack machine
Bytecode format
Operations and operands
The constant pool

Code generation strategies
Pure bytecode interpretation
Static compilation
Total JIT compilation
Mixed mode interpretation
Adaptive code generation
Determining "hotness"
Invocation counters
Software-based thread sampling
Hardware-based sampling

1
11

11
13
13
13
14
14

14
15
17
18

19

20
21
21
22
23
24

24
24
26
27
28
29
30
30
31
31

Table of Contents

Optimizing a changing program
Inside the JIT compiler
Working with bytecode

31
34
34

Bytecode obfuscation

36

Bytecode "optimizers"

37

Abstract syntax trees

Where to optimize
The JRockit code pipeline
Why JRockit has no bytecode interpreter
Bootstrapping
Runtime code generation
Trampolines
Code generation requests
Optimization requests
On-stack replacement
Bookkeeping

A walkthrough of method generation in JRockit
The JRockit IR format
JIT compilation
Generating optimized code

38

41
42
43
44
44
45
46
47
47
48

49
49
51
58

Controlling code generation in JRockit
Command-line flags and directive files

64
64

Summary

70

Command-line flags
Directive files

Chapter 3: Adaptive Memory Management

The concept of automatic memory
management
Adaptive memory management
Advantages of automatic memory management
Disadvantages of automatic memory management
Fundamental heap management
Allocating and releasing objects
Fragmentation and compaction
Garbage collection algorithms
Reference counting
Tracing techniques
Mark and sweep
Stop and copy

64
67

71
72
72
73
74
74
74
75
76
77
77
78
80

Stopping the world

81

Generational garbage collection

87

Conservative versus exact collectors
Livemaps

82
84

Multi generation nurseries

88

[ ii ]

Table of Contents
Write barriers

88

Throughput versus low latency

90

Garbage collection in JRockit

92

Optimizing for throughput
Optimizing for low latency

91
91

Old collections
Nursery collections
Permanent generations
Compaction

93
93
94
95

Speeding it up and making it scale
Thread local allocation
Larger heaps

95
96
97

32-Bits and the 4-GB Barrier
The 64-bit world

97
98

Cache friendliness

100

Prefetching
Data placement

101
102

NUMA
Large pages
Adaptability
Near-real-time garbage collection
Hard and soft real-time
JRockit Real Time

102
104
105
107
107
108

The Java memory API
Finalizers
References

112
112
113

Differences in JVM behavior
Pitfalls and false optimizations
Java is not C++
Controlling JRockit memory
management
Basic switches

116
116
117

Compressed references
Advanced switches
Summary

120
121
121

Does the soft real-time approach work?
How does it work?

109
111

Weak references
Soft references
Phantom references

113
114
114

117
118

Outputting GC data
Set initial and maximum heap size
Controlling what to optimize for
Specifying a garbage collection strategy

118
119
119
120

[ iii ]

Table of Contents

Chapter 4: Threads and Synchronization
Fundamental concepts
Hard to debug
Difficult to optimize
Latency analysis

123
124
126
126
127

Java API
The synchronized keyword
The java.lang.Thread class
The java.util.concurrent package
Semaphores
The volatile keyword
Implementing threads and
synchronization in Java
The Java Memory Model

129
129
129
130
131
133

Implementing synchronization

139

The Java bytecode implementation

146

Implementing threads

150

135
135

Early problems and ambiguities
JSR-133

136
138

Primitives
Locks

Lock pairing

Green threads
OS threads

139
141
148
150
151

Optimizing threads and synchronization
Lock inflation and lock deflation
Recursive locking
Lock fusion
Lazy unlocking

152
152
153
154
155

Pitfalls and false optimizations
Thread.stop, Thread.resume and Thread.suspend
Double checked locking
JRockit flags
Examining locks and lazy unlocking

159
159
160
162
162

Implementation
Object banning
Class banning
Results

Lock details from -Xverbose:locks
Controlling lazy unlocking with
–XX:UseLazyUnlocking

Using SIGQUIT or Ctrl-Break for Stack Traces

[ iv ]

156
157
157
158

162
163

163

Table of Contents

Lock profiling

Enabling lock profiling with -XX:UseLockProfiling

Setting thread stack size using -Xss
Controlling lock heuristics
Summary

Chapter 5: Benchmarking and Tuning
Reasons for benchmarking
Performance goals
Performance regression testing
Easier problem domains to optimize
Commercial success
What to think of when creating a
benchmark
Measuring outside the system
Measuring several times
Micro benchmarks

Micro benchmarks and on-stack replacement
Micro benchmarks and startup time

166

166

168
168
168

171
172
172
173
174
175
175
177
179
179
181
182

Give the benchmark a chance to warm-up
Deciding what to measure
Throughput
Throughput with response time and latency
Scalability
Power consumption
Other issues
Industry-standard benchmarks
The SPEC benchmarks

183
183
184
184
185
186
187
187
188

SipStone
The DaCapo benchmarks
Real world applications
The dangers of benchmarking
Tuning
Out of the box behavior
What to tune for

192
192
193
193
194
194
196

The SPECjvm suite
The SPECjAppServer / SPECjEnterprise2010 suite
The SPECjbb suite

Tuning memory management
Tuning code generation
Tuning locks and threads
Generic tuning

188
189
190

196
202
204
205

[v]

Table of Contents

Common bottlenecks and how to avoid them
The –XXaggressive flag
Too many finalizers
Too many reference objects
Object pooling
Bad algorithms and data structures

206
207
207
207
208
209

Misuse of System.gc
Too many threads
One contended lock is the global bottleneck
Unnecessary exceptions
Large objects
Native memory versus heap memory
Wait/notify and fat locks
Wrong heap size
Too much live data
Java is not a silver bullet
Summary

211
211
212
212
214
215
216
216
216
217
218

Classic textbook issues
Unwanted intrinsic properties

Chapter 6: JRockit Mission Control

Background
Sampling-based profiling versus exact profiling
A different animal to different people
Mission Control overview
Mission Control server-side components
Mission Control client-side components
Terminology
Running the standalone version of Mission Control
Running JRockit Mission Control inside Eclipse
Starting JRockit for remote management
The JRockit Discovery Protocol

209
210

219
220
221
223
224
226
226
228
230
232
235
236

Running in a secure environment
Troubleshooting connections

241
243

The Experimental Update Site
Debugging JRockit Mission Control
Summary

246
247
249

Hostname resolution issues

Chapter 7: The Management Console
A JMX Management Console
Using the console
General

[ vi ]

245

251
252
253
254

Table of Contents
The Overview

254

MBeans

261

Runtime

271

Advanced

276

Other

283

MBean Browser
Triggers

262
266

System
Memory
Threads

272
273
275

Method Profiler
Exception Count
Diagnostic Commands

277
280
281

JConsole

Extending the JRockit Mission Control Console
Summary

Chapter 8: The Runtime Analyzer
The need for feedback
Recording
Analyzing JRA recordings
General
Overview
Recording
System

283

284
292

293
294
295
298
299
299
300
301

Memory

302

Code

310

Thread/Locks

315

Latency

321

Overview
GCs
GC Statistics
Allocation
Heap Contents
Object Statistics

302
302
306
308
309
309

Overview
Hot Methods
Optimizations

310
312
314

Overview
Threads
Java Locks
JVM Locks
Thread Dumps

316
317
318
320
320

Overview
Log
Graph

322
323
324

[ vii ]

Table of Contents
Threads
Traces
Histogram

325
326
327

Using the Operative Set
Troubleshooting
Summary

327
331
331

Chapter 9: The Flight Recorder

333

Starting time-limited recordings

339

The evolved Runtime Analyzer
A word on events
The recording engine
Startup options

334
334
335
337

Flight Recorder in JRockit Mission
Control
Advanced Flight Recorder Wizard concepts
Differences to JRA
The range selector
The Operative Set
The relational key
What's in a Latency?
Exception profiling
Memory
Adding custom events
Extending the Flight Recorder client
Summary

Chapter 10: The Memory Leak Detector

A Java memory leak
Memory leaks in static languages
Memory leaks in garbage collected languages
Detecting a Java memory leak
Memleak technology
Tracking down the leak
A look at classloader-related information
Interactive memory leak hunting
The general purpose heap analyzer
Allocation traces
Troubleshooting Memleak
Summary

[ viii ]

340
345
348
349
351
351
354
357
359
361
366
377

379
379
380
380
381
382
383
392
394
397
398
399
401

Table of Contents

Chapter 11: JRCMD

403

Introduction
Overriding SIGQUIT
Special commands
Limitations of JRCMD
JRCMD command reference
check_flightrecording (R28)
checkjrarecording (R27)
command_line
dump_flightrecording (R28)
heap_diagnostics (R28)
hprofdump (R28)
kill_management_server
list_vmflags (R28)
lockprofile_print
lockprofile_reset
memleakserver
oom_diagnostics (R27)
print_class_summary
print_codegen_list
print_memusage (R27)
print_memusage (R28)
print_object_summary
print_properties
print_threads
print_utf8pool
print_vm_state
run_optfile (R27)
run_optfile (R28)
runfinalization
runsystemgc
set_vmflag (R28)
start_flightrecording (R28)
start_management_server
startjrarecording (R27)
stop_flightrecording (R28)
timestamp
verbosity
version
Summary

404
405
407
407
408
408
409
410
410
411
415
416
416
417
418
418
419
419
420
421
422
428
431
432
434
434
435
436
436
436
437
437
438
439
440
441
441
443
443

[ ix ]

Table of Contents

Chapter 12: Using the JRockit Management APIs

445

Chapter 13: JRockit Virtual Edition

461

JMAPI
JMAPI examples
JMXMAPI
The JRockit internal performance counters
An example—building a remote version of JRCMD
Summary
Introduction to virtualization
Full virtualization
Paravirtualization
Other virtualization keywords
Hypervisors
Hosted hypervisors
Native hypervisors
Hypervisors in the market

445
447
450
452
455
460
462
464
464
465
465
465
466
466

Advantages of virtualization
Disadvantages of virtualization
Virtualizing Java
Introducing JRockit Virtual Edition

467
468
468
470

The JRockit VE kernel

The virtual machine image concept and management frameworks
Benefits of JRockit VE
Performance and better resource utilization
Manageability
Simplicity and security

Constraints and limitations of JRockit VE
A look ahead—can virtual be faster than real?
Quality of hot code samples
Adaptive heap resizing
Inter-thread page protection
Improved garbage collection
Concurrent compaction

Summary

472

474
481

481
484
484

486
486
486
487
488
488
491

491

Appendix A: Bibliography
Appendix B: Glossary
Index

493
503
545

[x]

Preface
This book is the result of an amazing series of events.
In high school, back in the pre-Internet era, the authors used to hang out at the same
bulletin board systems and found each other in a particularly geeky thread about
math problems. Bulletin board friendship led to friendship in real life, as well as
several collaborative software projects. Eventually, both authors went on to study
at the Royal Institute of Technology (KTH) in Stockholm.
More friends were made at KTH, and a course in database systems in our third year
brought enough people with a similar mindset together to achieve critical mass. The
decision was made to form a consulting business named Appeal Software Solutions
(the acronym A.S.S. seemed like a perfectly valid choice at the time). Several of us
started to work alongside our studies and a certain percentage of our earnings was
put away so that the business could be bootstrapped into a full-time occupation
when everyone was out of university. Our long-term goal was always to work with
product development, not consulting. However, at the time we did not know what
the products would turn out to be.
In 1997, Joakim Dahlstedt, Fredrik Stridsman and Mattias Joëlson won a trip to
one of the first JavaOne conferences by out-coding everyone in a Sun sponsored
competition for university students. For fun, they did it again the next year with
the same result.
It all started when our three heroes noticed that between the two JavaOne
conferences in 1997 and 1998, the presentation of Sun's adaptive virtual machine
HotSpot remained virtually unchanged. HotSpot, it seemed at the time, was the
answer to the Java performance problem. Java back then was mostly an interpreted
language and several static compilers for Java were on the market, producing code
that ran faster than bytecode, but that usually violated the language semantics in
some fundamental way. As this book will stress again and again, the potential power
of an adaptive runtime approach exceeds, by far, that of any ahead-of-time solution,
but is harder to achieve.

Preface

Since there were no news about HotSpot in 1998, youthful hubris caused us to ask
ourselves "How hard can it be? Let's make a better adaptive VM, and faster!" We
had the right academic backgrounds and thought we knew in which direction to go.
Even though it definitely was more of a challenge than we expected, we would still
like to remind the reader that in 1998, Java on the server side was only just beginning
to take off, J2EE hardly existed and no one had ever heard of a JSP. The problem
domain was indeed a lot smaller in 1998.
The original plan was to have a proof of concept implementation of our own JVM
finished in a year, while running the consulting business at the same time to finance
the JVM development. The JVM was originally christened "RockIT", being both rock
'n' roll, rock solid and IT. A leading "J" was later added for trademark reasons.
Naturally, after a few false starts, we needed to bring in venture capital. Explaining
how to capitalize on an adaptive runtime (that the competitors gave away their own
free versions of) provided quite a challenge. Not just because this was 1998, and
investors had trouble understanding any venture not ultimately designed to either
(1) send text messages with advertisements to cell phones or (2) start up a web-based
mail order company.
Eventually, venture capital was secured and in early 2000, the first prototype of
JRockit 1.0 went public. JRockit 1.0, besides being, as someone on the Internet put it
"very 1.0", made some headlines by being extremely fast at things like multi-threaded
server applications. Further venture capital was acquired using this as leverage. The
consulting business was broken out into a separate corporation and Appeal Software
Solutions was renamed Appeal Virtual Machines. Sales people were hired and we
started negotiations with Sun for a Java license.
Thus, JRockit started taking up more and more of our time. In 2001, the remaining
engineers working in the consulting business, which had also grown, were all finally
absorbed into the full-time JVM project and the consulting company was mothballed.
At this time we realized that we both knew exactly how to take JRockit to the next
level and that our burn rate was too high. Management started looking for a suitor in
the form of a larger company to marry.
In February 2002, BEA Systems acquired Appeal Virtual Machines, letting nervous
venture capitalists sleep at night, and finally securing us the resources that we
needed for a proper research and development lab. A good-sized server hall for
testing was built, requiring reinforced floors and more electricity than was available
in our building. For quite a while, there was a huge cable from a junction box on
the street outside coming in through the server room window. After some time,
we outgrew that lab as well and had to rent another site to host some of our servers.

[2]

Preface

As part of the BEA platform, JRockit matured considerably. The first two years at
BEA, plenty of the value-adds and key differentiators between JRockit and other
Java solutions were invented, for example the framework that was later to become
JRockit Mission Control. Several press releases, world-beating benchmark scores,
and a virtualization platform quickly followed. With JRockit, BEA turned into one of
the "big three" JVM vendors on the market, along with Sun and IBM, and a customer
base of thousands of users developed. A celebration was in order when JRockit
started generating revenue, first from the tools suite and later from the unparalleled
GC performance provided by the JRockit Real Time product.
In 2008, BEA was acquired by Oracle, which caused some initial concerns, but
JRockit and the JRockit team ended up getting a lot of attention and appreciation.
For many years now, JRockit has been running mission-critical applications all over
the world. We are proud to have been part of the making of a piece of software
with that kind of market penetration and importance. We are equally proud to have
gone from a pre-alpha designed by six guys in a cramped office in the Old Town of
Stockholm to a world-class product with a world-class product organization.
The contents of this book stems from more than a decade of our experience with
adaptive runtimes in general, and with JRockit in particular. Plenty of the information
in this book has, to our knowledge, never been published anywhere before.
We hope you will find it both useful and educational!

What this book covers

Chapter 1: Getting Started. This chapter introduces the JRockit JVM and JRockit
Mission Control. Explains how to obtain the software and what the support matrix
is for different platforms. We point out things to watch out for when migrating
between JVMs from different vendors, and explain the versioning scheme for
JRockit and JRockit Mission control. We also give pointers to resources where
further information and assistance can be found.
Chapter 2: Adaptive Code Generation. Code generation in an adaptive runtime is
introduced. We explain why adaptive code generation is both harder to do in a
JVM than in a static environment as well as why it is potentially much more
powerful. The concept of "gambling" for performance is introduced. We examine
the JRockit code generation and optimization pipeline and walk through it with
an example. Adaptive and classic code optimizations are discussed. Finally, we
introduce various flags and directive files that can be used to control code
generation in JRockit.

[3]

Preface

Chapter 3: Adaptive Memory Management. Memory management in an adaptive
runtime is introduced. We explain how a garbage collector works, both by looking
at the concept of automatic memory management as well as at specific algorithms.
Object allocation in a JVM is covered in some detail, as well as the meta-info needed
for a garbage collector to do its work. The latter part of the chapter is dedicated to the
most important Java APIs for controlling memory management. We also introduce
the JRockit Real Time product, which can produce deterministic latencies in a Java
application. Finally, flags for controlling the JRockit JVM memory management
system are introduced.
Chapter 4: Threads and Synchronization. Threads and synchronization are very important
building blocks in Java and a JVM. We explain how these concepts work in the Java
language and how they are implemented in the JVM. We talk about the need for a Java
Memory Model and the intrinsic complexity it brings. Adaptive optimization based on
runtime feedback is done here as well as in all other areas of the JVM. A few important
anti-patterns such as double-checked locking are introduced, along with common
pitfalls in parallel programming. Finally we discuss how to do lock profiling in JRockit
and introduce flags that control the thread system.
Chapter 5: Benchmarking and Tuning. The relevance of benchmarking and the
importance of performance goals and metrics is discussed. We explain how to create
an appropriate benchmark for a particular problem set. Some industrial benchmarks
for Java are introduced. Finally, we discuss in detail how to modify application
and JVM behavior based on benchmark feedback. Extensive examples of useful
command-line flags for the JRockit JVM are given.
Chapter 6: JRockit Mission Control. The JRockit Mission Control tools suite is
introduced. Startup and configuration details for different setups are given. We
explain how to run JRockit Mission Control in Eclipse, along with tips on how
to configure JRockit to run Eclipse itself. The different tools are introduced and
common terminology is established. Various ways to enable JRockit Mission
Control to access a remotely running JRockit, together with trouble-shooting tips,
are provided.
Chapter 7: The Management Console. This chapter is about the Management Console
component in JRockit Mission Control. We introduce the concept of diagnostic
commands and online monitoring of a JVM instance. We explain how trigger rules
can be set, so that notifications can be given upon certain events. Finally, we show
how to extend the Management Console with custom components.

[4]

Preface

Chapter 8: The Runtime Analyzer. The JRockit Runtime Analyzer (JRA) is introduced.
The JRockit Runtime Analyzer is an on-demand profiling framework that produces
detailed recordings about the JVM and the application it is running. The recorded
profile can later be analyzed offline, using the JRA Mission Control plugin.
Recorded data includes profiling of methods and locks, as well as garbage collection
information, optimization decisions, object statistics, and latency events. You will
learn how to detect some common problems in a JRA recording and how the latency
analyzer works.
Chapter 9: The Flight Recorder. The JRockit Flight Recorder has superseded JRA
in newer versions of the JRockit Mission Control suite. This chapter explains the
features that have been added that facilitate even more verbose runtime recordings.
Differences in functionality and GUI are covered.
Chapter 10: The Memory Leak Detector. This chapter introduces the JRockit Memory
Leak Detector, the final tool in the JRockit Mission Control tools suite. We explain
the concept of a memory leak in a garbage collected language and discuss several
use cases for the Memory Leak Detector. Not only can it be used to find unintentional
object retention in a Java application, but it also works as a generic heap analyzer.
Some of the internal implementation details are given, explaining why this tool
also runs with a very low overhead.
Chapter 11: JRCMD. The command-line tool JRCMD is introduced. JRCMD enables
a user to interact with all JVMs that are running on a particular machine and to
issue them diagnostic commands. The chapter has the form of a reference guide and
explains the most important available diagnostic commands. A diagnostic command
can be used to examine or modify the state of a running JRockit JVM
Chapter 12: Using the JRockit Management APIs. This chapter explains how to
programmatically access some of the functionality in the JRockit JVM. This is
the way the JRockit Mission Control suite does it. The APIs JMAPI and JMXMAPI
are introduced. While they are not fully officially supported, several insights can be
gained about the inner mechanisms of the JVM by understanding how they work.
We encourage you to experiment with your own setup.
Chapter 13: JRockit Virtual Edition. We explain virtualization in a modern
"cloud-based" environment. We introduce the product JRockit Virtual Edition.
Removing the OS layer from a virtualized Java setup is less problematic than
one might think. It can also help getting rid of some of the runtime overhead
that is typically associated with virtualization. We go on to explain how potentially
this can even reduce Java virtualization overhead to levels not possible even on
physical hardware.

[5]

Preface

What you need for this book

You will need a correctly installed JRockit JVM and runtime environment. To get full
benefits from this book, a JRockit version of R28 or later is recommended. However,
an R27 version will also work. Also, a correctly installed Eclipse for RCP/Plug-in
Developers is useful, especially if trying out the different ways to extend JRockit
Mission Control and for working with the programs in the code bundle.

Who this book is for

This book is for anyone with a working knowledge of Java, such as developers or
administrators with experience from a few years of professional Java development
or from managing larger Java installations. The book is divided into three parts.
The first part is focused on what a Java Virtual Machine, and to some extent any
adaptive runtime, does and how it works. It will bring up strengths and weaknesses
of runtimes in general and of JRockit more specifically, attempting to explain good
Java coding practices where appropriate. Peeking inside the "black box" that is the
JVM will hopefully provide key insights into what happens when a Java system
runs. The information in the first part of the book will help developers and architects
understand the consequences of certain design decisions and help them make better
ones. This part might also work as study material in a university-level course on
adaptive runtimes.
The second part of the book focuses on using the JRockit Mission Control to
make Java applications run more optimally. This part of the book is useful for
administrators and developers who want to tune JRockit to run their particular
applications with maximum performance. It is also useful for developers who want
to tune their Java applications for better resource utilization and performance. It
should be realized, however, that there is only so much that can be done by tuning
the JVM—sometimes there are simple or complex issues in the actual applications,
that, if resolved, will lead to massive performance increases. We teach you how the
JRockit Mission Control suite suite assists you in finding such bottlenecks and helps
you cut hardware and processing costs.
The final part of the book deals with important JRockit-related technologies that
have recently, or will soon, be released. This chapter is for anyone interested in
how the Java landscape is transforming over the next few years and why. The
emphasis is on virtualization.
Finally, there is a bibliography and a glossary of all technical terms used in the book.

[6]

Preface

Conventions

This book will, at times, show Java source code and command lines. Java code
is formatted with a fixed width font with standard Java formatting. Commandline utilities and parameters are also be printed with a fixed width font. Likewise,
references to file names, code fragments, and Java packages in sentences will use a
fixed width font.
Short and important information, or anecdotes, relevant to the current section of text
is placed in information boxes.

The contents of an information box—this is important!

Technical terms and fundamental concepts are highlighted as keywords. Keywords
also often appear in the glossary for quick reference.
Throughout the book, the capitalized tags JROCKIT_HOME and JAVA_HOME should be
expanded to the full path of your JRockit JDK/JRE installation. For example, if you
have installed JRockit so that your java executable is located in:
C:\jrockits\jrockit-jdk1.5.0_17\bin\java.exe

the JROCKIT_HOME and JAVA_HOME variables should be expanded to:
C:\jrockits\jrockit-jdk1.5.0_17\

The JRockit JVM has its own version number. The latest major version of JRockit is
R28. Minor revisions of JRockit are annotated with point release numbers after the
major version number. For example R27.1 and R27.2. We will, throughout the book,
assume R27.x to mean any R27-based version of the JRockit JVM, and R28.x to mean
any R28-based version of the JRockit JVM.
This book assumes that R28 is the JRockit JVM being used, where
no other context is supplied. Information relevant only to earlier
versions of JRockit is specifically tagged.

JRockit Mission Control clients use more standard revision numbers, for example
4.0. Any reference to 3.x and 4.0 in the context of tools mean the corresponding
versions of the JRockit Mission Control clients. At the time of this writing, 4.0 is the
latest version of the Mission Control client, and is, unless explicitly stated otherwise,
assumed to be the version in use in the examples in this book.
[7]

Preface

We will sometimes refer to third-party products. No deeper familiarity with them
is required to get full benefits from this book. The products mentioned are:
Oracle WebLogic Server—the Oracle J2EE application server.
http://www.oracle.com/weblogicserver

Oracle Coherence—the Oracle in-memory distributed cache technology.
http://www.oracle.com/technology/products/coherence/index.html

Oracle Enterprise Manager—the Oracle application management suite.
http://www.oracle.com/us/products/enterprise-manager/index.htm

Eclipse—the Integrated Development Environment for Java (and other languages).
http://www.eclipse.org

HotSpot™—the HotSpot™ virtual machine.
http://java.sun.com/products/hotspot

See the link associated with each product for further information.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about
this book—what you liked or may have disliked. Reader feedback is important for
us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to feedback@packtpub.com,
and mention the book title via the subject of your message.
If there is a book that you need and would like to see us publish, please send
us a note in the SUGGEST A TITLE form on www.packtpub.com or
e-mail suggest@packtpub.com.
If there is a topic that you have expertise in and you are interested in either writing
or contributing to a book on, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to
help you to get the most from your purchase.

[8]

Preface

Downloading the example code for the book
Visit http://www.packtpub.com/site/default/
files/8068_Code.zip to directly download the example code.
The downloadable files contain instructions on how to use them.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you find a mistake in one of our books—maybe a mistake in the text
or the code—we would be grateful if you would report this to us. By doing so, you
can save other readers from frustration and help us improve subsequent versions
of this book. If you find any errata, please report them by visiting http://www.
packtpub.com/support, selecting your book, clicking on the let us know link, and
entering the details of your errata. Once your errata are verified, your submission
will be accepted and the errata will be uploaded on our website, or added to any list
of existing errata, under the Errata section of that title. Any existing errata can be
viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media.
At Packt, we take the protection of our copyright and licenses very seriously. If you
come across any illegal copies of our works, in any form, on the Internet, please
provide us with the location address or website name immediately so that we can
pursue a remedy.
Please contact us at copyright@packtpub.com with a link to the suspected
pirated material.
We appreciate your help in protecting our authors, and our ability to bring you
valuable content.

Questions

You can contact us at questions@packtpub.com if you are having a problem with
any aspect of the book, and we will do our best to address it.

[9]

Getting Started
While parts of this book, mainly the first part, contain generic information on the
inner workings of all adaptive runtimes, the examples and in-depth information
still assume that the JRockit JVM is used. This chapter briefly explains how to obtain
the JRockit JVM and covers porting issues that may arise while deploying your Java
application on JRockit.
In this chapter, you will learn:
•

How to obtain JRockit

•

The platforms supported by JRockit

•

How to migrate to JRockit

•

About the command-line options to JRockit

•

How to interpret JRockit version numbers

•

Where to get help if you run into trouble

Obtaining the JRockit JVM

To get the most out of this book, the latest version of the JRockit JVM is required. For
JRockit versions prior to R27.5, a license key was required to access some of the more
advanced features in JRockit. As part of the Oracle acquisition of BEA Systems, the
license system was removed and it is now possible to access all features in JRockit
without any license key at all. This makes it much easier to evaluate JRockit and
to use JRockit in development. To use JRockit in production, a license must still be
purchased. For Oracle customers, this is rarely an issue, as JRockit is included with
most application suites, for example, any suite that includes WebLogic Server will
also include JRockit.

Getting Started

At the time of writing, the easiest way to get a JRockit JVM is to download and
install JRockit Mission Control—the diagnostics and profiling tools suite for JRockit.
The folder layout of the Mission Control distribution is nearly identical to that of any
JDK and can readily be used as a JDK. The authors would very much like to be able
to provide a self-contained JVM-only JDK for JRockit, but this is currently beyond
our control. We anticipate this will change in the near future.
Before JRockit Mission Control is downloaded, ensure that a supported platform
is used. The server part of Mission Control is supported on all platforms for which
JRockit is supported.
Following is the platform matrix for JRockit Mission Control 3.1.x:
Platform

Java 1.4.2

Java 5.0

Java 6

Linux x86

X

X

X

Linux x86-64

N/A

X

X

Linux Itanium

X (server only)

X (server only)

N/A

Solaris SPARC (64-bit)

X (server only)

X (server only)

X (server only)

Windows x86

X

X

X

Windows x86-64

N/A

X (server only)

X (server only)

Windows Itanium

X (server only)

X (server only)

N/A

Following is the platform matrix for JRockit Mission Control 4.0.0:
Platform

Java 5.0

Java 6

Linux x86

X

X

Linux x86-64

X

X

Solaris SPARC (64-bit)

X (server only)

X (server only)

Windows x86

X

X

Windows x86-64

X

X

Note that the JRockit Mission Control client is not (yet) supported on Solaris, but
that 64-bit Windows support has been added in 4.0.0.
When running JRockit Mission Control on Windows, ensure that
the system's temporary directory is on a file system that supports
per-user file access rights. In other words, make sure it is not on a
FAT formatted disk. On a FAT formatted disk, essential features
such as automatic discovery of local JVMs will be disabled.

[ 12 ]

Chapter 1

The easiest way to get to the JRockit home page is to go to your favorite search
engine and type in "download JRockit". You should end up on a page on the
Oracle Technology Network from which the JVM and the Mission Control suite
can be downloaded. The installation process varies between platforms, but should
be rather self explanatory.

Migrating to JRockit

Throughout this book, we will refer to the directory where the JRockit JVM is installed
as JROCKIT_HOME. It might simplify things to make JROCKIT_HOME a system variable
pointing to that particular path. After the installation has completed, it is a good idea to
put the JROCKIT_HOME/bin directory on the path and to update the scripts for any Java
applications that should be migrated to JRockit. Setting the JAVA_HOME environment
variable to JROCKIT_HOME is also recommended. In most respects JRockit is a direct
drop in replacement for other JVMs, but some startup arguments, for example
arguments that control specific garbage collection behavior, typically differ between
JVMs from different vendors. Common arguments, however, such as arguments for
setting a maximum heap size, tend to be standardized between JVMs.
For more information about specific migration details, see the
Migrating Applications to the Oracle JRockit JDK Chapter in the
online documentation for JRockit.

Command-line options

There are three main types of command-line options to JRockit—system properties,
standardized options (-X flags), and non-standard ones (-XX flags).

System properties

Startup arguments to a JVM come in many different flavors. Arguments starting
with –D are interpreted as a directive to set a system property. Such system
properties can provide configuration settings for various parts of the Java class
libraries, for example RMI. JRockit Mission Control provides debugging information
if started with –Dcom.jrockit.mc.debug=true. In JRockit versions post R28, the use
of system properties to provide parameters to the JVM has been mostly deprecated.
Instead, most options to the JVM are provided through non-standard options and the
new HotSpot style VM flags.

[ 13 ]

Getting Started

Standardized options

Configuration settings for the JVM typically start with -X for settings that are
commonly supported across vendors. For example, the option for setting the
maximum heap size, -Xmx, is the same on most JVMs, JRockit included. There
are a few exceptions here. The JRockit flag –Xverbose provides logging with
optional sub modules. The similar (but more limited) flag in HotSpot is called
just –verbose.

Non-standard options

Vendor-specific configuration options are usually prefixed with -XX. These options
should be treated as potentially unsupported and subject to change without notice.
If any JVM setup depends on -XX-prefixed options, those flags should be removed
or ported before an application is started on a JVM from a different vendor.
Once the JVM options have been determined, the user application can be started.
Typically, moving an existing application to JRockit leads to an increase in runtime
performance and a slight increase in memory consumption.
The JVM documentation should always be consulted to determine if non-standard
command-line options have the same semantics between different JVMs and
JVM versions.

VM flags

In JRockit versions post R28, there is also a subset of the non-standard options called
VM flags. The VM flags use the -XX:= syntax. These flags can also be
read and, depending on the particular flag, written using the command-line utility
JRCMD after the JVM has been started. For more information on JRCMD,
see Chapter 11.

Changes in behavior

Sometimes there is a change of runtime behavior when moving from one JVM to
another. Usually it boils down to different JVMs interpreting the Java Language
Specification or Java Virtual Machine Specification differently, but correctly. In
several places there is some leeway in the specification that allows different vendors
to implement the functionality in a way that best suits the vendor's architecture. If an
application relies too much on a particular implementation of the specification, the
application will almost certainly fail when switching to another implementation.
For example, during the milestone testing for an older version of Eclipse, some of
the tests started failing when running on JRockit. This was due to the tests having
inter-test dependencies, and this particular set of tests were relying on the test
[ 14 ]

Chapter 1

harness running the tests in a particular order. The JRockit implementation of the
reflective listing of methods (Class#getDeclaredMethods) did not return the
methods in the same order as other JVMs, which according to the specification
is fine. It was later decided by the Eclipse development team that relying on a
particular method ordering was a bug, and the tests were consequently corrected.
If an application has not been written to the specification, but rather to the behavior
of the JVM from a certain vendor, it can fail. It can even fail when running with a
more recent version of the JVM from the same vendor. When in doubt, consult the
Java Language Specification and the documentation for the JDK.
Differences in performance may also be an issue when switching JVMs for an
application. Latent bugs that weren't an issue with one JVM may well be an issue
with another, if for example, performance differences cause events to trigger earlier
or later than before. These things tend to generate support issues but are rarely the
fault of the JVM.
For example, a customer reported that JRockit crashed after only a day. Investigation
concluded that the application also crashed with a JVM from another vendor, but
it took a few more days for the application to crash. It was found that the crashing
program ran faster in JRockit, and that the problem; a memory leak, simply came to
light much more quickly.
Naturally, any JVM, JRockit included, can have bugs. In order to brand itself "Java",
a Java Virtual Machine implementation has to pass an extensive test suite—the Java
Compatibility Kit (JCK).
JRockit is continuously subjected to a battery of tests using a distributed test system.
Large test suites, of which the JCK is one component, are run to ensure that JRockit
can be released as a stable, Java compatible, and certified JVM. Large test suites
from various high profile products, such as Eclipse and WebLogic Server, as well as
specially designed stress tests, are run on all supported platforms before a release
can take place. Continuous testing against performance regressions is also done as a
fundamental part of our QA infrastructure. Even so, bugs do happen. If JRockit does
crash, it should always be reported to Oracle support engineers.

A note on JRockit versioning

The way JRockit is versioned can be a little confusing. There are at least three version
numbers of interest for each JRockit release:
1. The JRockit JVM version.
2. The JDK version.
3. The Mission Control version.
[ 15 ]

Getting Started

One way to obtain the version number of the JVM is to run java –version from the
command prompt. This would typically result in something like the following lines
being printed to the console:
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Oracle JRockit(R) (build R28.0.0-582-123273-1.6.0_
14-20091029-2121-windows-ia32, compiled mode)

The first version number is the JDK version being bundled with the JVM. This
number is in sync with the standard JDK versions, for the JDK shipped with
HotSpot. From the example, we can gather that Java 1.6 is supported and that it is
bundled with the JDK classes from update 14-b08. If you, for example, are looking
to see what JDK class-level security fixes are included in a certain release, this would
be the version number to check.
The JRockit version is the version number starting with an 'R'. In the above example
this would be R28.0.0. Each version of the JRockit JVM is built for several different
JDKs. The R27.6.5, for instance, exists in versions for Java 1.4, 1.5 (5.0) and 1.6 (6.0).
With the R28 version of JRockit, the support for Java 1.4 was phased out.
The number following the version number is the build number, and the number
after that is the change number from the versioning system. In the example, the
build number was 582 and the change number 123273. The two numbers after the
change number are the date (in compact ISO 8601 format) and time (CET) the build
was made. After that comes the operating system and CPU architecture that the JVM
was built for.
The version number for JRockit Mission Control can be gathered by executing
jrmc -version or jrmc -version | more from the command line.
On Windows, the JRockit Mission Control launcher (jrmc)
is based on the javaw launcher to avoid opening a console
window. Console output will not show unless explicitly
redirected, for example to more.

[ 16 ]

Chapter 1

The output should look like this:
Oracle JRockit(R) Mission Control(TM) 4.0 (for JRockit R28.0.0)
java.vm.version = R28.0.0-582-123273-1.6.0
_14-20091029-2121-windows-ia32
build = R28.0.0-582
chno = 123217
jrmc.fullversion = 4.0.0
jrmc.version = 4.0
jrockit.version = R28.0.0
year = 2009

The first line tells us what version of Mission Control this is and what version of
JRockit it was created for. The java.vm.version line tells us what JVM Mission
Control is actually running on. If Mission Control has been launched too "creatively",
for example by directly invoking its main class, there may be differences between
the JVM information in the two lines. If this is the case, some functionality in
JRockit Mission Control, such as automatic local JVM discovery, may be disabled.

Getting help

There are plenty of helpful resources on JRockit and JRockit Mission Control
available on the Oracle Technology Network, such as blogs, articles, and forums.
JRockit developers and support staff are continuously monitoring the forums, so
if an answer to a particular question cannot be found in the forums already, it is
usually answered within a few days. Some questions are asked more frequently
than others and have been made into "stickies"—forum posts that will stay at the
top of the topic listings. There is, for example, a "sticky" available on how to acquire
license files for older versions of JRockit.
The JRockit Forum can, at the time of writing, be found here:
http://forums.oracle.com/forums/forum.jspa?forumID=561
Here are the locations of some popular JRockit blogs:
http://blogs.oracle.com/jrockit/
http://blogs.oracle.com/hirt/
http://blogs.oracle.com/staffan/

[ 17 ]

Getting Started

Summary

This chapter provided a short guide for getting started with the JRockit JVM and for
migrating existing applications to the JRockit JVM. We covered installing JRockit and
provided insights into common pitfalls when migrating a Java application from one
JVM to another.
The different categories of command-line flags that JRockit supports were explained,
and we showed examples of how to find the version numbers for the different
components of the JRockit JDK.
Finally, we provided pointers to additional help.

[ 18 ]

Adaptive Code Generation
This chapter covers code generation and code optimization in a JVM runtime
environment, both as a general concept as well as taking a closer look at the JRockit
code generation internals. We start by discussing the Java bytecode format, and how
a JIT compiler works, making a case for the power of adaptive runtimes. After that,
we drill down into the JRockit JVM. Finally, the reader learns how to control code
generation and optimization in JRockit.
You will learn the following from this chapter:
•

The benefits of a portable platform-independent language such as Java.

•

The structure of the Java bytecode format and key details of the Java Virtual
Machine specification.

•

How the JVM interprets bytecode in order to execute a Java program.

•

Adaptive optimizations at runtime versus static ahead-of-time
compilation. Why the former is better but harder to do. The
"gambling on performance" metaphor.

•

Why code generation in an adaptive runtime is potentially very powerful.

•

How Java can be compiled to native code, and what the main problems are.
Where should optimizations be done—by the Java programmer, by the JVM,
or at the bytecode level?

•

How the JRockit code pipeline works and its design rationales.

•

How to control the code generator in JRockit.

Adaptive Code Generation

Platform independence

The main selling point for Java when it first came out, and the main contributor to its
success as a mainstream language, was the write once/run everywhere concept. Java
programs compile into platform-independent, compact Java bytecode (.class files).
There is no need to recompile a Java application for different architectures, since all
Java programs run on a platform-specific Java Virtual Machine that takes care of the
final transition to native code.
This widely enhanced portability is a good thing. An application, such as a C++
program, that compiles to a platform-dependent format, has a lot less flexibility.
The C++ compiler may compile and heavily optimize the program, for example for
the x86 architecture. Then x86 will be the only architecture on which the program
can run. We can't readily move the program, optimizations and all, to SPARC. It has
to be recompiled, perhaps by a weaker compiler that doesn't optimize as well as for
x86. Also if the x86 architecture is upgraded with new instructions, the program will
not be able to take advantage of these without being recompiled. Portability can of
course be achieved by distributing source code, but that may instead be subject to
various license restrictions. In Java, the portability problem is moved to the JVM,
and thus becomes third-party responsibility for the programmer.
In the Java world, all platforms on which a JVM exists can execute Java.
Platform-independent bytecode is not a new concept per se, and has been used in
several languages in the past, for example Pascal and Smalltalk. However, Java was
the first language where it was a major factor in its widespread adoption.
When Java was new, its applications were mainly in the form of Applets, designed
for embedded execution in a web browser. Applets are typical examples of client
side programs. However, Java is not only platform-independent, but it also has
several other nice intrinsic language properties such as built-in memory management
and protection against buffer overruns. The JVM also provides the application with
a secure sandboxed platform model. All of these things make Java ideal not only for
client applications, but also for complex server side logic.
It took a few years before the benefits of Java as a server-side language were
fully acknowledged. Its inherent robustness led to rapidly shorter application
development times compared to C++, and to widespread server adoption. Shorter
development cycles matter a lot when the application being developed is fairly
complex, such as is typically the case for the server side.

[ 20 ]

Chapter 2

The Java Virtual Machine

While platform-independent bytecode provides complete portability between
different hardware platforms, a physical CPU still can't execute it. The CPU
only knows to execute its particular flavor of native code.
Throughout this text, we will refer to code that is specific to a certain
hardware architecture as native code. For example, x86 assembly language or
x86 machine code is native code for the x86 platform. Machine code should
be taken to mean code in binary platform-dependent format. Assembly
language should be taken to mean machine code in human-readable form.

Thus, the JVM is required to turn the bytecodes into native code for the CPU on
which the Java application executes. This can be done in one of the following two
ways (or a combination of both):
•

•

The Java Virtual Machine specification fully describes the JVM as a state
machine, so there is no need to actually translate bytecode to native code.
The JVM can emulate the entire execution state of the Java program,
including emulating each bytecode instruction as a function of the JVM
state. This is referred to as bytecode interpretation. The only native code
(barring JNI) that executes directly here is the JVM itself.
The Java Virtual Machine compiles the bytecode that is to be executed to
native code for a particular platform and then calls the native code. When
bytecode programs are compiled to native code, this is typically done one
method at the time, just before the method in question is to be executed for
the first time. This is known as Just-In-Time compilation (JIT).

Naturally, a native code version of a program executes orders of magnitude
faster than an interpreted one. The tradeoff is, as we shall see, bookkeeping
and compilation time overhead.

Stack machine

The Java Virtual Machine is a stack machine. All bytecode operations, with few
exceptions, are computed on an evaluation stack by popping operands from the
stack, executing the operation and pushing the result back to the stack. For example,
an addition is performed by pushing the two terms to the stack, executing an add
instruction that consumes the operands and produces a sum, which is placed on
the stack. The party interested in the result of the addition then pops the result.
In addition to the stack, the bytecode format specifies up to 65,536 registers or
local variables.
[ 21 ]

Adaptive Code Generation

An operation in bytecode is encoded by just one byte, so Java supports up to 256
opcodes, from which most available values are claimed. Each operation has
a unique byte value and a human-readable mnemonic.
The only new bytecode value that has been assigned throughout the
history of the Java Virtual Machine specification is 0xba—previously
reserved, but about to be used for the new operation invokedynamic.
This operation can be used to implement dynamic dispatch when a
dynamic language (such as Ruby) has been compiled to Java bytecode.
For more information about using Java bytecode for dynamic languages,
please refer to Java Specification Request (JSR) 292 on the Internet.

Bytecode format

Consider the following example of an add method in Java source code and then in
Java bytecode format:
public int add(int a, int b) {
return a + b;
}
public int add(int,
Code:
0:
iload_1
1:
iload_2
2:
iadd
3:
ireturn
}

int);
//
//
//
//

stack: a
stack: a, b
stack: (a+b)
stack:

The input parameters to the add method, a and b, are passed in local variable
slots 1 and 2 (Slot 0 in an instance method is reserved for this, according to the
JVM specification, and this particular example is an instance method). The first
two operations, with opcodes iload_1 and iload_2, push the contents of these
local variables onto the evaluation stack. The third operation, iadd, pops the two
values from the stack, adds them and pushes the resulting sum. The fourth and
final operation, ireturn, pops the sum from the bytecode stack and terminates
the method using the sum as return value. The bytecode in the previous example
has been annotated with the contents of the evaluation stack after each operation
has been executed.
Bytecode for a class can be dumped using the javap command with
the –c command-line switch. The command javap is part of the JDK.

[ 22 ]

Chapter 2

Operations and operands

As we see, Java bytecode is a relatively compact format, the previous method
only being four bytes in length (a fraction of the source code mass). Operations
are always encoded with one byte for the opcode, followed by an optional number
of operands of variable length. Typically, a bytecode instruction complete with
operands is just one to three bytes.
Here is another small example, a method that determines if a number is even or not.
The bytecode has been annotated with the hexadecimal values corresponding to the
opcodes and operand data.
public boolean even(int number) {
return (number & 1) == 0;
}
public boolean even(int);
Code:
0:
iload_1
// 0x1b
1:
iconst_1
// 0x04
2:
iand
// 0x7e
3:
ifne
10 // 0x9a 0x00 0x07
6:
iconst_1
// 0x03
7:
goto
11 // 0xa7 0x00 0x04
10: iconst_0
// 0x03
11: ireturn
// 0xac
}

number
number, 1
(number & 1)
1
0

The program pushes its in-parameter, number and the constant 1 onto the evaluation
stack. The values are then popped, ANDed together, and the result is pushed on
the stack. The ifne instruction is a conditional branch that pops its operand from
the stack and branches if it is not zero. The iconst_0 operation pushes the constant
0 onto the evaluation stack. It has the opcode value 0x3 in bytecode and takes no
operands. In a similar fashion iconst_1 pushes the constant 1. The constants are
used for the boolean return value.
Compare and jump instructions, for example ifne (branch on not equal, bytecode
0x9a), generally take two bytes of operand data (enough for a 16 bit jump offset).
For example, if a conditional jump should move the instruction pointer
10,000 bytes forward in the case of a true condition, the operation would
be encoded as 0x9a 0x27 0x10 (0x2710 is 10,000 in hexadecimal. All
values in bytecode are big-endian).

Other more complex constructs such as table switches also exist in bytecode with
an entire jump table of offsets following the opcode in the bytecode.
[ 23 ]

Adaptive Code Generation

The constant pool

A program requires data as well as code. Data is used for operands. The operand
data for a bytecode program can, as we have seen, be kept in the bytecode instruction
itself. But this is only true when the data is small enough, or commonly used (such as
the constant 0).
Larger chunks of data, such as string constants or large numbers, are stored
in a constant pool at the beginning of the .class file. Indexes to the data in
the pool are used as operands instead of the actual data itself. If the string
aVeryLongFunctionName had to be separately encoded in a compiled method
each time it was operated on, bytecode would not be compact at all.
Furthermore, references to other parts of the Java program in the form of method, field,
and class metadata are also part of the .class file and stored in the constant pool.

Code generation strategies

There are several ways of executing bytecode in a JVM, from just emulating the
bytecode in a pure bytecode interpreter to converting everything to native code
for a particular platform.

Pure bytecode interpretation

Early JVMs contained only simple bytecode interpreters as a means of executing
Java code. To simplify this a little, a bytecode interpreter is just a main function
with a large switch construct on the possible opcodes. The function is called with
a state representing the contents of the Java evaluation stack and the local variables.
Interpreting a bytecode operation uses this state as input and output. All in all,
the fundamentals of a working interpreter shouldn't amount to more than a
couple of thousand lines of code.
There are several simplicity benefits to using a pure interpreter. The code generator
of an interpreting JVM just needs to be recompiled to support a new hardware
architecture. No new native compiler needs to be written. Also, a native compiler
for just one platform is probably much larger than our simple switch construct.
A pure bytecode interpreter also needs little bookkeeping. A JVM that compiles
some or all methods to native code would need to keep track of all compiled code.
If a method is changed at runtime, which Java allows, it needs to be scheduled
for regeneration as the old code is obsolete. In a pure interpreter, its new bytecodes
are simply interpreted again from the start the next time that we emulate a call to
the method.
[ 24 ]

Chapter 2

It follows that the amount of bookkeeping in a completely interpreted model is
minimal. This lends itself well to being used in an adaptive runtime such as a
JVM, where things change all the time.
Naturally, there is a significant performance penalty to a purely interpreted language
when comparing the execution time of an interpreted method with a native code
version of the same code. Sun Microsystems' Classic Virtual Machine started out as
a pure bytecode interpreter.
Running our previous add method, with its four bytecode instructions, might easily
require the execution of ten times as many native instructions in an interpreter
written in C. Whereas, a native version of our add most likely would just be two
assembly instructions (add and return).
int evaluate(int opcode, int* stack, int* localvars) {
switch (opcode) {
...
case iload_1:
case iload_2:
int lslot = opcode - iload_1;
stack[sp++] = localvars[lslot];
break;
case iadd:
int sum = stack[--sp] + stack[--sp];
stack[sp++] = sum;
break;
case ireturn:
return stack[--sp];
...
}
}

The previous example shows simple pseudo code for a bytecode interpreter with
just enough functionality to execute our add method. Even this simple code snippet
amounts to tens of assembly instructions in the natively compiled JVM. Considering
that a natively compiled version of the add method would just be two instructions,
this illustrates the performance problem with pure bytecode interpretation.
JIT compiling the add method on x86 yields us:
add eax, edx

// eax = edx+eax

ret

// return eax

[ 25 ]

Adaptive Code Generation

Note that this book will sometimes show assembly code examples in
places, to illustrate points. No prior knowledge of assembly code on
any platform is needed to reap the full benefits of the text. However, the
concept of low level languages should be familiar to the reader. If you
feel yourself breaking out in a cold sweat over the assembly listings that
are displayed in a few places throughout the text, don't worry too much.
They are not necessary to understand the big picture.

Static compilation

In the early days of Java, several simple "brute force" approaches to getting around the
bytecode performance problem were made. These usually involved static compilation
in some form. Usually, an entire Java program was compiled into native code before
execution. This is known as ahead-of-time compilation. Basically, ahead-of-time
compilation is what your average C++ compiler does all the time.
As a limited subset of the problem of static compilation for Java is easy to solve,
a row of products appeared in the late nineties, using methodologies like turning
bytecodes into naive C code and then passing it to a C compiler. Most of the time,
the resulting code ran significantly faster than purely interpreted bytecode.
However, these kinds of products rarely supported the full dynamic nature of the
Java language and were unable to graciously handle things like code being replaced
at runtime without large workarounds.
The obvious disadvantage of static compilation for Java is that the benefits of platform
independence immediately disappear. The JVM is removed from the equation.
Another disadvantage is that the automatic memory management of Java has
to be handled more or less explicitly, leading to limited implementations with
scalability issues.
As Java gradually moved more and more towards server side applications, where its
dynamic nature was put to even more use, static solutions became impractical. For
example, an application server generating plenty of Java Server Pages (JSPs) on the
fly reduces a static compiler to a JIT compiling JVM, only slower and less adaptive.

[ 26 ]

Chapter 2

Note that static ahead-of-time solutions, while unsuitable for
implementing Java, can be useful in certain other contexts, for example
ahead-of-time analysis. Program analysis is a time consuming
business. If some of it can be done offline, before program execution, and
communicated to the JVM, there may be performance benefits to be had.
For example, .class files may be annotated with offline profiling data,
perhaps in the form of Java Annotations.

Total JIT compilation

Another way to speed up bytecode execution is to not use an interpreter at all, and JIT
compile all Java methods to native code immediately when they are first encountered.
The compilation takes place at runtime, inside the JVM, not ahead-of-time.
Unlike completely static ahead-of-time compilation, on the fly compilation fits better
into the Java model with a mobile adaptive language.
Total JIT compilation has the advantage that we do not need to maintain an
interpreter, but the disadvantage is that compile time becomes a factor in the total
runtime. While we definitely see benefits in JIT compiling hot methods, we also
unnecessarily spend expensive compile time on cold methods and methods that
are run only once. Those methods might as well have been interpreted instead.

A frequently executed method is said to be hot. A method that is not
frequently executed and doesn't contribute to the overall program
performance regardless of its implementation is said to be cold.

This can be remedied by implementing different levels of compiler quality in the
JIT compiler, starting out with every method as a quick and dirty version. When
the JVM knows that a method is hot, for example if the number of invocations of
the method reaches a certain threshold value, it can be queued for recompilation
with more optimizations applied. This naturally takes longer.

[ 27 ]

Adaptive Code Generation

The main disadvantage of total JIT compilation is still low code generation speed.
In the same way that an interpreted method executes hundreds of times slower than
a native one, a native method that has to be generated from Java bytecodes takes
hundreds of times longer to get ready for execution than an interpreted method.
When using total JIT compilation, it is extremely important to spend clock cycles on
optimizing code only where it will pay off in better execution time. The mechanism
that detects hot methods has to be very advanced, indeed. Even a quick and dirty JIT
compiler is still significantly slower at getting code ready for execution than a pure
interpreter. The interpreter never needs to translate bytecodes into anything else.
Another issue that becomes more important with total JIT compilation is the large
amounts of throwaway code that is produced. If a method is regenerated, for
example since assumptions made by the compiler are no longer valid, the old code
takes up precious memory. The same is true for a method that has been optimized.
Therefore, the JVM requires some kind of "garbage collection" for generated code
or a system with large amounts of JIT compilation would slowly run out of native
memory as code buffers grow.
JRockit is an example of a JVM that uses an advanced variant of total JIT compilation
as its code generation strategy.

Mixed mode interpretation

The first workable solution that was proposed, that would both increase
execution speed and not compromise the dynamic nature of Java, was mixed
mode interpretation.
In a JVM using mixed mode interpretation, all methods start out as interpreted when
they are first encountered. However, when a method is found to be hot, it is scheduled
for JIT compilation and turned into more efficient native code. This adaptive approach
is similar to that of keeping different code quality levels in the JIT, described in the
previous section.
Detecting hot methods is a fundamental functionality of every modern JVM,
regardless of code execution model, and it will be covered to a greater extent later
in this chapter. Early mixed mode interpreters typically detected the hotness of a
method by counting the number of times it was invoked. If this number was large
enough, optimizing JIT compilation would be triggered for the method.
Similar to total JIT compilation, if the process of determining if a method is hot is
good enough, the JVM spends compilation time only on the methods where it makes
the most difference. If a method is seldom executed, the JVM would waste no time
turning it into native code, but rather keep interpreting it each time that it is called.
[ 28 ]

Chapter 2

Bookkeping JIT code is a simple problem with mixed mode interpretation. If a version
of a compiled method needs to be regenerated or an assumption is invalidated, its code
is thrown out. The next time the method is called, it will once again be interpreted. If
the method is still hot, it will eventually be recompiled with the changed model of the
world incorporated.
Sun Microsystems was the first vendor to embrace mixed mode
interpretation in the HotSpot compiler, available both in a client
version and a server side version, the latter with more advanced code
optimizations. HotSpot in turn, was based on technology acquired
from Longview Technologies LLC (which started out as Animorphic).

Adaptive code generation

Java is dynamic in nature and certain code generation strategies fit less well than
others. From the earlier discussion, the following conclusions can be drawn:
•
•

•

Code generation should be done at runtime, not ahead of time.
All methods cannot be treated equally by code generator. There needs to
be a way to discern a hot method from a cold one. Otherwise unnecessary
optimization effort is spent on cold methods, or worse, not enough
optimization effort on hot methods.
In a JIT compiler, bookkeeping needs to be in place in order to keep
up with the adaptive runtime. This is because generated native code
invalidated by changes to the running program must be thrown away
and potentially regenerated.

Achieving code execution efficiency in an adaptive runtime, no matter what JIT
or interpretation strategy it uses, all boils down to the equation:
Total Execution Time = Code Generation Time + Execution Time
In other words, if we spend lots of effort carefully generating and optimizing every
method to make sure it turns into efficient native code, we contribute too much code
generation time to the total execution time. We want the JVM to execute our Java
code in every available clock cycle, not use the expensive cycles to garbage collect
or generate code.
If we spend too little time preparing methods for execution, their runtime performance
is likely to be bad and thus contribute too many "inefficient" cycles to the total
execution time.
The JVM needs to know precisely which methods are worth the extra time spent
on more elaborate code generation and optimization efforts.
[ 29 ]

Adaptive Code Generation

There are, of course, other aspects of total execution time, such as time spent in
garbage collection. This, however, is beyond the scope of this chapter and will be
covered in more detail in the chapter on memory management. Here it is sufficient
to mention that the code optimizer sometimes can help reduce garbage collection
overhead by generating efficient code, that is less memory bound. One example
would be by applying escape analysis, which is briefly covered later in
this chapter.

Determining "hotness"

As we have seen, "one size fits all" code generation that interprets every method,
or JIT compiling every method with a high optimization level, is a bad idea in an
adaptive runtime. The former, because although it keeps code generation time
down, execution time goes way up. The latter, because even though execution is fast,
generating the highly optimized code takes up a significant part of the total runtime.
We need to know if a method is hot or not in order to know if we should give it lots
of code generator attention, as we can't treat all methods the same.
Profiling to determine "hotness" can, as was hinted at in the previous sections, be
implemented in several different ways. The common denominator for all ways of
profiling is that a number of samples of where code spends execution time is collected.
These are used by the runtime to make optimization decisions—the more samples
available, the better informed decisions are made. Just a few isolated samples in
different methods won't really tell us much about the execution profile of a program.
Naturally, collecting samples always incurs some overhead in itself, and there is a
tradeoff between having enough samples and the overhead of collecting them.

Invocation counters

One way to sample hot methods is to use invocation counters. An invocation counter
is typically associated with each method and is incremented when the method is
called. This is done either by the bytecode interpreter or in the form of an extra add
instruction compiled into the prologue of the native code version of the method.
Especially in the JIT compiled world, where code execution speed doesn't
disappear into interpretation overhead, invocation counters may incur some
visible runtime overhead, usually in the form of cache misses in the CPU. This
is because a particular location in memory has to be frequently written to by the
add at the start of each method.

[ 30 ]

Chapter 2

Software-based thread sampling

Another, more cache friendly, way to determine hotness is by using thread
sampling. This means periodically examining where in the program Java threads are
currently executing and logging their instruction pointers. Thread sampling requires
no code instrumentation.
Stopping threads, which is normally required in order to extract their contexts is,
however, quite an expensive operation. Thus, getting a large amount of samples
without disrupting anything at all requires a complete JVM-internal thread
implementation, a custom operating system such as in Oracle JRockit Virtual Edition,
or specialized hardware support.

Hardware-based sampling

Certain hardware platforms, such as Intel IA-64, provides hardware instrumentation
mechanisms that may be used by an application. One example is the hardware IP
sample buffer. While generating code for IA-64 is a rather complex business, at least
the hardware architecture allows for collecting a large amount of samples cheaply,
thus facilitating better optimization decisions.
Another benefit of hardware-based sampling is that it may provide other data, not
just instruction pointers, cheaply. For example, hardware profilers may export data
on how often a hardware branch predictor makes an incorrect assumption, or on
how often the CPU caches miss in particular locations. The runtime can use this
information to generate more optimal code. Inverting the condition of the jump
instruction that caused the branch prediction miss and prefetching data ahead
of the instruction that caused the cache miss would solve these issues. Thus, efficient
hardware-based sampling can lay an excellent groundwork for further adaptive
code optimizations in the runtime.

Optimizing a changing program

In assembly code, method calls typically end up as call instructions. Variants of
these exist in all hardware architectures. Depending on the type of call, the format
of the call instruction varies.
In object-oriented languages, virtual method dispatch is usually compiled as indirect
calls (that is the destination has to be read from memory) to addresses in a dispatch
table. This is because a virtual call can have several possible receivers depending on
the class hierarchy. A dispatch table exists for every class and contains the receivers
of its virtual calls. A static method or a virtual method that is known to have only
one implementation can instead be turned into a direct call with a fixed destination.
This is typically much faster to execute.
[ 31 ]

Adaptive Code Generation

In native code, a static call would look something similar to:
call 0x2345670 (a jump to a fixed location)
A virtual call would look something similar to:
mov eax, [esi]

(load type info from receiver in esi)

call [eax+0x4c]

(eax + 0x4c is the dispatch table entry)

As we have to dereference memory twice for the virtual call, it is slower
than just calling a fixed destination address.

Consider a static environment, such as a compiled C++ program. For the code
generator, everything that can be known about the application is known at
compile time. For example, we know that any given virtual method with a single
implementation will never be overridden by another, simply because no other
virtual method exists. New code cannot enter the system, so the overrider also will
never exist. This not only removes the need for the extra bookkeeping required for
throwing out old code, but it also allows for the C++ compiler to generate static calls
to the virtual method.
Now, consider the same virtual method in a Java program. At the moment it exists
only in one version, but Java allows that it can be overridden at any time during
program execution. When the JIT compiler wants to generate a call to this method,
it would prefer that the method remained a single implementation forever. Then,
the previous C++ optimization can be used and the call can be generated as a fast
fixed call instead of a slower virtual dispatch. However, if the method is not declared
final, it can be overridden at any time. It looks like we don't dare use the direct call
at all, even though it is highly unlikely that the method will ever be overridden.
There are several other situations in Java where the world looks good right now
to the compiler, and optimizations can be applied, but if the world changes in the
future, the optimizations would have to be immediately reverted. For compiled Java,
in order to match compiled C++ in speed, there must be a way to do these kinds of
optimizations anyway.
The JVM solves this by "gambling". It bases its code generation decisions on
assumptions that the world will remain unchanged forever, which is usually
the case. If it turns out not to be so, its bookkeeping system triggers callbacks if
any assumption is violated. When this happens, the code containing the original
assumption needs to be regenerated—in our example the static dispatch needs to be
replaced by a virtual one. Having to revert code generated based on an assumption
about a closed world is typically very costly, but if it happens rarely enough, the
benefit of the original assumption will deliver a performance increase anyway.

[ 32 ]

Chapter 2

Some typical assumptions that the JIT compiler and JVM, in general, might bet on are:
•

A virtual method probably won't be overridden. As it only exists only in one
version, it can always be called with a fixed destination address like a
static method.

•

A float will probably never be NaN. We can use hardware instructions
instead of an expensive call to the native floating point library that is
required for corner cases.

•

The program probably won't throw an exception in a particular try
block. Schedule the catch clause as cold code and give it less attention
from the optimizer.

•

The hardware instruction fsin probably has the right precision for most
trigonometry. If it doesn't, cause an exception and call the native floating
point library instead.

•

A lock probably won't be too saturated and can start out as a fast spinlock.

•

A lock will probably be repeatedly taken and released by the same thread,
so the unlock operation and future reacquisitions of the lock can optimistically
be treated as no-ops.

A static environment that was compiled ahead of time and runs in a closed world
can not, in general, make these kinds of assumptions. An adaptive runtime, however,
can revert its illegal decisions if the criteria they were based on are violated. In
theory, it can make any crazy assumption that might pay off, as long as it can be
reverted with small enough cost. Thus, an adaptive runtime is potentially far more
powerful than a static environment given that the "gambling" pays off.
Getting the gambling right is a very difficult problem. If we assume that relatively
rare events will occur frequently, in order to avoid regenerating code, we can
never achieve anything near the performance of a static compiler. However, if very
frequent events are assumed to be rare, we will instead have to pay the penalty in
increased code generation time for reoptimizations or invalidations. There is a fine
area of middle ground here of what kinds of assumptions can be made. There is a
significant art to finding this middle ground, and this is where a high performance
runtime can make its impact. Given that we find this area—and JRockit is based on
runtime information feedback in all relevant areas to make the best decisions—an
adaptive runtime has the potential to outperform a static environment every time.

[ 33 ]

Adaptive Code Generation

Inside the JIT compiler

While it is one thing to compile bytecodes to native code and have it executed within
the JVM, getting it to run as efficiently as possible is a different story. This is where
40 years of research into compilers is useful, along with some insight into the Java
language. This section discusses how a JIT compiler can turn bytecode into efficient
native code.

Working with bytecode

A compiler for a programming language typically starts out with source code,
such as C++. A Java JIT compiler, in a JVM, is different in the way that it has to
start out with Java bytecode, parts of which are quite low level and assembly-like.
The JIT compiler frontend, similar to a C++ compiler frontend, can be reused on
all architectures, as it's all about tokenizing and understanding a format that is
platform-independent—bytecode.
While compiled bytecode may sound low level, it is still a well-defined format
that keeps its code (operations) and data (operands and constant pool entries)
strictly separated from each other. Parsing bytecode and turning it into a program
description for the compiler frontend actually has a lot more in common with
compiling Java or C++ source code, than trying to deconstruct a binary executable.
Thus, it is easier to think of bytecode as just a different form of source code—a
structured program description. The bytecode format adds no serious complexities
to the compiler frontend compared to source code. In some ways, bytecode helps
the compiler by being unambiguous. Types of variables, for instance, can always
be easily inferred by the kind of bytecode instruction that operates on a variable.
However, bytecode also makes things more complex for the compiler writer.
Compiling byte code to native code is, somewhat surprisingly, in some ways
harder than compiling human-readable source code.
One of the problems that has to be solved is the evaluation stack metaphor that
the Java Virtual Machine specification mandates. As we have seen, most bytecode
operations pop operands from the stack and push results. No native platforms are
stack machines, rather they rely on registers for storing intermediate values. Mapping
a language that uses local variables to native registers is straightforward, but mapping
an evaluation stack to registers is slightly more complex. Java also defines plenty of
virtual registers, local variables, but uses an evaluation stack anyway. It is the authors'
opinion that this is less than optimal. One might argue that it is strange that the
virtual stack is there at all, when we have plenty of virtual registers. Why isn't an add
operation implemented simply as "x = y+z" instead of "push y, push z, add, pop x".
Clearly the former is simpler, given that we have an ample supply of registers.
[ 34 ]

Chapter 2

It turns out that as one needs to compile bytecodes to a native code, the stack
metaphor often adds extra complexity. In order to reconstruct an expression, such
as add, the contents of the execution stack must be emulated at any given point in
the program.
Another problem, that in rare cases may be a design advantage, is the ability of Java
bytecodes to express more than Java source code. This sounds like a good thing
when it comes to portability—Java bytecode is a mobile format executable by any
JVM. Wouldn't it make sense to decouple the Java source code from the bytecode
format so that one might write Java compilers for other languages that in turn can
run on Java Virtual Machines? Of course it would, and it was probably argued
that this would further help the spread and adoption of Java early in its design
stage. However, for some reason or other, auto-generated bytecode from foreign
environments is rarely encountered. A small number of products that turn other
languages into Java bytecode exist, but they are rarely used. It seems that when the
need for automatic bytecode generation exists, the industry prefers to convert the
alien source code to Java and then compile the generated Java code. Also, when
auto generated Java code exists, it tends to conform pretty much to the structure
of compiled Java source code.

POWER OF EXPRESSION
JAVA BYTECODE

POWER OF EXPRESSION - JAVA SOURCE CODE

The problem that bytecode can express more than Java has led to the need
for bytecode verification in the JVM, a requirement defined by the Java
Virtual Machine specification. Each JVM implementation needs to check
that the bytecode of an executing program does no apparently malicious
tricks, such as jumping outside the method, overflowing the evaluation
stack, or creating recursive subroutines.

Though bytecode portability and cross compiling several languages to bytecode
is potentially a good thing, it also leads to problems. This is especially because
bytecode allows unstructured control flow. Control flow with gotos to arbitrary
labels is available in bytecode, which is not possible in the Java language. Therefore,
it is possible to generate bytecodes that have no Java source code equivalent.

[ 35 ]

Adaptive Code Generation

Allowing bytecodes that have no Java equivalent can lead to some other problems.
For example, how would a Java source code debugger handle bytecode that cannot
be decompiled into Java?
Consider the following examples:
•

In bytecode, it is conceivable to create a goto that jumps into the body of a
loop from before the loop header (irreducible flow graphs). This construct is
not allowed in Java source code. Irreducible flow graphs are a classic obstacle
for an optimizing compiler.

•

It is possible to create a try block that is its own catch block. This is not
allowed in Java source code.

•

It is possible to take a lock in one method and release it in another method.
This is not allowed in Java source code and will be discussed further in
Chapter 4.

Bytecode obfuscation

The problem of bytecode expressing more than source code is even more complex.
Through the years, various bytecode obfuscators have been sold with promises of
"protecting your Java program from prying eyes". For Java bytecode this is mostly
a futile exercise because, as we have already discussed, there is a strict separation
between code and data. Classic anti-cracking techniques are designed to make it
hard or impossible for adversaries to find a sensitive place in a program. Typically,
this works best in something like a native binary executable, such as an .exe file,
where distinctions between code and data are less clear. The same applies to an
environment that allows self-modifying code. Java bytecode allows none of these.
So, for a human adversary with enough determination, any compiled Java program
is more vulnerable by design.
Bytecode obfuscators use different techniques to protect bytecode. Usually, it boils
down to name mangling or control flow obfuscation.
Name mangling means that the obfuscator goes over all the variable info and field
and method names in a Java program, changing them to short and inexplicable
strings, such as a, a_, and a__ (or even more obscure Unicode strings) instead of
getPassword, setPassword, and decryptPassword. This makes it harder for an
adversary to crack your program, since no clues can be gleaned from method and
field names. Name mangling isn't too much of a problem for the compiler writer,
as no control flow has been changed from the Java source code.

[ 36 ]

Chapter 2

It is more problematic if the bytecode obfuscator deliberately creates unstructured
control flow not allowed in Java source code. This technique is used to prevent
decompilers from reconstructing the original program. Sadly though, obfuscated
control flow usually leads to the compiler having to do extra work, restructuring
the lost control flow information that is needed for optimizations. Sometimes it isn't
possible for the JVM to do a proper job at all, and the result is lowered performance.
Thus control flow obfuscation should be avoided.

Bytecode "optimizers"

Various bytecode "optimizers" are also available in the market. They were especially
popular in the early days of Java, but they are still encountered from time to time.
Bytecode "optimizers" claim performance through restructuring bytecodes into
more "efficient" forms. For example, divisions with powers of two can be replaced
by shifts, or a loop can be inverted, potentially saving a goto instruction.
In modern JVMs, we have failed to find proof that "optimized" bytecodes are
superior to unaltered ones straight out of javac. A modern JVM has a code
generator well capable of doing a fine job optimizing code, and even though
bytecode may look low level, it certainly isn't to the JVM. Any optimization
done already at the bytecode level is likely to be retransformed into something
else several times over on the long journey to native code.
We have never seen a case where a customer has been able to demonstrate a
performance benefit from bytecode optimization. However, we have frequently
run into customer cases where the program behavior isn't the expected one and
varies between VMs because of failed bytecode optimization.
Our advice is to not use bytecode optimizers, ever!

[ 37 ]

Adaptive Code Generation

Abstract syntax trees

As we have seen, Java bytecode has its advantages and disadvantages. The authors
find it helpful just to think of bytecode as serialized source code, and not as some
low level assembler that needs to run as fast as possible. In an interpreter, bytecode
performance matters, but not to a great extent as the interpretation process is so
slow anyway. Performance comes later in the code pipeline.
While bytecode is both compact and extremely portable, it suffers from
the strength of expression problem. It contains low-level constructs such
as gotos and conditional jumps, and even the dreaded jsr (jump to
subroutine, used for implementing finally clauses) instruction. As of
Java 1.6, however, subroutines are inlined instead by javac and most
other Java compilers.

A bytecode to native compiler can't simply assume that the given bytecode is
compiled Java source code, but needs to cover all eventualities. A compiler whose
frontend reads source code (be it Java, C++, or anything else) usually works by
first tokenizing the source code into known constructs and building an Abstract
Syntax Tree (AST). Clean ASTs are possible only if control flow is structured and
no arbitrary goto instructions exist, which is the case for Java source code. The AST
represents code as sequences, expressions, or iterations (loop nodes). Doing an
in-order traversal of the tree reconstructs the program. The AST representation
has many advantages compared to bytecode.
For example, consider the following method that computes the sum of the elements
in an array:
public int add(int [] series) {
int sum = 0;
for (int i = 0; i < series.length; i++) {
sum += series[i];
}
return sum;
}

[ 38 ]

Chapter 2

When turning it to bytecode, the javac compiler most likely creates an abstract
syntax tree that looks something like this:
SEQUENCE

body
=

sum

LOOP

RETURN

0

sum

=
init

i

0

cond

<

i

=

length

sum

+

series
sum

aload

series
iter

i

=

i

+

i

1

Several important prerequisites for code optimization, such as identifying loop
invariants and loop bodies require expensive analysis in a control flow graph.
Here, this comes very cheap, as loops are given implicitly by the representation.

[ 39 ]

Adaptive Code Generation

However, in order to create the bytecode, the structured for loop, probably already
represented as a loop node in the Java compiler's abstract syntax tree, needs to be
broken up into conditional and unconditional jumps.
public int add(int[]);
Code:
0: iconst_0
1: istore_2
2: iconst_0
3: istore_3
4: iload_3
5: aload_1
6: arraylength
7: if_icmpge 22
10: iload_2
11: aload_1
12: iload_3
13: iaload
14: iadd
15: istore_2
16: iinc
3, 1
19: goto
4
22: iload_2
23: ireturn

//sum=0
//i=0
//loop_header:

//if (i>=series.length) then goto 22

//sum += series[i]
//i++
//goto loop_header
//return sum

Now, without structured control flow information, the bytecode compiler has to
spend expensive CPU cycles restructuring control flow information that has been
lost, sometimes irretrievably.
Perhaps, in retrospect, it would have been a better design rationale to directly use
an encoded version of the compiler's ASTs as bytecode format. Various academic
papers have shown that ASTs are possible to represent in an equally compact or
more compact way than Java bytecode, so space is not a problem. Interpreting an
AST at runtime would also only be slightly more difficult than interpreting bytecode.
The earliest versions of the JRockit JIT used a decompiling frontend.
Starting from byte code, it tried to recreate the ASTs present when
javac turned source code into bytecode. If unsuccessful, the
decompiler fell back to more naive JIT compilation. Reconstructing
ASTs, however, turned out to be a very complex problem and the
decompiler was scrapped in the early 2000s, to be replaced by a unified
frontend that created control flow graphs, able to support arbitrary
control flow directly from bytecode.

[ 40 ]

Chapter 2

Where to optimize

Programmers tend to optimize their Java programs prematurely. This is completely
understandable. How can you trust the black box, the JVM below your application,
to do something optimal with such a high-level construct as Java source code?
Of course this is partially true, but even though the JVM can't fully create an
understanding of what your program does, it can still do a lot with what it gets.
It is sometimes surprising how much faster a program runs after automatic adaptive
optimization, simply because the JVM is better at detecting patterns in a very large
runtime environment than a human. On the other hand, some things lend themselves
better to manual optimization. This book is in no way expressing the viewpoint that all
code optimizations should be left to the JVM; however, as we have explained, explicit
optimization on the bytecode level is probably a good thing to avoid.
There are plenty of opportunities for writing efficient programs in Java and
situations where an adaptive runtime can't help. For example, a JVM can never turn
a quadratic algorithm into a linear one, replacing your BubbleSort with a QuickSort.
A JVM can never invent its own object cache where you should have written one
yourself. These are the kinds of cases that matter in Java code. The JVM isn't magical.
Adaptive optimization can never substitute bad algorithms with good ones. At most,
it can make the bad ones run a little bit faster.
However, the JVM can easily deal with many constructs in standard object-oriented
code. The programmer will probably gain very little by avoiding the declaration of
an extra variable or by copying and pasting field loads and stores all over the place
instead of calling a simple getter or setter. These are examples of micro-optimizations
that make the Java code harder to read and that don't make the JIT compiled code
execute any faster.

[ 41 ]

Adaptive Code Generation

Sometimes, Java source code level optimizations are downright
destructive. All too often people come up with hard-to-read Java
code that they claim is optimal because some micro benchmark
(where only the interpreter was active and not the optimizing JIT)
told them so. An example from the real world is a server application
where the programmer did a lot of iterations over elements in
arrays. Believing it important to avoid using a loop condition, he
surrounded the for loops with a try block and catch clause for the
ArrayIndexOutOfBoundsException, that was thrown when the
program tried to read outside the array. Not only was the source very
hard to read, but once the runtime had optimized the method, it was
also significantly slower than a standard loop would have been. This is
because exceptions are very expensive operations and are assumed to
be just that—exceptions. The "gambling" behavior of the JVM, thinking
that exceptions are rare, became a bad fit.

It is all too easy to misunderstand what you are measuring when you are looking
for a performance bottleneck. Not every problem can be stripped down into a small
self contained benchmark. Not every benchmark accurately reflects the problem
that is being examined. Chapter 5 will go into extensive detail on benchmarking and
how to know what to look for in Java performance. The second part of this book will
cover the various components of the JRockit Mission Control Suite, which is an ideal
toolbox for performance analysis.

The JRockit code pipeline

Given that the frontend of the JIT compiler is finished with the bytecode, having
turned it into some other form that is easier to process, what happens next? Typically,
the code goes through several levels of transformations and optimizations, each level
becoming increasingly platform-dependent. The final level of code is native code
for a particular platform. The native code is emitted into a code buffer and executed
whenever the function it represents is called.
Naturally, it makes sense to keep the JIT compiler portable as far as possible. So,
most optimizations are usually done when the intermediate code format is still
platform-independent. This makes it easier to port the JIT compiler to different
architectures. However, low-level, platform-specific optimizations must naturally
be implemented as well to achieve industrial strength performance.
This section describes how the JIT compiler gets from bytecode to native code and
the stages involved. We concentrate on the JRockit JIT compiler, but in general terms
the process of generating native code is similar between JVMs.
[ 42 ]

Chapter 2

Why JRockit has no bytecode interpreter
JRockit uses the code generation strategy total JIT compilation.

When the JRockit project started in 1998, the JVM architects realized early on that
pure server-side Java was a niche so far unexploited, and JRockit was originally
designed to be only a server-side JVM. Most server-side applications, it was argued,
stay running for a long time and can afford to take some time reaching a steady state.
Thus, for a server-side-only JVM, it was decided that code generation time was a
smaller problem than efficient execution. This saved us the trouble of implementing
both a JIT compiler and a bytecode interpreter as well as handling the state
transitions between them.
It was quickly noted that compiling every method contributed to additional startup
time. This was initially not considered to be a major issue. Server-side applications,
once they are up and running, stay running for a long time.
Later, as JRockit became a major mainstream JVM, known for its performance,
the need to diversify the code pipeline into client and server parts was recognized.
No interpreter was added, however. Rather the JIT was modified to differentiate even
further between cold and hot code, enabling faster "sloppy" code generation the first
time a method was encountered. This greatly improved startup time to a satisfying
degree, but of course, getting to pure interpreter speeds with a compile-only approach
is still very hard.
Another aspect that makes life easier with an interpreter is debuggability. Bytecode
contains meta information about things like variable names and line numbers. These
are needed by the debugger. In order to support debuggability, the JRockit JIT had to
propagate this kind of information all the way from per-bytecode basis to per-native
instruction basis. Once that bookkeeping problem was solved, there was little reason
to add an interpreter. This has the added benefit that, to our knowledge, JRockit is the
only virtual machine that lets the user debug optimized code.
The main problems with the compile-only strategy in JRockit are the code bloat
(solved by garbage collecting code buffers with methods no longer in use) and
compilation time for large methods (solved by having a sloppy mode for the JIT).

[ 43 ]

Adaptive Code Generation

The compile-only strategy is sometimes less scalable than it should
be. For example, sometimes, JRockit will use a lot of time generating a
relatively large method, the typical example being a JSP. Once finished,
however, the response time for accessing that JSP will be better than
that of an interpreted version.
If you run into problems with code generation time using JRockit,
these can often be worked around. More on this will be covered in the
Controlling code generation in JRockit section at the end of this chapter.

Bootstrapping

The "brain" of the JRockit JVM is the runtime system itself. It keeps track of what
goes on in the world that comprises the virtual execution environment. The
runtime system is aware of which Java classes and methods make up the "world"
and requests that the code generator compiles them at appropriate times with
appropriate levels of code quality.
To simplify things a bit, the first thing the runtime wants to do when the JVM
is started, is to look up and jump to the main method of a Java program. This is
done through a standard JNI call from the native JVM, just like any other native
application would use JNI to call Java code.
Searching for main triggers a complex chain of actions and dependencies. A lot of
other Java methods required for bootstrapping and fundamental JVM behavior need
to be generated in order to resolve the main function. When finally main is ready and
compiled to native code, the JVM can execute its first native-to-Java stub and pass
control from the JVM to the Java program.
To study the bootstrap behavior of JRockit, try running a simple Java
program with the command-line switch –Xverbose:codegen. It may
seem shocking that running a simple "Hello World" program involves JIT
compiling around 1,000 methods. This, however, takes very little time. On
a modern Intel Core2 machine, the total code generation time is less than
250 milliseconds.

Runtime code generation

Total JIT compilation needs to be a lazy process. If any method or class referenced from
another method would be fully generated depth first at referral time, there would be
significant code generation overhead. Also, just because a class is referenced from the
code doesn't mean that every method of the class has to be compiled right away or
even that any of its methods will ever be executed. Control flow through the Java
[ 44 ]

Chapter 2

program might take a different path. This problem obviously doesn't exist in a mixed
mode solution, in which everything starts out as interpreted bytecode with no need
to compile ahead of execution.

Trampolines

JRockit solves this problem by generating stub code for newly referred but not yet
generated methods. These stubs are called trampolines, and basically consist of a few
lines of native code pretending to be the final version of the method. When the method
is first called, and control jumps to the trampoline, all it does is execute a call that
tells JRockit that the real method needs to be generated. The code generator fulfils the
request and returns the starting address of the real method, to which the trampoline
then dispatches control. To the user it looks like the Java method was called directly,
when in fact it was generated just at the first time it was actually called.
0x1000: method A
call method B @ 0x2000

0x3000: method C
call method B @ 0x2000

0x2000: method B (trampoline)
call JVM.Generate(B) -> start
write trap @ 0x2000
goto start @ 0x4000

0x4000: The "real" method B
...

Consider the previous example. method A, whose generated code resides at
address 0x1000 is executing a call to method B, that it believes is placed at address
0x2000. This is the first call to method B ever. Consequently, all that is at address
0x2000 is a trampoline. The first thing the trampoline does is to issue a native call
to the JVM, telling it to generate the real method B. Execution then halts until this
code generation request has been fulfilled, and a starting address for method B is
returned, let's say 0x4000. The trampoline then dispatches control to method B by
jumping to that address.
Note that there may be several calls to method B in the code already, also pointing
to the trampoline address 0x2000. Consider, for example, the call in method C
that hasn't been executed yet. These calls need to be updated as well, without
method B being regenerated. JRockit solves this by writing an illegal instruction at
address 0x2000, when the trampoline has run. This way, the system will trap if the
trampoline is called more than once. The JVM has a special exception handler that
catches the trap, and patches the call to the trampoline so that it points to the real
method instead. In this case it means overwriting the call to 0x2000 in method C
with a call to 0x4000. This process is called back patching.

[ 45 ]

Adaptive Code Generation

Back patching is used for all kinds of code replacement in the virtual machine, not
just for method generation. If, for example, a hot method has been regenerated to a
more efficient version, the cold version of the code is fitted with a trap at the start
and back patching takes place in a similar manner, gradually redirecting calls from
the old method to the new one.
Again, note that this is a lazy approach. We don't have time to go over the entire
compiled code base and look for potential jumps to code that has changed since
the caller was generated.
If there are no more references to an older version of a method, its native code buffer
can be scheduled for garbage collection by the run time system so as to unclutter
the memory. This is necessary in a world that uses a total JIT strategy because the
amount of code produced can be quite large.

Code generation requests

In JRockit, code generation requests are passed to the code generator from
the runtime when a method needs to be compiled. The requests can be either
synchronous or asynchronous.
Synchronous code generation requests do one of the following:
•

Quickly generate a method for the JIT, with a specified level of efficiency

•

Generate an optimized method, with a specified level of efficiency

An asynchronous request is:
•

Act upon an invalidated assumption, for example, force regeneration of a
method or patch the native code of a method

Internally, JRockit keeps synchronous code generation requests in a code generation
queue and an optimization queue, depending on request type. The queues are
consumed by one or more code generation and/or optimization threads, depending
on system configuration.
The code generation queue contains generation requests for methods that are needed
for program execution to proceed. These requests, except for special cases during
bootstrapping, are essentially generated by trampolines. The call "generate me" that
each trampoline contains, inserts a request in the code generation queue, and blocks
until the method generation is complete. The return value of the call is the address in
memory where the new method starts, to which the trampoline finally jumps.

[ 46 ]

Chapter 2

Optimization requests

Optimization requests are added to the optimization queue whenever a method is
found to be hot, that is when the runtime system has realized that we are spending
enough time executing the Java code of that method so that optimization is warranted.
The optimization queue understandably runs at a lower priority than the code
generation queue as its work is not necessary for code execution, but just for code
performance. Also, an optimization request usually takes orders of magnitude
longer than a standard code generation request to execute, trading compile time
for efficient code.

On-stack replacement

Once an optimized version of a method is generated, the existing version of the
code for that method needs to be replaced. As previously described, the method
entry point of the existing cold version of the method is overwritten with a trap
instruction. Calls to the old method will be back patched to point to the new,
optimized piece of code.
If the Java program spends a very large amount of time in a method, it will be
flagged as hot and queued for replacement. However, consider the case where the
method contains a loop that executes for a very long time. This method may well
be hotspotted and regenerated, but the old method still keeps executing even if the
method entry to the old method is fitted with a trap. Obviously, the performance
enhancement that the optimized method contributes will enter the runtime much
later, or never if the loop is infinite.
Some optimizers swap out code on the existing execution stack by replacing the code
of a method with a new version in the middle of its execution. This is referred to as
on-stack replacement and requires extensive bookkeeping. Though this is possible
in a completely JIT-compiled world, it is easier to implement where there is an
interpreter to fall back to.
JRockit doesn't do on-stack replacement, as the complexity required to do so is
deemed too great. Even though the code for a more optimal version of the method
may have been generated, JRockit will continue executing the old version of the
method if it is currently running.
Our research has shown that in the real world, this matters little for achieving
performance. The only places we have encountered performance penalties because
of not doing on-stack replacement is in badly written micro benchmarks, such as
when the main function contains all the computations in a very long loop. Moving the
bulk of the benchmark into a separate function and calling this repeatedly from main
will resolve this problem. We will thoroughly discuss the most important aspects of
benchmarking in Chapter 5.
[ 47 ]

Adaptive Code Generation

Bookkeeping

The code generator in the JVM has to perform a number of necessary bookkeeping
tasks for the runtime system.

Object information for GC

For various reasons, a garbage collector needs to keep track of which registers and
stack frame locations contain Java objects at any given point in the program. This
information is generated by the JIT compiler and is stored in a database in the runtime
system. The JIT compiler is the component responsible for creating this data because
type information is available "for free" while generating code. The compiler has to deal
with types anyway. In JRockit, the object meta info is called livemaps, and a detailed
explanation of how the code generation system works with the garbage collector is
given in Chapter 3, Adaptive Memory Management.

Source code and variable information

Another bookkeeping issue in the compiled world is the challenge of preserving
source code level information all the way down to machine language. The JVM
must always be able to trace program points back from an arbitrary native
instruction to a particular line of Java source code. We need to support proper
stack traces for debugging purposes, even stack traces containing optimized code.
This gets even more complicated as the optimizer may have transformed a method
heavily from its original form. A method may even contain parts of other methods
due to inlining. If an exception occurs anywhere in our highly optimized native
code, the stack trace must still be able to show the line number where this happened.
This is not a difficult problem to solve—bookkeeping just involves some kind of
database, as it is large and complex. JRockit successfully preserves mappings between
most native instructions and the actual line of Java source code that created them.
This, obviously, is much more work in a compiled world than in an interpreted one.
In the Java bytecode format, local variable information and line number information
are mapped to individual bytecodes, but JRockit has to make sure that the mapping
survives all the way down to native code. Each bytecode instruction eventually turns
into zero or more native code instructions that may or may not execute in sequence.

Assumptions made about the generated code

Finally, as we have already discussed, remembering what assumptions or "gambles"
have been made while generating methods is vital in Java. As soon as one of the
assumptions is violated, we need to send asynchronous code regeneration requests
for whatever methods are affected. Thus, an assumption database is another part of
the JRockit runtime that communicates with the code generator.

[ 48 ]

Chapter 2

A walkthrough of method generation in
JRockit

Let us now take a look at what happens on the road from bytecode to native code in
the JRockit JIT compiler. This section describes how a small method is transformed
to native code by the JRockit JIT. Large parts of this process are similar in other JIT
compilers (as well as in other static compilers), and some parts are not. The end
result, native code, is the same.
Let us, consider the following Java method as an example:
public static int md5_F(int x, int y, int z) {
return (x & y) | ((~x) & z);
}

This is part of the well known MD5 hash function and performs bit operations on
three pieces of input.

The JRockit IR format

The first stage of the JRockit code pipeline turns the bytecode into an Intermediate
Representation (IR). As it is conceivable that other languages may be compiled
by the same frontend, and also for convenience, optimizers tend to work with a
common internal intermediate format.
JRockit works with an intermediate format that differs from bytecode, looking
more like classic text book compiler formats. This is the common approach that
most compilers use, but of course the format of IR that a compiler uses always
varies slightly depending on implementation and the language being compiled.
Aside from the previously mentioned portability issue, JRockit also doesn't work
with bytecode internally because of the issues with unstructured control flow and
the execution stack model, which differs from any modern hardware register model.
Because we lack the information to completely reconstruct the ASTs, a method in
JRockit is represented as a directed graph, a control flow graph, whose nodes are
basic blocks. The definition of a basic block is that if one instruction in the basic
block is executed, all other instructions in it will be executed as well. Since there are
no branches in our example, the md5_F function will turn into exactly one basic block.

Data flow

A basic block contains zero to many operations, which in turn have operands.
Operands can be other operations (forming expression trees), variables (virtual
registers or atomic operands), constants, addresses, and so on, depending on
how close to the actual hardware representation the IR is.
[ 49 ]

Adaptive Code Generation

Control flow

Basic blocks can have multiple entries and multiple exits. The edges in the graph
represent control flow. Any control flow construct, such as a simple fallthrough to
the next basic block, a goto, a conditional jump, a switch, or an exception,
produces one or more edges in the graph.
When control enters a method, there is a designated start basic block for the execution.
A basic block with no exits ends the method execution. Typically such a block ends
with a return or throw clause.

A word about exceptions

A small complication is the presence of exceptions, which, if consistent to this model,
should form conditional jumps from every bytecode operation that may fault to an
appropriate catch block, where one is available.
This would quickly turn into a combinatorial explosion of edges in the flow graph
(and consequently of basic blocks), severely handicapping any O(|V||E|) (nodes
x edges) graph traversal algorithm that needs to work on the code. Therefore,
exceptions are treated specially on a per-basic block basis instead.

This figure shows the basic block graph of a slightly larger example. Method entry is
at Block 0 that has three exits—two normal ones, as it contains a conditional branch,
and an exception edge. This means that Block 0 is a try block, whose catch starts
at Block 3. The same try block spans Block 1 and Block 2 as well. The method can
exit either by triggering the exception and ending up in Block 3 or by falling through
to Block 5. Both these blocks end with return instructions. Even though the only
instruction that can trigger an exception is the div in Block 2 (on division by zero),
[ 50 ]

Chapter 2

the try block spans several nodes because this is what the bytecode (and possibly
the source code) looked like. Optimizers may choose to deal with this later.

JIT compilation

This following figure illustrates the different stages of the JRockit code pipeline:
BC2HIR

HIR2MIR

MIR2LIR

RegAlloc

EMIT

Generating HIR

The first module in the code generator, BC2HIR, is the frontend against the bytecode
and its purpose is to quickly translate bytecodes into IR. HIR in this case stands for
High-level Intermediate Representation. For the md5_F method, where no control
flow in the form of conditional or unconditional jumps is present, we get just one
basic block.
The following code snippet shows the md5_F method in bytecode form:
public static int md5_F(int, int, int);
Code:
Stack contents:
0: iload_0
v0
1: iload_1
v1
2: iand
(v0&v1)
3: iload_0
(v0&v1), v0
4: iconst_m1 (v0&v1), v0, -1
5: ixor
(v0&v1), (v0^-1)
6: iload_2
(v0&v1), (v0^-1), v2
7: iand
(v0&v1), ((v0^-1) & v2)
8: ior
((v0&v1) | ((v0^-1) & v2))
9: ireturn

Emitted code:

return ((v0&v1) |
((v0^-1) & v2));

The JIT works by computing a control flow graph for the IR by examining where
the jumps in the bytecode are, and then filling its basic blocks with code. Code is
constructed by emulating the contents of the evaluation stack at any given location
in the program. Emulating a bytecode operation results in changes to the evaluation
stack and/or code being generated. The example has been annotated with the
contents of the emulated evaluation stack and the resulting generated code after each
bytecode has been processed
Bit negation (~) is implemented by javac as an xor with -1
(0xffffffff), as bytecode lacks a specific not operator.

[ 51 ]

Adaptive Code Generation

As we can see, by representing the contents of a variable on the evaluation stack with
a variable handle, we can reconstruct the expressions from the original source code.
For example, the iload_0 instruction, which means "push the contents of variable
0" turns into the expression "variable 0" on the emulated stack. In the example, the
emulator gradually forms a more and more complex expression on the stack, and
when it is time to pop it and return it, the expression in its entirety can be used to
form code.
This is the output, the High-level IR, or HIR:
params: v1 v2 v3
block0: [first] [id=0]
10 @9:49

(i32)

return {or {and v1 v2} {and {xor v1 -1} v3}}

In JRockit IR, the annotation @ before each statement identifies its
program point in the code all the way down to assembler level. The
first number following the @ is the bytecode offset of the expression
and the last is the source code line number information. This is part
of the complex meta info framework in JRockit that maps individual
native instructions back to their Java program points.

The variable indexes were assigned by JRockit, and differ from those in the
bytecode. Notice that operations may contain other operations as their operands,
similar to the original Java code. These nested expressions are actually a useful
byproduct of turning the bytecode stack back into expressions. This way we get a
High-level Representation instead of typical "flat" compiler code with temporary
variable assignments, where operations may not contain other operations. The HIR
lends itself well to some optimizations that are harder to do on another format; for
example, discovering if a sub-expression (in the form of a subtree) is present twice
in an expression. Then the sub-expression can be folded into a temporary variable,
saving it from being evaluated twice.
Emulating the bytecode stack to form HIR is not without problems though. Since
at compile time, we only know what expression is on the stack, and not its value,
we run into various problems. One example would be in situations where the stack
is used as memory. Take for example the construct result = x ? a : b. The
bytecode compiles into something like this:
/* bytecode for: "return x ? a : b" */
static int test(boolean x, int a, int b);
0: iload_0
//push x
1: ifeq
8 //if x == false then goto 8
4: iload_1
//push a
[ 52 ]

Chapter 2
5:
8:
9:

goto 9
iload_2
ireturn

//push b
//return pop

When the emulator gets to the ireturn instruction, the value popped can be either
a (local variable 1) or b (local variable 2). Since we can't express "either a or b" as a
variable, we need to replace the loads at offsets 4 and 8 with writes to one and the
same temporary variable, and place that on the stack instead.
The BC2HIR module that turns bytecodes into a control flow graph with expressions
is not computationally complex. However, it contains several other little special
cases, similar to the earlier one, which are beyond the scope of this book. Most of
them have to do with the lack of structure in byte code and with the evaluation stack
metaphor. Another example would be the need to associate monitorenter bytecodes
with their corresponding monitorexit(s), the need for which is explained in great
detail in Chapter 4.

MIR

MIR or Middle-level Intermediate Representation, is the transform domain where
most code optimizations take place. This is because most optimizations work best
with three address code or rather instructions that only contain atomic operands,
not other instructions. Transforming HIR to MIR is simply an in-order traversal
of the expression trees mentioned earlier and the creation of temporary variables.
As no hardware deals with expression trees, it is natural that code turns into
progressively simpler operations on the path through the code pipeline.
Our md5_F example would look something like the following code to the JIT
compiler, when the expression trees have been flattened. Note that no operation
contains other operations anymore. Each operation writes its result to a temporary
variable, which is in turn used by later operations.
params: v1 v2 v3
block0: [first] [id=0]
2 @2:49*

(i32)

and

v1 v2 -> v4

5 @5:49*

(i32)

xor

v1 -1 -> v5

7 @7:49*

(i32)

and

v5 v3 -> v5

8 @8:49*

(i32)

or

v4 v5 -> v4

10 @9:49*

(i32)

return

v4

If the JIT compiler is executing a code generation request from the optimizer, most
optimizations on the way down to native code are carried out on MIR. This will be
discussed later in the chapter.

[ 53 ]

Adaptive Code Generation

LIR
After MIR, it is time to turn platform dependent as we are approaching native code.
LIR, or Low-level IR, looks different depending on hardware architecture.
Consider the Intel x86, where the biggest customer base for JRockit exists. The x86
has legacy operations dating back to the early 1980s. The RISC-like format of the
previous MIR operations is inappropriate. For example, a logical and operation on
the x86 requires the same first source and destination operand. That is why we need
to introduce a number of new temporaries in order to turn the code into something
that fits the x86 model better.
If we were compiling for SPARC, whose native format looks more like the JRockit IR,
fewer transformations would have been needed.
Following is the LIR for the md5_F method on a 32-bit x86 platform:
params: v1 v2 v3
block0: [first] [id=0]
2 @2:49*
(i32)
11 @2:49
(i32)
5 @5:49*
(i32)
12 @5:49
(i32)
7 @7:49*
(i32)
8 @8:49*
(i32)
14 @9:49
(i32)
13 @9:49*
(i32)

x86_and
x86_mov
x86_xor
x86_mov
x86_and
x86_or
x86_mov
x86_ret

v2 v1
v2 ->
v1 -1
v1 ->
v5 v3
v4 v5
v4 ->
eax

-> v2
v4
-> v1
v5
-> v5
-> v4
eax

A couple of platform-independent mov instructions have been inserted to get the
correct x86 semantics. Note that the and, xor, and or operations now have the same
first operand as destination, the way x86 requires. Another interesting thing is that
we already see hard-coded machine registers here. The JRockit calling convention
demands that integers be returned in the register eax, so the register allocator that
is the next step of the code pipeline doesn't really have a choice for a register for the
return value.

Register allocation

There can be any number of virtual registers (variables) in the code, but the physical
platform only has a small number of them. Therefore, the JIT compiler needs to do
register allocation, transforming the virtual variable mappings into machine registers.
If at any given point in the program, we need to use more variables than there are
physical registers in the machine at the same time, the local stack frame has to be used
for temporary storage. This is called spilling, and the register allocator implements
spills by inserting move instructions that shuffle registers back and forth from the
stack. Naturally spill moves incur overhead, so their placement is highly significant
in optimized code.
[ 54 ]

Chapter 2

Register allocation is a very fast process if done sloppily, such as in the first JIT
stage, but computationally intensive if a good job is needed, especially when
there are many variables in use (or live) at the same time. However, because of
the small number of variables, we get an optimal result with little effort in our
example method. Several of the temporary mov instructions have been coalesced
and removed.
Our md5_F method needs no spills, as x86 has seven available registers (15 on the
64-bit platforms), and we use only three.
params: ecx eax edx
block0: [first] [id=0]
2 @2:49*
(i32)
5 @5:49*
(i32)
7 @7:49*
(i32)
8 @8:49*
(i32)
13 @9:49*
(void)

x86_and
x86_xor
x86_and
x86_or
x86_ret

eax
ecx
ecx
eax
eax

ecx -> eax
-1 -> ecx
edx -> ecx
ecx -> eax

Every instruction in our register allocated LIR has a native instruction equivalent
on the platform that we are generating code for.
Just to put spill code in to perspective, following is a slightly longer example. The
main method of the Spill program does eight field loads to eight variables that
are then used at the same time (for multiplying them together).
public class
static int
static int
static int

Spill {
aField, bField, cField, dField;
eField, fField, gField, hField;
answer;

public static void main(String args[]) {
int a = aField;
int b = bField;
int c = cField;
int d = dField;
int e = eField;
int f = fField;
int g = gField;
int h = hField;
answer = a*b*c*d*e*f*g*h;
}
}

[ 55 ]

Adaptive Code Generation

We will examine the native code for this method on a 32-bit x86 platform. As
32-bit x86 has only seven available registers, one of the intermediate values has
to be spilled to the stack. The resulting register allocated LIR code is shown in the
following code snippet:
Assembly or LIR instructions that dereference memory typically annotate
their pointers as a value or variable within square brackets. For example,
[esp+8] dereferences the memory eight bytes above the stack pointer
(esp) on x86 architectures.
block0: [first] [id=0]
68

(i32)

x86_push ebx

//store callee save reg

69

(i32)

x86_push ebp

//store callee save reg

70

(i32)

x86_sub

esp 4 -> esp

//alloc stack for 1 spill

43 @0:7*

(i32)

x86_mov

[0xf56bd7f8] -> esi

//*aField->esi (a)

44 @4:8*

(i32)

x86_mov

[0xf56bd7fc] -> edx

//*bField->edx (b)

67 @4:8

(i32)

x86_mov

edx -> [esp+0x0]

//spill b to stack

45 @8:9*

(i32)

x86_mov

[0xf56bd800] -> edi

//*cField->edi (c)

46 @12:10* (i32)

x86_mov

[0xf56bd804] -> ecx

//*dField->ecx (d)

47 @17:11* (i32)

x86_mov

[0xf56bd808] -> edx

//*eField->edx (e)

48 @22:12* (i32)

x86_mov

[0xf56bd80c] -> eax

//*fField->eax (f)

49 @27:13* (i32)

x86_mov

[0xf56bd810] -> ebx

//*gField->ebx (g)

50 @32:14* (i32)

x86_mov

[0xf56bd814] -> ebp

//*hField->ebp (h)

26 @39:16

(i32)

x86_imul esi [esp+0x0] -> esi //a *= b

28 @41:16

(i32)

x86_imul esi edi -> esi

//a *= c

30 @44:16

(i32)

x86_imul esi ecx -> esi

//a *= d

32 @47:16

(i32)

x86_imul esi edx -> esi

//a *= e

34 @50:16

(i32)

x86_imul esi eax -> esi

//a *= f

36 @53:16

(i32)

x86_imul esi ebx -> esi

//a *= g

38 @56:16

(i32)

x86_imul esi ebp -> esi

//a *= h

65 @57:16* (i32)

x86_mov

esi -> [0xf56bd818]

//*answer = a

71 @60:18* (i32)

x86_add

esp, 4 -> esp

//free stack slot

72 @60:18

(i32)

x86_pop

-> ebp

//restore used callee save

73 @60:18

(i32)

x86_pop

-> ebx

//restore used callee save

66 @60:18

(void) x86_ret

//return

[ 56 ]

Chapter 2

We can also note that the register allocator has added an epilogue and prologue to
the method in which stack manipulation takes place. This is because it has figured
out that one stack position will be required for the spilled variable and that it also
needs to use two callee-save registers for storage. A register being callee-save means
that a called method has to preserve the contents of the register for the caller. If the
method needs to overwrite callee-save registers, they have to be stored on the local
stack frame and restored just before the method returns. By JRockit convention on
x86, callee-save registers for Java code are ebx and ebp. Any calling convention
typically includes a few callee-save registers since if every register was potentially
destroyed over a call, the end result would be even more spill code.

Native code emission

After register allocation, every operation in the IR maps one-to-one to a native
operation in x86 machine language and we can send the IR to the code emitter. The last
thing that the JIT compiler does to the register allocated LIR is to add mov instructions
for parameter marshalling (in this case moving values from in-parameters as defined
by the calling convention to positions that the register allocator has picked). Even
though the register allocator thought it appropriate to put the first parameter in ecx,
compilers work internally with a predefined calling convention. JRockit passes the first
parameter in eax instead, requiring a shuffle mov. In the example, the JRockit calling
convention passes parameters x in eax, y in edx, and z in esi respectively.

Assembly code displayed in figures generated by code dumps from
JRockit use Intel style syntax on the x86, with the destination as the first
operand, for example "and ebx, eax" means "ebx = ebx & eax".

Following is the resulting native code in a code buffer:
[method is md5_F(III)I [02DB2FF0 - 02DB3002]]
02DB2FF0:

mov

ecx,eax

02DB2FF2:

mov

eax,edx

02DB2FF4:

and

eax,ecx

02DB2FF6:

xor

ecx,0xffffffff

02DB2FF9:

and

ecx,esi

02DB2FFC:

or

eax,ecx

02DB2FFF:

ret

[ 57 ]

Adaptive Code Generation

Generating optimized code

Regenerating an optimized version of a method found to be hot is not too dissimilar
to normal JIT compilation. The optimizing JIT compiler basically piggybacks on the
original code pipeline, using it as a "spine" for the code generation process, but at each
stage, an optimization module is plugged into the JIT.
BC2HIR

Optimize HIR

HIR2MIR

RegAlloc

MIR2LIR

Optimize MIR

Optimize LIR

Graph Fusion
Based Regalloc

EMIT

Optimize Native
Code

A general overview

Different optimizations are suitable for different levels of IR. For example, HIR
lends itself well to value numbering in expression trees, substituting two equivalent
subtrees of an expression with one subtree and a temporary variable assignment.
MIR readily transforms into Single Static Assignment (SSA) form, a transform
domain that makes sure that any variable has only one definition. SSA transformation
is part of virtually every commercial compiler today and makes implementing many
code optimizations much easier. Another added benefit is that code optimizations in
SSA form can be potentially more powerful.
if (x1> 0)

if (x > 0)

result = 1

result =0

result1 = 1

result2 =0

result3 = 0 (result1,result2 )
return result3

return result

The previous flow graph shows what happens before and after transformation to
SSA form. The result variable that is returned by the program is assigned either
1 or 0 depending on the value of x and the branch destination. Since SSA form
allows only one assignment of each variable, the result variable has been split into
three different variables. At the return statement, result can either be result1
or result2. To express this "either" semantic, a special join operator, denoted
by the Greek letter phi (Φ), is used. Trivially, no hardware platform can express
this ambiguity, so the code has to be transformed back to normal form before
emission. The reverse transform basically replaces each join operator with preceding
assignments, one per flow path, to the destination of the join instruction.
[ 58 ]

Chapter 2

Many classic code optimizations such as constant propagation and copy propagation
have their own faster SSA form equivalents. This mostly has to do with the fact that
given any use of a variable, it is unambiguous where in the code that variable is
defined. There is plenty of literature on the subject and a thorough discussion on
all the benefits of SSA form is beyond the scope of this book.
LIR is platform-dependent and initially not register allocated, so transformations
that form more efficient native operation sequences can be performed here.
An example would be replacing dumb copy loops with specialized Intel SSE4
instructions for faster array copies on the x86.

When generating optimized code, register allocation tends to be very
important. Any compiler textbook will tell you that optimal register
allocation basically is the same problem as graph coloring. This is
because if two variables are in use at the same time, they obviously
cannot share the same register. Variables in use at the same time can
be represented as connected nodes in a graph. The problem of register
allocation can then be reduced to assigning colors to the nodes in the
graph, so that no connected nodes have the same color. The amount of
colors available is the same as the number of registers on the platform.
Sadly enough, in computational complexity terms, graph coloring is
NP-hard. This means that no efficient (polynomial time) algorithm
exists that can solve the problem. However, graph coloring can be
approximated in quadratic time. Most compilers contain some variant
of the graph coloring algorithm for register allocation.

The JRockit optimizer contains a very advanced register allocator that is based
on a technique called graph fusion, that extends the standard graph coloring
approximation algorithm to work on subregions in the IR. Graph fusion has the
attractive property that the edges in the flow graph, processed early, generate fewer
spills than the edges processed later. Therefore, if we can pick hot subregions before
cold ones, the resulting code will be more optimal. Additional penalty comes from
the need to insert shuffle code when fusing regions in order to form a complete
method. Shuffle code consists of sequences of move instructions to copy the
contents of one local register allocation into another one.
Finally, just before code emission, various peephole optimizations can be applied to
the native code, replacing one to several register allocated instructions in sequence
with more optimal ones.

[ 59 ]

Adaptive Code Generation

Clearing a register is usually done by XORing the register with itself.
Replacing instructions such as mov eax, 0 with xor eax, eax,
which is potentially faster, is an example of a peephole optimization that
works on exactly one instruction. Another example would be turning a
multiplication with the power of two followed by an add instruction into
a simple lea instruction on x86, optimized to do both.

How does the optimizer work?

A complete walkthrough of the JRockit code pipeline with the algorithms and
optimizations within would be the subject for an entire book of its own. This
section merely tries to highlight some of the things that a JVM can do with code,
given adequate runtime feedback.
Generating optimized code for a method in JRockit generally takes 10 to 100 times
as long as JITing it with no demands for execution speed. Therefore, it is important
to only optimize frequently executed methods.
Grossly oversimplifying things, the bulk of the optimizer modules plugged into the
code pipeline work like this:
do {
1) get rid of calls, exposing more control flow
through aggressive inlining.
2) apply optimizations on enlarged code mass, try to shrink it.
} while ("enough time left" && "code not growing too fast");

Java is an object-oriented language and contains a lot of getters, setters, and other
small "nuisance" calls. The compiler has to presume that calls do very complex things
and have side effects unless it knows what's inside them. So, for simplification, small
methods are frequently inlined, replacing the call with the code of the called function.
JRockit tries to aggressively inline everything that seems remotely interesting on
hot execution paths, with reasonable prioritization of candidates using sample and
profiling information.
In a statically compiled environment, too aggressive inlining would be total
overkill, and too large methods would cause instruction cache penalties and slow
down execution. In a runtime, however, we can hope to have good enough sample
information to make more realistic guesses about what needs to be inlined.
After bringing in whatever good inlining candidates we can find into the method, the
JIT applies optimizations to the, now usually quite large, code mass, trying to shrink
it. For example, this is done by folding constants, eliminating expressions based on
escape analysis, and applying plenty of other simplifying transforms. Dead code
[ 60 ]

Chapter 2

that is proved never to be executed is removed. Multiple loads and stores that access
the same memory location can, under certain conditions, be eliminated, and so on.
Surprisingly enough, the size of the total code mass after inlining and then optimizing
the inlined code is often less than the original code mass of a method before anything
was inlined.
The runtime system can perform relatively large simplifications, given relatively
little input. Consider the following program that implements the representation
of a circle by its radius and allows for area computation:
public class Circle {
private double radius;
public Circle(int radius) {
this.radius = radius;
}
public double getArea() {
return 3.1415 * radius * radius;
}
public static double getAreaFromRadius(int radius) {
Circle c = new Circle(radius);
return c.getArea();
}
static
static
static
static

int areas[] = new int[0x10000];
int radii[] = new int[0x10000];
java.util.Random r = new java.util.Random();
int MAX_ITERATIONS = 1000;

public static void gen() {
for (int i = 0; i < areas.length; i++) {
areas[i] = (int)getAreaFromRadius(radii[i]);
}
}
public static void main(String args[]) {
for (int i = 0; i < radii.length; i++) {
radii[i] = r.nextInt();
}
for (int i = 0; i < MAX_ITERATIONS; i++) {
gen(); //avoid on stack replacement problems
}
}
}

[ 61 ]

Adaptive Code Generation

Running the previous program with JRockit with the command-line
flag –Xverbose:opt,gc, to make JRockit dump all garbage collection
and code optimization events, produces the following output:
hastur:material marcus$ java –Xverbose:opt,gc Circle
[INFO ][memory ] [YC#1] 0.584-0.587: YC 33280KB->8962KB (65536KB),
0.003 s, sum of pauses 2.546 ms, longest pause 2.546 ms
[INFO ][memory ] [YC#2] 0.665-0.666: YC 33536KB->9026KB (65536KB),
0.001 s, sum of pauses 0.533 ms, longest pause 0.533 ms
[INFO ][memory ] [YC#3] 0.743-0.743: YC 33600KB->9026KB (65536KB),
0.001 s, sum of pauses 0.462 ms, longest pause 0.462 ms
[INFO ][memory ] [YC#4] 0.821-0.821: YC 33600KB->9026KB (65536KB),
0.001 s, sum of pauses 0.462 ms, longest pause 0.462 ms
[INFO ][memory ] [YC#5] 0.898-0.899: YC 33600KB->9026KB (65536KB),
0.001 s, sum of pauses 0.463 ms, longest pause 0.463 ms
[INFO ][memory ] [YC#6] 0.975-0.976: YC 33600KB->9026KB (65536KB),
0.001 s, sum of pauses 0.448 ms, longest pause 0.448 ms
[INFO ][memory ] [YC#7] 1.055-1.055: YC 33600KB->9026KB (65536KB),
0.001 s, sum of pauses 0.461 ms, longest pause 0.461 ms
[INFO ][memory ] [YC#8] 1.132-1.133: YC 33600KB->9026KB (65536KB),
0.001 s, sum of pauses 0.448 ms, longest pause 0.448 ms
[INFO ][memory ] [YC#9] 1.210-1.210: YC 33600KB->9026KB (65536KB),
0.001 s, sum of pauses 0.480 ms, longest pause 0.480 ms
[INFO ][opt
][00020] #1 (Opt)
jrockit/vm/Allocator.allocObjectOrArray(IIIZ)Ljava/lang/Object;
[INFO ][opt
][00020] #1 1.575-1.581 0x9e04c000-0x9e04c1ad 5
.72 ms 192KB 49274 bc/s (5.72 ms 49274 bc/s)
[INFO ][memory ] [YC#10] 1.607-1.608: YC 33600KB->9090KB
(65536KB), 0.001 s, sum of pauses 0.650 ms, longest pause 0.650 ms
[INFO ][memory ] [YC#11] 1.671-1.672: YC 33664KB->9090KB (65536KB),
0.001 s, sum of pauses 0.453 ms, longest pause 0.453 ms.
[INFO ][opt
][00020] #2 (Opt)
jrockit/vm/Allocator.allocObject(I)Ljava/lang/Object;
[INFO ][opt
][00020] #2 1.685-1.689 0x9e04c1c0-0x9e04c30d 3
.88 ms 192KB 83078 bc/s (9.60 ms 62923 bc/s)
[INFO ][memory ] [YC#12] 1.733-1.734: YC 33664KB->9090KB
(65536KB), 0.001 s, sum of pauses 0.459 ms, longest pause 0.459 ms.
[INFO ][opt
][00020] #3 (Opt) Circle.gen()V
[INFO ][opt
][00020] #3 1.741-1.743 0x9e04c320-0x9e04c3f2 2
.43 ms 128KB 44937 bc/s (12.02 ms 59295 bc/s)
[INFO ][opt
][00020] #4 (Opt) Circle.main([Ljava/lang/String;)V
[INFO ][opt
][00020] #4 1.818-1.829 0x9e04c400-0x9e04c7af 11
.04 ms 384KB 27364 bc/s (23.06 ms 44013 bc/s)
hastur:material marcus$

No more output is produced until the program is finished.
[ 62 ]

Chapter 2

The various log formats for the code generator will be discussed in more detail at the
end of this chapter. Log formats for the memory manager are covered in Chapter 3.
It can be noticed here, except for optimization being performed on the four hottest
methods in the code (where two are JRockit internal), that the garbage collections
stop after the optimizations have completed. This is because the optimizer was able
to prove that the Circle objects created in the getAreaFromRadius method aren't
escaping from the scope of the method. They are only used for an area calculation.
Once the call to c.getArea is inlined, it becomes clear that the entire lifecycle of the
Circle objects is spent in the getAreaFromRadius method. A Circle object just
contains a radius field, a single double, and is thus easily represented just as that
double if we know it has a limited lifespan. An allocation that caused significant
garbage collection overhead was removed by intelligent optimization.
Naturally, this is a fairly trivial example, and the optimization issue is easy for the
programmer to avoid by not instantiating a Circle every time the area method is
called in the first place. However, if properly implemented, adaptive optimizations
scale well to large object-oriented applications.
The runtime is always better than the programmer at detecting certain
patterns. It is often surprising what optimization opportunities the
virtual machine discovers, that a human programmer hasn't seen. It
is equally surprising how rarely a "gamble", such as assuming that a
particular method never will be overridden, is invalidated. This shows
some of the true strength of the adaptive runtime.

Unoptimized Java carries plenty of overhead. The javac compiler needs to do
workarounds to implement some language features in bytecode. For example,
string concatenation with the + operator is just syntactic sugar for the creation of
StringBuilder objects and calls to their append functions. An optimizing compiler
should, however, have very few problems transforming things like this into more
optimal constructs. For example, we can use the fact that the implementation of
java.lang.StringBuilder is known, and tell the optimizer that its methods
have no harmful side effects, even though they haven't been generated yet.
Similar issues exist with boxed types. Boxed types turn into hidden objects
(for example instances of java.lang.Integer) on the bytecode level. Several
traditional compiler optimizations, such as escape analysis, can often easily strip
down a boxed type to its primitive value. This removes the hidden object allocation
that javac put in the bytecode to implement the boxed type.

[ 63 ]

Adaptive Code Generation

Controlling code generation in JRockit

JRockit is designed to work well out of the box, and it is generally discouraged to
play around too much with its command-line options. Changing the behavior of
code generation and optimization is no exception. This section exists mostly for
informational purposes and the user should be aware of possible unwanted side
effects that can arise from changing the behavior of the code generators.
This section applies mainly to the versions of JRockit from R28 and later.
For earlier versions of JRockit, please consult the JRockit documentation
for equivalent ways of doing the same thing. Note that all R28
functionality does not have equivalents in earlier versions of JRockit.

Command-line flags and directive files

In the rare case that the code generator causes problems in JRockit, or an application
behaves strangely or erroneously, or it just takes too long time to optimize a particular
method, the JRockit code generator behavior can be altered and controlled. Naturally,
if you know what you are doing, code generation can be controlled for other purposes
as well.

Command-line flags

JRockit has several command-lines flags that control code generation behavior in
a coarse grained way. For the purpose of this text, we will only mention a few.

Logging

The –Xverbose:codegen (and –Xverbose:opt) options make JRockit output two
lines of information per JIT compiled (or optimized) method to stderr.
Consider the output for a simple HelloWorld program. Every code generation event
produces two lines in the log, one when it starts and one when it finishes.
hastur:material marcus$ java –Xverbose:codegen HelloWorld
[INFO ][codegen][00004] #1 (Normal) jrockit/vm/RNI.transitToJava(I)V
[INFO ][codegen][00004] #1 0.027-0.027 0x9e5c0000-0x9e5c0023 0
.14 ms (0.00 ms)
[INFO ][codegen][00004] #2 (Normal)
jrockit/vm/RNI.transitToJavaFromDbgEvent(I)V
[INFO ][codegen][00004] #2 0.027-0.027 0x9e5c0040-0x9e5c0063 0
.03 ms (0.00 ms)
[INFO ][codegen][00004] #3 (Normal) jrockit/vm/RNI.debuggerEvent()V
[ 64 ]

Chapter 2
[INFO ][codegen][00004] #3 0.027-0.027 0x9e5c0080-0x9e5c0131 0
.40 ms 64KB 0 bc/s (0.40 ms 0 bc/s)
[INFO ][codegen][00004] #4 (Normal)
jrockit/vm/ExceptionHandler.enterExceptionHandler()
Ljava/lang/Throwable;
[INFO ][codegen][00004] #4 0.027-0.028 0x9e5c0140-0x9e5c01ff 0
.34 ms 64KB 0 bc/s (0.74 ms 0 bc/s)
[INFO ][codegen][00004] #5 (Normal)
jrockit/vm/ExceptionHandler.gotoHandler()V
[INFO ][codegen][00004] #5 0.028-0.028 0x9e5c0200-0x9e5c025c 0
.02 ms (0.74 ms)
...
[INFO ][codegen][00044] #1149 (Normal) java/lang/Shutdown.runHooks()V
[INFO ][codegen][00044] #1149 0.347-0.348 0x9e3b4040-0x9e3b4106 0
.26 ms 128KB 219584 bc/s (270.77 ms 215775 bc/s)
hastur:material marcus$

The first log line of a code generation request (event start) contains the following
information from left to right:
•

Info tag and log module identifier (code generator).

•

The thread ID of the thread generating the code: Depending on the system
configuration, there can be more than one code generator thread and more
than one code optimizer thread.

•

The index of the generated method: The first method to be generated starts
at index 1. As we notice, at the beginning of the output, code generation is
single threaded, and the order between the start and end of a code generation
event is maintained, forming consecutive entries. This doesn't have to be
the case if multiple code generation and optimization threads are working.

•

The code generation strategy: The code generation strategy tells you how
this particular method will be generated. As it is too early to have received
runtime feedback information, all methods are generated using a normal
code generator strategy, or even a quick one that is even sloppier. The quick
strategy is applied for methods that are known to be of exceedingly little
importance for the runtime performance. This can be, for example, static
initializers that will run only once and thus make no sense to even register
allocate properly.

•

The generated method: This is uniquely identified by class name, method
name, and descriptor.

[ 65 ]

Adaptive Code Generation

The second line of a code generation request (event end) contains the following
information from left to right:
•

Info tag and log module identifier (code generator).

•

The thread ID of the thread generating the code.

•

The index of the generated method.

•

Start and end time for the code generation event: This is measured in
seconds from the start of the JVM.

•

The address range: This is where the resulting native code is placed
in memory.

•

Code generation time: The number of milliseconds it took for the code
generator to turn this particular method into machine language (starting
from bytecode).

•

Maximum amount of thread local memory used: This is the maximum
amount of memory that the code generator thread needed to allocate in
order to generate the method.

•

Average number of bytecodes per second: The number of bytecodes
processed per second for this method. 0 should be interpreted as
infinity—the precision was not good enough

•

Total code generation time: The total number of milliseconds this thread has
spent in code generation since JVM startup and average bytecodes compiled
per second for the thread so far.

Turning off optimizations

The command-line flag -XnoOpt, or –XX:DisableOptsAfter=

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : No
Tagged PDF                      : Yes
XMP Toolkit                     : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
Instance ID                     : uuid:8abb565e-9376-41ee-b1de-061cca61fe1e
Document ID                     : adobe:docid:indd:002c6625-6159-11de-9f51-d055c96b00dc
Original Document ID            : adobe:docid:indd:002c6625-6159-11de-9f51-d055c96b00dc
Rendition Class                 : proof:pdf
Derived From Instance ID        : 002c6624-6159-11de-9f51-d055c96b00dc
Derived From Document ID        : adobe:docid:indd:eafda706-28e4-11de-93a8-d41c3977141e
History Action                  : saved, saved, saved, saved
History Instance ID             : xmp.iid:D6E342FC546DDF118697F2995678FF1A, xmp.iid:D7E342FC546DDF118697F2995678FF1A, xmp.iid:D8E342FC546DDF118697F2995678FF1A, xmp.iid:D9E342FC546DDF118697F2995678FF1A
History When                    : 2010:06:01 14:55:15+05:30, 2010:06:01 14:55:15+05:30, 2010:06:01 14:57:50+05:30, 2010:06:01 14:58+05:30
History Software Agent          : Adobe InDesign 6.0, Adobe InDesign 6.0, Adobe InDesign 6.0, Adobe InDesign 6.0
History Changed                 : /, /metadata, /, /
Create Date                     : 2010:06:01 15:13+05:30
Modify Date                     : 2010:06:21 20:47:17+03:00
Metadata Date                   : 2010:06:21 20:47:17+03:00
Creator Tool                    : Adobe InDesign CS4 (6.0)
Doc Change Count                : 1
Format                          : application/pdf
Producer                        : Adobe PDF Library 9.0
Trapped                         : False
Has XFA                         : No
Page Count                      : 588
Creator                         : Adobe InDesign CS4 (6.0)
EXIF Metadata provided by EXIF.tools

Navigation menu