<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.3">Jekyll</generator><link href="https://blog.ludovic.dev/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.ludovic.dev/" rel="alternate" type="text/html" /><updated>2023-11-19T14:38:43+00:00</updated><id>https://blog.ludovic.dev/feed.xml</id><title type="html">Ludovic Henry</title><author><name>Ludovic Henry</name></author><entry><title type="html">QEMU and Github Actions</title><link href="https://blog.ludovic.dev/2023/11/19/qemu-and-gha.html" rel="alternate" type="text/html" title="QEMU and Github Actions" /><published>2023-11-19T00:00:00+00:00</published><updated>2023-11-19T00:00:00+00:00</updated><id>https://blog.ludovic.dev/2023/11/19/qemu-and-gha</id><content type="html" xml:base="https://blog.ludovic.dev/2023/11/19/qemu-and-gha.html">&lt;p&gt;GHA is great! It’s free (20 runners for open repositories per org), it has a rich ecosystem of actions, the runners are managed by the team at GitHub (updates and security fixes), it integrates to GitHub (releases, pull requests, issues, etc.), and it supports multiple platforms: Linux, windows, and macOS on x86.&lt;/p&gt;

&lt;p&gt;However, the set of architectures isn’t as exhaustive as what your users may run on. For example, Java is available on aarch64, armhf, s390x (IBM mainframes), ppc64el, and riscv64. You could add self-hosted runners for each of these platforms, but you lose most of the advantages of GitHub-provided runners.&lt;/p&gt;

&lt;p&gt;An alternative to self-hosted runner is to use the GitHub-provided runners with a twist: emulation 🫣 (not as scary as you think).&lt;/p&gt;

&lt;p&gt;Let’s look into what emulation is and how it works.&lt;/p&gt;

&lt;h1 id=&quot;emulation-with-qemu&quot;&gt;Emulation with QEMU&lt;/h1&gt;

&lt;p&gt;Emulation allows you to run an application written for an architecture (ex: riscv64) on another architecture (ex: x86). This emulation then takes care of translating from riscv64 or s390x to x86 for example, allowing you to transparently run programs across architectures.&lt;/p&gt;

&lt;p&gt;The most commonly used emulation software in the Unix ecosystem is QEMU. It has become the Swiss Army knife of emulation, supporting most architectures (many I have never heard of), making it a necessary tool for any new architecture’s ecosystem bringup.&lt;/p&gt;

&lt;p&gt;QEMU supports two modes of execution:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;System emulation: it emulates a machine on which you need to boot a Linux kernel. You then SSH into that machine to run your application. It is similar to launching a VM on your machine.&lt;/li&gt;
  &lt;li&gt;User-mode emulation: the kernel is still the host’s one and QEMU emulates the syscalls. Whenever your application makes a syscall, QEMU takes over and acts accordingly, calling into the host kernel where it makes sense, “faking” the syscall otherwise.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The user-mode emulation is the easiest to put in place on CI as you launch your application just like any other process, and QEMU makes sure everything “just works”.&lt;/p&gt;

&lt;p&gt;So how do you run your application with QEMU? The most explicit is to run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu-riscv64-static myapplication&lt;/code&gt;. Let’s take an example of the OpenJDK compiled for riscv64 and running it on an x86 machine (you can download one from adoptium &lt;a href=&quot;https://ci.adoptium.net/job/build-scripts/job/jobs/job/evaluation/job/jobs/job/jdk21u/job/jdk21u-evaluation-linux-riscv64-temurin/lastSuccessfulBuild/artifact/workspace/target/OpenJDK21U-jdk_riscv64_linux_hotspot_2023-11-13-16-12.tar.gz&quot;&gt;here&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;If you try running it, you’ll get the following:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; /path/to/jdk/bin/java -version
bash: exec format error: /path/to/jdk/bin/java
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That makes sense given the binary is targeting riscv64 and we are trying to run it on x86. We confirm it’s a riscv64 binary with:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; file /path/to/jdk/bin/java
/path/to/jdk/bin/java: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=e4445fabaa78b36248d15f0e6a3652939c1f64c1, for GNU/Linux 4.15.0, stripped
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Notice the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ELF 64-bit LSB pie executable, UCB RISC-V&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, let’s try running it with QEMU as I mentioned before. First install QEMU:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; sudo apt install qemu-user-static
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, run:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; qemu-riscv64-static /path/to/jdk/bin/java -version
qemu-riscv64-static: Could not open '/lib/ld-linux-riscv64-lp64d.so.1': No such file or directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Damn, what’s happening here? Well, QEMU only knows how to translate assembly from riscv64 to x86 here, it doesn’t know how to load libraries, that’s the role of the dynamic linker! Here QEMU is looking for this dynamic linker at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/lib/ld-linux-riscv64-lp64d.so.1&lt;/code&gt; but it can’t find it. That makes sense, there is no such file on the machine.&lt;/p&gt;

&lt;p&gt;So where can we find it and how can we tell QEMU where to find it? Via a sysroot and environment variables of course!&lt;/p&gt;

&lt;p&gt;(Please don’t run away! It’s easier to set up than you think, I promise!)&lt;/p&gt;

&lt;h1 id=&quot;the-sysroot&quot;&gt;The sysroot&lt;/h1&gt;

&lt;p&gt;First, let’s setup a sysroot using debootstrap:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; sudo apt install debootstrap
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then create a sysroot for riscv64 with the stock Ubuntu 22.04 content:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; sudo debootstrap --arch=riscv64 --verbose --resolve-deps --components=main,universe jammy sysroot
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you check what’s in that sysroot folder, you’ll find everything you have when you have a fresh install of Ubuntu 22.04 on a machine:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; ls -alh sysroot
total 68K
drwxr-xr-x 17 root	root	4.0K Nov 13 16:25 .
drwxr-xr-x 40 ludovic ludovic 4.0K Nov 13 17:04 ..
lrwxrwxrwx  1 root	root   	7 Nov 13 16:20 bin -&amp;gt; usr/bin
drwxr-xr-x  2 root	root	4.0K Oct  9 22:54 boot
drwxr-xr-x  4 root	root	4.0K Nov 13 16:20 dev
drwxr-xr-x 68 root	root	4.0K Nov 13 16:25 etc
drwxr-xr-x  2 root	root	4.0K Oct  9 22:54 home
lrwxrwxrwx  1 root	root   	7 Nov 13 16:20 lib -&amp;gt; usr/lib
drwxr-xr-x  2 root	root	4.0K Nov 13 16:20 media
drwxr-xr-x  2 root	root	4.0K Nov 13 16:20 mnt
drwxr-xr-x  2 root	root	4.0K Nov 13 16:20 opt
drwxr-xr-x  2 root	root	4.0K Oct  9 22:54 proc
drwx------  3 root	root	4.0K Nov 13 16:21 root
drwxr-xr-x 10 root	root	4.0K Nov 13 16:24 run
lrwxrwxrwx  1 root	root   	8 Nov 13 16:20 sbin -&amp;gt; usr/sbin
drwxr-xr-x  2 root	root	4.0K Nov 13 16:20 srv
drwxr-xr-x  2 root	root	4.0K Oct  9 22:54 sys
drwxrwxrwt  3 root	root	4.0K Nov 13 16:25 tmp
drwxr-xr-x 11 root	root	4.0K Nov 13 16:20 usr
drwxr-xr-x 11 root	root	4.0K Nov 13 16:20 var
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But you’ll also notice that everything in there is riscv64:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; file sysroot/bin/bash
sysroot/bin/bash: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=7ed4e703c21cd514edcf8100a05580e75e174735, for GNU/Linux 4.15.0, stripped
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, how do we use that sysroot to run our java for riscv64 binary? Simply tell QEMU where to load files from with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;QEMU_LD_PREFIX&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; QEMU_LD_PREFIX=sysroot qemu-riscv64-static /path/to/jdk/bin/java -version
openjdk version &quot;21&quot; 2023-09-19
OpenJDK Runtime Environment (build 21+35-Ubuntu-1)
OpenJDK 64-Bit Server VM (build 21+35-Ubuntu-1, mixed mode, sharing)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And it works! Congratulations, you got a riscv64 binary running on x86, isn’t technology incredible? Go ahead, try it out, run a more complex workload. I frequently run the whole of the OpenJDK or larger applications like Apache Spark, and it works (mostly) flawlessly.&lt;/p&gt;

&lt;p&gt;Ok, it’s all a bit tedious to use that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu-riscv64-static&lt;/code&gt; all the time. And how does it even work when the process forks and creates children? Well it doesn’t out of the box. Actually it does when you install qemu-user-static because it’s smart, but let’s figure out how it’s done exactly.&lt;/p&gt;

&lt;h1 id=&quot;binary-format&quot;&gt;Binary format&lt;/h1&gt;

&lt;p&gt;First, let me show you some magic. Go ahead, try the following:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; QEMU_LD_PREFIX=sysroot /path/to/jdk/bin/java -version
openjdk version &quot;21&quot; 2023-09-19
OpenJDK Runtime Environment (build 21+35-Ubuntu-1)
OpenJDK 64-Bit Server VM (build 21+35-Ubuntu-1, mixed mode, sharing)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Wait! It Just Works™?? What’s that magic!? Welcome to the wonderful world of binfmt (binary format).&lt;/p&gt;

&lt;p&gt;The kernel knows how to load a variety of formats: ELF, a.out, static executables among others. But it’s not feasible for the kernel to know all executables format out there, especially as you can get pretty creative - want to execute a JAR file or python script as a plain old executable without prefixing with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python&lt;/code&gt;, of couse you can do that!&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://en.m.wikipedia.org/wiki/Binfmt_misc&quot;&gt;binfmt_misc&lt;/a&gt; mechanism allows to add support for these additional mechanism. It allows you to register to the kernel specific “interpreter” for “arbitrary executable file formats to be recognized and passed to certain user space applications, such as emulators and virtual machines.” (Read &lt;a href=&quot;https://lwn.net/Articles/630727/&quot;&gt;How programs get run&lt;/a&gt; and &lt;a href=&quot;https://lwn.net/Articles/631631/&quot;&gt;How programs get run: ELF binaries&lt;/a&gt; articles on LWN from David Drysdale for in-depth details)&lt;/p&gt;

&lt;p&gt;You can find the registered interpreters at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/proc/sys/fs/binfmt_misc&lt;/code&gt;. On my machine I have:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; ls -alh /proc/sys/fs/binfmt_misc
total 0
drwxr-xr-x 2 root root 0 Oct 24 12:26 .
dr-xr-xr-x 1 root root 0 Oct 24 12:26 ..
-rw-r--r-- 1 root root 0 Oct 24 12:26 jar
-rw-r--r-- 1 root root 0 Oct 24 12:26 llvm-10-runtime.binfmt
-rw-r--r-- 1 root root 0 Oct 24 12:26 llvm-11-runtime.binfmt
-rw-r--r-- 1 root root 0 Oct 24 12:26 llvm-14-runtime.binfmt
-rw-r--r-- 1 root root 0 Oct 24 12:26 python3.10
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-aarch64
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-alpha
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-arm
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-armeb
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-cris
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-hexagon
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-hppa
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-m68k
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-microblaze
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-mips
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-mips64
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-mips64el
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-mipsel
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-mipsn32
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-mipsn32el
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-ppc
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-ppc64
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-ppc64le
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-riscv32
-rw-r--r-- 1 root root 0 Nov 13 16:08 qemu-riscv64
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-s390x
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-sh4
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-sh4eb
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-sparc
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-sparc32plus
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-sparc64
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-xtensa
-rw-r--r-- 1 root root 0 Nov 13 17:06 qemu-xtensaeb
--w------- 1 root root 0 Nov 13 17:06 register
-rw-r--r-- 1 root root 0 Nov 13 17:06 status
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s look at the interpreter for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu-riscv64&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; cat /proc/sys/fs/binfmt_misc/qemu-riscv64
enabled
interpreter /usr/libexec/qemu-binfmt/riscv64-binfmt-P
flags: POCF
offset 0
magic 7f454c460201010000000000000000000200f300
mask ffffffffffffff00fffffffffffffffffeffffff
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here is what we can find:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The magic number for riscv64&lt;/li&gt;
  &lt;li&gt;The path to the qemu-riscv64-static executable&lt;/li&gt;
  &lt;li&gt;Some options&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s recap how it works:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;In your shell, you launch your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java&lt;/code&gt; executable compiled for riscv64&lt;/li&gt;
  &lt;li&gt;The shell calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;execve&lt;/code&gt; with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java&lt;/code&gt; executable as argument&lt;/li&gt;
  &lt;li&gt;The kernel probes for the magic number in the riscv64 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java&lt;/code&gt; executable&lt;/li&gt;
  &lt;li&gt;That magic number matches for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu-riscv64&lt;/code&gt; interpreter&lt;/li&gt;
  &lt;li&gt;The kernel invokes &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu-riscv64&lt;/code&gt; to “interpret” the riscv64 executable&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu-riscv64&lt;/code&gt; l then start translating the java executable from riscv64 assembly to x86, and executes that newly generated x86 code&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And that’s it! That’s how riscv64 assembly is executed transparently on x86 machines with QEMU.&lt;/p&gt;

&lt;h1 id=&quot;facilitating-packaging&quot;&gt;Facilitating packaging&lt;/h1&gt;

&lt;p&gt;The main issue with this whole thing now is that I still need to setup a sysroot, and that’s cumbersome to setup. If only we had a mechanism to ship filesystems around where we can package everything we need and simply run them.&lt;/p&gt;

&lt;p&gt;Docker of course! (When is it not a solution?)&lt;/p&gt;

&lt;p&gt;The easiest way is the following (assuming &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qemu-user-static&lt;/code&gt;` is already setup):&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; docker run --rm -it --platform linux/riscv64 riscv64/ubuntu:23.04
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And with that you will have an Ubuntu 23.04 running riscv64 on your x86 machine 🤯. Isn’t that amazing?&lt;/p&gt;

&lt;p&gt;Go ahead, try it. Run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uname -m&lt;/code&gt; for fun. Or even install a package of your choice with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;apt install&lt;/code&gt;, it just works! (If it doesn’t let me know, it’s a bug)&lt;/p&gt;

&lt;p&gt;Do you need to do all that on your machine, all of it by hand? Luckily no, especially on GHA, where you’ve a bunch of actions already available. Let’s explore some of them in the next part of this post.&lt;/p&gt;

&lt;h1 id=&quot;next&quot;&gt;Next&lt;/h1&gt;

&lt;p&gt;Let’s look in the next post how to use all of the on GHA to build and test your projects on RISC-V. (Link incoming once posted.)&lt;/p&gt;</content><author><name>Ludovic Henry</name></author><summary type="html">GHA is great! It’s free (20 runners for open repositories per org), it has a rich ecosystem of actions, the runners are managed by the team at GitHub (updates and security fixes), it integrates to GitHub (releases, pull requests, issues, etc.), and it supports multiple platforms: Linux, windows, and macOS on x86.</summary></entry><entry><title type="html">Differences in Calling Conventions</title><link href="https://blog.ludovic.dev/2020/09/14/differences-in-calling-conventions.html" rel="alternate" type="text/html" title="Differences in Calling Conventions" /><published>2020-09-14T00:00:00+00:00</published><updated>2020-09-14T00:00:00+00:00</updated><id>https://blog.ludovic.dev/2020/09/14/differences-in-calling-conventions</id><content type="html" xml:base="https://blog.ludovic.dev/2020/09/14/differences-in-calling-conventions.html">&lt;p&gt;&lt;em&gt;This is an installment in a &lt;a href=&quot;/2020/09/07/openjdk-on-aarch64.html&quot;&gt;series of posts&lt;/a&gt; that will highlight discoveries I am making as I add support for Windows-AArch64 and macOS-AArch64 to the OpenJDK.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;a href=&quot;https://developer.apple.com/library/archive/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARM64FunctionCallingConventions.html&quot;&gt;ARM64 Function Calling Convention&lt;/a&gt;, Apple describes where and how the macOS-AArch64 calling convention differs from the &lt;a href=&quot;https://developer.arm.com/documentation/ihi0055/b/&quot;&gt;official one&lt;/a&gt; used on Linux and Windows. This calling convention is part of the ABI, which you can read more about at &lt;a href=&quot;/2020/09/08/whats-an-abi-anyways.html&quot;&gt;What’s an ABI anyways?&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the official calling convention, parameters are 8-bytes aligned, while on macOS (and iOS), the parameters are aligned on their size. For example, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt; is 4-bytes wide and 4-bytes aligned, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;short&lt;/code&gt; is 2-bytes wide and 2-bytes aligned. That impacts any Java code calling into native code (into the VM or via JNI, for example). We can expose this difference with something as simple as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java -version&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;the-symptoms&quot;&gt;The symptoms&lt;/h2&gt;

&lt;p&gt;That is what happens when I run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java -version&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; build/macosx-aarch64-server-slowdebug/jdk/bin/java -version
Error occurred during initialization of boot layer
java.lang.InternalError: DMH.invokeStatic=Lambda(a0:L,a1:L,a2:L,a3:L,a4:L,a5:L,a6:L)=&amp;gt;{
    t7:L=DirectMethodHandle.internalMemberName(a0:L);
    t8:L=MethodHandle.linkToStatic(a1:L,a2:L,a3:L,a4:L,a5:L,a6:L,t7:L);t8:L}
Caused by: java.lang.IllegalArgumentException: classData is only applicable for hidden classes
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From a quick search in the OpenJDK source code for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classData is only applicable for hidden classes&lt;/code&gt;, we find that the exception is thrown from &lt;a href=&quot;https://github.com/openjdk/jdk/blob/869b05169fdb3a1ac851b367a2284ca0c5bb4d7a/src/hotspot/share/prims/jvm.cpp#L1025&quot;&gt;src/hotspot/share/prims/jvm.cpp:1025&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Running with a debugger yields more information about the crash:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; lldb -- build/macosx-aarch64-server-slowdebug/jdk/bin/java -version
(lldb) target create &quot;build/macosx-aarch64-server-slowdebug/jdk/bin/java&quot;
Current executable set to '/Users/luhenry/openjdk-jdk/build/macosx-aarch64-server-slowdebug/jdk/bin/java' (arm64).
(lldb) settings set -- target.run-args  &quot;-version&quot;
(lldb) b jvm.cpp:1025
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) r
Process 61939 launched: '/Users/luhenry/openjdk-jdk/build/macosx-aarch64-server-slowdebug/jdk/bin/java' (arm64)
1 location added to breakpoint 1
Process 61939 stopped
* thread #3, stop reason = breakpoint 1.1
    frame #0: 0x000000010604df0c libjvm.dylib`jvm_lookup_define_class(env=0x0000000100816ba8, lookup=0x000000017008c7a0, name=&quot;java/lang/invoke/LambdaForm$DMH&quot;, buf=0x000000010501b000, len=1212, pd=0x0000000000000000, init='\x01', flags=0, classData=0x000000000000000a, __the_thread__=0x0000000100816820) at jvm.cpp:1025:7
   1022   if (!is_hidden) {
   1023     // classData is only applicable for hidden classes
   1024     if (classData != NULL) {
-&amp;gt; 1025       THROW_MSG_0(vmSymbols::java_lang_IllegalArgumentException(), &quot;classData is only applicable for hidden classes&quot;);
   1026     }
   1027     if (is_nestmate) {
   1028       THROW_MSG_0(vmSymbols::java_lang_IllegalArgumentException(), &quot;dynamic nestmate is only applicable for hidden classes&quot;);
Target 0: (java) stopped.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We have, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;is_hidden = (flags &amp;amp; HIDDEN_CLASS) == HIDDEN_CLASS&lt;/code&gt;, which is false with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flags = 10&lt;/code&gt;. However, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classData&lt;/code&gt; has an unexpected value: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xa&lt;/code&gt;. It is indeed non-NULL, which is why it throws an exception, but we expect either a NULL value or a valid pointer to a Java object. Here, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xa&lt;/code&gt; is neither of those.&lt;/p&gt;

&lt;p&gt;Let’s backtrack a bit and figure out where these values come from.&lt;/p&gt;

&lt;p&gt;First, let’s take a look at the backtrace:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(lldb) bt
* thread #3, stop reason = breakpoint 1.1
  * frame #0: 0x000000010604df0c libjvm.dylib`jvm_lookup_define_class(env=0x0000000100816ba8, lookup=0x000000017008c7a0, name=&quot;java/lang/invoke/LambdaForm$DMH&quot;, buf=0x000000010501b000, len=1212, pd=0x0000000000000000, init='\x01', flags=0, classData=0x000000000000000a, __the_thread__=0x0000000100816820) at jvm.cpp:1025:7
    frame #1: 0x000000010604dbb0 libjvm.dylib`::JVM_LookupDefineClass(env=0x0000000100816ba8, lookup=0x000000017008c7a0, name=&quot;java/lang/invoke/LambdaForm$DMH&quot;, buf=0x000000010501b000, len=1212, pd=0x0000000000000000, initialize='\x01', flags=0, classData=0x000000000000000a) at jvm.cpp:1139:10
    frame #2: 0x0000000100502cfc libjava.dylib`Java_java_lang_ClassLoader_defineClass0(env=0x0000000100816ba8, cls=0x000000017008c758, loader=0x0000000000000000, lookup=0x000000017008c7a0, name=0x000000017008c798, data=0x000000017008c790, offset=0, length=1212, pd=0x0000000000000000, initialize='\x01', flags=0, classData=0x000000000000000a) at ClassLoader.c:263:12
    frame #3: 0x0000000108080aa0
    frame #4: 0x000000010807bde0
[...]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can see that the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classData&lt;/code&gt; comes straight from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Java_java_lang_ClassLoader_defineClass0&lt;/code&gt;. Looking further into this function, we note that it is the native implementation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java.lang.ClassLoader.defineClass0&lt;/code&gt; (see &lt;a href=&quot;https://github.com/openjdk/jdk/blob/869b05169fdb3a1ac851b367a2284ca0c5bb4d7a/src/java.base/share/classes/java/lang/ClassLoader.java#L1134&quot;&gt;src/java.base/share/classes/java/lang/ClassLoader.java:1134&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Next, let’s verify what values Java is passing:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;--- a/src/java.base/share/classes/java/lang/System.java
+++ b/src/java.base/share/classes/java/lang/System.java
@@ -2190,6 +2190,7 @@ public final class System {
             }
             public Class&amp;lt;?&amp;gt; defineClass(ClassLoader loader, Class&amp;lt;?&amp;gt; lookup, String name, byte[] b, ProtectionDomain pd,
                                         boolean initialize, int flags, Object classData) {
+                System.err.println(&quot;ClassLoader.defineClass0(&quot; + loader + &quot;, &quot; + lookup + &quot;, &quot; + name + &quot;, &quot; + b + &quot;, &quot; + 0 + &quot;, &quot; + b.length + &quot;, &quot; + pd + &quot;, &quot; + initialize + &quot;, &quot; + flags + &quot;, &quot; + classData + &quot;)&quot;);
                 return ClassLoader.defineClass0(loader, lookup, name, b, 0, b.length, pd, initialize, flags, classData);
             }
             public Class&amp;lt;?&amp;gt; findBootstrapClassOrNull(ClassLoader cl, String name) {
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; lldb -- build/macosx-aarch64-server-slowdebug/jdk/bin/java -version
(lldb) target create &quot;build/macosx-aarch64-server-slowdebug/jdk/bin/java&quot;
Current executable set to '/Users/luhenry/openjdk-jdk/build/macosx-aarch64-server-slowdebug/jdk/bin/java' (arm64).
(lldb) settings set -- target.run-args  &quot;-version&quot;
(lldb) b Java_java_lang_ClassLoader_defineClass0
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) r
Process 64011 launched: '/Users/luhenry/openjdk-jdk/build/macosx-aarch64-server-slowdebug/jdk/bin/java' (arm64)
1 location added to breakpoint 1
Process 64011 stopped
ClassLoader.defineClass0(null, class java.lang.invoke.LambdaForm, java/lang/invoke/LambdaForm$DMH, [B@7e0b37bc, 0, 1212, null, true, 10, [DMH.invokeStatic=Lambda(a0:L,a1:L,a2:L,a3:L,a4:L,a5:L,a6:L)=&amp;gt;{
    t7:L=DirectMethodHandle.internalMemberName(a0:L);
    t8:L=MethodHandle.linkToStatic(a1:L,a2:L,a3:L,a4:L,a5:L,a6:L,t7:L);t8:L}])
Process 64011 stopped
* thread #3, stop reason = breakpoint 1.1
    frame #0: 0x0000000100502bbc libjava.dylib`Java_java_lang_ClassLoader_defineClass0(env=0x0000000100816ba8, cls=0x000000017008c758, loader=0x0000000000000000, lookup=0x000000017008c7a0, name=0x000000017008c798, data=0x000000017008c790, offset=0, length=1212, pd=0x0000000000000000, initialize='\x01', flags=0, classData=0x000000000000000a) at ClassLoader.c:226:12
   223  {
   224      jbyte *body;
   225      char *utfName;
-&amp;gt; 226      jclass result = 0;
   227      char buf[128];
   228
   229      if (data == NULL) {
Target 0: (java) stopped.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here is what we have learned so far:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flags&lt;/code&gt; is equal to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10&lt;/code&gt; in Java but &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; in native&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classData&lt;/code&gt; is a valid, non-NULL object in Java, but it is equal to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xa&lt;/code&gt; in native.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a classic example of a calling convention mismatch between the caller and the callee. On the one hand, the caller, respecting a specific ABI, puts the parameters in a pre-defined set of locations (register or stack slots). On the other hand, the callee, respecting another ABI, expects the parameters to be passed in a different pre-defined set of locations.&lt;/p&gt;

&lt;h2 id=&quot;understanding-the-difference&quot;&gt;Understanding the difference&lt;/h2&gt;

&lt;p&gt;Let’s visualize the differences between the calling conventions of Linux-AArch64 and macOS-AArch64.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Parameter&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Size (bytes)&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Linux-AArch64&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;macOS-AArch64&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;env&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r0&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r0&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cls&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r1&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;loader&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r2&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r2&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lookup&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r3&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r3&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r4&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r4&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r5&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r5&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;offset&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;4&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r6&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r6&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;length&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;4&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r7&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r7&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pd&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+0&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+0&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;initialize&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+8&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+8&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flags&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;4&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+16&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+12&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classData&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+24&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;&lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sp+16&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;You notice the difference around &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flags&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classData&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Hotspot currently follows the Linux-AArch64 calling convention while native follows the macOS-AArch64 calling convention.&lt;/p&gt;

&lt;p&gt;Let’s map the stack at the time of the call. &lt;em&gt;(Note that the memory ordering is little-endian.)&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;                     Java         native
sp+28 | 10000000 |             
sp+24 | 022d1a9f | &amp;lt; classData
sp+20 | 00000000 |            
sp+16 | a0000000 | &amp;lt; flags      &amp;lt; classData
sp+12 | 00000000 |              &amp;lt; flags
sp+8  | 10000000 | &amp;lt; init       &amp;lt; init
sp+4  | 00000000 |            
sp+0  | 00000000 | &amp;lt; pd         &amp;lt; pd
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This clarifies why &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flags&lt;/code&gt; is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10&lt;/code&gt; in Java but &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; in native, and why &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;classData&lt;/code&gt; is a valid pointer in Java but &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xa&lt;/code&gt; in native.&lt;/p&gt;

&lt;h2 id=&quot;how-to-fix-it&quot;&gt;How to fix it?&lt;/h2&gt;

&lt;p&gt;The fix is to teach Hotspot to use the macOS-AArch64 calling convention when running on macOS-AArch64.&lt;/p&gt;

&lt;p&gt;Luckily there are only a few places in Hotspot that generate this transition from Java to native: in the interpreter and in the compiler. Due to technical and historical reasons, the code is not shared across these two. We’ll then need to modify both for everything to work.&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;https://github.com/openjdk/jdk/blob/869b05169fdb3a1ac851b367a2284ca0c5bb4d7a/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp#L54-L93&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InterpreterRuntime::SignatureHandlerGenerator::pass_int&lt;/code&gt;&lt;/a&gt;, we have the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;void InterpreterRuntime::SignatureHandlerGenerator::pass_int() {
  const Address src(from(), Interpreter::local_offset_in_bytes(offset()));

  switch (_num_int_args) {
  case 0:
    __ ldr(c_rarg1, src);
    _num_int_args++;
    break;
  case 1:
    __ ldr(c_rarg2, src);
    _num_int_args++;
    break;

[...]

  default: // for any parameter passed on the stack
    __ ldr(r0, src);
    __ str(r0, Address(to(), _stack_offset));
    _stack_offset += wordSize;
    _num_int_args++;
    break;
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The solution is to ensure that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_stack_offset&lt;/code&gt; for the next parameter is not 8-bytes aligned, but 4-bytes aligned for an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;--- a/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp
@@ -86,7 +86,7 @@ void InterpreterRuntime::SignatureHandlerGenerator::pass_int() {
   default:
     __ ldr(r0, src);
     __ str(r0, Address(to(), _stack_offset));
-    _stack_offset += wordSize;
+    _stack_offset += MACOS_ONLY(4) NOT_MACOS(wordSize);
     _num_int_args++;
     break;
   }
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;However, we still need to ensure that any 8-bytes wide values (like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;long&lt;/code&gt;, objects, or pointers in general) are still 8-bytes aligned.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;--- a/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp
@@ -125,6 +125,7 @@ void InterpreterRuntime::SignatureHandlerGenerator::pass_long() {
     _num_int_args++;
     break;
   default:
+    _stack_offset = align_up(_stack_offset, 8);
     __ ldr(r0, src);
     __ str(r0, Address(to(), _stack_offset));
     _stack_offset += wordSize;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With these fixes and a few others similar to this one, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;java -version&lt;/code&gt; now runs successfully:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; build/macosx-aarch64-server-slowdebug/jdk/bin/java -version
openjdk version &quot;16-internal&quot; 2021-03-16
OpenJDK Runtime Environment (slowdebug build 16-internal+0-adhoc.luhenry.openjdk-jdk)
OpenJDK 64-Bit Server VM (slowdebug build 16-internal+0-adhoc.luhenry.openjdk-jdk, mixed mode)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;We explored how the macOS-AArch64 ABI differs from the Linux-AArch64 ABI, and its impact on Java to native method calls. We also explored what modifications are necessary for Hotspot to match the different calling conventions between macOS, Linux, and Windows.&lt;/p&gt;

&lt;p&gt;In later posts, I’ll talk more about some of the issues I ran into when porting the OpenJDK to Windows-AArch64 and macOS-AArch64, the subtle differences in their ABI and APIs, and the necessary modifications to the OpenJDK.&lt;/p&gt;</content><author><name>Ludovic Henry</name></author><summary type="html">This is an installment in a series of posts that will highlight discoveries I am making as I add support for Windows-AArch64 and macOS-AArch64 to the OpenJDK.</summary></entry><entry><title type="html">What’s an ABI anyways?</title><link href="https://blog.ludovic.dev/2020/09/08/whats-an-abi-anyways.html" rel="alternate" type="text/html" title="What’s an ABI anyways?" /><published>2020-09-08T00:00:00+00:00</published><updated>2020-09-08T00:00:00+00:00</updated><id>https://blog.ludovic.dev/2020/09/08/whats-an-abi-anyways</id><content type="html" xml:base="https://blog.ludovic.dev/2020/09/08/whats-an-abi-anyways.html">&lt;p&gt;&lt;em&gt;This is an installment in a &lt;a href=&quot;/2020/09/07/openjdk-on-aarch64.html&quot;&gt;series of posts&lt;/a&gt; that will highlight discoveries I am making as I add support for Windows-AArch64 and macOS-AArch64 to the OpenJDK.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Before we dive further into specifics of Windows-AArch64 and macOS-AArch64, it’s essential to lay out some of the platform’s fundamental concepts. Here, I’m diving deeper into what an ABI is.&lt;/p&gt;

&lt;h2 id=&quot;definition&quot;&gt;Definition&lt;/h2&gt;

&lt;p&gt;ABI = Application Binary Interface.&lt;/p&gt;

&lt;p&gt;It sounds very similar to API (Application Programming Interface) because it is, in fact, a very similar concept.&lt;/p&gt;

&lt;p&gt;Let’s take a look at an example of a C function defined in a header file:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;// my_library.h
int my_func(long p0, long p1, long p2, long p3, long p4, long p5, int p6, int p7, long p8, char p9, int p10, long p11);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To invoke this function from another file, you do the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;// my_exe.c
#include &quot;my_library.h&quot;
int main(int argc, char **argv) {
  return my_func(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Because you declared the signature of the function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_func&lt;/code&gt;, the compiler will know how to generate the code to invoke the function. It will know how many parameters it takes, the type of each parameter, and the return type. The compiler then knows whether you’re invoking the function correctly. For example:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;If you try to pass more or less than 12 parameters, it will fail to compile&lt;/li&gt;
  &lt;li&gt;If you try to pass a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;long&lt;/code&gt; instead of a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;char&lt;/code&gt;, it may fail to compile&lt;/li&gt;
  &lt;li&gt;If you ignore the return value, it may emit a warning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The API is then a contract between the caller and the callee for functions and data types. In Object-Oriented Programming, the API is similarly a contract for classes and methods.&lt;/p&gt;

&lt;p&gt;After the compiler validates that a caller invokes a function the way the callee expects it, it generates the assembly code that does the invoke. It is where the ABI comes into play.&lt;/p&gt;

&lt;p&gt;Overall, an ABI is similar to an API: it defines a contract, the API at the source-level, and the ABI at the binary-level.&lt;/p&gt;

&lt;p&gt;Parts of what the ABI defines are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The &lt;strong&gt;calling convention&lt;/strong&gt;, or “how to invoke a function”: the instructions to emit, how to pass parameters, how to set up the stack, whether to allocate the stack up or down, and more.&lt;/li&gt;
  &lt;li&gt;The &lt;strong&gt;size and alignment of basic data types&lt;/strong&gt;, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;short&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;long&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;long long&lt;/code&gt;, or pointers.&lt;/li&gt;
  &lt;li&gt;The &lt;strong&gt;usage of machine registers&lt;/strong&gt;, with, for example, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r31&lt;/code&gt; to store the stack pointer, or using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r8&lt;/code&gt; as a scratch register.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ABI is also specific to each platform: Windows vs. Linux vs. macOS, ARM vs. x86, 32-bits vs. 64-bits.&lt;/p&gt;

&lt;p&gt;The most widespread ABI is the C ABI. Different languages commonly use this ABI to call into each other. For example, the Java compiler knows nothing about C/C++ headers. Still, by defining the equivalent signature in Java to the C function, it will know where to put the parameters, the size of each parameter, and where to get the return value, merely by following the C ABI.&lt;/p&gt;

&lt;h3 id=&quot;an-analogy&quot;&gt;An analogy&lt;/h3&gt;

&lt;p&gt;An ABI is like a contract between two robots assembling parts to build a product. And these robots only communicate through predefined boxes to exchange the parts.&lt;/p&gt;

&lt;p&gt;For example, Robot-1 puts parts into five boxes, waits for Robot-2 to do its job, picks up the product from the “return” box, and stores it elsewhere. Robot-2 takes the parts from the five boxes, assembles them, and drops the product back into the “return” box.&lt;/p&gt;

&lt;p&gt;For a smooth operation, both robots need to agree on which part goes into which box. If, at any point, Robot-1 starts putting one the part into another box, the other robot will miss necessary parts and assemble the wrong product.&lt;/p&gt;

&lt;p&gt;The ABI also makes it possible to drop-in a new robot, as long as it abides by the established contract. This new robot then doesn’t even need to come from the same manufacturer or be programmed the same way; it only has to follow the ABI.&lt;/p&gt;

&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;

&lt;p&gt;Let’s take &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_func&lt;/code&gt; above and focus on the Linux-AArch64 ABI.&lt;/p&gt;

&lt;p&gt;Per &lt;a href=&quot;https://developer.arm.com/documentation/ihi0055/b/&quot;&gt;the official documentation&lt;/a&gt;, the ABI defines:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;char&lt;/code&gt; is 1-byte wide, an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt; is 4-bytes wide, and a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;long&lt;/code&gt; is 8-bytes wide.&lt;/li&gt;
  &lt;li&gt;The first 8 parameters are passed in registers &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r0...r7&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;The next 4 parameters are passed on the stack with an 8-bytes alignment.&lt;/li&gt;
  &lt;li&gt;The return value is passed in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;r0&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_func&lt;/code&gt;, the following illustrate where arguments are passed in registers and the stack. &lt;em&gt;Note that the memory ordering is little-endian.&lt;/em&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;// parameters passed in registers
r0: 00000000 00000000    &amp;lt; p0 and return value
r1: 10000000 00000000    &amp;lt; p1
r2: 20000000 00000000    &amp;lt; p2
r3: 30000000 00000000    &amp;lt; p3
r4: 40000000 00000000    &amp;lt; p4
r5: 50000000 00000000    &amp;lt; p5
r6: 60000000 00000000    &amp;lt; p6
r7: 70000000 00000000    &amp;lt; p7

// parameters passed on the stack
sp+0:  80000000 00000000 &amp;lt; p8
sp+8:  90000000 00000000 &amp;lt; p9
sp+16: a0000000 00000000 &amp;lt; p10
sp+24: b0000000 00000000 &amp;lt; p11
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; to call &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_func&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;main&lt;/code&gt; then needs to put the parameters in these predefined locations. Failing to do so leads &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_func&lt;/code&gt; to look for parameters where they haven’t been passed. That can lead (at best) to a crash, or (at worst) to the application’s state’s silent corruption.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;We learned that an ABI is a contract at the binary level for different code pieces to interact with each other. It is why it is an essential component of a platform. We also explored some of the calling convention aspects of the ABI on Linux-AArch64.&lt;/p&gt;

&lt;p&gt;In later posts, I’ll talk about some of the issues I ran into when porting the OpenJDK to Windows-AArch64 and macOS-AArch64, the subtle differences in their ABI, and the necessary modifications to the OpenJDK.&lt;/p&gt;</content><author><name>Ludovic Henry</name></author><summary type="html">This is an installment in a series of posts that will highlight discoveries I am making as I add support for Windows-AArch64 and macOS-AArch64 to the OpenJDK.</summary></entry><entry><title type="html">OpenJDK on AArch64</title><link href="https://blog.ludovic.dev/2020/09/07/openjdk-on-aarch64.html" rel="alternate" type="text/html" title="OpenJDK on AArch64" /><published>2020-09-07T00:00:00+00:00</published><updated>2020-09-07T00:00:00+00:00</updated><id>https://blog.ludovic.dev/2020/09/07/openjdk-on-aarch64</id><content type="html" xml:base="https://blog.ludovic.dev/2020/09/07/openjdk-on-aarch64.html">&lt;p&gt;In response to recent developments around ARM64 (the &lt;a href=&quot;https://www.apple.com/newsroom/2020/06/apple-announces-mac-transition-to-apple-silicon/&quot;&gt;Apple Silicon&lt;/a&gt; announcement for example), the Java Engineering Group here at Microsoft decided to join in the effort to port the OpenJDK to ARM64 on Windows and macOS. I wanted to share some of the more interesting aspects of the work I’m involved in a series of blog posts.&lt;/p&gt;

&lt;p&gt;I’ll document my journey to discover and resolve the differences between ARM64 and x86, on Linux, Windows, and macOS in these posts.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/2020/09/08/whats-an-abi-anyways.html&quot;&gt;What’s an ABI anyways?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/2020/09/14/differences-in-calling-conventions.html&quot;&gt;Differences in Calling Conventions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Ludovic Henry</name></author><summary type="html">In response to recent developments around ARM64 (the Apple Silicon announcement for example), the Java Engineering Group here at Microsoft decided to join in the effort to port the OpenJDK to ARM64 on Windows and macOS. I wanted to share some of the more interesting aspects of the work I’m involved in a series of blog posts.</summary></entry><entry><title type="html">AOT Compilation in HotSpot: Introduction</title><link href="https://blog.ludovic.dev/2019/10/31/aot-compilation-in-hotspot-introduction.html" rel="alternate" type="text/html" title="AOT Compilation in HotSpot: Introduction" /><published>2019-10-31T00:00:00+00:00</published><updated>2019-10-31T00:00:00+00:00</updated><id>https://blog.ludovic.dev/2019/10/31/aot-compilation-in-hotspot-introduction</id><content type="html" xml:base="https://blog.ludovic.dev/2019/10/31/aot-compilation-in-hotspot-introduction.html">&lt;p&gt;&lt;em&gt;This blog post is not about SubstrateVM nor GraalVM but focuses on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jaotc&lt;/code&gt; AOT compiler in HotSpot.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this blog post, we are going to focus on the Ahead-Of-Time (AOT) Compilation that was introduced in Java 9 (&lt;a href=&quot;https://openjdk.java.net/jeps/295&quot;&gt;https://openjdk.java.net/jeps/295&lt;/a&gt;) with the addition of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jaotc&lt;/code&gt; command-line utility. This AOT compiler is based on the work done in Graal JIT.&lt;/p&gt;

&lt;p&gt;We are going to explore some of the tradeoffs that the AOT compiler needs to take, and how the generated code fits in the Tiered Compilation (TC) pipeline. Then, we will go through a simple example, showing how to use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jaotc&lt;/code&gt; command-line utility. Finally, we are going to explore some alternatives to the AOT compiler like JIT at Startup, JIT caching, and Distributed JIT.&lt;/p&gt;

&lt;h2 id=&quot;aot-compilation-in-hotspot&quot;&gt;AOT Compilation in HotSpot&lt;/h2&gt;

&lt;p&gt;An AOT compiler’s primary capability is to generate machine code for an application without having to run the application, allowing a future run of the application to pick the generated code. Similarly, to C1 and C2, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jaotc&lt;/code&gt; compiles Java bytecode to native code.&lt;/p&gt;

&lt;p&gt;The primary motivator behind using AOT in Java is to bypass the interpreter. It is generally faster for the machine to execute machine code than it is to execute the code via the bytecode interpreter. In many cases, it is a definite advantage, especially for code that needs to be executed even just a few times.&lt;/p&gt;

&lt;h3 id=&quot;tradeoffs-of-generated-code&quot;&gt;Tradeoffs of generated code&lt;/h3&gt;

&lt;p&gt;An AOT compiler cannot make the same class of assumptions as a JIT compiler. The AOT compiler doesn’t have access to as much information as the JIT compiler does because the process generating and executing the application are not the same.&lt;/p&gt;

&lt;p&gt;For example, AOT compilers are required to generate Position Independent Code (PIC) to produce shared libraries. That is because there is no way to know ahead of execution where in memory the code is loaded, blocking any assumption the AOT compiler can make on the location (relative or absolute) of a symbol; this prevents the AOT compiler from referencing the address of any symbol directly. So, whenever a symbol (such as functions and constants) is accessed, it requires the AOT compiler to generate an indirection, with the resolution happening on first access to the symbol.&lt;/p&gt;

&lt;p&gt;On the other hand, a JIT compiler can take the address in memory of a symbol and embed it directly in the code. It works because the JIT compiler can assume the code to have a shorter lifetime than the symbol: the code generation happens after the symbol initialization (or at least the code generation initializes the symbol), and the shutdown of the process triggers the destruction of both the code and the symbol.&lt;/p&gt;

&lt;p&gt;Another example is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;final static&lt;/code&gt; variables. A JIT compiler can make certain assumptions allowing it to generate code based on the value of the variable. But because an AOT compiler cannot know the value of the variable before the initialization of the variable – which only happens at the execution of the code – it can’t make the same assumptions. That can lead to missed optimizations opportunities like dead-code elimination or inlining.&lt;/p&gt;

&lt;p&gt;Finally, the OS and architecture on which you execute the code and on which you generate the code are required to be the same. For example, if you want to execute the code on Windows, you cannot generate the code on Linux or macOS but only on Windows. That is because the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jaotc&lt;/code&gt; does not support cross-compilation.&lt;/p&gt;

&lt;h3 id=&quot;integration-with-the-tiered-compilation-pipeline&quot;&gt;Integration with the Tiered Compilation pipeline&lt;/h3&gt;

&lt;p&gt;Introduced in Java 7, Tiered Compilation (TC) goal is to have fast startup time and fast steady-state throughput. The implementation consists of a pipeline of multiple tiers of code generation. The three main components of this pipeline are the interpreter, the C1 compiler, and the C2 compiler. It replaced the -client and -server command-line parameters available in previous versions of Java.&lt;/p&gt;

&lt;p&gt;As the method goes through the different tiers, each tier gathers information about the method execution. This information is called Profiling Data (PD). The C2 compiler uses this PD to make certain assumptions such as what code paths are cold/warm/hot, and what types are used at any call sites. It can then generate code better suited for the specific context that it is currently executing in.&lt;/p&gt;

&lt;p&gt;The five tiers of code generation are:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;none (0):&lt;/strong&gt; Interpreter gathering full PD&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;simple (1):&lt;/strong&gt; C1 compiler with no profiling&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;limited profile (2):&lt;/strong&gt; C1 compiler with light profiling gathering some PD&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;full profile (3):&lt;/strong&gt; C1 compiler with full profiling gathering full PD&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;full optimization (4):&lt;/strong&gt; C2 compiler with no profiling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jaotc&lt;/code&gt;, you have the option to generate code with or without support for TC. Enabling TC generates slightly slower code due to the profiling overhead. Disabling TC blocks the use of the TC pipeline leading to slower steady-state throughput.&lt;/p&gt;

&lt;p&gt;Figure 1 and Figure 2 show the flow in the TC pipeline if you use AOT or not.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/aot-compilation-in-hotspot-introduction/tiered-compilation-pipeline-without-aot.png&quot; alt=&quot;Figure 1: Tiered Compilation pipeline without AOT.&quot; /&gt;&lt;br /&gt;
&lt;em&gt;Figure 1: Tiered Compilation pipeline without AOT.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/aot-compilation-in-hotspot-introduction/tiered-compilation-pipeline-with-aot.png&quot; alt=&quot;Figure 2: Tiered Compilation pipeline with AOT.&quot; /&gt;&lt;br /&gt;
&lt;em&gt;Figure 2: Tiered Compilation pipeline with AOT.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The difference in generated code by the AOT compiler with and without support for TC is trivial. For the TC pipeline to decide whether to compile the method at a particular tier, the generated code updates a set of counters (invocation counters, backedge counters) when executing, and whenever any of these counters overflow a given threshold, the instrumented code calls back into the runtime. This call contains all the information needed by the runtime to figure out which method has reached the threshold. It allows the runtime to decide whether to compile the method at the next tier of the TC pipeline. Given that, if you generate code that does not have support for TC, then the counters are never updated, thus never overflowed, and it never calls back into the runtime to request compilation at the next tier of the TC pipeline.&lt;/p&gt;

&lt;p&gt;In case the code has been generated with support for TC, AOT code fits in the TC pipeline at roughly the same tier as the limited profile (2) tier. The threshold value differs, with, for example, the execution threshold to go from Tier 0 or Tier 2 to Tier 3: the default value without AOT is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Tier3InvocationThreshold=200&lt;/code&gt;, and the default value with AOT is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Tier3AOTInvocationThreshold=10000&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;usage&quot;&gt;Usage&lt;/h2&gt;

&lt;p&gt;For the AOT compiler to successfully generate code, the same environment than for the JIT compiler need to be available. That means that all dependencies (jars, jmods) must be present and accessible to the AOT compiler.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The example below is assuming you are using Java 11 or later.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s take a simple example, HelloWorld.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;class HelloWorld {
    public static void main(String args[]) {
        System.out.println(&quot;Hello, World&quot;);
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To compile it to Java bytecode, run the usual:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; javac HelloWorld.java
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you want to run without AOT, you simply run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; java HelloWorld
Hello, World
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you want to run with AOT, you first need to run the AOT compiler:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; jaotc --compile-for-tiered --output libHelloWorld.so --verbose HelloWorld
Compiling libHelloWorld.so...
1 classes found (25 ms)
Scanning HelloWorld
added &amp;lt;init&amp;gt;()V
added main([Ljava/lang/String;)V
2 methods total, 2 methods to compile (4 ms)
Freeing memory [used: 4.0 MB , comm: 12.0 MB, freeRatio ~= 66.7%] (44 ms)
Compiling with 12 threads
.
2 methods compiled, 0 methods failed (363 ms)
Freeing memory [used: 5.4 MB , comm: 18.0 MB, freeRatio ~= 70.0%] (17 ms)
Parsing compiled code (2 ms)
Freeing memory [used: 5.8 MB , comm: 24.0 MB, freeRatio ~= 75.9%] (18 ms)
Processing metadata (10 ms)
Freeing memory [used: 5.7 MB , comm: 24.0 MB, freeRatio ~= 76.2%] (18 ms)
Preparing stubs binary (0 ms)
Preparing compiled binary (0 ms)
.header: 63 bytes
.config: 43 bytes
.kls.offsets: 336 bytes
.meth.offsets: 52 bytes
.kls.dependencies: 76 bytes
.stubs.offsets: 1036 bytes
.meth.metadata: 7832 bytes
.text: 17800 bytes
.code.segments: 137 bytes
.meth.constdata: 14344 bytes
.kls.got: 224 bytes
.cnt.got: 48 bytes
.meta.got: 32 bytes
.meth.state: 360 bytes
.oop.got: 8 bytes
.meta.names: 2234 bytes
Freeing memory [used: 5.7 MB , comm: 24.0 MB, freeRatio ~= 76.2%] (18 ms)
Creating binary: libHelloWorld.o (14 ms)
Freeing memory [used: 5.7 MB , comm: 24.0 MB, freeRatio ~= 76.2%] (18 ms)
Creating shared library: libHelloWorld.so (19 ms)
Final memory   [used: 5.6 MB , comm: 24.0 MB, freeRatio ~= 76.8%]
Total time: 911 ms
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then to reference the code generated by the AOT compiler, run:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; java -XX:AOTLibrary=./libHelloWorld.so HelloWorld
Hello, World
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To verify if the AOT compiled code is loaded and executed, run the above command with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:+PrintAOT&lt;/code&gt; and you should observe the following output:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$&amp;gt; java -XX:AOTLibrary=./libHelloWorld.so -XX:+PrintAOT HelloWorld
17    1     loaded    ./libHelloWorld.so  aot library
58    1     aot[ 1]   HelloWorld.&amp;lt;init&amp;gt;()V
58    2     aot[ 1]   HelloWorld.main([Ljava/lang/String;)V
Hello, World
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can observe the output of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-XX:PrintAOT&lt;/code&gt; in the first three lines. Line 1 signals that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;./libHelloWorld.so&lt;/code&gt; was correctly loaded. Lines 2 and 3 signal that the constructor &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HelloWorld.&amp;lt;init&amp;gt;()&lt;/code&gt; and the main method &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HelloWorld.main()&lt;/code&gt; were loaded and used for this execution of the application.&lt;/p&gt;

&lt;h2 id=&quot;alternatives&quot;&gt;Alternatives&lt;/h2&gt;

&lt;p&gt;Other approaches to code generation apart from AOT are in development in Java. Similarly, to AOT, some of them focus on startup throughput (JIT at Startup and JIT Caching), while others focus on compilation footprint (JIT out of Process).&lt;/p&gt;

&lt;h3 id=&quot;jit-at-startup&quot;&gt;JIT at Startup&lt;/h3&gt;

&lt;p&gt;At startup, before the Java main() method executes, the JVM compiles a predefined set of methods. The C2 compiler is used to compile these methods and, because both the generation and the execution of the code happen in the same process, the C2 compiler doesn’t require any modifications. PD is available alongside the predefined set of methods and is used to generate better code.&lt;/p&gt;

&lt;p&gt;The procedure to determine the predefined set of methods is simple: previous runs gather this information, both by saving the compiled methods and the PD used to compile these methods.&lt;/p&gt;

&lt;p&gt;This approach answers a very particular need: guaranteeing a high throughput from the get-go, at the expense of startup time.&lt;/p&gt;

&lt;p&gt;Two implementations are &lt;a href=&quot;https://docs.azul.com/zing/Zing_AT_ReadyNow_ReadyNow.htm&quot;&gt;Azul ReadyNow&lt;/a&gt;, available in Azul Zing, and &lt;a href=&quot;https://openjdk.java.net/jeps/8203832&quot;&gt;JWarmup&lt;/a&gt;, available in Alibaba Dragonwell (currently a draft JEP in OpenJDK).&lt;/p&gt;

&lt;h3 id=&quot;jit-caching&quot;&gt;JIT Caching&lt;/h3&gt;

&lt;p&gt;During the execution of an application, the JVM dumps the code generated to disk. It allows the JVM, at the next execution of the application, to only have to pick-up where it left off, loading the code previously generated from disk, and have a robust startup throughput. It differs from the AOT compiler in the requirement to run the application to generate the code, while the AOT compiler only involves parsing the application’s code.&lt;/p&gt;

&lt;p&gt;This method requires persistent storage between runs of the application. It does require the same dependencies and environment between runs. Otherwise, you cannot always guarantee that the code generated on a previous run is compatible with the current one.&lt;/p&gt;

&lt;p&gt;Two implementations are &lt;a href=&quot;https://www.eclipse.org/openj9/docs/aot/&quot;&gt;OpenJ9 Dynamic AOT&lt;/a&gt; and &lt;a href=&quot;https://docs.azul.com/zing/UseZVM_CompileStashing_Overview.htm&quot;&gt;Azul Compile Stashing&lt;/a&gt;, available in Azul Zing.&lt;/p&gt;

&lt;h3 id=&quot;distributed-jit&quot;&gt;Distributed JIT&lt;/h3&gt;

&lt;p&gt;This method assumes that offloading the code generation to another process on another machine has a smaller footprint than generating the code in-process. It works particularly well in constrained environments (for example, less than 512 MB of RAM and half of a CPU core) where you can off-load the code generation to bigger machines, freeing precious resources for the application. Moreover, it allows for system-level optimizations by allowing better caching of the generated code across many runs of the same application (ex: running Hadoop across dozens, hundreds of machines).&lt;/p&gt;

&lt;p&gt;The overall goal is not to reduce startup time or improve startup throughput – like AOT compilation, JIT at Startup, or JIT Caching – but to reduce the impact of the JIT compiler on the application footprint. That makes it a great complement to these other methods.&lt;/p&gt;

&lt;p&gt;An implementation is &lt;a href=&quot;https://blog.openj9.org/2019/04/01/a-simple-jitaas-demo-on-docker-containers/&quot;&gt;OpenJ9 JITaaS&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In this blog post, we explored tradeoffs in the code generated by the AOT compiler (like Position Independent Code), and how the code fits in the TC pipeline. We also looked at other solutions in the Java ecosystem like JIT at Startup, JIT Caching and Distributed JIT, and how these solutions fit in the larger code generation aspect of the JVM.&lt;/p&gt;

&lt;p&gt;In a future post, we’ll dig deeper into the implementation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jaotc&lt;/code&gt; and how the code is loaded and used by HotSpot.&lt;/p&gt;</content><author><name>Ludovic Henry</name></author><summary type="html">This blog post is not about SubstrateVM nor GraalVM but focuses on the jaotc AOT compiler in HotSpot.</summary></entry></feed>