Intro to the long awaited Java Panama

Dec 3, 2023 · 5466 words · 26 minute read

Coding Java

Background #

For a long time, Java doesn’t have a built-in easy-to-use C interop mechanism. Calling platform APIs and C libraries are more difficult on Java than other high-level languages like C# or Rust. Traditionally, Java developers has to use JNI, which require us to code dynamic libraries ourselves that manually native functional calls in C and export JVM-specific functions. For those who hadn’t used JNI before, here is what a native method looks like in C:

/*
 * Class:     Native
 * Method:    func
 * Signature: (ILjava/lang/String;JD)V
 */
JNIEXPORT void JNICALL Java_Native_func
  (JNIEnv *, jobject, jint, jstring, jlong, jdouble);

The function name and argument types all have to match and passing structs or unions require extra care for ABI. Most painfully, developers have to wrap native APIs themselves. There are third-party libs like JNA and JavaCPP to make the process less error-prone, but the JNI interface is still the only official way to do C interop in Java.

Finally, there is Project Panama, a set of changes to the Java language aiming at creating an alternative and easy-to-use approach to JNI, allowing developers to safely and efficiently access foreign memory and functions from pure Java code, without any C glue.

(JEP-424) Foreign memory access: Manually allocate and free memory segments in Java as well as access them in a type-safe manner.
(JEP-424) Foreign function call
(JEP-460) Vector APIs (Not covered in this post)

Current status #

As the time of writing, these changes are out of the incubator and ready for release with OpenJDK 22. OpenJDK 21 (LTS) has preview support.

To start, download OpenJDK 21 LTS or 22 EA build, and append the following arguments to javac(1) and java(1):

$ javac --enable-preview --source 21

$ # For OpenJDK 22 LTS
$ javac --enable-preview --source 22

$ java --enable-preview

For IntelliJ Idea, select 21 in Project Structure > Language Level.

Note that the API is still different from OpenJDK 21 with preview and OpenJDK 22 EA. APIs like allocateUtf8String are renamed to allocateFrom.

Changes are listed at https://javaalmanac.io/jdk/22/apidiff/21/.

You can read more on the proposal at https://bugs.openjdk.org/browse/JDK-8312523.

Unless noted, this post will use OpenJDK 21. The latest jextract tool is also based on OpenJDK 21. If you switch to OpenJDK 22 EA or use the final stable release (which will be shipped in 2024), please make sure to make corresponding changes.

Manual memory management #

Before jumping into calling C functions, we should discuss how the Panama APIs deal with manual memory management first. The reason is simple: to call the C functions, we have to correctly model the C types and structs in Java.

Panama provides a set of APIs to safely load / store data from the memory (heap or even somethign else). It guarantees both memory safety (no memory leak, no dangling pointers, no double free) and type safety (no corruption). That being said, Panama memory are strongly typed.

Allocation and arena #

Panama memory buffers have lifecycles bound to specific areans. An arena is a lifecycle scope for memory buffers, and an arena have multiple memory buffers. It could be global (its buffers are never freed), auto (buffers are freed when unused, based on GC), confined (buffers are freed upon request), and shared. You allocate buffers from arenas with explicit sizes to get a MemorySegment object, representing the buffer.

MemorySegment buf1, buffer2;
// Create a confined arena, and all of its buffers will be freed after the try {} block.
try (Arena arena = Arena.ofConfined()) {
    // Allocate a 8byte buffer.
    buf1 = arena.allocate(8);
    // Allocate a 16byte buffer.
    buf2 = arena.allocate(16);
    // Store data
    buf1.set(type, offset, value);
    // Load data from the buffer at the given offset, starting with zero.
    buf1.get(type, offset); 
    // Not though bytes and overflow happens! IndexOutOfBound exception will be thrown.
    buf1.get(JAVA_DOUBLE, 5);
}
// Both buffers freed here.
buf1.get(...); // Use-after-free! IllegalStateException thrown.

You can use other arena types by calling Arena#ofClobal() and Arena#ofAuto(). See Javadoc on Arena for more details.

Compatibility guide: Arena was recently renamed from ResourceScope in previous Java versions (when Panama APIs are still in the incubator). Older resources may refer to APIs like ResourceScope or MemorySegment.allocateNative. Both APIs are now gone.

Basic value types #

As you may infer from the code above: all load / store operations are based on specific types, which has pre-defined sizes and byte orders. Actually, we can load / store Java primitive types into and from the memory easily with ValueLayout:

try (Arena arena = Arena.ofConfined()) {
    MemorySegment buf = arena.allocate(8); // Buffers are zeroed

    // All of the following types use native endianness! Learn more at ValueLayout Javadoc.

    // Load the first 8 bytes as a signed long (int64)
    assert 0 == buf.get(ValueLayout.JAVA_LONG, 0);
    // Load the first 4 bytes as a signed int (int32)
    assert 0 == buf.get(ValueLayout.JAVA_INT, 0);
    // Load the next 4 bytes as a signed int (int32)
    assert 0 == buf.get(ValueLayout.JAVA_INT, 4);

    // Store an int32 into the first 4 bytes
    buf.set(ValueLayout.JAVA_INT, 0, 114514);
    assert 114514 == buf.get(ValueLayout.JAVA_INT, 0);

    // Store an int32 into the next 4 bytes
    buf.set(ValueLayout.JAVA_INT, 0, 1919810);
    assert 1919810 == buf.get(ValueLayout.JAVA_INT, 4);

    // Load as an int64 instead!
    buf.get(ValyeLayout.JAVA_LONG, 0);
}

You can also use other types like char (8bits), short, double, boolean, etc.

Beside value types: structures, layout path, and VarHandle #

Manipulating primitive types are obvious enough. It is better if we model C structs or unions in Java.

A struct is just a set of fields stored consecutively (with padding) in memory. So, all we do is to mimic that behaviour ourselves: load / store fields with byte offsets. Suppose we want to model the following C struct in Java:

struct {
    int32_t a;
    // 4byte padding
    double b;
};

We can do the following:

try (Arena arena = Arena.ofConfined()) {
    MemorySegment buf = arena.allocate(ValueLayout.JAVA_INT.byteSize() + 4 +
                                        ValueLayout.JAVA_DOUBLE.byteSize());
    // a = 100
    buf.set(ValueLayout.JAVA_INT, 0, 100);
    // b = 100.1
    buf.set(ValueLayout.JAVA_DOUBLE, 8 /* sizeof(int32_t) + 4 */, 100.1);
}

And it works! The buffer buf has the same memory layout same as the struct in C! If we pass it to some C functions, it is guaranteed to work (we have to take care of compiler-specific stuff like alignment and padding).

However, we are still not modelling anything. We are just manually operating on types and byte offsets, like doing pointer arithmetics in C. Surely, we can do something better, by pre-defining a memory structure for structs.

If you look closer at ValueLayout, you’ll find that it is extending MemoryLayout. Yes, ValueLayout is just one type of MemoryLayout, and other types allow you to combine multiple ValueLayout’s into something complex, like a struct! There are 5 different layouts in total:

ValueLayout: Carries a single primitive value.
StructLayout: Combines multiple MemoryLayout into a continuous struct.
PaddingLayout: Used for alignment purposes, having an explicitly specified size and not caring about its contents.
SequenceLayout: Repeats any MemoryLayout a number of types, used to model arrays.
UnionLayout: Used to model unions.

To really model our previous struct in a StructLayout, we can do the following:

/*
Models the following:

struct {
    int32_t a;
    // 4byte padding
    double b;
};
*/

MemoryLayout struct = MemoryLayout.structLayout(
    ValueLayout.JAVA_INT.withName("a"),
    MemoryLayout.paddingLayout(4),
    ValueLayout.JAVA_DOUBLE.withName("b")
);

assert 16 == struct.byteSize();

Panama enforces manual padding: unlike C which compilers automatically pad for you in an implementation-defined manner, Java forces you to pad yourselves, otherwise IllegalArgumentException will be thrown.

Here, we created our memory layout! You can verify its size by struct.byteSize(). Now let’s allocate the buffer again and access values with the following code:

try (Arena arena = Arena.ofConfined()) {
    MemorySegment buf = arena.allocate(struct);

    buf.set(ValueLayout.JAVA_INT,
            struct.byteOffset(MemoryLayout.PathElement.groupElement("a"),
            100); // a = 100
    buf.set(ValueLayout.JAVA_DOUBLE,
            struct.byteOffset(MemoryLayout.PathElement.groupElement("b"),
            100.1); // b = 100.1
}

The hard-coded sizes and offsets are replaced with references to struct, nice. The StructLayout is now calculating the byte offsets automatically for us, and the only thing we need to do is to tell them we want a specific field … wait, how do we specify fields? What are the new calls to groupElement()?

It turns out that the call to groupElement("") forms a layout path. A layout path is an unambiguous path to a single field in a memory layout. StructLayout use layout paths to locate fields and return the byte offset.

Finally, let’s do one more optimization: remove the redundant JAVA_INT, JAVA_DOUBLE, and byteOffset() calls. We can replace them with VarHandle, a mechanism introced with Java 9 to represent a specific remote variable. VarHandle provides set / get methods, whose implementation here actually stores / loads the values in and out of our memory buffer, at specific offsets that we don’t need to care about.

We can obtain a VarHandle to a field using the MemoryLayout#varHandle method:

VarHandle handleA = struct.varHandle(MemoryLayout.PathElement.groupElement("a"));
VarHandle handleB = struct.varHandle(MemoryLayout.PathElement.groupElement("b"));

To load / store the value, you can replace buf.get / buf.set calls with handleA.get / handleA.set.

try (Arena arena = Arena.ofConfined()) {
    MemorySegment buf = arena.allocate(struct);
    handleA.set(buf, 100); // (MemorySegment, Value)
    handleB.set(buf, 100.1);

    // Use this on Java 22 instead:
    // handleA.set(buf, 0, 100); // (MemorySegment, Base Offset, Value)
    // handleB.set(buf, 0, 100.1);

    assert 100 == (int) handleA.get(buf); // (MemorySegment)
    assert 100.1 == (double) handleB.get(buf);

    // Use this on Java 22 instead:
    assert 100 == (int) handleA.get(buf, 0); // (MemorySegment, Base Offset)
    assert 100.1 == (double) handleB.get(buf, 0);
}

VarHandle get / set methods have vaargs, and their arguments are defined by your layout path. The first argument is always the buffer. Then it comes logical indexes for open elements (used for arrays, see the section below). Finally, for set, the next argument is the value.

Note: On Java 22, there is a new argument after the buffer: base offset. It is used to specify the offset in bytes of the whole buffer when doing load / store. This is a breaking API change:

// Java 21
handleA.set(buf, 100); // (MemorySegment, Value)
handleB.set(buf, 100.1);

assert 100 == (int) handleA.get(buf); // (MemorySegment)
assert 100.1 == (double) handleB.get(buf);

// Java 22
handleA.set(buf, 0, 100); // (MemorySegment, Base Offset, Value)
handleB.set(buf, 0, 100.1);

assert 100 == (int) handleA.get(buf, 0); // (MemorySegment, Base Offset)
assert 100.1 == (double) handleB.get(buf, 0);

Arrays and open elements #

Now let’s consider the turning our C struct into an array:

struct s {
    int32_t a;
    // 4byte padding
    double b;
};

struct s arr[10];

To model this structure, we need to use the SequenceLayout:

MemoryLayout arr = MemoryLayout.sequenceLayout(10, struct); // Repeat struct 10 times
assert 10 * struct.byteSize() == arr.byteSize();

Because the new layout is sequenced before the actual struct, we need to change our layout path to identify the fields:

assert 0 == arr.byteOffset(MemoryLayout.PathLayout.sequenceElement(0),  // Match the first item in array
                            MemoryLayout.PathLayout.groupElement("a")); // Match the member with name 'a'
assert 16 * 2 + 8 == arr.byteOffset(MemoryLayout.PathLayout.sequenceElement(2), // Match the third item in array
                            MemoryLayout.PathLayout.groupElement("b"));         // Match the member with name 'b'

As usual, we can access the fields by either calculating byte offset or using VarHandle:

VarHandle handleA0 = arr.varHandle(MemoryLayout.PathLayout.sequenceElement(0),
                                    MemoryLayout.PathLayout.groupElement("a"));
VarHandle handleA2 = arr.varHandle(MemoryLayout.PathLayout.sequenceElement(2),
                                    MemoryLayout.PathLayout.groupElement("a"));
try (Arena arena = Arena.ofConfined()) {
    MemorySegment buf = arena.allocate(arr);

    handleA0.set(buf, 10); (MemorySegment, logical index)
    // On Java 22, the previous line would become
    // handleA0.set(buf, 0, 10); (MemorySegment, base offset, value)
    handleA2.set(buf, 20);

    assert 10 == (int) handleA0.get(buf);
    // On Java 22, the previous line would become
    // handleA0.get(buf, 0); (MemorySegment, base offset)
    assert 20 == (int) handleA2.get(buf);
}

Furthermore, we want to avoid defining the same field twice for two different items in the array. Instead of specifying the index in sequenceEnceElement(i), we can omit the argument to build an open element. An open element matches is like a SQL template, which has an additional arguments to specify the index when calling get / set:

VarHandle handleA = arr.varHandle(MemoryLayout.PathLayout.sequenceElement(), // Match every item
                                    MemoryLayout.PathLayout.groupElement("a"));
try (Arena arena = Arena.ofConfined()) {
    MemorySegment buf = arena.allocate(arr);

    // New argument here for logical index: (Buffer, Base Offset, Index, Value)
    handleA.set(buf, 0, 10); // (MemorySegment, logical index[0], value)
    handleA.set(buf, 2, 20);
    // On Java 22, the previous two lines have to become:
    // handleA.set(buf, 0, 0, 10); // (MemorySegment, base offset, logical index[0], value)
    // handleA.set(buf, 0, 2, 20);

    // Same arguments: (Buffer, Base Offset, Index)
    assert 10 == (int) handleA.get(buf, 0); // (MemorySegment, logical index[0])
    assert 20 == (int) handleA.get(buf, 2);
    // On Java 22, the previous two lines have to become:
    assert 10 == (int) handleA.get(buf, 0, 0); // (MemorySegment, base offset, logical index[0])
    assert 20 == (int) handleA.get(buf, 0, 2);
}

Java22 also provides a useful method to create such a var handle: arrayElementVarHandle(). As the name suggests, it automatically appends an open sequence element in front of the layout path, making it easier to use:

Note: This API is Java 22 only. It is not available in OpenJDK 21 LTS.

VarHandle handleA = arr.arrayElementVarHandle(MemoryLayout.PathLayout.groupElement("a")); // Same!
try (Arena arena = Arena.ofConfined()) {
    MemorySegment buf = arena.allocate(arr);

    handleA.set(buf, 0, 0, 10);
    assert 20 == (int) handleA.get(buf, 0, 2);
}

For nested arrays, just put more open elements in the var handle, and there will be additional index arguments.

Foreign function call #

After wrapping up manual memory access, it’s now the time to talk about foreign function calls. Panama allows developers to call any C ABI functions in pure Java, without C glues or external libraries or performance penalty (compared to JNI). Panama also allows developers to pass Java functions as function pointers into C functions and call them back (and this is 3x ~ 4x faster than JNI).

Into unsafe memory #

Recall from the previous section: manual memory load / store operations in Java are strongly typed, and boundry checks happen at runtime to guarantee that they won’t crash the JVM. However, when dealing with foreign function calls, there is no way to guarantee memory access in those foreign functions. For example, a C function allocates a buffer internally, fills some data, and returns pointer to the caller. The caller has no idea of the size and content of the pointer. This kinds of memories are considered raw and unsafe because MemorySegment boundry and type checkers cannot apply to them. Operating on these memory, even in Java, may crash the JVM.

Panama represents unsafe memory as zero-length MemorySegments, which you can’t usually allocate yourself.

There are several options dealing with unsafe memory:

Reinterpret with a known size, so Java will check boundaries for you:

MemorySegment foreign = // from some C call
MemorySegment foreignWithBoundaryCheck = foreign.reinerpret(8); // Now has size 8

Reinterpret with a known size and bind to an Arena:

try (Arena arena = Arena.confined()) {
    MemorySegment foreign = // from some C call, not bind to arena
    MemorySegment foreignWithBoundaryAndLifeCheck = foreign.reinerpret(8, arena, null); // Now has size 8 and bound to arena
} // It's illegal to access foreignWithBoundaryAndLifeCheck here, but I guess it's not freed?

Bind to an existing MemoryLayout:

MemorySegment foreign = // from some C call
MemorySegment foreignWithKnownLayout = foreign.get(ValueLayout.ADDRESS.withTargetLayout(XXX), 0);
// It's on the docs, but I can't get this to work. Not sure why.

Because these memory operations are unsafe and may crash the JVM, they are not permitted by default. You need to specify --enable-native-access=*MODULE* in JVM command line.

Symbol lookup #

Panama FFI splits foreign function calls into two parts: Symbol lookup (same as dynamic loader: resolve symbol name into address) and Linker (adapts function address and arguments to native ABI).

Symbol lookups implement the SymbolLookup interface, and there are three options:

Linker.nativeLinker().defaultLookup(): Lookup commonly used libraries, namely libc.
SymbolLookup.loaderLookup(): Lookup libraries in current classpath, like those loaded by System.loadLibrary().
SymbolLookup.libraryLookup(Path, Arena): Open the specific dynamic library (in an arena) and lookup. Similar to dlopen(3).

The most important method in SymbolLookup is find(String). It returns an Optional<MemorySegment>. It can be used to lookup both exported functions and global variables. As you may guess, the returned MemorySegment is zero-length. Binaries like ELF never export information like how long the memory or global variable is! Moreover, it’s useless to know the length for function address.

For example, we want to lookup the gethostname(3) function from libc:

// Get the location to gethostname from libc. It's in the libc, so no dlopen(3) / arena
// required!
MemorySegment pGethostname = Linker.nativeLinker().defaultLookup().find("gethostname").get();

// It's unsafe memory by default.
assert 0 == pGethostname.byteSize();

System.out.printf("0x%x\n", pGethostname.address());

For another example, let’s read and write the well-known errno from libc:

// Get the location to errno from libc. It's in the libc, so no dlopen(3) / arena
// required! Also, it's an integer, not a function.
MemorySegment errno = Linker.nativeLinker().defaultLookup().find("errno").get();

// It's unsafe memory by default.
assert 0 == errno.byteSize();

// We know that errno has the type of int, so manually reinterpret it to 4 bytes,
// and JVM will automatically do boundary check for us later on.
// Remember, this is an unsafe operation because JVM has no guarantee of what
// the memory layout is at the target location.
errno = errno.reinterpret(4);
assert 4 == errno.byteSize();

// Get as usual as a int
System.out.println(errno.get(ValueLayout.JAVA_INT, 0));

// errno = 0
errno.set(ValueLayout.JAVA_INT, 0, 0);
System.out.println(errno.get(ValueLayout.JAVA_INT, 0));

Review on ABI for those don’t know #

After getting the function address, we need to figure out how to call it.

Calling a function needs passing arguments back and forth. However, neither the function pointer nor the ELF export table provides anything about the function arguments and return value at all. All it has are the locations of the first instruction of exported functions. In fact, the assembly language has nothing to do with functions itself. It only has labels, which are indexes given for specific locations of the binary. Consider the following RISC-V ASM:

label_a: # This is a label, and its value is the location of the first instruction below it: li
    li a0, 0x1
    li a1, 0x2
    add a2, a0, a1

label_b: # Value realized into the location of jle during assemble process
    jle label_a # Branch into label_1

In machine code, goto a label is (mostly) as simple as setting the PC to the provided address, and the CPU will fetch instructions from the given address in the next instruction cycle. If you want to pass something to a label, just leave some values in registers (as registers won’t be altered) or the memory.

Functions are provided by higher-level languages like C. Functions have a name, a list of arguments, and a return value. Everyone knows about functions, like the following one:

int add(int x, int y) {
    return x + y;
}

How does the following code translate to machine code? It is for sure going to become a label called add:, but where do we put x, y, and the return val? As a programmer, you are free to define arbitrary locations for these values. You can put them in a0, a1, or a2, but it’s better to have contrasts for those locations. Here comes the C ABI (a.k.a. calling convention).

C ABI (a.k.a. calling convention) defines the relationship between a C function definition and native registers and memory. It is ISA-specific, and some ISAs like x86 has multiple conventions.

A calling convention defines:

Where to put the arguments? Normally we prefer registers because they are faster, but registers are limited, so we also have to use the memory (stack) for larger data.
Where to put the return value? Normally we also use registers.
How does the stack grow? Upwards or downwords? This affects the stack frame and stack pointer.
What registers must not be changed during a function call (callee-saved)? What registers may be changed during a function call (caller-saved)?

As long as you know the target calling convention and a C-style function definition, you can correctly place arguments and get the value back, without worring about messing up the memory or registers.

Linker and MethodHandle #

Because calling conventions are platform-dependent, Panama has to figure out a way to (automatically) convert the given list of argument types and return type into a format that correctly sets up the register and stack. This tool is called Linker.

Don’t get confused! It has noting to do with the linker that links object files into another binary.

A linker accepts a list of return and arguments types (represented as MemoryLayout, because the most important thing these types has to do in ABI is their size!) and returns a MethodHandle. MethodHandle is introduced together with VarHandle in Java 9, and it represents a method, similar to the plain-old reflection Method.

Let’s do it in action! To obtain a Linker instance, use the Linker.nativeLinker() method. Then, use downcallHandle to get a MethodHandle for a library function. Downcall is calling C from Java, and upcall is for calling Java from C.

Then, allocate the arguments manually, using the mechanism described above. Unless for primitive types (pass by avlue), all pointer references must be allocated manually from an arena! Panama doesn’t distinguish stack / heap pointers.

Finally, invoke the MethodHandle and get the result. Just as a remainder, any pointer returned by the foreign function are unsafe and must be taken with care!

Here is how to call gethostname(3) from libc:

// Define an Arena for managing the memory involved in this call
try (Arena arena = Arena.ofConfined()) {
    // Get the location of exported label gethostname(3) from libc.
    MemorySegment ptr = Linker.nativeLinker().defaultLookup().find("gethostname").get();
    // It will have zero length!
    assert 0 == ptr.byteSize();

    // Create a MethodHandle wrapper for that function by supplying the function pointer and sizes of arguments
    MethodHandle gethostname = Linker.nativeLinker().downcallHandle(ptr,
        FunctionDescriptor.of(ValueLayout.JAVA_INT, // Return type
                                ValueLayout.ADDRESS, // First argument: char*
                                ValueLayout.JAVA_LONG)); // Second argument: size_t
    // Allocate the first argument (string buffer)
    MemorySegment arg = arena.allocate(64);
    // It is zeroed by default!
    // getUtf8String simply derefs the given segment byte-by-byte and get its result.
    assert "".equals(arg.getUtf8String(0));

    // Invoke the function
    assert 0 == (int) gethostname.invokeExact(arg, arg.byteSize());

    // Read the buffer: Because gethostname(3) only writes to that buffer, it is
    // still the same buffer allocated from Java. Thus, we can use it safely.
    System.out.println(arg.getUtf8String(0));
}

Here is the table for Java types v.s. C types:

C type	Layout	Java carrier
`bool`	`JAVA_BOOLEAN`	`byte`
`char`	`JAVA_BYTE`	`byte`
`short`	`JAVA_SHORT`	`short`, `char`
`int`	`JAVA_INT`	`int`
`long`	`JAVA_LONG`	`long`
`long long`	`JAVA_LONG`	`long`
`float`	`JAVA_FLOAT`	`float`
`double`	`JAVA_DOUBLE`	`double`
`char` `int*` …	`ADDRESS`	`MemorySegment`
`struct Point { int x; int y; };` `union Choice { float a; int b; };` …	`MemoryLayout.structLayout(...)` `MemoryLayout.unionLayout(...)`	`MemorySegment`

(Taken from https://github.com/openjdk/panama-foreign/blob/foreign-memaccess%2Babi/doc/panama_ffi.md#linker)

Similar to VarHandle, MethodHandle#invokeExact also has specific arguments type requirements. These types come from FunctionDescriptor#of. When you call invokeExact with the wrong combination of paraemters, it will throw an exception. It’s common to get these exceptions! Just fix your arguments to make it work.

Let’s do another example using SQLite3 that heavily uses ** (the library allocates and writes an pointer to the given pointer location):

try (Arena arena = Arena.ofConfined()) {
	// Open the library (similar to dlopen(3))
	// Because this library is not already loaded, we have to specify an Arena to place
	// required memory buffers.
    SymbolLookup libsqlite3 = SymbolLookup.libraryLookup("/usr/lib/libsqlite3.so", arena);

	// Lookup symbols and convert them into MethodHandle
	// int sqlite3_open(const char *, sqlite3 **)
    MethodHandle sqlite3Open =
            Linker.nativeLinker().downcallHandle(libsqlite3.find("sqlite3_open").get(),
            FunctionDescriptor.of(ValueLayout.JAVA_INT, // Return
								ValueLayout.ADDRESS, // First argument: char *
								ValueLayout.ADDRESS)); // Second argument: pointer-to-unknown-size-pointer (still a pointer)
	// int sqlite3_prepare_v2(sqlite3 *, const char *, int, sqlite3_stmt **, const char **)
    MethodHandle sqlite3PrepareV2 = Linker.nativeLinker().downcallHandle(
            libsqlite3.find("sqlite3_prepare_v2").get(),
            FunctionDescriptor.of(ValueLayout.JAVA_INT, // Return
                    ValueLayout.ADDRESS, // First argument: pointer-to-unknown-size
                    ValueLayout.ADDRESS, // Second argument: char *
                    ValueLayout.JAVA_INT, // Third argument: int
                    ValueLayout.ADDRESS, // Forth argument: pointer-to-unknown-size-pointer
                    ValueLayout.ADDRESS)); // Fifth argument: char **
	// int sqlite3_step(sqlite3_stmt *)
    MethodHandle sqlite3Step = Linker.nativeLinker().downcallHandle(libsqlite3.find("sqlite3_step").get(),
            FunctionDescriptor.of(ValueLayout.JAVA_INT, // Return
                    ValueLayout.ADDRESS)); // First argument: pointer-to-unknown-size
	// int sqlite3_finalize(sqlite3_stmt *)
    MethodHandle sqlite3Finalize = Linker.nativeLinker().downcallHandle(libsqlite3.find("sqlite3_finalize").get(),
            FunctionDescriptor.of(ValueLayout.JAVA_INT,
                    ValueLayout.ADDRESS));
	// int sqlite3_close(sqlite3 *)
    MethodHandle sqlite3Close = Linker.nativeLinker().downcallHandle(libsqlite3.find("sqlite3_close").get(),
            FunctionDescriptor.of(ValueLayout.JAVA_INT,
                    ValueLayout.ADDRESS));

    // The way SQLite3 runs is as follows: the caller provides a pointer to a
    // pointer, and sqlite3_open will allocate a sqlite3 struct internally (its
    // size is opaque). Then, it will write the address to this struct to the
    // location provided by the caller.
    // It works like follows:
    // sqlite3 *db; // Unknown size, blank pointer
    // sqlite3_open(&db);
    //      Inside:
    //      sqlite3 *s = malloc(...);
    //      *ptr = s;
    // Now (*db) has value (pointing to an internally allocated buffer).

    // Here, we are going to allocate for a pointer. This segment has a size of
    // a (platform-dependent) pointer. Its content is zero, and it will be filled
    // with the pointer of an opaque sqlite3 struct.
    // void **dbPtr;
    MemorySegment dbPtr = arena.allocate(ValueLayout.ADDRESS);
    // String literal "db"
    // char *path = "db";
    MemorySegment path = arena.allocateUtf8String("db");

    // sqlite3_open(path, dbPtr);
    assert 0 == (int) sqlite3Open.invokeExact(path, dbPtr);
    // Now db is filled by a pointer
    // Let's get its contents (deref)
    // This is a safe operation: It derefs the pointer dbPtr (having size of an
    // address, allocated safely from Java) as a pointer, starting from ofset 0.
    // However, the dereferred pointer has zero size because its contents is
    // unknown (returned from sqlite3_open).
    // We don't need to care about its contents because sqlite3 struct is opaque.
    // This is same as:
    // void *db = *dbPtr;
    MemorySegment db = db.get(ValueLayout.ADDRESS, 0);

    // Similar to sqlite3 but for statement.
    MemorySegment stmtPtr = arena.allocate(ValueLayout.ADDRESS);
    MemorySegment err = arena.allocate(64);

    assert 0 == (int) sqlite3PrepareV2.invokeExact(db.get(ValueLayout.ADDRESS, 0),
            arena.allocateUtf8String("PRAGMA foreign_keys = ON"),
            -1,
            stmtPtr,
            MemorySegment.NULL);

    MemorySegment stmt = stmtPtr.get(ValueLayout.ADDRESS, 0);

    assert 101 == (int) sqlite3Step.invokeExact(stmt);
    assert 0 == (int) sqlite3Finalize.invokeExact(stmt);
    assert 0 == (int) sqlite3Close.invokeExact(db);
}

Upcalls #

One of the most useful features of Panama is being able to efficiently call Java methods from foreign code (callbacks). The Linker will automatically allocate a function pointer in the memory, arrange arguments based on the given list of argument types and native ABI, and call your given Java method when needed.

It is very easy to use, as shown in the below code, invoking sqlite3_exec:

int sqlite3_exec(
  sqlite3*,                                  /* An open database */
  const char *sql,                           /* SQL to be evaluated */
  int (*callback)(void*,int,char**,char**),  /* Callback function */
  void *,                                    /* 1st argument to callback */
  char **errmsg                              /* Error msg written here */
);

// First, you still need to model the C function pointer into a Panama
// FunctionDescriptor, the same as downcalls.
// You also need to define a Java method with the same argument types!
// int (*callback)(void*,int,char**,char**),  /* Callback function */
FunctionDescriptor callbackDesc = FunctionDescriptor.of(ValueLayout.JAVA_INT, // Return
        ValueLayout.ADDRESS,
        ValueLayout.JAVA_INT,
        ValueLayout.ADDRESS,
        ValueLayout.ADDRESS);
// Upcalls are also built upon MethodHandles. Use the Java 9 MethodHandles API
// to lookup your target Java method.
MethodHandle callback = MethodHandles.lookup().findStatic(Main.class,
                                                        "callbacl",
                                                        // Convert FunctionDescriptor to method type
                                                        callbackDesc.toMethodType());
// Wrap the above descriptor and Java MethodHandle into a ready-to-use C function pointer,
// so it can be passed to foreign functions.
MemorySegment callbackPtr = Linker.nativeLinker().upcallStub(callback, callbackDesc, arena);

assert 0 == (int) sqlite3Exec.invokeExact(db,
        arena.allocateUtf8String("select * from pragma_function_list;"),
        callbackPtr, // Pass as usual
        arena.allocateUtf8String("demo"),
        MemorySegment.NULL);

Define the Java-side method, with matching arguments:

private static int onExec(MemorySegment d,
                          int columns,
                          MemorySegment datas,
                          MemorySegment names) {
}

And that’s it!

To make our demo working, I’ll use the following onExec implementation that dumps everything. It’s also a good example dealing with foreign nested arrays and unsafe memory:

private static int onExec(MemorySegment d,
                          int columns,
                          MemorySegment datas,
                          MemorySegment names) {
    // d is char *
    // Because C strings are null-terminated, we don't know their length unless
    // iterating through. Here, we need to break Java's boundary check (defaults
    // to 0 for unsafe foreign memory) by simply passing Long.MAX_VALUE.
    // It will allow getUtf8String iterate all bytes starting from the given pointer
    // and stop after '\0'.
    d = d.reinterpret(Long.MAX_VALUE);
    // d is now a char*
    // getUtf8String simply derefs the pointer byte-by-byte and concats the string.
    String str = d.getUtf8String(0);

    // Because datas and names are both char**, let's define a SequenceLayout and
    // get a VarHandle to a single char*, so we don't need to calculate byte offsets.
    // Here, it's known (from SQLite manual) that both dates and names will have
    // columns items. Thus, we specify columns as the first argument to sequenceLayout,
    // so it will have an accurate size (so we don't need to use Long.MAX_VALUE here).
    MemoryLayout layout = MemoryLayout.sequenceLayout(columns,
            ValueLayout.ADDRESS);
    // It will have correct size.
    assert columns * ValueLayout.ADDRESS.byteSize() == layout.byteSize();

    // A VarHandle to get a single char* out of the char** array.
    VarHandle single = layout.varHandle(MemoryLayout.PathElement.sequenceElement());

    // dates and names are all char **, each having the size of columns * pointer.
    // They are unsafe, so reinterpret them to have known sizes.
    datas = datas.reinterpret(layout.byteSize());
    names = names.reinterpret(layout.byteSize());

    System.out.printf("%s - %d\n", str, columns);
    for (int i = 0; i < columns; i ++) {
        // Get the char* from names and datas.
        // They are also unsafe foreign pointers!
        // In fact, they are char*
        // So for getUtf8String to work without boundary check, we need to call
        // reinterpret(Long.MAX_VALUE).
        MemorySegment namMem = ((MemorySegment) single.get(names, i))
                .reinterpret(Long.MAX_VALUE);
        final MemorySegment datMem = ((MemorySegment) single.get(datas, i))
                .reinterpret(Long.MAX_VALUE);

        // Now, deref all bytes.
        String nam = namMem.getUtf8String(0);
        String dat;
        // Data can be NULL. Check it first.
        if (datMem.address() == 0) {
            dat = "(Null)";
        } else {
            dat = datMem.getUtf8String(0);
        }
        System.out.printf("\t%s = %s\n", nam, dat);
    }

    return 0;
}

Put them all together: A `snprintf(3)` demo #

Let’s do a demo on a fairly complicated function: snprintf(3). We’ll convert the following C code into Panama API calls:

char buf[64];
snprintf(buf, sizeof(buf), "Hello World from Java, calling printf @ %p\n", snprintf);

Because snprintf(3) is has variadic arguments, we have to give Panama a specific list of arguments we want to use. FunctionDescriptor doesn’t accept variadic arguments. We also need to tell the linker that starting from which argument is variadic.

try (Arena arena = Arena.ofConfined()) {
    // Resolve the label 'snprintf' from libc and get its location.
    MemorySegment snprintfPtr = Linker.nativeLinker().defaultLookup().find("snprintf").get();

    // Adapt the function pointer and its arguments into a MethodHandle.
    MethodHandle snprintf = Linker.nativeLinker().downcallHandle(snprintfPtr,
            FunctionDescriptor.of(ValueLayout.JAVA_INT, // Return int
                    ValueLayout.ADDRESS, ValueLayout.JAVA_LONG, // First two argument: char *, size_t
                    ValueLayout.ADDRESS, // Third argument: char *
                    ValueLayout.ADDRESS), // Forth argument: %p (vararg! The first (0) vaarg argument.)
            Linker.Option.firstVariadicArg(3)); // The index (starting from 0) of the first vararg.

    // Allocate the buffer for output string.
    MemorySegment buf = arena.allocate(64);

    // Call the function with correct arguments (including return value type).
    int res = (int) snprintf.invokeExact(buf, buf.byteSize(),
            arena.allocateUtf8String("Hello World from Java, calling printf @ %p\n"),
            snprintfPtr);

    // Read the buffer.
    System.out.print(buf.getUtf8String(0));
}

`jextract(1)` comes to rescue #

It would be boring and error-prone to manually translate C function definitions into FunctionDescriptor’s. Fortunately, Java has a new command line tool called jextract to help us with the mechanical work. It uses LLVM to parse input header files and automatically generate Java wrappers that takes care of everything.

Currently, jextract is in early access build, and it has to be downloaded separately at https://jdk.java.net/jextract/. It is based on OpenJDK 21, and it hadn’t been migrated to the latest Java 22 APIs yet. It also have other limitations like not supporting function-like macros. I recommend you to watch latest updates at https://jdk.java.net/jextract/ and their mailing list.

Let’s rewrite our above SQLite example with jextract. The process is very straightforward. First, generate Java sources (or classes) using jextract(1):

./jextract \
    -t org.sqlite \
    --source \
    --output /path/to/src/ \
    -l sqlite3 \
    /usr/include/sqlite3.h

-t org.sqlite: Put generated classes or sources into a package.
--source: Generate source files rather than class bytecodes.
--output: Output dir.
-l sqlite3: Add System.loadLibrary("sqlite3") for SymbolLookup.loaderLookup().
/usr/include/sqlite3.h: The input header.

And we can simply rewrite our above code into:

try (Arena arena = Arena.ofConfined()) {
    MemorySegment dbPtr = arena.allocate(ValueLayout.ADDRESS);
    MemorySegment path = arena.allocateUtf8String("db");

    assert sqlite3_h.SQLITE_OK() == sqlite3_h.sqlite3_open(path, dbPtr);

    MemorySegment db = dbPtr.get(ValueLayout.ADDRESS, 0);

    MemorySegment stmtPtr = arena.allocate(ValueLayout.ADDRESS);
    MemorySegment err = arena.allocate(64);

    assert sqlite3_h.SQLITE_OK() == sqlite3_h.sqlite3_prepare_v2(db,
            arena.allocateUtf8String("PRAGMA foreign_keys = ON"),
            -1,
            stmtPtr,
            MemorySegment.NULL);

    MemorySegment stmt = stmtPtr.get(ValueLayout.ADDRESS, 0);

    assert sqlite3_h.SQLITE_DONE() == sqlite3_h.sqlite3_step(stmt);

    FunctionDescriptor onExecDescr = FunctionDescriptor.of(ValueLayout.JAVA_INT,
            ValueLayout.ADDRESS,
            ValueLayout.JAVA_INT,
            ValueLayout.ADDRESS,
            ValueLayout.ADDRESS);
    MethodHandle onExec = MethodHandles.lookup().findStatic(Main.class, "onExec", onExecDescr.toMethodType());
    MemorySegment onExecPtr = Linker.nativeLinker().upcallStub(onExec, onExecDescr, arena);
    assert 0 == sqlite3_h.sqlite3_exec(db,
            arena.allocateUtf8String("select * from pragma_function_list;"),
            onExecPtr,
            arena.allocateUtf8String("demo1"),
            MemorySegment.NULL);

    assert sqlite3_h.SQLITE_OK() == sqlite3_h.sqlite3_finalize(stmt);
    assert sqlite3_h.SQLITE_OK() == sqlite3_h.sqlite3_close(db);
}

It looks better! At least in some ways. We still need to manually create easy-to-use Java style bindings for SQLite, for example converting return codes into Java exceptions. But they are nothing jextract can help!

Acknowledgements and external links #

I hope this post gives you a easy-to-follow introduction to Panama. If you hadn’t get your hands dirty on Panama APIs, please do so! Online resources on Panama are still rare, and their mailing list could be a good place to ask.

The videos at https://openjdk.org/projects/panama/ are really good resources! They provide a gentle introduction to Panama APIs. However, they are really outdated, and Panama APIs changed over years.

https://github.com/openjdk/panama-foreign/blob/foreign-memaccess%2Babi/doc/panama_ffi.md and https://github.com/openjdk/panama-foreign/blob/foreign-memaccess%2Babi/doc/panama_memaccess.md are more relevant.

For the latest docs, always consult Javadoc. Note the difference between Java 21 and Java 22 docs.

Intro to the long awaited Java Panama

Background #

Current status #

Manual memory management #

Allocation and arena #

Basic value types #

Beside value types: structures, layout path, and VarHandle #

Arrays and open elements #

Foreign function call #

Into unsafe memory #

Symbol lookup #

Review on ABI for those don’t know #

Linker and MethodHandle #

Upcalls #

Put them all together: A snprintf(3) demo #

jextract(1) comes to rescue #

Acknowledgements and external links #

Put them all together: A `snprintf(3)` demo #

`jextract(1)` comes to rescue #