Experimenting with GC-less (heap-less) Java
February 2024
What
https://github.com/xonixx/gc_less – on this repository I did a sequence of experiments in GC-less (heap-less) Java utilizing solar.misc.Unsafe
and in addition the most recent different java.lang.overseas.MemorySegment
.
GC-less (heap-less) means we allocate knowledge constructions in native reminiscence straight (exterior JVM heap). Such constructions usually are not seen to GC. However which means that we’re chargeable for their (express) de-allocations (i.e., guide reminiscence administration).
Why
For enjoyable and out of curiosity.
Mainly it began once I needed to verify if the well-known solar.misc.Unsafe
method continues to be viable with the newest Java.
Additionally, I needed to attempt to program in Java like if I program in C and see the way it goes.
How
solar.misc.Unsafe
It was a shock to seek out that this class continues to be supported by the newest (on the time of this writing) Java model 23.
solar.misc.Unsafe
class is unsafe certainly. You’ll be able to simply crash your JVM with only a couple traces of code:
But nonetheless it seems to be pretty widely used.
Anyway, of curiosity to us is a family of methods in this class, related to off-heap memory management.
Information constructions implementation
In order a sensible software of GC-less (heap-less) fashion of programming I made a decision to implement some primary data structures:
- array
- array checklist
- stack
- hash desk
Not solely was I fascinated with how simple it’s or if it’s possible in any respect. I additionally questioned how the efficiency will examine to plain Java’s knowledge constructions.
Producing code by template
As an alternative of writing separate implementations for every Java sort (int, lengthy, double, and so forth.) I made a decision to generate them by templates.
This fashion from a single template (for instance, TemplateArrayList.java) the set on specialised implementations are generated:
Fascinating to notice is that the template itself is a runnable code. In truth implementation-wise it corresponds to lengthy
-specialized model.
That is handy, as a result of we will test a template and this makes positive all of the specialised implementations are right too.
The technology is applied in type of a script gen.awk (Why AWK?).
We use annotation @Type and sophistication Tpl to indicate patterns to get replaced by a generator:
Epsilon GC
To make it possible for we certainly don’t (unintentionally) devour heap I enabled the Epsilon GC setting.
Epsilon GC is “a GC that handles reminiscence allocation however doesn’t implement any precise reminiscence reclamation mechanism. As soon as the out there Java heap is exhausted, the JVM will shut down.”
Some implementation particulars
I applied the info constructions in a barely “unusual” OOP-resembling approach.
So, for instance, check out IntArrayList.
All of the strategies of this class are static, as a result of we don’t need to allocate object on heap.
Additionally, all of the strategies (apart from allocate
, which performs a job of the constructor) settle for lengthy handle
as first parameter. It performs a job of this
(in a way, it’s one way or the other analogous to Python’s self
parameter).
So the utilization appears to be like like:
lengthy handle = IntArray.allocate(cleaner,10);
IntArray.set(handle, 7, 222);
System.out.println(IntArray.get(handle, 7));
So this lengthy handle
performs a job of a reference. Clearly, with this you may’t have inheritance / overriding, however we don’t want it.
try-with-resources + Cleaner
and Ref
lessons
The thought of the Cleaner class is you move the occasion of it within the allocation process of a (GC-less) class, such that the Cleaner
turns into chargeable for free-ing the reminiscence of the allotted occasion. This performs greatest with try-with-resources.
strive (Cleaner cleaner = new Cleaner()) {
lengthy array = IntArray.allocate(cleaner,10);
IntArray.set(array,1,1);
lengthy map = IntHashtable.allocate(cleaner,10,.75f);
map = IntHashtable.put(map,1,1);
/*
Why can we re-assign the map?
It is as a result of when the info construction grows,
it could possibly ultimately overgrow its initially-allocated reminiscence area.
The algorithm will re-allocate the larger inside storage which is able to trigger
the handle change.
*/
map = IntHashtable.put(map,2,22);
System.out.println(IntArray.getLength(array));
System.out.println(IntHashtable.getSize(map));
}
// each array and map above are de-allocated, reminiscence reclaimed
Ref class is required after we need to register some object for cleanup. However we will’t merely register its handle, as a result of the article will be relocated attributable to reminiscence reallocation attributable to rising. So it means, the GC-less object registers for cleanup by Cleaner
via the Ref
after which the Ref
pointer will get up to date on each object relocation.
Reminiscence leaks detection
I had an thought to implement reminiscence leaks detection. This appeared relatively easy to achieve. The thought: on every reminiscence allocation we keep in mind the place (we instantiate new Exception()
to seize a stack hint). On every corresponding reminiscence free()
we discard it.
Thus, on the finish we verify what’s left.
I applied a base test class such that check that extends it could possibly mechanically make sure the absence of reminiscence leaks.
Visible demonstration
So we will say that one of many advantages of guide reminiscence administration off-heap is deterministic reminiscence utilization.
The category Main4 supplies a visible demonstration of allocation and de-allocation in a loop as seen by way of reminiscence graph of Process Supervisor:
java.lang.overseas.MemorySegment
It seems, that very not too long ago this JEP emerged: “Deprecate Memory-Access Methods in sun.misc.Unsafe for Removal”.
Which means regardless of utilizing memory-access strategies in solar.misc.Unsafe
is enjoyable and generally helpful, we will not depend on this performance.
The JEP occurs to supply the safer alternate options. I made a decision to take a deeper have a look at the up to date API and perceive the way it compares to now deprecated solar.misc.Unsafe
.
In its place we now have java.lang.overseas.MemorySegment
class. It’s price mentioning that this class is part of a much bigger API, devoted to “invoking foreign functions (i.e., code outside the JVM)” that allows “Java applications to name native libraries and course of native knowledge with out the brittleness and hazard of JNI”.
General I discovered that the brand new API supplies a “managed” API over the native reminiscence. That’s, when beforehand we accessed reminiscence by way of lengthy
that represented the uncooked reminiscence handle, now we’ve got java.lang.overseas.MemorySegment
object (that represents each a pointer and a reminiscence area).
That is, clearly, good. Utilizing a devoted get/set strategies from that class provides a lot safer ensures, like, for instance, built-in out-of-bounds entry checks, management of accessing correctly aligned reminiscence, and so forth.
It could actually appear that this API is way heavier. For each off-heap allocation we have to have an on-heap deal with within the type of MemorySegment
occasion. So, we will suppose that this may render the concept of utilizing the algorithms with plenty of small allocations non-viable.
Surprisingly, this isn’t the case!
My solar.misc.Unsafe
-based hashtable implementation is an instance of an algorithm that makes use of many allocation. It’s a well known hashtable algorithm (analogous to plain Java’s HashMap
), with an array of buckets crammed primarily based on the hash code, and utilizing linked lists of nodes for parts.
For the sake of experiment I converted it to MemorySegment
. I used to be anticipating that, due to many MemorySegment
objects allotted, its JVM heap consumption might be proportional to the hashtable dimension.
As an alternative, experiment exhibits that the heap reminiscence consumption is fairly O(1).
How is that this doable?
It seems that:
MemorySegment
is a value-based class as said by its javadoc. Value-based classes represent the first move towards the inline classes that might be launched by the project Valhalla.- It appears to be like just like the implementation of inlining worth objects is already out there at the very least for some JDK lessons, like our
MemorySegment
. That is applied by extreme utilization of@jdk.inside.vm.annotation.ForceInline
annotation:
My tackle that is that in the course of the JIT-compilation all allocations of MemorySegment
s are utterly eliminated by inlining this class.
Python-like hashtable implementation
Whereas doing these experiments I used to be fortunate sufficient to satisfy this text: Internals of sets and dicts. It describes the implementation particulars of units and maps within the Python programming language.
It seems, that the dict
(that is the title for hashtable in Python) algorithm in Python doesn’t use many allocations. As an alternative, it allocates one piece of reminiscence and distributes the important thing, worth, hashes knowledge in it. When assortment grows, it re-allocates this steady piece of reminiscence and re-distributes the info. That is precisely what we’d like!
So I got here up with this MemorySegment
-based, Python-based off-heap hashtable implementation.
Stress check with restricted heap
With Java it’s inevitable, that even in the event you implement an off-heap algorithm, some heap might be used. So we will solely contemplate an off-heap algorithm sensible if it consumes small/fastened quantity of heap. In any other case, if its extra consumption of heap is proportional to the issue dimension, off-heap doesn’t make a lot sense.
I’ve created this small experiment – a program that allocates a hashtable and fills it with 50 million of parts for every of three implementation:
Unsafe
-based hashtable- Python-based hashtable
MemorySegment
-based hashtable
The experiment runs with Epsilon-GC with heap dimension fixed at 20 MB.
We are able to verify that each one three implementation move the check with this heap restrict.
[0.002s][info][gc] Utilizing Epsilon
[0.003s][warning][gc,init] Contemplate enabling -XX:+AlwaysPreTouch to keep away from reminiscence commit hiccups
[0.011s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 1060K (5.18%) used
===== Unsafe-based hashtable =====
Unsafe-based: 0
Unsafe-based: 1000000
Unsafe-based: 2000000
Unsafe-based: 3000000
...
Unsafe-based: 47000000
Unsafe-based: 48000000
Unsafe-based: 49000000
Liberating...
Liberating native addr 140590889037840...
Liberating native ref 140596292148752...
Liberating locals 140596292147328...
===== MemorySegment-based hashtable =====
[16.869s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 2106K (10.28%) used
[16.941s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 3159K (15.43%) used
[16.980s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 4262K (20.82%) used
[17.019s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 5438K (26.55%) used
MemorySegment-based: 0
[17.084s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 6665K (32.55%) used
[17.129s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 7894K (38.55%) used
[17.165s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 9123K (44.55%) used
[17.186s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 10352K (50.55%) used
[17.203s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 11581K (56.55%) used
MemorySegment-based: 1000000
MemorySegment-based: 2000000
MemorySegment-based: 3000000
...
MemorySegment-based: 47000000
MemorySegment-based: 48000000
MemorySegment-based: 49000000
===== Python-based hashtable =====
Python-based: 0
Python-based: 1000000
Python-based: 2000000
Python-based: 3000000
...
Python-based: 47000000
Python-based: 48000000
Python-based: 49000000
[33.169s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) dedicated, 11993K (58.56%) used
Benchmarks
Lastly, I want to current to your consideration couple benchmarks I’ve created.
AccessSpeedTest.java measures learn/write velocity (decrease = higher):
int[] |
Unsafe |
MemorySegment |
|
---|---|---|---|
Learn check, ms | 1885 | 1883 | 1879 |
Write check, ms | 2177 | 2213 | 2214 |
Surprisingly we will hardly see right here any noticeable distinction in velocity, regardless of int[]
and MemorySegment
have bounds checking, so I anticipated them to be slower than Unsafe
.
One other velocity check is IntHashtableSpeedTest (decrease = higher):
What can we see right here?
- It seems that GC-less algorithms is usually slower than standard GC-full
- Constructed-in
HashMap
is 3+ occasions quicker thanUnsafe
-based with related algorithm - Python-like on-heap is sort of 4 occasions quicker than off-heap.
- Constructed-in
- It additionally seems that for the actual case of
map<int,int>
it’s doable to give you the implementation (Python-like) that’s 8 occasions quicker over the built-in implementation (HashMap
)
Takeaways
- It’s doable to make use of guide reminiscence administration in Java, for instance, in the event you want deterministic reminiscence consumption.
- Velocity shouldn’t be a very good cause for utilizing off-heap algorithms. We’ve seen that very same algorithms are 3+ occasions slower off-heap. In all probability attributable to higher optimization alternatives for JIT compiler with standard Java code.
- In your implementation it’s suggested to desire
java.lang.overseas.MemorySegment
oversolar.misc.Unsafe
.
In case you observed a typo or produce other suggestions, please e mail me at xonixx@gmail.com