Discovering Java Thread Leaks With JDK Flight Recorder and a Bit Of SQL
The opposite day at work, we had a scenario the place we suspected a thread leak in a single explicit service,
i.e. code which repeatedly begins new threads, with out taking good care of ever stopping them once more.
Every thread requires a little bit of reminiscence for its stack house,
so beginning an unbounded variety of threads could be thought of as a type of reminiscence leak, inflicting your software to expire of reminiscence finally.
As well as, the extra threads there are, the extra overhead the working system incurs for scheduling them,
till the scheduler itself will devour a lot of the obtainable CPU assets.
Thus it’s very important to detect and repair this sort of downside early on.
The same old start line for analyzing a suspected thread leak is taking a thread dump,
as an illustration utilizing the jstack CLI instrument or through JDK Mission Control;
if there’s an surprising massive variety of threads (oftentimes with related and even similar names), then it’s very possible that one thing is improper certainly.
However a thread dump by itself is simply a snapshot of the thread state at a given time,
i.e. it doesn’t inform you how the thread rely is altering over time (maybe there are a lot of threads that are began but additionally stopped once more?),
and it additionally doesn’t offer you details about the trigger, i.e. which a part of your software is beginning all these threads. Does it occur in your individual code base, or inside some third social gathering dependency? Whereas the thread names and stacks within the dump may give you some thought, that data isn’t essentially sufficient for a conclusive root trigger evaluation.
Fortunately, Java’s built-in occasion recorder and efficiency evaluation instrument, JDK Flight Recorder,
exposes all the info you’ll want to determine thread leaks and their trigger.
So let’s check out the small print, bidding farewell to these pesky thread leaks as soon as and without end!
The primary JFR occasion sort to take a look at is jdk.JavaThreadStatistics
:
recorded each second by default, it retains observe of lively, accrued, and peak thread counts.
Here’s a JFR recording from a easy thread leak demo software I’ve created
(latest occasions on the high):
The variety of lively threads is repeatedly growing, by no means going again down once more — fairly clearly that this a thread leak.
Now let’s determine the place precisely all these threads are coming from.
For this, two different JFR occasion varieties turn out to be useful: jdk.ThreadStart
and jdk.ThreadEnd
.
The previous captures all of the related data when a thread is began:
time stamp, title of the brand new thread and the father or mother thread, and the stack hint of the father or mother thread when beginning the kid thread.
The latter occasion sort will probably be recorded when a thread finishes.
If we discover many thread begin occasions originating on the similar code location with out a corresponding finish occasion
(correlated through the thread id contained within the occasions), that is very possible a supply of a thread leak.
This form of occasion evaluation is an ideal use case for JFR Analytics.
This instrument means that you can analyze JFR recordings utilizing customary SQL (leveraging Apache Calcite underneath the hood).
In JFR Analytics, every occasion sort is represented by its personal “desk”.
Discovering thread begin occasions with out matching finish occasions is so simple as operating a LEFT JOIN
on the 2 occasion varieties and retaining solely these begin occasions which don’t have a be a part of companion.
So let’s load the file into the SQLLine command line shopper
(see the README of JFR Analytics for instructions on constructing and launching this instrument):
1
2
3
!join jdbc:calcite:schemaFactory=org.moditect.jfranalytics.JfrSchemaFactory;schema.file=thread_leak_recording.jfr dummy dummy
!outputformat vertical
Run the next SQL question for locating thread begin occasions with out corresponding thread be a part of occasions:
1
2
3
4
5
6
7
8
SELECT
ts."startTime",
ts."parentThread"."javaName" as "parentThread",
ts."eventThread"."javaName" AS "newThread",
TRUNCATE_STACKTRACE(ts."stackTrace", 20) AS "stackTrace"
FROM "jdk.ThreadStart" ts LEFT JOIN "jdk.ThreadEnd" te
ON ts."eventThread"."javaThreadId" = te."eventThread"."javaThreadId"
WHERE te."startTime" IS NULL;
Be aware how the parentThread
and eventThread
columns are of a fancy SQL sort, permitting you to refer to string properties comparable to javaName
or javaThreadId
utilizing dot notation.
In that easy instance recording, there’s one stack hint which dominates the consequence set, so any of the occasions reveals the offender:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
startTime 2023-02-26 11:36:04.284
javaName executor-thread-0
javaName pool-1060-thread-1
stackTrace java.lang.System$2.begin(Thread, ThreadContainer):2528
jdk.inside.vm.SharedThreadContainer.begin(Thread):160
java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953
java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364
java.util.concurrent.AbstractExecutorService.submit(Callable):145
java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791
org.acme.GreetingResource.hey():18
null
null
null
null
jdk.inside.replicate.DirectMethodHandleAccessor.invoke(Object, Object[]):104
java.lang.replicate.Technique.invoke(Object, Object[]):578
org.jboss.resteasy.core.MethodInjectorImpl.invoke(HttpRequest, HttpResponse, Object, Object[]):170
org.jboss.resteasy.core.MethodInjectorImpl.invoke(HttpRequest, HttpResponse, Object):130
org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(HttpRequest, HttpResponse, Object):660
org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(HttpRequest, HttpResponse, Object):524
org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(HttpRequest, HttpResponse, Object):474
null
org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter():364
The decision for creating a brand new thread apparently is initiated by the GreetingResource::hey()
technique by submitting a Callable
to an executor service.
And absolutely sufficient, that is the way it seems like:
1
2
3
4
5
6
7
8
9
10
11
@GET
@Produces(MediaType.TEXT_PLAIN)
public String hey() {
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.submit(() -> {
whereas (true) {
Thread.sleep(1000L);
}
});
return "Hey World";
}
If issues aren’t as clear-cut as in that contrived instance, it may be helpful to truncate stack traces to an inexpensive line rely
(e.g. it ought to be save to imagine that the person code beginning a thread is rarely additional away than ten frames from the precise thread begin name) and group by that.
JFR Analytics supplies the built-in operate TRUNCATE_STACKTRACE
for this objective:
1
2
3
4
5
6
7
8
9
SELECT
TRUNCATE_STACKTRACE(ts."stackTrace", 10) AS "stackTrace",
COUNT(1) AS "threadCount"
FROM "jdk.ThreadStart" ts LEFT JOIN "jdk.ThreadEnd" te
ON ts."eventThread"."javaThreadId" = te."eventThread"."javaThreadId"
WHERE te."startTime" IS NULL
GROUP BY
TRUNCATE_STACKTRACE(ts."stackTrace", 10)
ORDER BY "threadCount" DESC;
This factors on the problematic stack traces and code areas in a really pronounced manner (output barely adjusted for higher readability):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
stackTrace java.lang.System$2.begin(Thread, ThreadContainer):2528
jdk.inside.vm.SharedThreadContainer.begin(Thread):160
java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953
java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364
java.util.concurrent.AbstractExecutorService.submit(Callable):145
java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791
org.acme.GreetingResource.hey():18
null
null
null
threadCount 414
---
stackTrace java.util.Timer.<init>(String, boolean):188
jdk.jfr.inside.PlatformRecorder.lambda$createTimer$0(Listing):101
null
java.lang.Thread.run():1589
threadCount 1
Generally you might encounter a scenario the place new threads are began from inside different threads in a third social gathering dependency,
relatively than immediately from threads inside your individual code base.
In that case the stack traces of the thread begin occasions might not inform you sufficient in regards to the root reason for the issue,
i.e. the place these different “middleman” threads are coming from, and the way they relate to your individual code.
To dig into the small print right here, you possibly can leverage the truth that every jdk.ThreadStart
occasion incorporates details about the father or mother thread which began a brand new thread.
So you possibly can be a part of the jdk.ThreadStart
desk to itself on the father or mother thread’s id,
fetching additionally the stack traces of the code beginning these father or mother threads:
1
2
3
4
5
6
7
8
9
10
11
12
SELECT
ts."startTime",
pts."parentThread"."javaName" AS "grandParentThread",
ts."parentThread"."javaName" AS "parentThread",
ts."eventThread"."javaName" AS "newThread",
TRUNCATE_STACKTRACE(pts."stackTrace", 15) AS "parentStackTrace",
TRUNCATE_STACKTRACE(ts."stackTrace", 15) AS "stackTrace"
FROM "jdk.ThreadStart" ts LEFT JOIN "jdk.ThreadEnd" te
ON ts."startTime" = te."startTime"
JOIN "jdk.ThreadStart" pts
ON ts."parentThread"."javaThreadId" = pts."eventThread"."javaThreadId"
WHERE te."startTime" IS NULL;
Right here, stackTrace
is the hint of a thread (named “pool-728-thread-1”) of an exterior library, “greeting supplier”, which begins one other (leaking) thread (named “pool-729-thread-1”),
and parentStackTrace
factors to the code in our personal software (thread title “executor-thread-0”) which began that first thread:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
startTime 2023-02-28 09:15:24.493
grandParentThread executor-thread-0
parentThread pool-728-thread-1
newThread pool-729-thread-1
parentStackTrace java.lang.System$2.begin(Thread, ThreadContainer):2528
jdk.inside.vm.SharedThreadContainer.begin(Thread):160
java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953
java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364
java.util.concurrent.AbstractExecutorService.submit(Runnable):123
java.util.concurrent.Executors$DelegatedExecutorService.submit(Runnable):786
com.instance.greeting.GreetingService.greet():20
com.instance.greeting.GreetingService_ClientProxy.greet()
org.acme.GreetingResource.hey():20
null
null
null
null
jdk.inside.replicate.DirectMethodHandleAccessor.invoke(Object, Object[]):104
java.lang.replicate.Technique.invoke(Object, Object[]):578
---
stackTrace java.lang.System$2.begin(Thread, ThreadContainer):2528
jdk.inside.vm.SharedThreadContainer.begin(Thread):160
java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953
java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364
java.util.concurrent.AbstractExecutorService.submit(Callable):145
java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791
com.instance.greeting.GreetingProvider.createGreeting():13
com.instance.greeting.GreetingProvider_ClientProxy.createGreeting()
com.instance.greeting.GreetingService.lambda$greet$0(AtomicReference):21
null
java.util.concurrent.Executors$RunnableAdapter.name():577
java.util.concurrent.FutureTask.run():317
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Employee):1144
java.util.concurrent.ThreadPoolExecutor$Employee.run():642
java.lang.Thread.run():1589
If the thread hierarchy is even deeper, you might proceed down that path and preserve becoming a member of an increasing number of father or mother threads till you’ve arrived on the software’s primary thread.
I hoped to leverage recursive question assist in Calcite for this objective,
however because it turned out, assist for this solely exists within the Calcite RelBuilder
API in the mean time,
whereas the RECURSIVE
key phrase just isn’t supported for SQL queries but.
Outfitted with JDK Flight Recorder, JDK Mission Management, and JFR Analytics,
figuring out and fixing thread leaks in your Java software is turning into a comparatively easy activity.
The jdk.JavaThreadStatistics
, jdk.ThreadStart
, and jdk.ThreadEnd
occasion varieties are enabled within the default JFR profile,
which is supposed for everlasting utilization in manufacturing.
I.e. you possibly can preserve a size-capped steady recording operating on a regular basis,
dump it right into a file at any time when wanted, after which begin the evaluation course of as described above.
Taking issues a step additional, you might additionally arrange monitoring and alerting on the variety of lively threads,
e.g. by exposing the jdk.JavaThreadStatistics
occasion through a remote JFR event recording stream,
permitting you to react in real-time at any time when the lively thread rely reaches an surprising excessive degree.