This post dives into virtual threads and explores if they are a silver bullet for making a high volume of API calls.
Intro
The application I am working on performs a significant number of concurrent REST calls (over 10_000) with minimal processing required on the response. This scenario seems ideal for virtual threads. Let's analyze how they perform application and if the switch is worthwhile.
Background
Virtual threads introduce an abstraction layer on top of traditional platform threads. They leverage carrier threads (essentially platform threads) but differ in how they handle blocking operations (like waiting for a response). When a virtual thread encounters a block, it unmounts from the carrier, allowing the carrier to pick up other virtual threads, and boosting hardware utilization. This flexibility comes with some scheduling overhead. The goal is to identify when virtual threads offer a clear advantage.
JEP-444 highlights the key benefits:
To put it another way, virtual threads can significantly improve application throughput when
- The number of concurrent tasks is high (more than a few thousand), and
- The workload is not CPU-bound, since having many more threads than processor cores cannot improve throughput in that case.
The application I am working on checks the health of service via REST requests, making it non-CPU-bound as it takes some time to get responses, and there are enough URLs to generate thousands of virtual threads. This suggests virtual threads could be a good fit.
Setup
The benchmark setup involves two machines: one to run the benchmark and another acting as a server to receive requests. The goal is to measure the time it takes to process 10,000 tasks, each involving one or two API calls. The server endpoint simulates different response times using a path variable.
The requests are made to the following end-point:
|
|
Getting a response from this end-point takes around 4ms. With the thread.sleep I can add more delay to this end-point.
Using a path variable it is possible to delay a response from the server. This makes benchmarking different response times a lot easier.
The benchmark is created with JMH. The goal is to submit 10_000 tasks to the newVirtualThreadPerTaskExecutor
and newFixedThreadPool
and see which one takes the least time on average to run. There are a few combinations of parameters I tested.
So the number of benchmarks is:
- 2 types of executors (virtual and fixed pool of platform threads)
- tasks with 1 or 2 API calls
- 0,1,2,3,4, or 5ms of extra delay.
This gives me 2 * 2 * 6 = 12 possible combinations to run. For each combination, 10_000 requests are made.
This is the benchmark code:
|
|
This benchmark used the following execution plan:
|
|
JMH will run all possible combinations of these variables and print the results after running all the benchmarks.
Results
The benchmark is run with JDK 24-loom+1-17 (2024/6/22). This is an early access build of project Loom and has the latest changes to the virtual thread scheduler. I created two separate graphs one for the “1 API call” benchmark and one for the “2 API call” benchmark to make it easier to compare the two kinds of executors.
These are the results of submitting 10_000 tasks that each make 1 API call: As you can see the virtual threads are very stable at around 3 seconds. The platform threads are performing better with 0.8 seconds to perform 10_000 requests, in comparison to the 3 seconds the virtual threads need. As the delay increases so does the time the platform threads need to perform all those requests. The Virtual Threads on the other hand seem to perform a little better when requests have more delay.
These are the results of submitting 10_000 tasks that make 2 API calls:
As you can see the virtual threads stay very stable at around 3 seconds to perform 2 x 10_000 calls. Looking at the platform threads you see that till the 2ms of extra delay, they perform better than virtual threads, after that virtual threads are the clear winner.
For complete transparency, these are JMH benchmark results:
|
|
Key Takeaways:
- For single API calls with a response time under 9ms (4ms base delay + 5ms extra delay), platform threads perform better.
- When making multiple calls, virtual threads show more consistent performance around 3 seconds.
- Platform threads might be preferable for 2 calls if the response time of each stays below 6ms (4ms base delay + 2ms extra delay).
Remember, these are general guidelines. The optimal choice depends on your specific application's characteristics.