How a Fair Benchmark Shifted My View on Jakarta EE in 2026
#Backend

How a Fair Benchmark Shifted My View on Jakarta EE in 2026

Backend Reporter
7 min read

A step‑by‑step benchmark comparing Spring Boot, Payara Micro and Embedded GlassFish revealed that methodological flaws can flip the winner. After normalising JDK versions, heap size, warm‑up and DB metrics, Spring Boot showed the best latency and throughput on a single workstation, while Payara Micro offered the cleanest failure‑free run and GlassFish remained a viable lightweight Jakarta EE option. The article explains the experiment, the trade‑offs and a decision matrix for choosing a runtime.

How a Fair Benchmark Shifted My View on Jakarta EE in 2026

Featured image

The problem: intuition vs. evidence

When I first ran a realistic workload against three Java runtimes – Spring Boot, Embedded GlassFish and Payara Micro – the raw numbers suggested that the embedded GlassFish container was faster than Spring Boot. Publishing that result would have made a catchy headline, but the methodology was missing several crucial controls:

  • Different JDK builds were used.
  • Warm‑up periods were mixed with measurement windows.
  • Heap sizes and connection‑pool settings varied.
  • No database‑level statistics were collected.

Relying on such an incomplete experiment would have reinforced a bias rather than clarified the real performance characteristics.

The solution approach: a stepwise, reproducible lab

I designed a small but realistic API that mimics a shipment‑intelligence service. The same contract is implemented in the three runtimes and points to a single PostgreSQL instance populated with 100 k shipment rows. The workload includes:

  • Reads by tracking ID (lightweight).
  • Aggregations for route and volume summaries (DB‑heavy).
  • Paginated delayed‑shipment queries.
  • Event ingestion that writes to the DB.
  • Health/readiness endpoints.

Load is generated with k6, using identical scenarios across runs. The measurement stack captures:

  • HTTP latency percentiles (p50, p95, p99).
  • Throughput (requests per second).
  • Resident set size (RSS) before and after the test.
  • GC logs.
  • PostgreSQL pg_stat_statements to attribute DB cost.

All code lives in the public repository enterprise‑runtime‑lab. Tags mark each phase of the experiment (scaffold, baseline, realistic benchmark, fairness matrix, Railway smoke, final).

Phase 2 – the tempting quick result

Runtime p50 p95 p99 Throughput
Embedded GlassFish 4.66 ms 58.77 ms 111.85 ms 86.46 req/s
Payara Micro 16.32 ms 135.76 ms 238.61 ms 71.17 req/s
Spring Boot 36.59 ms 340.50 ms 594.74 ms 53.36 req/s

At first glance GlassFish looks like the winner, but the test omitted warm‑up separation, used different JDKs and did not record RSS or DB metrics.

Phase 3 – adding causality

I introduced multiple virtual‑user levels (10, 25, 50, 100), three runs per configuration, and captured RSS and GC logs. GlassFish kept low tail latency under high load, Payara excelled in throughput, while Spring Boot consistently used less RSS. However, Payara reported unsupported JDK warnings on some runs, indicating an unfair comparison.

Phase 4 – the fair benchmark (final)

Controls applied:

  • Temurin 21.0.10 for every runtime.
  • Fixed heap -Xms512m -Xmx512m.
  • Separate warm‑up phase, then a 180 s measurement window.
  • Identical HikariCP pool settings.
  • pg_stat_statements reset after warm‑up.
Runtime VUs Median p50 Median p95 Median p99 Throughput Error rate Median RSS
Spring Boot 25 4.59 ms 66.92 ms 110.03 ms 213.13 req/s 0.01 % 517.5 MB
Payara Micro 25 33.10 ms 188.16 ms 336.77 ms 156.48 req/s 0.00 % 694.3 MB
Embedded GlassFish 25 38.03 ms 198.83 ms 371.96 ms 151.26 req/s 0.00 % 579.1 MB
Spring Boot 100 149.36 ms 341.69 ms 473.41 ms 372.56 req/s 0.04 % 543.0 MB
Payara Micro 100 204.61 ms 588.31 ms 870.53 ms 284.29 req/s 0.00 % 715.7 MB
Embedded GlassFish 100 320.12 ms 540.00 ms 677.23 ms 229.28 req/s 0.01 % 593.9 MB

Editorial reading

  • At 25 VUs Spring Boot leads in median latency, throughput and RSS.
  • At 100 VUs Spring Boot still holds the best p95/p99 and highest throughput, though it registers a few check failures.
  • Payara Micro shows zero check failures at both loads, making it the most “clean” Jakarta EE option.
  • Embedded GlassFish remains viable but no longer dominates when the test is fair.

Why the database mattered

pg_stat_statements revealed that the aggregation queries (route/volume summaries) dominate the tail latency. Simple tracking reads are cheap. This tells us that the observed differences are not solely due to the Java runtime; the DB, its connection pool and the host environment also contribute. A benchmark that isolates the runtime would need a CPU‑bound workload with minimal DB interaction.

Development experience

  • Spring Boot – fastest to prototype. Auto‑configuration, health endpoints and observability come out of the box. The team I work with already uses it, so the friction cost is low.
  • Payara Micro – feels natural if the organization already ships WARs on Jakarta EE. No check failures in the final phase, but log analysis is more involved.
  • Embedded GlassFish – surprised me with a lightweight executable that still supports the full Jakarta EE API set. Not a winner here, but worth a look when a tiny footprint is required.

Decision tree derived from the lab

  1. Greenfield with a Spring‑savvy team → choose Spring Boot. The lab shows the best latency/throughput on a single node, and the ecosystem reduces operational risk.
  2. Existing Payara/Jakarta EE deployment → evaluate Payara Micro first. It runs cleanly under pressure and avoids a full rewrite.
  3. Need for a tiny Jakarta EE executable → consider Embedded GlassFish as a bridge; it can host Jakarta EE code without a full app server.

Limits of the experiment (no hype)

  • Single workstation; results may differ on multi‑node or cloud environments.
  • Workload is DB‑heavy; CPU‑bound micro‑benchmarks would shift the balance.
  • No Kafka, PostGIS, native images, Kubernetes, or long soak tests.
  • Phase 5 (Railway smoke) only proves that the services start and answer a minimal request; it is not a performance metric.

What I would change in a repeat run

  • Extend measurement windows to several hours to capture slow‑drift GC behaviour.
  • Run the harness on a CI runner to reduce local noise.
  • Align connection‑pool limits with PostgreSQL max_connections to see if queueing moves from the DB to the runtime.
  • Add a CPU‑bound scenario (e.g., JSON serialization) to isolate runtime overhead.

How this fits my day‑to‑day work

I spend most of my time building Java/Spring Boot services for digital‑identity, biometrics and secure storage. The evidence that Spring Boot delivers the best local latency, combined with its familiar tooling, makes it the pragmatic choice for new products. At the same time, the lab gives me concrete numbers to convince clients with existing Payara installations that a migration is not mandatory; a trial with Payara Micro or Embedded GlassFish can be justified with data.

The eureka moment

Seeing the ranking flip once all runtimes shared the same JDK, heap and warm‑up made me realize how fragile “quick benchmarks” are. The dominant factor for p99 under load turned out to be the aggregation query cost, not a mysterious framework penalty. From that point the conversation shifted from “which framework is superior” to “what do we really need to optimise?”

The role of editorial briefs

Before publishing, I wrote separate briefs that listed:

  • What evidence is allowed to be claimed.
  • What claims are prohibited.
  • The explicit limitations of the experiment. These briefs forced me to withhold the Phase 2 headline, to disclose Spring Boot’s check failures and to label the Railway run as a smoke test only. The process prevented the article from becoming a polemic and kept it focused on architecture.

Reproducibility

All artifacts are versioned in the GitHub repo. The final state is tagged runtime-lab-final (commit d176ed6). Tags for each phase let anyone replay the exact conditions that produced the tables shown above.

Bottom line

If the team already knows Spring, start with Spring Boot – the lab backs that choice with latency, throughput and memory numbers. If the organisation lives in Jakarta EE, run a Payara Micro trial before deciding on a rewrite – it showed zero check failures and competitive throughput. Embedded GlassFish is a viable lightweight Jakarta EE option, but it does not win the final fair benchmark.

The real lesson is that migration decisions must be backed by a benchmark that mirrors the actual workload, not by generic numbers or personal preference.


Original post published on juanchi.dev. The benchmark code, data and detailed run logs are available in the public repo linked above.

Comments

Loading comments...