A step‑by‑step benchmark comparing Spring Boot, Payara Micro and Embedded GlassFish revealed that methodological flaws can flip the winner. After normalising JDK versions, heap size, warm‑up and DB metrics, Spring Boot showed the best latency and throughput on a single workstation, while Payara Micro offered the cleanest failure‑free run and GlassFish remained a viable lightweight Jakarta EE option. The article explains the experiment, the trade‑offs and a decision matrix for choosing a runtime.

How a Fair Benchmark Shifted My View on Jakarta EE in 2026

The problem: intuition vs. evidence

When I first ran a realistic workload against three Java runtimes – Spring Boot, Embedded GlassFish and Payara Micro – the raw numbers suggested that the embedded GlassFish container was faster than Spring Boot. Publishing that result would have made a catchy headline, but the methodology was missing several crucial controls:

Different JDK builds were used.
Warm‑up periods were mixed with measurement windows.
Heap sizes and connection‑pool settings varied.
No database‑level statistics were collected.

Relying on such an incomplete experiment would have reinforced a bias rather than clarified the real performance characteristics.

The solution approach: a stepwise, reproducible lab

I designed a small but realistic API that mimics a shipment‑intelligence service. The same contract is implemented in the three runtimes and points to a single PostgreSQL instance populated with 100 k shipment rows. The workload includes:

Reads by tracking ID (lightweight).
Aggregations for route and volume summaries (DB‑heavy).
Paginated delayed‑shipment queries.
Event ingestion that writes to the DB.
Health/readiness endpoints.

Load is generated with k6, using identical scenarios across runs. The measurement stack captures:

HTTP latency percentiles (p50, p95, p99).
Throughput (requests per second).
Resident set size (RSS) before and after the test.
GC logs.
PostgreSQL pg_stat_statements to attribute DB cost.

All code lives in the public repository enterprise‑runtime‑lab. Tags mark each phase of the experiment (scaffold, baseline, realistic benchmark, fairness matrix, Railway smoke, final).

Phase 2 – the tempting quick result

Runtime	p50	p95	p99	Throughput
Embedded GlassFish	4.66 ms	58.77 ms	111.85 ms	86.46 req/s
Payara Micro	16.32 ms	135.76 ms	238.61 ms	71.17 req/s
Spring Boot	36.59 ms	340.50 ms	594.74 ms	53.36 req/s

At first glance GlassFish looks like the winner, but the test omitted warm‑up separation, used different JDKs and did not record RSS or DB metrics.

Phase 3 – adding causality

I introduced multiple virtual‑user levels (10, 25, 50, 100), three runs per configuration, and captured RSS and GC logs. GlassFish kept low tail latency under high load, Payara excelled in throughput, while Spring Boot consistently used less RSS. However, Payara reported unsupported JDK warnings on some runs, indicating an unfair comparison.

Phase 4 – the fair benchmark (final)

Controls applied:

Temurin 21.0.10 for every runtime.
Fixed heap -Xms512m -Xmx512m.
Separate warm‑up phase, then a 180 s measurement window.
Identical HikariCP pool settings.
pg_stat_statements reset after warm‑up.

Runtime	VUs	Median p50	Median p95	Median p99	Throughput	Error rate	Median RSS
Spring Boot	25	4.59 ms	66.92 ms	110.03 ms	213.13 req/s	0.01 %	517.5 MB
Payara Micro	25	33.10 ms	188.16 ms	336.77 ms	156.48 req/s	0.00 %	694.3 MB
Embedded GlassFish	25	38.03 ms	198.83 ms	371.96 ms	151.26 req/s	0.00 %	579.1 MB
Spring Boot	100	149.36 ms	341.69 ms	473.41 ms	372.56 req/s	0.04 %	543.0 MB
Payara Micro	100	204.61 ms	588.31 ms	870.53 ms	284.29 req/s	0.00 %	715.7 MB
Embedded GlassFish	100	320.12 ms	540.00 ms	677.23 ms	229.28 req/s	0.01 %	593.9 MB

Editorial reading

At 25 VUs Spring Boot leads in median latency, throughput and RSS.
At 100 VUs Spring Boot still holds the best p95/p99 and highest throughput, though it registers a few check failures.
Payara Micro shows zero check failures at both loads, making it the most “clean” Jakarta EE option.
Embedded GlassFish remains viable but no longer dominates when the test is fair.

Why the database mattered

pg_stat_statements revealed that the aggregation queries (route/volume summaries) dominate the tail latency. Simple tracking reads are cheap. This tells us that the observed differences are not solely due to the Java runtime; the DB, its connection pool and the host environment also contribute. A benchmark that isolates the runtime would need a CPU‑bound workload with minimal DB interaction.

Development experience

Spring Boot – fastest to prototype. Auto‑configuration, health endpoints and observability come out of the box. The team I work with already uses it, so the friction cost is low.
Payara Micro – feels natural if the organization already ships WARs on Jakarta EE. No check failures in the final phase, but log analysis is more involved.
Embedded GlassFish – surprised me with a lightweight executable that still supports the full Jakarta EE API set. Not a winner here, but worth a look when a tiny footprint is required.

Decision tree derived from the lab

Greenfield with a Spring‑savvy team → choose Spring Boot. The lab shows the best latency/throughput on a single node, and the ecosystem reduces operational risk.
Existing Payara/Jakarta EE deployment → evaluate Payara Micro first. It runs cleanly under pressure and avoids a full rewrite.
Need for a tiny Jakarta EE executable → consider Embedded GlassFish as a bridge; it can host Jakarta EE code without a full app server.

Limits of the experiment (no hype)

Single workstation; results may differ on multi‑node or cloud environments.
Workload is DB‑heavy; CPU‑bound micro‑benchmarks would shift the balance.
No Kafka, PostGIS, native images, Kubernetes, or long soak tests.
Phase 5 (Railway smoke) only proves that the services start and answer a minimal request; it is not a performance metric.

What I would change in a repeat run

Extend measurement windows to several hours to capture slow‑drift GC behaviour.
Run the harness on a CI runner to reduce local noise.
Align connection‑pool limits with PostgreSQL max_connections to see if queueing moves from the DB to the runtime.
Add a CPU‑bound scenario (e.g., JSON serialization) to isolate runtime overhead.

How this fits my day‑to‑day work

I spend most of my time building Java/Spring Boot services for digital‑identity, biometrics and secure storage. The evidence that Spring Boot delivers the best local latency, combined with its familiar tooling, makes it the pragmatic choice for new products. At the same time, the lab gives me concrete numbers to convince clients with existing Payara installations that a migration is not mandatory; a trial with Payara Micro or Embedded GlassFish can be justified with data.

The eureka moment

Seeing the ranking flip once all runtimes shared the same JDK, heap and warm‑up made me realize how fragile “quick benchmarks” are. The dominant factor for p99 under load turned out to be the aggregation query cost, not a mysterious framework penalty. From that point the conversation shifted from “which framework is superior” to “what do we really need to optimise?”

The role of editorial briefs

Before publishing, I wrote separate briefs that listed:

What evidence is allowed to be claimed.
What claims are prohibited.
The explicit limitations of the experiment. These briefs forced me to withhold the Phase 2 headline, to disclose Spring Boot’s check failures and to label the Railway run as a smoke test only. The process prevented the article from becoming a polemic and kept it focused on architecture.

Reproducibility

All artifacts are versioned in the GitHub repo. The final state is tagged runtime-lab-final (commit d176ed6). Tags for each phase let anyone replay the exact conditions that produced the tables shown above.

Bottom line

If the team already knows Spring, start with Spring Boot – the lab backs that choice with latency, throughput and memory numbers. If the organisation lives in Jakarta EE, run a Payara Micro trial before deciding on a rewrite – it showed zero check failures and competitive throughput. Embedded GlassFish is a viable lightweight Jakarta EE option, but it does not win the final fair benchmark.

The real lesson is that migration decisions must be backed by a benchmark that mirrors the actual workload, not by generic numbers or personal preference.

Original post published on juanchi.dev. The benchmark code, data and detailed run logs are available in the public repo linked above.

#Java #Spring Boot #Jakarta EE #Benchmark #Performance

How a Fair Benchmark Shifted My View on Jakarta EE in 2026

How a Fair Benchmark Shifted My View on Jakarta EE in 2026

The problem: intuition vs. evidence

The solution approach: a stepwise, reproducible lab

Phase 2 – the tempting quick result

Phase 3 – adding causality

Phase 4 – the fair benchmark (final)

Why the database mattered

Development experience

Decision tree derived from the lab

Limits of the experiment (no hype)

What I would change in a repeat run

How this fits my day‑to‑day work

The eureka moment

The role of editorial briefs

Reproducibility

Bottom line

Comments

Phase 2 – the tempting quick result

Phase 3 – adding causality

Phase 4 – the fair benchmark (final)