Java Lambda tuning comes down to cold starts, memory, and build control

Vadym Kazulkin’s InfoQ talk gives Java teams a practical AWS Lambda tuning map: start with SnapStart, prime the code paths that load heavy dependencies, and use GraalVM Native Image when build complexity buys enough latency and memory gain.

AWS Serverless Hero Vadym Kazulkin used an InfoQ presentation to show how Java teams can cut AWS Lambda cold starts from multi-second delays to sub-second ranges with Lambda SnapStart, priming hooks, smaller deployment packages, and GraalVM Native Image.

The target was a common enterprise shape: API Gateway calls a Java Lambda handler, the handler deserializes a proxy request event, and the function reads product data from DynamoDB through the AWS SDK. Kazulkin used that path because it exposes the startup costs Java teams hit in production: class loading, runtime setup, dependency injection, JSON serialization, HTTP client creation, and SDK initialization.

AWS now lists Java 25, Java 21, Java 17, Java 11, and Java 8 on its Lambda Java runtime documentation. That matters for teams that want recent Java language and runtime work without leaving managed Lambda runtimes.

A Lambda execution environment handles one request at a time. AWS starts more environments when concurrency rises, when developers deploy new code, or when AWS replaces older environments for maintenance. That startup path creates the cold start: Lambda downloads code, starts the runtime, initializes reachable code, and then calls the handler.

Kazulkin’s baseline Java function used 1 GB of memory on x86, Amazon Corretto, the Apache HTTP client through the AWS SDK, and a 14 MB deployment artifact. In his test run, warm DynamoDB reads landed around 7 milliseconds at p90, while cold starts sat around 3 seconds at p90.

SnapStart changes the deployment path. With Lambda SnapStart, AWS initializes a published function version, captures a Firecracker microVM snapshot, encrypts it, and caches copies. Later invocations resume from that snapshot instead of repeating the initialization path.

Kazulkin measured about 2 seconds at p90 after enabling SnapStart without application changes. He then added a pre-snapshot hook through the CRaC API and called ProductDao.getProductById("0") before AWS captured the snapshot. That call forced the function to load the DynamoDB client path, the HTTP client path, and Jackson serialization code before runtime traffic arrived.

That priming step cut p90 cold start time to about 1 second in his first measurements. After AWS warmed its snapshot cache across more invocations, Kazulkin reported about 650 milliseconds at p90 for the last 70 cold starts in the test set.

The priming result explains a common Lambda surprise. A Java function can show a warm invocation outlier because the handler touches code that the initializer did not load. A first DynamoDB call can create response classes, serializers, marshallers, and HTTP machinery during the handler path. Priming moves that work into the snapshot phase.

Teams should prime code paths that match production payloads. An API Gateway handler can prime request serialization with a small synthetic event. A DynamoDB reader can prime one harmless read path. A PostgreSQL client needs connection validation after restore, because the function may resume from a snapshot long after AWS captured connection state.

AWS documents SnapStart limits that matter in design reviews. SnapStart covers Java 11 and later managed runtimes, Python 3.12 and later, and .NET 8 and later. AWS excludes container images, OS-only runtimes, provisioned concurrency, Amazon Elastic File System, Amazon S3 Files, and ephemeral storage above 512 MB from SnapStart support.

Snapshot state also creates correctness risks. Developers should generate unique IDs, fresh secrets, entropy, and short-lived cache values after initialization. A timestamp captured before snapshot can mislead code that expects current time during invocation.

Package size also shaped Kazulkin’s results. A 130 KB Hello World function started faster than the 14 MB sample. A 50 MB function with tracing and extra dependencies started slower. Java Lambda teams should audit dependencies with the same care they bring to container image size, because Lambda must fetch, initialize, snapshot, or restore that code.

Memory tuning gives teams another lever. Lambda maps memory to CPU share, so a 1 GB function gets less CPU than a larger configuration. Kazulkin advised testing memory up to the point where added CPU stops improving startup or handler time. Cost needs measurement because a faster, larger function can cost less than a smaller function that runs longer.

JIT settings also matter. Kazulkin used -XX:TieredStopAtLevel=1 to reduce compiler work during short Lambda lifetimes. Many Lambda execution environments exit before HotSpot reaches peak optimization paths, so aggressive tiered compilation can spend CPU on work that the function cannot recover through later throughput.

GraalVM Native Image attacks the same problem from the build side. The GraalVM Native Image documentation describes ahead-of-time compilation that produces a native executable with reachable application classes, standard library code, runtime pieces, and native code. Kazulkin measured lower cold starts than SnapStart in his sample and saw lower warm start outliers.

The trade-off moves into CI and dependency management. Native Image needs a build pipeline that can allocate enough CPU and memory, and Kazulkin’s sample needed about 6 GB of memory and up to 3 minutes to build one function. Teams also need reachability metadata for reflection, logging, dynamic class loading, and framework internals.

Framework choice influences that cost. Quarkus, Micronaut, Helidon, and Spring support Native Image paths, but a transitive dependency can still break a build or produce a runtime ClassNotFoundException if the team misses reflection metadata. The GraalVM tracing agent can create metadata from test runs, but test gaps can leave production paths uncovered.

Project Leyden adds another variable for Java teams. The OpenJDK Project Leyden effort focuses on startup time, time to peak performance, and footprint. Kazulkin framed Leyden as part of the long-term Java startup story, while GraalVM Native Image remains the more aggressive AOT option for Lambda latency and memory goals.

The operational decision comes down to control. SnapStart keeps teams inside AWS managed runtimes and requires less build work, but developers must understand snapshot state and priming. GraalVM can produce faster starts and smaller memory profiles, but teams own build time, metadata, framework support, and dependency drift.

For many Java Lambda APIs, Kazulkin’s data points to a practical order of work: trim dependencies, enable SnapStart on published versions, add pre-snapshot priming for SDK and serialization paths, tune memory, and measure p90 and p99 across enough cold starts for AWS cache behavior to settle. Teams that still need lower latency can then price the GraalVM build path against the extra engineering work.

Java Lambda tuning comes down to cold starts, memory, and build control

Comments