Siddhish Sutaria and Jolly Shah and the Quiet Work of Embedded ML

A HackerNoon profile spotlights two engineers working at the seam where machine learning meets constrained hardware. The piece itself is thin on specifics, so here is what the embedded AI space actually looks like right now, and why the people who tune firmware for wearables and edge devices matter more than the funding headlines suggest.

A new HackerNoon profile titled Shaping Embedded System Evolution puts the spotlight on Siddhish Sutaria and Jolly Shah, two engineers whose listed focus areas read like a map of where machine learning is heading: embedded AI engineering, ML firmware, edge AI for wearables, data center systems, and cross-platform embedded design. The original write-up is short on hard detail, so rather than restate a headline, it is worth looking at the actual work this kind of profile points to, and why it is one of the more interesting and least hyped corners of the current tech cycle.

The problem these engineers are solving

Most of the attention in AI goes to models that live in data centers. Large language models, image generators, and recommendation engines run on racks of GPUs with effectively unlimited power and cooling. That is not where most computing actually happens. The vast majority of processors shipped every year go into microcontrollers and embedded systems: the chip inside a fitness band, a hearing aid, an industrial sensor, a car's brake controller. These devices have kilobytes to a few megabytes of memory, run on coin cells or small batteries, and cannot phone home to a server every time they need to make a decision.

Getting machine learning to run usefully on that hardware is a genuinely hard engineering problem. A model that needs hundreds of megabytes and a discrete GPU has to be compressed, quantized, and rewritten to fit in a budget that is thousands of times smaller, while still producing answers fast enough to be useful and cheap enough in power terms not to drain the battery in an hour. This is the discipline that sits behind tags like embedded AI engineering and scalable embedded ML. It is firmware work, signal processing, and model optimization all at once.

How embedded ML actually works

The pipeline usually starts with a model trained the normal way, in a framework like PyTorch or TensorFlow. The interesting part is everything that happens after training. Quantization converts the model's 32-bit floating point weights down to 8-bit integers, or sometimes even smaller, which shrinks the memory footprint and lets cheap integer math units do the work. Pruning strips out connections that contribute little to the output. Knowledge distillation trains a small model to imitate a larger one, capturing most of the accuracy at a fraction of the size.

From there the model gets handed to a runtime built for constrained hardware. TensorFlow Lite for Microcontrollers and the ONNX Runtime are common targets, and toolchains like Edge Impulse have grown up specifically to make this workflow approachable. The engineer's job is to balance three quantities that constantly fight each other: model accuracy, inference latency, and energy per inference. Push accuracy too high and the battery dies. Optimize too aggressively for power and the model starts making bad calls. There is no universally correct answer, only trade-offs tuned to a specific device and use case.

featured image - Siddhish Sutaria and Jolly Shah: Shaping Embedded System Evolution

This is where wearables make the constraints vivid. A device on your wrist has to classify motion, detect a heartbeat irregularity, or wake on a voice command using a power budget measured in microamps. Doing that on-device rather than in the cloud is not just an efficiency choice, it is a privacy and reliability choice. The data never leaves the device, and the feature keeps working when the network does not.

Why the cross-platform angle matters

The profile's mention of cross-platform embedded systems and consumer-cloud integration points at the other half of the problem. A wearable is rarely a standalone product. It pairs with a phone, syncs to a cloud backend, and increasingly shares model updates over the air. Designing a system where the same logic behaves consistently across a microcontroller, a mobile app, and a server backend is its own engineering challenge, because each layer has different constraints and failure modes. Getting the split right, deciding what runs at the edge and what runs in the data center, is a recurring architectural decision rather than a solved problem.

The data center side has not gone away either. Training the models, aggregating telemetry, and pushing improvements back down to fleets of devices all happen server-side. The engineers who can reason across that whole stack, from a constrained sensor up to scalable backend infrastructure, are rare precisely because the two worlds reward opposite instincts. Embedded work prizes frugality and predictability. Cloud work prizes elasticity and abstraction.

The skeptic's note

Profiles like this one tend to arrive with a lot of momentum and not much measurable detail, and this one is no exception: it names focus areas without naming shipped products, benchmark numbers, or funding. That is worth flagging honestly rather than dressing up. The substance in this space is not in the press treatment, it is in whether a model actually runs within its power budget on real silicon, and those numbers are usually buried in datasheets and internal benchmarks rather than announcements.

What is real is the direction. As model compression tools mature and microcontrollers ship with dedicated neural accelerators, more intelligence is migrating to the edge, and the engineers who know how to make that work are quietly more valuable than the field's funding headlines imply. If you want to understand where consumer hardware is going over the next few years, the embedded ML layer is a more reliable signal than the model-of-the-week. The people tuning firmware to squeeze a useful model into a wristband are doing the unglamorous work that decides whether any of the bigger promises actually reach a user's hand.

For anyone wanting to get hands-on with the discipline rather than read about it, the TensorFlow Lite Micro repository and the ONNX Runtime documentation are good starting points, and they make the trade-offs described here concrete in a way no profile can.