What this is. Opinion + Experience + Fact (40% opinion · 40% experience · 20% fact). Written in collaboration with AI — I discuss, I do not outsource.


1. The module I have written five times

Of all the modules every embedded team rebuilds, OTA is the one I have watched the same engineer write five different ways across five products. Same engineer. Same fundamentals. Five codebases, five subtly different implementations of "download an image, verify it, write it to the spare slot, flip the boot flag, recover if the power drops mid-write."

Each one is a sound implementation. Each one works. Each one ships. And each one is a fresh few weeks of an experienced engineer's time spent re-deriving the same state machine, re-learning the same flash-wear edge cases, and re-discovering the same way a brownout at the wrong moment can brick a unit. The fifth rewrite is as careful as the first, because the cost of getting OTA wrong is a truck roll or a dead device in a customer's hands.

This post is about the better answer the last post pointed at. Not "write OTA more carefully." Write it once — contract-frozen, observability built in — and use it everywhere. The same is true for a short list of services every embedded team rebuilds on every project. OTA is just the one that hurts most when it goes wrong.

First principle. The module worth writing once is the one whose fundamentals do not change across products. OTA's fundamentals have been the same for a decade — only the chip underneath moves.


2. The five everyone rebuilds

OTA is not alone. There is a short list of services that show up in nearly every connected embedded product, get rebuilt from scratch each time, and have almost identical requirements across products. Here is the list, with the shape of the waste and what each looks like when it is written once.

ServiceRebuilt acrossWritten once as
OTA updateevery connected product, every timesvc_ota — slot management, verify, rollback
Non-volatile storage / NVMnearly every product with settingssvc_nvm — wear-aware, versioned schema
Watchdogevery product with an MCUsvc_watchdog — kick policy, fault capture
Time syncevery networked productsvc_time — monotonic + wall-clock, drift
Logging / event captureevery product, with printf reinventedsvc_log — bounded taxonomy, by default

Five services. Every connected product needs most of them. Each gets rewritten because it lives inside the product's codebase rather than above it, in a shared layer. The requirements barely move between products. The implementations are written fresh every time anyway.

First principle. A service whose requirements stay constant across products is a service the team is paying to rewrite, not paying to invent.


3. Why it keeps happening

The rewrite is not a failure of discipline. It happens for reasons that feel right in the moment.

The first reason is coupling. OTA written inside a product reaches into that product's flash map, its boot path, its specific MCU. Lifting it out feels like more work than rewriting it, so the team rewrites it. The second reason is trust: OTA is the riskiest module to get wrong, so an engineer would rather own a version they understand line by line than inherit one they would have to audit anyway. The third reason is that there was no shared place to put it. Each product is its own repository, its own build, its own world. A module that could be shared had nowhere above the product to live, so it lived in five places.

All three reasons dissolve the moment a structured application layer above the RTOS exists — a place where a service can sit above any single product, talk to the hardware through the same clean seam the rest of the layer uses, and carry its own contract and its own observability. The shared place is the thing that was missing, not the willingness to share.

First principle. Services get rewritten because there was no shared place above the product to keep them. Build the place, and the rewrite stops being the path of least resistance.


4. What "write once" actually means

Writing a service once is more than copying a folder between repositories. Copied code drifts — five copies become five variants within a year, each patched separately, and the shared-once benefit evaporates. Write once means three specific things.

It means a frozen contract: the service exposes a small, stable interface — ota_begin, ota_write_chunk, ota_finalize, ota_rollback — that holds steady when the chip underneath it changes. Products depend on the contract, not the implementation. It means observability built in: the service emits its own bounded events — download started, chunk verified, slot flipped, rollback triggered — so every product inherits a debuggable OTA without adding instrumentation. And it means a clean hardware seam: the service talks to flash and boot through the same HAL interface the rest of the layer uses, so the one part that genuinely differs between products — the flash driver — is the only part that gets re-supplied.

When those three hold, the OTA state machine, the verification logic, the rollback policy, the wear handling — all the parts that stay the same — are written and hardened once. The product supplies a flash driver and a configuration. That is the whole integration.

First principle. Write once means frozen contract, built-in observability, and a clean hardware seam — so the parts that stay the same are shared and only the part that genuinely differs is re-supplied.


5. The cost of the fifth rewrite

It is worth being concrete about what the rewrite actually costs, because the cost hides well.

The direct cost is the few weeks of senior-engineer time per product. That is visible, and teams budget for it. The hidden cost is larger. Every fresh OTA implementation is a fresh set of edge cases that have to be re-found in the field rather than inherited as already-solved. The brownout-mid-write case that product three handled correctly is a bug waiting to be rediscovered in product four, because product four's OTA is a separate body of code that did not inherit product three's lesson. Five implementations means five separate hardening curves, each climbed from the bottom.

A service written once climbs the hardening curve one time. The brownout case gets solved on whichever product hits it first, and every product after that inherits the fix the day it lands. The fifth product ships with OTA that has five products' worth of field hardening in it, instead of its own first draft. That is the compounding the title points at — write once, and every product after the first starts from the hardened version.

First principle. Five implementations climb five hardening curves from the bottom. One implementation climbs once, and every product after inherits the altitude.


6. Services as part of the layer

This is where the services connect back to everything the last several posts have built. A structured application layer above the RTOS already owns the message bus, the contracts, the state-machine tables, and the observability taxonomy. A shared service library is the natural next tenant of that layer.

A service like svc_ota is just a well-behaved citizen of the layer: it communicates over the same typed message bus, it exposes a frozen contract like every other module, its state machine lives in a flat table, and its events flow into the same observability stream. It is host-simulatable for the same reason the rest of the layer is — the flash it writes to can be a faked HAL on a laptop, so the OTA state machine can be tested and replayed without bricking a real unit. Everything the layer gives the application, it gives the services too.

That is why services belong in the framework rather than in each product. They are not a special category. They are ordinary modules that happen to be useful to every product, written to the same standard as the rest of the layer, and kept in a place where every product can reach them.

First principle. A shared service is not a special artifact — it is an ordinary module of the layer that happens to be useful everywhere, written once to the layer's standard.


7. Write once, use everywhere

The pattern is simple to state and slow to adopt, because adopting it means building the shared place before the second product needs it. The team that builds the layer and puts its first service in it pays a little extra on product one and collects the dividend on every product after. The team that keeps each service inside each product pays the rewrite forever, a few weeks at a time, and rarely sees the bill as one number.

OTA is the one I lead with because the stakes make the argument for me — a bricked unit in the field is a cost no one argues with. But the same logic runs through NVM, the watchdog, time sync, and logging. Each is a service whose fundamentals are constant, whose risk of a fresh bug is real, and whose right home is a shared layer above the product rather than a fresh implementation inside it.

Write the hard parts once. Harden them across every product instead of inside one. Supply only the part that genuinely differs. That is what it means for a service to belong in the framework.

First principle. The services worth sharing are the ones whose fundamentals are constant and whose failure is expensive. Write those once, above the product, and let every product inherit the hardening.

Which service has your team written the most times — and what would change if the next product inherited the hardened version instead of starting fresh?

Next: the architecture, the contracts, the services — they all sit in service of one thing we're announcing soon. Stay close.


Labeled: Opinion + Experience + Fact (40% opinion · 40% experience · 20% fact)

Sources:

(Written in collaboration with AI — I discuss, I do not outsource.)

New to this labeling? Read the framework → 20+ Years of Ideas. Articulation Is the Craft.

— Ritesh | ritzylab.com

#EmbeddedSystems #Firmware #OTA #SystemsArchitecture #FirstPrinciples