What this is.
Opinion + Experience + Fact (40% opinion · 40% experience · 15% fact · 5% fiction)
Written in collaboration with AI — I discuss, I do not outsource.


Chapter 1. The Choice After Observability

The last post ended on a promise: when observability is built in, one architectural choice decides how cleanly the system scales — message-driven or shared state. Both compound, in opposite directions ⚖️.

This post is that choice, written down.

I have shipped products in both shapes. A motor controller where every module held a shared status struct, protected by a mutex the team mostly trusted. A connected sensor where every module published typed events on a bus and read its inputs the same way. Both products shipped. Both products worked. Only one of them was calm to extend a year later.

The choice looks small at the start. The two architectures cost about the same in the first sprint. By the third quarter the cost lines have separated, and by the second year they are not in the same room anymore. The team holding shared state is reasoning about which task touched which field. The team holding messages is reasoning about which contract changed and what the next contract should be 📨.

The choice is structural. The shape it pays back is structural too.

First principle. Shared state and message-driven cost about the same on day one. By year two they cost an order of magnitude apart.


Chapter 2. The Shared-State Product

The first product was a small motor controller for a connected appliance 🌀.

The architecture was honest: a g_motor_status struct held current, voltage, RPM, fault flags, command state. A mutex around the struct. Eight modules read from it. Four modules wrote to it. The senior engineer kept the rules in his head — module A always writes before module B reads, module C only writes during state transitions, module D reads outside the lock because that field is "atomic."

The product shipped. The first six months in the field were quiet. Then a customer reported a fault that the device recovered from incorrectly — the motor restarted in the wrong mode, drew current the supply could not deliver, tripped its own protection. The fault recovered. But the recovery path was wrong.

The engineer chasing the bug spent two days reading the eight readers and four writers, mapping which field each one touched, drawing the implicit ordering on a whiteboard. The bug turned out to be a write from the fault handler that happened between two reads in the recovery state machine — a small race the original design had not enumerated.

The fix took an hour. The reasoning took two days. There were a dozen other places the same shape of bug could be sitting, and no one on the team could prove they were safe ⏱️.

First principle. Shared state is fast to write and slow to reason about. The reasoning cost shows up after the product ships, not during the build.


Chapter 3. The Message-Driven Product

The second product was a connected sensor a few years later 📡.

The architecture was less honest at the start — the team paid for an extra day or two of design before writing the first module. Every module published typed messages on a bus. Every module read its inputs as messages. There was no shared struct. The state lived inside each module, and the interactions between modules were a finite set of typed contracts the team wrote down on day one.

The first sprint was slower. The team had to define the message types — MOTOR_CMD, MOTOR_TELEMETRY, FAULT_REPORT, BOOT_STATUS. Each had a versioned schema. Each had a publisher and a list of subscribers. The first code review took longer because the contracts were new and the reviewers had to learn them.

By the third sprint, the contracts were the team's shared vocabulary. New modules plugged in by declaring which contracts they read and which they published. The state machine inside each module became a small, local thing — a few states, a few events, a few transitions, all bounded by the contracts.

The bug class that took two days to find on the first product was architecturally impossible on the second. There was no shared field. The fault handler could not silently write to a struct two readers were sampling. If the fault handler wanted to change anything in another module, it had to publish a message, and the message had to be in the contract.

The first product had a race in its third quarter. The second product had its third quarter pass without one 📭.

First principle. Message-driven costs an extra day of design and removes whole classes of bugs that shared-state ships with by default.


Chapter 4. The Same Feature, Two Shapes

Here is the same small feature — "when the motor faults, latch the system into a safe state and report it" — written in both shapes 🪞.

═════════════════════════ SHARED STATE ═════════════════════════
// motor_isr.c
void motor_fault_isr(void) {
    g_motor_status.fault = FAULT_OVERCURRENT;
    g_motor_status.state = MOTOR_FAULTED;
    g_safety.latched = true;          // module C reads this
}

// system_sm.c (in recovery loop)
if (g_safety.latched) {
    if (g_motor_status.state == MOTOR_FAULTED) {
        enter_safe_state();           // intended path
    } else {
        log_unexpected_recovery();    // ← bug lives here
    }
}
// Race: if the safety latch is read before the state field
// is updated, the recovery branch goes the wrong way.

═══════════════════════ MESSAGE-DRIVEN ═════════════════════════
// motor_module.c
void motor_fault_isr(void) {
    fault_evt_t e = {
        .source = MOTOR,
        .kind   = FAULT_OVERCURRENT,
        .ts_us  = now_us(),
    };
    bus_publish(EVT_FAULT, &e);       // contract: typed, atomic
}

// safety_module.c (subscriber)
void on_fault(const fault_evt_t* e) {
    enter_safe_state(e->source, e->kind);
}
// No race possible. The fault is one typed message.
// The safety module reads it as one atomic delivery.

Two implementations. Same product behavior on the happy path. Different behavior on the unhappy path — the path the customer reports, the path the audit reviews, the path the AI agent has to reason about when it proposes the next module ⚖️.

The shared-state version has 200 lines of reasoning behind every change in any of the four writers. The message-driven version has a contract — one struct, one publisher, a list of subscribers — and every change to the contract is a deliberate version bump.

First principle. The same feature in two architectures has the same happy path and two different unhappy paths. The unhappy path is the one the team lives in.


Chapter 5. The Form The Agent Reads

The third reader of every embedded codebase in 2026 is the AI agent — the one that proposes the next handler, the next state, the next module, given the existing code as context 🤖.

The agent reads shared state poorly. The implicit ordering between writers and readers is in the senior engineer's head. The agent sees the struct, the mutex, the writes, the reads — and proposes code that violates the ordering, because the ordering is not in the source. The reviewer either catches it or ships a bug.

The agent reads message contracts cleanly. Each contract is a typed schema. Each handler is a function whose inputs are the contract and whose outputs are other contracts. The agent's proposal is a new handler or a new contract — both auditable, both reviewable, both inside a structure that already exists.

This is not a property of any specific framework. It is a property of the architecture. A team can pick any RTOS, any transport, any tooling — if the modules talk to each other through typed messages, the agent has something to read. If the modules talk to each other through shared state and mutexes, the agent is guessing.

The structured application layer above the RTOS — the layer that holds the message bus, the contracts, the FSM tables, the observability events — is the layer the agent reads. Message-driven is what makes that layer legible 📖.

First principle. Message-driven is the only model an AI agent can reliably extend, because every interaction has a contract the agent can read.


Chapter 6. Where Shared State Earns Its Place

There is a class of system where shared state is the right answer, and it is worth naming honestly 🎯.

A small bare-metal main loop with one or two state variables and no concurrent writers — shared state is the natural fit. An ISR that updates a single counter the main loop drains — shared state is the natural fit. A device with a few hundred lines of total firmware and no future modules planned — shared state is the natural fit.

The architectural question is not "shared state is always wrong." The architectural question is "where does this product's scale curve go." A product that ships once and is replaced is a different conversation from a product that ships across three generations and earns a tooling integration in the second.

The teams that get hurt by shared state are the teams whose products outgrew the original assumption. The codebase was small. The team was small. The status struct was honest. The product shipped, then ran for five years, then earned three new modules and a partner integration — and the implicit ordering that worked at module four broke at module nine.

Message-driven costs an extra day on day one. It charges that extra day to the version of the project that will still be running in year five. Most products in 2026 are that project ⏳.

First principle. Shared state is the right answer at small scale and short horizons. Message-driven is the right answer the moment either of those changes.


Chapter 7. What I Would Try This Sprint

If I were on a firmware team with a shared-state architecture and a long horizon, here is the smallest experiment I would run this week 📌.

Pick one module that currently writes to the shared struct. Define one typed message that captures everything that module communicates outward — MOTOR_TELEMETRY, SENSOR_READING, whichever. Define one subscriber that reads the message and updates the local copy it cares about.

Run both paths in parallel for a sprint. The shared struct stays. The new message bus runs alongside. The team sees the contract in code, the subscriber in code, the test in code — and decides whether the next module gets the same treatment.

The cost is a few hundred lines for the bus and one module. The payoff is the first contract the team can point at when the next architectural debate starts. Most teams I have worked with have one module whose interactions with the rest of the system are the source of half their bugs. That is the module to start with 🔧.

The choice between message-driven and shared state is rarely a clean greenfield decision. It is usually a quiet migration, one module at a time, that the team realises in retrospect was the most important refactor of the year.

If your team had one sprint to introduce a typed message bus alongside the existing shared state — which module would you move first?

Next: the architectural choice extends to state machines themselves — and the form that reads cleanest for humans, auditors, and AI is the table.

First principle. The shift from shared state to message-driven is rarely a rewrite. It is a one-module experiment that the team repeats until the architecture changes shape.


Labeled: Opinion + Experience + Fact (40% opinion · 40% experience · 15% fact · 5% fiction)

Sources:

(Written in collaboration with AI — I discuss, I do not outsource.)

New to this labeling? Read the framework → 20+ Years of Ideas. Articulation Is the Craft.

— Ritesh | ritzylab.com

#EmbeddedSystems #Firmware #Architecture #MessageDriven #FirstPrinciples