Skip to content
JM Valino
selected work
Telemetry 2025 Architect & Platform Engineer

Telemetry and Fleet Management System

A system for managing device fleets at scale — health, configuration, and remote operations — backed by a high-throughput telemetry pipeline.

AWS IoT Core Telemetry Pipelines TypeScript Event-Driven

A non-confidential, case-study-style overview. Specifics are generalized.

Problem

Operating a growing device fleet meant answering basic questions — which devices are healthy, what firmware are they on, can I reconfigure a subset safely — that the existing tooling couldn’t answer without manual work. At the same time, raw telemetry volume was outpacing the ability to store and query it cost-effectively.

Decisions

  • Built a telemetry pipeline that separated hot-path ingestion from analytical storage, aggregating and down-sampling before long-term persistence.
  • Introduced a device-state model — health, configuration, and lifecycle — kept current from device-reported events rather than polled on demand.
  • Designed remote operations (configuration, staged updates) around idempotent, acknowledged commands so the fleet could be changed safely in batches.
  • Made per-device and per-cohort observability a built-in capability.

Tradeoffs

  • Aggregating at ingestion reduced storage and query cost but meant deciding early which fidelity to keep — a reversible-but-not-free decision.
  • An eventually-consistent device-state model favored availability and throughput over always-current reads, which fit the operational use cases.
  • Staged, acknowledged rollouts were slower than fire-and-forget but eliminated whole classes of fleet-wide mistakes.

Outcome

Operators gained a current, queryable view of fleet health and the ability to make changes in controlled batches with a known blast radius. Telemetry costs became predictable as volume grew, and incident response shifted from manual investigation to dashboards and targeted action.

Technologies

AWS IoT Core · telemetry pipeline · stream processing · device-state modeling · staged remote operations · TypeScript.