Edge AI · Thermal-Aware Dynamic Scaling · Senior Thesis Project
Adaptive Edge AI Controller
A hardware-aware control layer for NVIDIA Jetson Orin NX that uses real-time thermal telemetry, FOPDT prediction, and fuzzy logic to dynamically scale YOLOv8 inference parameters, preventing thermal shutdowns.
Overview
Running YOLO and real-time vision workloads on Jetson-class edge devices is not only a peak-performance problem; it is a long-duration reliability problem. In security cameras, robotic platforms, and unattended field systems expected to operate continuously, sustained inference load gradually accumulates heat, risking thermal throttling, performance degradation, and sudden device crashes.
Adaptive Edge AI Controller introduces a software control layer that sits above the inference loop. By querying hardware temperature sensors in real time and predicting near-future thermal pressure via a First-Order Plus Dead Time (FOPDT) model, it dynamically manages inference resolution (imgsz) and frame-processing ratios (percentage) using a fuzzy-logic controller. The result is a self-regulating system that maintains stable operation without triggering emergency shutdowns.
Continuous closed-loop run on Jetson hardware
Synchronized CSV logs of temperatures, load, and FPS
Maintained safely below the 85°C emergency threshold
Stabilized within warning and critical operating bands
Demonstration Video
Closed-loop thermal adaptation in action.
Watch the controller manage a real-time YOLOv8 human detection pipeline on Jetson hardware for over 2 hours, dynamically switching between operating modes to maintain thermal equilibrium.
Real-time YOLOv8 pipeline demonstrating live parameter scaling based on device telemetry.
The Problem
Thermal instability in unattended Edge AI nodes.
Real-time object detectors continuously stress GPU, CPU, memory, and hardware decoders. On compact edge systems, this sustained load generates significant heat. If unchecked, the device enters thermal throttling, creating erratic latency spikes and frame drops. In severe cases, the operating system locks up entirely or performs a hard shutdown, requiring manual physical intervention on site.
Developer Forum Reports & Field Evidence
| Source | Reported Problem | Implication |
|---|---|---|
| NVIDIA Developer Forums | GPU temperature climbs to 70 °C within 10 minutes of running YOLOv8 on Jetson Nano, causing stability concerns for outdoor deployments. | Long-duration field operations require temperature-trend-aware scaling rather than static configurations. |
| NVIDIA Jetson Forum | Embedded system shuts down completely after 15–20 minutes of real-time object detection from RTSP stream due to heatsink overheating. | Model optimization is insufficient on its own; application-level thermal safety must be built into the runtime. |
| Ultralytics Community | Pipeline freezes completely within 1–2 hours under multi-camera YOLO inference, necessitating physical power cycling. | Unattended remote nodes (e.g., security cameras, outdoor sensors) require proactive self-healing mechanisms. |
| NVIDIA Developer Forums | DeepStream pipeline experiences frame delays and thermal throttling at 68–70 °C on newer Jetson Orin Nano Super hardware. | Sustained inference stresses the thermal envelope of even latest-generation compact edge AI hardware. |
A/B Evaluation
Self-regulation vs. hardware-level failure.
Without the controller, the system passively waits for operating-system throttling or system lockup. With the controller, the application proactively adapts its compute needs to maintain stability.
Thermal Behavior
Proactive vs. PassiveGPU temperature rises unchecked, leading to severe thermal stress and eventual hardware safety shutoffs.
Workload is actively scaled back as temperatures cross thresholds, keeping the system below the 85°C emergency line.
Throughput (FPS)
Stable vs. ErraticStarts high, then suffers erratic drops and high latency spikes as OS clock-throttling (DVFS) degrades clocks.
Controlled trade-off. Workload levels are adjusted smoothly, maintaining predictable and stable pipeline frame rates.
System Uptime
Continuous vs. BoundedHigh risk of system lockups, kernel freezes, or sudden shutdowns after 15 to 60 minutes of heavy YOLO execution.
Continuous 130+ minute operation verified under high ambient load without a single thermal freeze or crash event.
Performance Loss
Deterministic vs. RandomUnpredictable and arbitrary. The operating system decides which system components to throttle and when.
Managed degradation. The application controls which dimensions (resolution vs. frame count) are sacrificed to keep running.
Operator Needs
Autonomous vs. ManualHigh. Requires manual intervention to reset the frozen field devices or cool them down physically.
Zero. The system self-regulates in real time, automatically restoring full parameters when the device cools down.
Operating Modes
Multi-stage thermal-region traversal.
Safe Region
GPU Temp < 70 °CFull-performance mode. YOLO inference runs at maximum quality (imgsz=640) and processing ratio (percentage=1.0) to prioritize detection accuracy.
Warning Region
GPU Temp 70 °C – 80 °CGradual scaling mode. The fuzzy-logic controller dynamically scales down input resolution (imgsz) and limits processed frames to prevent heat buildup.
Critical Region
GPU Temp 80 °C – 85 °CAggressive scaling mode. The controller enforces strict workload reduction (processed frame ratio drops sharply) to arrest the upward temperature trend.
Emergency Region
GPU Temp >= 85 °CSafety override mode. Bypasses the fuzzy loop to force immediate, hard-coded limits (imgsz=320, percentage=0.25) to protect the device from thermal damage.
Control Architecture
A closed-loop software control cycle.
The inference loop queries the latest control decisions before each frame. The controller runs as a parallel thread, updating metrics and resolving fuzzy parameters periodically.
Directly queries NVIDIA Tegra sysfs nodes to extract GPU temperature, CPU utilization, and GPU load at sub-second intervals.
Stores recent samples to smooth sensor noise and compute short-horizon temperature derivatives (dT/dt) for trend analysis.
Estimates near-future thermal pressure by modeling the Jetson device as a First-Order Plus Dead Time dynamic process.
Translates thermal error and rate of change into continuous workload adjustment factors using Mamdani-style inference rules.
Monitors the hardware thresholds and enforces hard override values in case of critical temperature spikes (>= 85 °C).
Dynamically updates the YOLOv8 pipeline, adjusting input image size (imgsz) and processed frame ratio on the fly.
Telemetry Analysis
Empirical validation of control effectiveness.
Telemetry recorded during a 130-minute test run demonstrates the controller successfully managing temperatures, keeping the system inside a stable thermal envelope.
Experiment A: Processed Frame Ratio Throttling (Percentage Lever)
This experiment isolates the processed-frame ratio lever, scaling it from 1.0 (all frames inferred) to 0.25 (1 in 4 frames inferred) at a fixed resolution of 640px.
| Control State | Average GPU Temp | Average FPS | Average GPU Load | Average CPU Load |
|---|---|---|---|---|
| percentage=1.0 (baseline) | 56.47 °C | 24.06 | 72.88% | 19.10% |
| percentage=0.25 (scaled) | 54.63 °C | 8.34 | 26.39% | 10.70% |



Experiment B: Input Resolution Scaling (Image-Size Lever)
This experiment isolates the resolution control lever, scaling input width between 640px and 320px while keeping the frame-processing ratio fixed at 1.0.
| Control State | Average GPU Temp | Average FPS | Average GPU Load | Average CPU Load |
|---|---|---|---|---|
| resolution=640 (baseline) | 57.15 °C | 23.75 | 71.03% | 18.98% |
| resolution=320 (scaled) | 55.99 °C | 25.87 | 67.04% | 19.11% |



Offline First-Order Plus Dead Time Model Fits
The thermal response parameters were identified offline by applying step changes to the actuators, mapping the system transfer function (Gain K, Time Constant Tau, and Dead Time Theta).
| Step Experiment | Process Gain (K) | Time Constant (Tau) | Dead Time (Theta) | RMSE | R² Accuracy |
|---|---|---|---|---|---|
| FPS percentage step | 3.96 °C/u | 119.1 s | 0.0 s | 0.154 °C | 0.936 |
| Image-size step | 2.50 °C/u | 82.3 s | 5.2 s | 0.076 °C | 0.963 |
Target Systems
Practical deployment scenarios.
Smart Surveillance
Security cameras expected to run continuous target identification in outdoor enclosures under direct sunlight.
Robotics & UAVs
Compact mobile systems with strict battery limits, small physical footprints, and limited airflow cooling.
Traffic Analytics
Street-level camera nodes tracking vehicle and pedestrian volumes continuously without physical access.
Remote Edge Sensors
Unattended environmental and industrial monitoring stations where physical maintenance is costly or impossible.
My Contribution
I designed the closed-loop thermal experiments and implemented the real-time YOLOv8 GStreamer camera capture pipeline. I built the telemetry logger to sample Jetson Orin NX sysfs nodes, coded the FOPDT thermal predictor, and developed the Mamdani fuzzy-logic rules. I validated the control levers empirically, proving that application-level self-regulation preserves system uptime.
Technology Stack