AeroLoop, Autonomous Aircraft Detection System

Context

Built for the 2025 Edge Impulse Hackathon over one month (31 Oct to 30 Nov).
The concept: a fully autonomous pipeline that collects audio, validates samples, and improves its own aircraft detection model with minimal human oversight.

This wasn’t a model project.
It was a system project from end to end.

The Problem

Detecting aircraft in noisy outdoor environments generates mountains of irrelevant data. A Pi recording 24/7 might capture 1 hour of aircraft and 23 hours of wind, traffic, and silence.

Manual annotation would take 23-34 hours per day.
Even worse: most negative samples are useless for training. What you need are hard negatives, the sounds that confuse the model, like machinery, cars, and construction.

The real question: How do you find hard negatives without drowning your annotators?

The Solution

Three techniques in one autonomous loop:

1. Ground Truth via Sensor Fusion
RTL-SDR decodes aircraft transponders. When a plane enters a 3km radius, the system triggers a 60-second recording. The aircraft’s presence becomes the label.

2. Hard-Negative Mining at the Edge
When no aircraft is within 10km, the system records 20 seconds of background noise and runs on-device inference immediately. If predictions stay below 0.4 confidence, it deletes the sample. If any prediction is at least 0.4, it keeps the sample because model confusion is valuable training data.

Result: 91-98% of negatives rejected on-device. ~5.5 hours of annotation time saved.

3. Remote MLOps Pipeline
Automated workflow: trim audio, upload to Edge Impulse, retrain, evaluate, build TFLite, and deploy to Pi only if accuracy improves. Runs autonomously via SSH and EI API.

System Architecture

Raspberry Pi 4 (The Collector)

Decodes ADS-B transponder signals (dump1090)
Records via Arduino Nano (USB microphone)
Runs TFLite inference for negative filtering
Stores validated samples with metadata

Arduino Nano 33 BLE Sense (The Target Device)

256 KB RAM / 1 MB flash constraint
Triple duty: collection mic, validation target, production deployment
Using same hardware throughout ensures acoustic consistency

Local Machine (The Orchestrator)

Remote download via SSH
Streamlit annotation GUI with audio trimming
Automated training, evaluation, conditional deployment

Key Design Decisions

Audio Processing

2-second windows (fits Nano memory, <500ms inference)
MFE DSP: 32,000 samples to 1,984 features
Compact CNN: 4x Conv2D to 128 neurons to 2-class output
Weighted moving average over 3 predictions for stable real-time output

Training Strategy

50:50 train/test split initially (needed headroom to measure improvement)
Fixed test set throughout (evolved to 80:20 by project end)
GPU training: <10 minutes per iteration

Results

Baseline (Iteration 0)

25 hours collection, 4.5 hours annotation
142.6 minutes total (40.5 aircraft, 102.1 negative)
94.73% test accuracy

Autonomous Loop (Iterations 1-4)

Test accuracy: 94.73% to 95.9%
Class distribution shifted naturally: 28:72 to 62:38, towards model weaknesses
Negative drop rate: 91-98%
Annotation time saved: 5.5 hours

The DSP Failure (Iterations 5-7)

Aircraft flight patterns changed (takeoffs vs. landings). Model adapted, but my custom DSP implementation diverged from Edge Impulse’s MFE block. On-device filtering broke (0% drop rate) while studio accuracy kept improving.

Root cause: 32-bit Pi couldn’t run EI’s Linux SDK. Custom NumPy/SciPy DSP worked initially but failed as model sophistication increased.

The lesson: Production systems need production-grade tooling.

Final Production Model

97.11% test accuracy (F1: 0.96 aircraft, 0.98 negative)
Total negatives rejected: 330.31 minutes (5.5 hours)
Annotation time savings: 73% (7.5 hours to 2 hours)
Real-world validation: Nano deployed at window, predictions aligned with FlightRadar24

Why This Matters

Traditional ML: collect everything, label everything, train, then hope it works

AeroLoop: deploy model, collect only what improves it, retrain automatically, repeat

This approach scales. The 73% reduction in annotation time over 60 hours isn’t the story. The story is a self-improving system that respects human time, hardware constraints, and real-world messiness.

Domain Transferability

The methodology generalises to any domain with:

Ground-truth sensor (SDR, GPS, camera, scheduled events)
Target sensor (microphone, IMU, accelerometer)
Sparse events in noisy data

Hypothetical applications: gunshot detection, predictive maintenance, wildlife monitoring.