Illuminate IQ — How the System Learns
Three levels of intelligence. Compounding over time. Every site benefits from every other site.
SD-WAN Overwatch · Nepean Networks · April 2026
Core principle: Intelligence accumulates at three levels simultaneously. Node-level knowledge is site-specific and precise. SD-WAN group knowledge links connected sites. Fleet knowledge compounds across every site and every MSP — and flows back to benefit every individual node automatically.
🌐 Fleet Level
All sites · All partners · Global
Shared intelligence that protects every node simultaneously — without sharing any customer data
Fleet Patterns39+ active patterns. Auto-suppress known-benign events before any AI call. 5+ confirmations at 75%+ confidence to activate.
Threat SignaturesConfirmed threat patterns from MSP feedback. Hikvision→China Telecom confirmed as threat at one site = every Hikvision protected everywhere.
False Positive LibraryConfirmed false positives suppressed fleet-wide. Microsoft Teams bandwidth spike marked OK once = never alerts again at any site.
Partner CampaignsSame suspicious IP/ASN/country at 2+ sites within 2hrs = campaign alert. Coordinated attacks detected that individual sites would miss.
Carrier EventsSame WAN anomaly at 10+ nodes globally within 60 mins = ISP fault, not site-specific. Context added to all affected alerts automatically.
Threat Intel CacheAbuseIPDB + OTX + GreyNoise + Talos enrichment. Known malicious IPs escalated to P1. Known scanners auto-suppressed.
↕
🔗 SD-WAN Group Level
Connected sites · Same customer fabric
Correlation across nodes that share an SD-WAN fabric — lateral movement, carrier faults, shared incidents
Shared IncidentsSame anomaly type at 2+ nodes in same SD-WAN group within 30 mins. Likely carrier fault or shared threat — one incident, not one alert per site.
Lateral MovementUnusual traffic volumes between nodes in the same SD-WAN fabric. Compromise at one site spreading to another.
Inter-Site TrafficNew or anomalous traffic between sites that have never communicated before. Potential staging or exfiltration path.
Subnet DiscoveryLAN subnets per node derived from flow data and cached. Enables accurate inter-site traffic detection without manual configuration.
↕
📍 Node Level
Individual site · Highly specific
Site-specific baselines built from scratch. Every device and every link gets its own precise behavioural profile.
Device BaselinesPer-device: seen countries, ASNs, applications, daily bytes out (mean + stddev), hourly activity patterns. 288 time buckets per metric.
Link BaselinesPer-WAN-link: latency, jitter, packet loss per 5-minute time bucket. Monday 9am has different normal than Sunday 2am.
App ProfilesPer-site: top applications by bandwidth, device count, daily volume. Baseline for what "normal" application mix looks like.
Historical Context30-day rolling window: weekly alert trends, anomaly frequency, destination IP history, MSP feedback summary. Fed to Claude on every investigation.
Device Inventory636+ devices across 11 sites. OUI vendor, device class, observation days, countries seen, ASNs seen.
Incident MemoryOpen incidents tracked with full timeline, device MACs, destinations, bytes. Context for every new alert at the same site.
The compounding effect: Knowledge flows upward — node feedback updates fleet patterns. Knowledge flows downward — fleet patterns protect every node. A new site deployed today immediately benefits from everything every other site has ever learned.
Per-node learning timeline: Every Antares node starts knowing nothing about its own network. Over 14 days it builds a precise baseline. Fleet intelligence arrives from day one and continuously enriches local detection.
Per-node learning timeline
Day 1 — 13
Learning phase — data collection only NO ALERTS
Every flow event from Illuminate is consumed and stored. Device baselines are building: which countries each device talks to, which applications it uses, how much data it moves, what time of day it is active. WAN link metrics are being recorded to establish latency, jitter, and loss norms per 5-minute time bucket. No anomaly alerts fire — the baseline doesn't exist yet. Security-critical events (crypto mining, Tor, known malicious IPs) bypass learning phase and alert from day one.
Node learning Fleet patterns active immediately
Day 14
Baseline unlocked — detection goes live ALERTS ON
After 14 days, the system has seen each device across two full weekly cycles — Monday morning traffic vs Saturday night. Anomaly detection switches on. The first alerts start firing against real deviations from the established baseline. Fleet patterns are already suppressing the most common false positives.
Node baseline active SD-WAN group correlation active Fleet patterns enriching
Day 14 — 30
30-day historical context builds
As the site accumulates alert history, the Historical Context module gains substance. Claude now sees weekly alert trends, anomaly frequency by type, and destination IP history when investigating new anomalies. "This IP has been seen 200 times in 30 days and always dismissed" is now available context — reducing false positives significantly.
Historical context filling Fleet feedback accumulating
Day 14 — 90
Baseline refinement — false positives declining
The baseline continues updating via exponential moving average (EMA, alpha = 0.05). Seasonal patterns settle. If a customer installs a new application permanently, the baseline adapts and stops alerting within 1–2 weeks. MSP feedback actively shapes sensitivity. Fleet intelligence continuously enriches local detection — known threat patterns from other sites arrive and immediately improve detection quality.
Node baseline maturing Fleet patterns: confirmed patterns propagating
Day 90+
Mature baseline — high confidence detection FULL COVERAGE
Three months of history. The baseline is highly stable. Alerts are precise. The system has enough data to distinguish genuine anomalies from seasonal variation, business events, and infrastructure changes. A new device appearing is immediately compared against 90 days of normal behaviour for every other device of the same type across the fleet. WAN baselines now cover all 288 daily time buckets with multiple samples per bucket.
Mature node Group intelligence mature Fleet-enriched
Ongoing
Continuous adaptation
The EMA smoothing factor means the baseline adapts slowly to permanent changes while still detecting sudden deviations. A camera that starts legitimately communicating with a new CDN will gradually absorb that into its profile over ~3 weeks. The same camera suddenly sending 2 GB to an unknown ASN overnight fires immediately — the deviation is too large and too sudden to be normal adaptation. Claude now has 30+ days of historical context, full device inventory, open incidents, and fleet intelligence for every investigation.
Node Group Fleet
Each data element has a different learning mechanism, retention policy, and scope of application. Some things are learned per-device. Some per-link. Some are inherited from fleet intelligence on day one.
Device behaviour baselines (per-device, per-site)
| What is learned | How | Stored as | Used for |
|---|---|---|---|
| Destination countries Which countries each device talks to |
EMA frequency weight per country. Alpha = 0.05. Threshold for "normal" = 3%. | seen_countries JSONB{"AU":0.91,"US":0.07} |
flow_geo_new: flag first-ever contact with country below threshold. P2 if high-risk country. |
| Destination ASNs Which autonomous systems it reaches |
EMA frequency weight per ASN. Normalised separately from country. | seen_asns JSONB{"AS13335":0.45,"AS16509":0.33} |
flow_volume_spike context. Escalated if ASN is in high-risk list. |
| Applications used Illuminate-identified app categories |
EMA frequency weight per application identifier. | seen_applications JSONB |
app_bw_spike: flag when single app exceeds 5× normal bandwidth share. |
| Daily outbound volume How much data each device normally sends |
Rolling mean + standard deviation updated via EMA each day. | daily_bytes_out_meandaily_bytes_out_stddev |
flow_volume_spike: fire when current day exceeds mean + 4σ. |
| Device identity What type of device this is |
OUI lookup from MAC address prefix. Manual labels from MSP via commentary. | oui_vendor, oui_classobservation_days |
Alert narratives, fleet pattern matching, NLQ device queries, incident context. |
| Destination IP history Which specific IPs have been seen and what happened |
Derived from anomalies + alert feedback over 30 days. Not a persistent baseline — queried on demand. | Queried from anomalies + alerts tables at investigation time |
Historical context: "This IP seen 200× in 30 days, always dismissed → likely benign." Reduces false positives dramatically. |
WAN link baselines (per-link, per-time-bucket)
| What is learned | How | Stored as | Used for |
|---|---|---|---|
| Latency by time of day Normal latency for each 5-minute window |
288 time buckets (5-minute slots). EMA mean + stddev per bucket. Monday 9am has its own baseline. | latency_mean[288]latency_stddev[288] |
link_latency_high: fire when current > bucket mean + 3σ. P2 on primary link. |
| Packet loss by time of day Normal loss for each 5-minute window |
Same 288-bucket EMA mechanism. Upstream loss tracked separately from downstream. | loss_mean[288]loss_stddev[288] |
link_loss_high: fire when current > bucket mean + 3σ. compound:isp_fault when combined with latency. |
| Failover history How often this link has failed over |
Counter per 24h and 7d window. Flap detection when multiple failovers in short window. | failover_count_24hfailover_count_7dflap_detected |
compound:link_flapping: fire when failover rate exceeds normal frequency. |
| Conntrack size Normal connection table size |
EMA per time bucket. Conntrack spikes indicate port scanning, DDoS, or connection floods. | conntrack_mean[288] |
compound:conntrack_spike, compound:ddos_inbound detection. |
Fleet-level intelligence (applies to every node immediately)
| What is learned | Activation threshold | Effect |
|---|---|---|
| Known-benign patterns E.g. Synology NAS → Microsoft Azure = not a concern |
5+ confirmations across fleet at ≥ 75% confidence | Auto-suppressed before any AI call. No alert. No MSP fatigue. Zero cost. |
| Known-threat patterns E.g. device → specific malicious ASN = confirmed threat |
3+ confirmations at ≥ 85% confidence | Immediate P1/P2 alert. Narrative references fleet confirmation. MSP acts faster with higher confidence. |
| Threat intel enrichment AbuseIPDB, OTX, GreyNoise, Talos per destination IP |
Queried for every novel anomaly that survives pre-filter | Known malicious IPs → P1. GreyNoise scanners → suppress. OTX C2/botnet → force escalation. |
| Partner campaigns Same suspicious IP/ASN at 2+ customer sites within 2hrs |
2+ sites, within 120 minutes, novel anomaly type only | Campaign alert: "This IP contacted 3 of your customer sites simultaneously. Coordinated targeting likely." |
The fleet pattern library is the compounding intelligence asset. It grows with every confirmed alert and every marked false positive. The larger the fleet, the more precise it becomes — and the less work each individual MSP has to do.
Fleet intelligence coverage by fleet size
Threat signature coverage
False positive suppression
Campaign detection accuracy
New node day-1 protection
Carrier event detection
What a new node gets on day 1 vs day 90
New node — Day 1 LEARNING
No local baseline yet. 14-day learning phase active.
✓ Fleet threat signatures (immediate)
✓ False positive suppression rules (immediate)
✓ Threat intel enrichment on novel IPs (immediate)
✓ SD-WAN group correlation (immediate)
○ Local device baselines (building...)
○ Local link baselines (building...)
○ 30-day historical context (building...)
○ Site-specific feedback shaping (pending)
✓ Fleet threat signatures (immediate)
✓ False positive suppression rules (immediate)
✓ Threat intel enrichment on novel IPs (immediate)
✓ SD-WAN group correlation (immediate)
○ Local device baselines (building...)
○ Local link baselines (building...)
○ 30-day historical context (building...)
○ Site-specific feedback shaping (pending)
Same node — Day 90 FULL COVERAGE
Mature local baseline + full fleet intelligence.
✓ Fleet patterns (updated continuously)
✓ False positive suppression (site-tuned)
✓ Threat intel (4-source enrichment)
✓ SD-WAN group + partner campaigns
✓ Per-device baselines (90 days deep)
✓ Per-link baselines (all 288 time buckets)
✓ 30-day historical context (rich)
✓ Local feedback shaping sensitivity
✓ Fleet patterns (updated continuously)
✓ False positive suppression (site-tuned)
✓ Threat intel (4-source enrichment)
✓ SD-WAN group + partner campaigns
✓ Per-device baselines (90 days deep)
✓ Per-link baselines (all 288 time buckets)
✓ 30-day historical context (rich)
✓ Local feedback shaping sensitivity
The feedback flywheel: Every MSP interaction teaches the system. A single "Not a Concern" click at one site can suppress the same alert at 1,000 other sites. The system gets smarter every time someone uses it — and the cost drops as it learns.
Alert → feedback → fleet improvement loop
① Anomaly detected
Device deviates from its local baseline, or matches a known threat pattern, or fleet campaign detected. Statistical detector fires.
→
② Pre-filter check
97%+ of anomalies handled locally. Fleet patterns auto-dismiss known-benign events. P4 auto-dismissed. No AI call needed.
→
③ Claude investigation
Surviving anomalies sent to Claude with full context: 30-day history, device baselines, fleet patterns, WAN state, open incidents.
↓
④ Alert delivered to Antares
Plain-English narrative. Specific device MACs. Named destinations. Recommended actions. Severity P1–P4.
→
⑤ MSP reviews and responds
Confirm threat / mark false positive / ask a question via NLQ. Single click. Takes 3 seconds. Recorded against the anomaly pattern.
→
⑥ Site-level learning
Feedback recorded. Same pattern from same device at this site will not alert again (false positive) or will escalate faster (confirmed threat).
↓
⑦ Fleet pattern update
Confirmation increments fleet pattern confidence score. Pattern type, device class, destination ASN, country — no customer data shared.
→
⑧ Activation threshold
5+ confirmations across fleet at ≥ 75% confidence = pattern activates. Immediate effect on pre-filter for every node.
→
⑨ Every node protected
All 11 nodes (and every future node) now suppress this pattern before it reaches Claude. AI cost drops. MSP alert fatigue drops.
Real example from the live fleet
Sorrento Office 2 — IBM Cloud Japan traffic: Multiple alerts fired for high-volume traffic to AS2914 (NTT/IBM Cloud Japan). MSP reviewed and confirmed Not a Concern — this is a legitimate business application. After 5 confirmations across the fleet, the pattern activated globally. Now all 11 sites suppress the same pattern automatically. Claude never sees it again. Cost: zero. MSP effort: 15 seconds total across 5 reviews spread over 2 weeks.
Partner campaign detection (new in April 2026): If Rose and Crown and Freight Lines Welshpool both contact the same suspicious IP within 2 hours, a campaign alert fires — one alert covering both sites. This is invisible to any system that analyses each site independently. The partner_campaigns table tracks these cross-site detections without sharing any customer data between sites.
Natural language Q&A (new in April 2026): MSPs can ask plain-English questions from the Antares portal. "What are the Hikvision cameras doing?" now returns: 19 cameras identified by MAC address, their behaviour is normal (0–6KB/day average), the alarms are from unknown devices with randomised MACs. This answer is built from device baselines, alert history, incident data, and IoT class detection — all learned from the live network.