Signal Detection
siftlogd detects three classes of signal: cascade, anomaly, and silence. Each represents a distinct failure pattern that is difficult or impossible to spot by watching a single service in isolation.
Cascade
A cascade signal fires when one service begins failing and a downstream service follows, particularly when log events share a trace ID. siftlogd identifies the origin service and names the propagation chain in order.
[signal:cascade] auth-service -> api-gateway -> user-service
03:14:19.002 auth-service ERROR db connection pool exhausted
03:14:19.441 auth-service ERROR token validation timeout [trace: f8a21c]
03:14:19.887 api-gateway ERROR auth-service unavailable [trace: f8a21c]
03:14:20.103 api-gateway ERROR circuit breaker OPEN: auth-service
03:14:20.341 user-service ERROR upstream auth failure [trace: f8a21c]
noise suppressed: 61,204 events | signal: 5 events | elapsed: 0.8s
What to do: Start at the origin. cascade_from in the signal record tells you which service failed first. The downstream services are victims — restarting them will not resolve the underlying problem.
Tuning: correlation.window_ms controls how close in time two services must be failing to be considered a cascade.
Anomaly
An anomaly signal fires when a service’s error rate spikes significantly beyond its own historical baseline. siftlogd adapts to each service’s normal behavior — a service running at 2% error rate under load will not trigger just because it is busy.
[signal:anomaly] order-service
error rate 8x above baseline (0.4/min -> 3.2/min)
Tuning: Lower signal.anomaly_threshold_multiplier to catch smaller spikes earlier. A value of 3.0 fires on a 3x increase; 10.0 (default) fires on a 10x increase. Increase for services with naturally spiky traffic.
Silence
A silence signal fires when a service that normally produces steady log traffic goes unexpectedly quiet. Silence is often the hardest failure mode to catch because it produces no errors to alert on.
[signal:silence] inventory-service
volume dropped 94% from baseline (847 -> 12 events/min)
What silence usually means:
- A process has hung silently — no crash, no errors, just stopped
- A deployment went wrong and the new instance is not logging
- A log shipper has stopped forwarding events
- A queue consumer has stalled
Tuning: Lower signal.silence_threshold_pct to detect smaller volume drops. At 70.0, a service that drops to 30% of normal volume will trigger.
Alert thresholds
siftlogd uses two independent thresholds: one for terminal signal detection, one for outbound alerting. The terminal shows everything the correlator detects. Email fires only when a higher, separately configured threshold is crossed.
alerts:
email:
to: oncall@company.com
smtp: smtp.company.com:587
notify_on:
cascade: true
anomaly_multiplier: 5.0 # alert threshold, higher than detection threshold
silence_minutes: 10 # alert only after 10 min silence
Signal history
All signals are written to ~/.siftlogd/signals.db (SQLite). Query with any SQLite client:
-- Last 20 signals
SELECT created_at, signal_type, service, cascade_from
FROM signals
ORDER BY id DESC
LIMIT 20;
-- All cascades in the last hour
SELECT created_at, service, cascade_from, trace_id
FROM signals
WHERE signal_type = "cascade"
AND created_at > datetime("now", "-1 hour");
Next: REST API
SiftLog Platform
Always-on log correlation daemon. Cascade, anomaly, and silence detection across every log source in your infrastructure.