Research
The Symbolic System Is the Labeler
The system that decides is also the system that labels. When a symbolic rule in our agent graduates — earns its place by firing well under reinforcement — it does not just act. It emits a training example: the signal features at firing time, paired with the pattern it matched. That pair is a supervised label, produced for free by a deterministic system that was never asked to be a labeler.
This note describes the mechanism and where it stands. It is an approach, not a result: the neural layer that consumes these labels is in development, and we have no generalization numbers to report yet.
Two speeds, and what is not the claim
The agent learns at two speeds. A fast brain updates online — scraper event, fuzzy match, firing strength, weight update, every few minutes, on unlabeled signals. A slow brain trains offline on a much longer cycle, on labeled data only.
The two-speed split is not new, and we do not claim it. It echoes hierarchical RL, complementary learning systems (McClelland et al., 1995), and Bengio’s consciousness-prior framing. The contribution is narrower and sits between the two brains: where the labels come from.
The mechanism
Each graduation event writes a labeled episode:
{
"signal_features": {
"event_type": "metric_pod_restart",
"service": "pgbouncer",
"restart_count": 15,
"namespace": "default"
},
"pattern_label": "database_connection_failure",
"confidence": 1.0,
"source": "hitl"
}
The decision store already records pattern name and confidence; the addition is extracting the signal features at decision time and exporting accumulated episodes. Labels arrive from four sources — human-in-the-loop, incident retrospectives, an autonomous investigation engine, and pre-classified anomaly streams — so the dataset densifies with every pattern that graduates, rather than depending on hand-authored heuristics.
Why it isn’t ANFIS, and isn’t just Snorkel
ANFIS tunes the shape of fuzzy membership functions from sparse reward; this asks a different question — given a signal, which pattern name applies? — learned from accumulating labels. The contrast, point by point:
| ANFIS | Symbolic labeling pipeline |
|---|---|
| Trains on sparse RL reward | Trains on accumulated labeled episodes |
| Learns MF shape parameters | Generalizes from labeled patterns |
| Fails with <100 positive examples | Densifies with every graduation |
| Needs MF boundary supervision | Labels come from the symbolic system |
| Input: (signal, reward) | Input: (signal_features, pattern_label) |
Against Snorkel’s weak supervision the distinction is sharper still: our labels are not authored labeling functions — they are emitted by a system that earned them through RL graduation, and they carry a temporal signature (the features as they stood at firing time). Self-bootstrapping plus temporality is the combination we have not found precedented.
What it builds on
The lineage is explicit — we extend it rather than displace it:
(2016). Logic Tensor Networks symbolic facts constrain neural training (2018). A Semantic Loss Function for Deep Learning with Symbolic Knowledge logical constraints as differentiable loss (2016). Snorkel: Data Programming programmatic weak supervision (2018). DeepProbLog neural-probabilistic logicLimitations
Stated plainly, because they are the honest state of the work:
- No generalization results yet. The training phase is ahead of us, not behind. Everything above is method.
- Minimum viable dataset is unknown. We estimate 50+ labeled episodes before training is meaningful; unvalidated.
- Override vs. supplement is open. When the neural prediction should override symbolic matching, versus only supplement it, is undecided.
- Feature design is open. Which event features form the input vector is unspecified.
Where it stands
The work ships in phases: add signal features to the decision record; export the three-source labeled set; train a deliberately simple model first (logistic regression or a small MLP) before anything larger; then let the slow brain propose new trial patterns from its predictions. Starting simple is intentional — the point is to validate that symbolic graduation produces learnable labels at all, before reaching for architecture.
If it holds, the agent’s rule system becomes a renewable supply of supervision for its neural layer. We will know when the numbers exist.