The Symbolic System Is the Labeler

The system that decides is also the system that labels. When a symbolic rule in our agent graduates — earns its place by firing well under reinforcement — it does not just act. It emits a training example: the signal features at firing time, paired with the pattern it matched. That pair is a supervised label, produced for free by a deterministic system that was never asked to be a labeler.

This note describes the mechanism and where it stands. It is an approach, not a result: the neural layer that consumes these labels is in development, and we have no generalization numbers to report yet.

Two speeds, and what is not the claim

The agent learns at two speeds. A fast brain updates online — scraper event, fuzzy match, firing strength, weight update, every few minutes, on unlabeled signals. A slow brain trains offline on a much longer cycle, on labeled data only.

The two-speed split is not new, and we do not claim it. It echoes hierarchical RL, complementary learning systems (McClelland et al., 1995), and Bengio’s consciousness-prior framing. The contribution is narrower and sits between the two brains: where the labels come from.

The mechanism

Each graduation event writes a labeled episode:

{
  "signal_features": {
    "event_type": "metric_pod_restart",
    "service": "pgbouncer",
    "restart_count": 15,
    "namespace": "default"
  },
  "pattern_label": "database_connection_failure",
  "confidence": 1.0,
  "source": "hitl"
}

The decision store already records pattern name and confidence; the addition is extracting the signal features at decision time and exporting accumulated episodes. Labels arrive from four sources — human-in-the-loop, incident retrospectives, an autonomous investigation engine, and pre-classified anomaly streams — so the dataset densifies with every pattern that graduates, rather than depending on hand-authored heuristics.

Why it isn’t ANFIS, and isn’t just Snorkel

ANFIS tunes the shape of fuzzy membership functions from sparse reward; this asks a different question — given a signal, which pattern name applies? — learned from accumulating labels. The contrast, point by point:

ANFIS	Symbolic labeling pipeline
Trains on sparse RL reward	Trains on accumulated labeled episodes
Learns MF shape parameters	Generalizes from labeled patterns
Fails with <100 positive examples	Densifies with every graduation
Needs MF boundary supervision	Labels come from the symbolic system
Input: (signal, reward)	Input: (signal_features, pattern_label)

Table 1. ANFIS vs. the symbolic labeling pipeline.

Against Snorkel’s weak supervision the distinction is sharper still: our labels are not authored labeling functions — they are emitted by a system that earned them through RL graduation, and they carry a temporal signature (the features as they stood at firing time). Self-bootstrapping plus temporality is the combination we have not found precedented.

What it builds on

The lineage is explicit — we extend it rather than displace it:

Serafini & d'Avila Garcez (2016). Logic Tensor Networks symbolic facts constrain neural training Xu et al. (2018). A Semantic Loss Function for Deep Learning with Symbolic Knowledge logical constraints as differentiable loss Ratner et al. (2016). Snorkel: Data Programming programmatic weak supervision Manhaeve et al. (2018). DeepProbLog neural-probabilistic logic

Limitations

Stated plainly, because they are the honest state of the work:

No generalization results yet. The training phase is ahead of us, not behind. Everything above is method.
Minimum viable dataset is unknown. We estimate 50+ labeled episodes before training is meaningful; unvalidated.
Override vs. supplement is open. When the neural prediction should override symbolic matching, versus only supplement it, is undecided.
Feature design is open. Which event features form the input vector is unspecified.

Where it stands

The work ships in phases: add signal features to the decision record; export the three-source labeled set; train a deliberately simple model first (logistic regression or a small MLP) before anything larger; then let the slow brain propose new trial patterns from its predictions. Starting simple is intentional — the point is to validate that symbolic graduation produces learnable labels at all, before reaching for architecture.

If it holds, the agent’s rule system becomes a renewable supply of supervision for its neural layer. We will know when the numbers exist.