Signal Normalization

Why Normalization Is Necessary

Raw Telegram metrics are not comparable across channels of different sizes, ages, and ecosystems. A 5% view rate means something different for a 500-member channel than for a 500,000-member channel. A 1,000-member spike means something different for a 3-day-old channel than for a 3-year-old channel.

Normalization makes signals comparable and filters out the structural noise that would corrupt model outputs.

What Is Removed

Bot Clusters

Bot clusters are identified through behavioral signature analysis:

Timing regularity: Human engagement follows irregular patterns. Bot engagement follows clock-aligned bursts.
Account age clustering: Groups of accounts created within the same short window that all join the same channel constitute a strong bot indicator.
Engagement rate uniformity: Groups of accounts that all show identical engagement rates (exactly the same reactions to the same messages) exhibit bot-like homogeneity.

When a bot cluster is detected, its contribution to view counts, join counts, and reaction metrics is downweighted before the signal is fed into models.

Echo Amplification

Echo amplification occurs when a small number of accounts forward content repeatedly across multiple channels to simulate organic spread. Signs:

Content forwarded from a channel back into itself via intermediary channels
Identical forward events from the same small group of accounts across multiple channels
View spikes that appear simultaneously across channels with no organic discovery mechanism linking them

Echo amplification is identified through the creator graph and forward graph analysis. Amplified signal is downweighted in NSM creator density calculations.

Artificial Spikes

Symmetrical join spikes and view spikes that do not correspond to any discoverable organic catalyst (news event, partnership, viral post) are flagged as artificial.

Flagged spikes are:

Retained as input to TRI (they contribute to BotRisk)
Removed from AVI calculations (they would artificially inflate velocity)
Penalized in SIS (they increase the anomaly score A)

⚠️

Normalization reduces but does not eliminate sophisticated bot activity. Highly coordinated campaigns that mimic organic patterns more carefully can partially evade spike detection. This is why multiple models are run simultaneously — a single normalization layer cannot catch everything, but the composite of five model outputs is substantially harder to game.

How Normalization Works

Channel-Relative Baseline

Each metric is normalized relative to the channel’s own historical baseline rather than an absolute scale:

NormalizedMetric = (CurrentValue − BaselineMean) / BaselineStdDev

Where BaselineMean and BaselineStdDev are computed over the channel’s 30-day rolling history.

This means a channel’s current performance is always evaluated against its own established patterns, not against ecosystem averages.

Ecosystem Adjustment

After channel-relative normalization, signals are adjusted for ecosystem baseline:

EcosystemAdjusted = NormalizedMetric × EcosystemMultiplier

EcosystemMultiplier accounts for the fact that, for example, Solana ecosystem channels typically have higher AVI volatility than TON ecosystem channels. Without this adjustment, Solana channels would score artificially high on AVI relative to their ecosystem peers.

Temporal Weighting

Recent signals are weighted more heavily than historical signals:

WeightedSignal = Signal × e^(−λ × DaysAgo)

Where λ is the temporal decay constant. This ensures the model reflects current structural conditions rather than historical performance peaks.

Data Collection Anomaly Detection