FRAMEWORK30 April 2026 · 9 min read

Reading the Whale Confirmation Coach output.

Most "AI signal" features narrate the score they just computed. The Whale Confirmation Coach does the opposite — it goes looking for evidence the score is wrong. When the framework says BULLISH but the verification layer says CONTRADICTED, the AI is usually right. Here's why, how to read the verdicts, and the case where the layer caught $435M of insider selling the framework had silently scored as bullish.

The closed-loop problem

Pre-v6.3, whale_sentiment worked like every other indicator: it read OBV, dark-pool proxies, block-trade detection, and unusual options activity, then output a 0-100 score. That score showed up in the per-ticker card as a chip and got fed back into the AI thesis prompt. The AI would then narrate the score back. "Whale sentiment is bullish at 78/100, supported by rising OBV and elevated block-trade activity."

The problem isn't that the narration was wrong. The narration was always faithful to the input. The problem is that the input had nowhere to be checked against. If whale_sentiment.py missed something — a Form 4 filing, a 13D position change, a sector-wide dark-pool divergence the proxy didn't model — the score never knew. The AI happily restated the gap.

That's a closed loop. And closed loops are what kill traders who use signal stacks: every layer sounds confident because every layer is reading the same partial truth.

What the verification layer actually does

v6.3 added a separate prompt that runs after the framework score is computed. The Whale Confirmation Coach takes the framework's verdict — say, "BULLISH 78" — and says: "go check that against the world." It runs Anthropic web_search against:

Then it returns one of three verdicts:

Plus an agreement percentage (0-100) that quantifies how much of the cited evidence supports the framework score, and a list of citations — actual links to the SEC pages, FINRA reports, and analyst notes the AI cross-checked.

How to read the agreement %

The agreement percent is not a confidence score. It's a corroboration density. Read it like this:

Agreement %ReadingAction
≥ 75%Multiple independent sources align with the framework score.Treat the framework score as load-bearing. Size normally.
50-74%Sources are mixed. Some real evidence both ways.Read the citations. Often the AI surfaces a real concern (insider selling alongside institutional buying). Reduce size or wait.
25-49%Most sources lean against the framework score.The framework is probably reading a partial signal. Don't size up on this one.
< 25%Outside evidence broadly contradicts.Skip the trade or scale into a hedge. The framework's reading the noise.

The math floor is 50%. Below it, the verification layer is telling you the per-ticker indicators are out of phase with what the larger market is doing — which is the dangerous regime, not the obviously-bullish one.

The verdict that matters most: CONTRADICTED

When CONFIRMED fires, it's a green-light comfort signal. Useful, but not high-leverage — you were probably going to take the trade anyway. The verdicts that earn the coach's $0.10 per call are the contradictions. Specifically: cases where the framework says BULLISH 75+ and the verification layer comes back CONFLICT or CONTRADICTED with citations.

Why the asymmetry? Because the framework's whale_sentiment score is built from tape behavior — OBV, block prints, unusual options. Tape behavior leads price by minutes to days. Insider Form 4 filings, 13D position changes, and analyst rating moves are structural data — they describe ownership decisions made weeks earlier. When the structural data points the opposite direction from the tape, you're almost always looking at one of two things:

Either way, the framework's surface read is wrong about who's behind the price action. The verification layer sees through the tape because it's reading the receipts.

The PLTR $435M case

The cleanest calibration moment came on PLTR. The framework had been scoring it BULLISH 72 — solid OBV, decent dark-pool flow, a recent earnings beat. The card was green. The thesis read fine.

The Whale Confirmation Coach came back CONTRADICTED. Agreement: 28%. The citations:

The framework's tape read wasn't wrong about volume. The volume was real. But the volume was institutional sells being absorbed by retail buys. The 13F-level data the AI surfaced told the actual story. The bullish tape was the symptom of distribution, not accumulation.

Same pattern fired for GOOGL (Pichai 2.53M shares sold) and NVDA (953,976 shares sold across 18 months with a 15:0 sell-to-buy ratio across executives). On three of the six tickers in the calibration window, the verification layer caught insider selling the framework had silently scored as bullish.

These aren't edge cases. These are mega-caps in 2026, and the same Form 4 / 13F machinery applies to every ticker the framework tracks. Without the verification layer, the framework's whale chip would have lit up green on three trades that were structurally bearish.

Why CONFIRMED is sometimes the bigger flag

There's a second-order use of the verdicts that's less obvious: a CONFIRMED verdict on a high-conviction setup is the green light to size up, not just to enter. The default is to enter at sleeve weight (15-25% per ticker depending on regime). When CONFIRMED fires with agreement ≥85% and Velocity Exception conditions are also met, the cap lifts to 35% — but only because the verification layer just told you the framework wasn't reading a partial signal.

Without the coach, the Velocity Exception would still exist as a rule in the framework, but the rule would gate on per-ticker indicators alone. Adding outside evidence to the gate is what makes the 35% cap a defensible policy rather than a way to oversize on lucky data.

The cost numbers, in case you're wondering

The Whale Confirmation Coach is the most expensive of the eight AI surfaces, because web_search requests are billed separately ($10/1000 requests) and the prompt typically fires 6-8 of them per call. From the v6.3 calibration data:

TickerAgreement %CostTime
GOOGL55%$0.09930s
NVDA 42%$0.08138s
TSLA 42%$0.09236s
MU 62%$0.11128s
VRT 35%$0.11037s

Mean: $0.099 per call, 33.8 seconds. Cached for 5 minutes, so a single trading session typically fires it 3-6 times across your watchlist. At ten cents per check on a setup that could save a $5,000 mistake, the unit economics aren't close. The math floor on whether to keep this surface is "did it catch one PLTR per quarter?" — and so far it has.

The discipline use

The most disciplined way to use the Whale Confirmation Coach isn't to consult it before every entry. It's to make a rule: any BULLISH 70+ score on a ticker you're about to size above 15% requires a CONFIRMED verdict before placing the order. Below the size threshold, the framework's own indicators are sufficient. Above it, the verification layer earns its keep.

That rule is what turns the coach from a feature you sometimes click into a gate that sometimes refuses. The dashboard's broker pre-flight chain treats it the same way the R:R floor is treated — a default-on safety, with an override flag for the cases you've thought through.

The framework's job is to be confident. The coach's job is to find the cases where confidence is misplaced. They work together when you let them.


Related: The eight AI surfaces — full list · What 10b5-1 actually means (the insider distinction the coach uses) · Why surface-bound AI > chat boxes

← LEARN INDEX
All articles
SEE THE COACH IN ACTION
Try Swing Deck →