Reading the Whale Confirmation Coach output.
Most "AI signal" features narrate the score they just computed. The Whale Confirmation Coach does the opposite — it goes looking for evidence the score is wrong. When the framework says BULLISH but the verification layer says CONTRADICTED, the AI is usually right. Here's why, how to read the verdicts, and the case where the layer caught $435M of insider selling the framework had silently scored as bullish.
The closed-loop problem
Pre-v6.3, whale_sentiment worked like every other indicator: it read OBV, dark-pool proxies, block-trade detection, and unusual options activity, then output a 0-100 score. That score showed up in the per-ticker card as a chip and got fed back into the AI thesis prompt. The AI would then narrate the score back. "Whale sentiment is bullish at 78/100, supported by rising OBV and elevated block-trade activity."
The problem isn't that the narration was wrong. The narration was always faithful to the input. The problem is that the input had nowhere to be checked against. If whale_sentiment.py missed something — a Form 4 filing, a 13D position change, a sector-wide dark-pool divergence the proxy didn't model — the score never knew. The AI happily restated the gap.
That's a closed loop. And closed loops are what kill traders who use signal stacks: every layer sounds confident because every layer is reading the same partial truth.
What the verification layer actually does
v6.3 added a separate prompt that runs after the framework score is computed. The Whale Confirmation Coach takes the framework's verdict — say, "BULLISH 78" — and says: "go check that against the world." It runs Anthropic web_search against:
- SEC filings — Form 4 insider transactions, 13D/13G institutional positions, S-1 / 10-Q filings tagged in the last 90 days.
- Dark-pool reports — published aggregates from FINRA TRF, Cboe LULD, off-exchange print attribution.
- Analyst sentiment — published price target changes and rating moves in the last 30 days.
- Sector flow — material 13F changes in the same theme bucket, in case sector rotation is driving what the per-ticker score reads as ticker-specific.
Then it returns one of three verdicts:
- CONFIRMED — outside evidence agrees with the framework score.
- CONFLICT — outside evidence partially contradicts; some sources agree, some don't.
- WEAK SIGNAL — outside evidence is sparse or inconclusive; the framework score is functionally undocumented.
Plus an agreement percentage (0-100) that quantifies how much of the cited evidence supports the framework score, and a list of citations — actual links to the SEC pages, FINRA reports, and analyst notes the AI cross-checked.
How to read the agreement %
The agreement percent is not a confidence score. It's a corroboration density. Read it like this:
| Agreement % | Reading | Action |
|---|---|---|
| ≥ 75% | Multiple independent sources align with the framework score. | Treat the framework score as load-bearing. Size normally. |
| 50-74% | Sources are mixed. Some real evidence both ways. | Read the citations. Often the AI surfaces a real concern (insider selling alongside institutional buying). Reduce size or wait. |
| 25-49% | Most sources lean against the framework score. | The framework is probably reading a partial signal. Don't size up on this one. |
| < 25% | Outside evidence broadly contradicts. | Skip the trade or scale into a hedge. The framework's reading the noise. |
The math floor is 50%. Below it, the verification layer is telling you the per-ticker indicators are out of phase with what the larger market is doing — which is the dangerous regime, not the obviously-bullish one.
The verdict that matters most: CONTRADICTED
When CONFIRMED fires, it's a green-light comfort signal. Useful, but not high-leverage — you were probably going to take the trade anyway. The verdicts that earn the coach's $0.10 per call are the contradictions. Specifically: cases where the framework says BULLISH 75+ and the verification layer comes back CONFLICT or CONTRADICTED with citations.
Why the asymmetry? Because the framework's whale_sentiment score is built from tape behavior — OBV, block prints, unusual options. Tape behavior leads price by minutes to days. Insider Form 4 filings, 13D position changes, and analyst rating moves are structural data — they describe ownership decisions made weeks earlier. When the structural data points the opposite direction from the tape, you're almost always looking at one of two things:
- Distribution to retail — institutions are selling into your buying. The tape looks bullish because the volume is there. The selling is structural.
- Short squeeze setup — institutions are positioning ahead of forced cover. The tape looks neutral because most of the buying hasn't happened yet.
Either way, the framework's surface read is wrong about who's behind the price action. The verification layer sees through the tape because it's reading the receipts.
The PLTR $435M case
The cleanest calibration moment came on PLTR. The framework had been scoring it BULLISH 72 — solid OBV, decent dark-pool flow, a recent earnings beat. The card was green. The thesis read fine.
The Whale Confirmation Coach came back CONTRADICTED. Agreement: 28%. The citations:
- Peter Thiel sold 4M PLTR shares in the prior 90 days.
- Director Stephen Moore filed a 16,000-share Form 4 sell on 15 April.
- Aggregate insider sells across 6 months: 227 sales vs. 0 purchases, $435M total.
- Analyst price targets: net flat, with three downgrades in the prior 30 days.
The framework's tape read wasn't wrong about volume. The volume was real. But the volume was institutional sells being absorbed by retail buys. The 13F-level data the AI surfaced told the actual story. The bullish tape was the symptom of distribution, not accumulation.
Same pattern fired for GOOGL (Pichai 2.53M shares sold) and NVDA (953,976 shares sold across 18 months with a 15:0 sell-to-buy ratio across executives). On three of the six tickers in the calibration window, the verification layer caught insider selling the framework had silently scored as bullish.
These aren't edge cases. These are mega-caps in 2026, and the same Form 4 / 13F machinery applies to every ticker the framework tracks. Without the verification layer, the framework's whale chip would have lit up green on three trades that were structurally bearish.
Why CONFIRMED is sometimes the bigger flag
There's a second-order use of the verdicts that's less obvious: a CONFIRMED verdict on a high-conviction setup is the green light to size up, not just to enter. The default is to enter at sleeve weight (15-25% per ticker depending on regime). When CONFIRMED fires with agreement ≥85% and Velocity Exception conditions are also met, the cap lifts to 35% — but only because the verification layer just told you the framework wasn't reading a partial signal.
Without the coach, the Velocity Exception would still exist as a rule in the framework, but the rule would gate on per-ticker indicators alone. Adding outside evidence to the gate is what makes the 35% cap a defensible policy rather than a way to oversize on lucky data.
The cost numbers, in case you're wondering
The Whale Confirmation Coach is the most expensive of the eight AI surfaces, because web_search requests are billed separately ($10/1000 requests) and the prompt typically fires 6-8 of them per call. From the v6.3 calibration data:
| Ticker | Agreement % | Cost | Time |
|---|---|---|---|
| GOOGL | 55% | $0.099 | 30s |
| NVDA | 42% | $0.081 | 38s |
| TSLA | 42% | $0.092 | 36s |
| MU | 62% | $0.111 | 28s |
| VRT | 35% | $0.110 | 37s |
Mean: $0.099 per call, 33.8 seconds. Cached for 5 minutes, so a single trading session typically fires it 3-6 times across your watchlist. At ten cents per check on a setup that could save a $5,000 mistake, the unit economics aren't close. The math floor on whether to keep this surface is "did it catch one PLTR per quarter?" — and so far it has.
The discipline use
The most disciplined way to use the Whale Confirmation Coach isn't to consult it before every entry. It's to make a rule: any BULLISH 70+ score on a ticker you're about to size above 15% requires a CONFIRMED verdict before placing the order. Below the size threshold, the framework's own indicators are sufficient. Above it, the verification layer earns its keep.
That rule is what turns the coach from a feature you sometimes click into a gate that sometimes refuses. The dashboard's broker pre-flight chain treats it the same way the R:R floor is treated — a default-on safety, with an override flag for the cases you've thought through.
The framework's job is to be confident. The coach's job is to find the cases where confidence is misplaced. They work together when you let them.
Related: The eight AI surfaces — full list · What 10b5-1 actually means (the insider distinction the coach uses) · Why surface-bound AI > chat boxes