Methodology · Data lineage
What we scraped, what we didn't, and why
The dataset behind these awards is publicly auditable. This page enumerates every field on the site, where it came from, and where it's missing — so nothing is taken on faith.
Data coverage at a glance
Top-25 communities
25
All have full public-data coverage.
Fully covered
5
Public + auth-gated fields populated.
Partial coverage
20
Auth-gated fields missing.
Field-by-field source map
Every field surfaced on the site is one of two kinds: public (sourced from the public Skool API, authoritative across all 25 communities) or auth-gated (requires an authenticated session cookie to scrape — currently complete for 5/25).
| Field | Gate | Coverage | Source / actor |
|---|---|---|---|
| Member count | Public | 25/25 | Public Skool API. Authoritative. |
| Price per month | Public | 25/25 | Public Skool API (monthlyPriceUsd). Authoritative. |
| MRR tier | Public | 25/25 | Public — Skool's internal MRR ranking exposed on owner profile. |
| Founder social links | Public | 25/25 | Public — owner.social object on community page. |
| Landing page copy | Public | 25/25 | Public — marketing.landingPage.description. |
| Category labels & post counts | Auth-gated | 5/25 | Requires authenticated community page scrape (community.labels). |
| Top contributors | Auth-gated | 5/25 | Requires authenticated leaderboard scrape (sourabhbgp/skool-scraper). |
| Recent posts feed | Auth-gated | 5/25 | Requires auth_token cookie + memo23/skool-posts-with-comments-scraper. |
Actors used
Each row below is an Apify actor we ran. Hard cap: $40 spend across the
whole pipeline (discovery + deep-dive + control group + posts feed +
founder research). Spend tracked in data/raw/spend.json.
| Actor | Purpose | Auth | Phase |
|---|---|---|---|
| easyapi/skool-groups-scraper | Discovery — query-based search across 50 AI terms | Public | Phase 2 |
| futurizerush/skool-group-scraper | Per-community deep-dive (member count, price, owner) | Public | Phase 4 |
| goat255/skool-scraper-goat | Enriched community profile (features, MRR tier, marketing) | Public | Phase 4 |
| sourabhbgp/skool-scraper | Top contributors leaderboard | Auth-gated | Phase 4 |
| memo23/skool-posts-with-comments-scraper | Posts feed + comment threads | Auth-gated | Phase 4 |
| futurizerush/skool-profile-scraper | Founder property crawl (Phase 6.8 Pass 2) | Public | Phase 6.8 |
Why some data is missing
Skool.com gates posts feeds, leaderboards, and category labels behind
an authenticated session. Scraping those fields requires a fresh
auth_token cookie from a logged-in browser session, which
expires in 24-72 hours.
During this build, the auth cookie expired partway through the deep-dive pass. 5 of 25 communities completed before expiry; the remaining 20 have authoritative public-data fields but empty auth-gated fields. We deliberately do not fabricate or interpolate missing values — gaps are surfaced as 🔒 callouts on each affected community page rather than hidden.
To complete coverage: refresh the cookie at
data/skool.cookies.json and re-run
npm run posts-feed && npm run render-data.
Causal blueprint methodology
Beyond the descriptive leaderboard, we ran a control-group pairwise analysis: each top-25 community was paired with a near-miss "control" (same niche, same launch year, different founder, 5-25% the size). Differences that show up in the top-25 but not in controls are candidate causal factors; differences common to both are noise.
Eighteen factors across six thematic clusters (founder/distribution, timing/waves, offer/format, trust/proof, AI-platform/infrastructure, network/geography) are scored on prevalence gap. Factors with ≥18/25 paired comparisons differentiated are flagged "high confidence" — the load-bearing levers in the K1 launch playbook.
Full results: /blueprint.