Methodology
The Market Opportunity Engine joins three federal datasets and produces a transparent opportunity score per ZIP code (ZCTA) per business type. There are no black-box weights — every input and subscore is visible in the rankings table.
Data sources
- Census ACS 5-year estimates — population, age, income, housing, household composition.
- Census County Business Patterns (CBP) ZIP detail — establishment counts by NAICS industry code at the ZIP level.
- Census County Business Patterns (CBP) county — same, at county level. Used as a fallback when ZIP-level data is suppressed.
- Census TIGER/Line — geographic boundaries and ZCTA→county crosswalk.
Scoring formula
Demand (per ZCTA)
pop_score— log-scaled population, 0 below 1,000, 1 at 50,000+.income_fit— 0 below the business type's income floor, 1 at its ideal income, linear in between.age_fit— tent function peaking at the business type's ideal age, dropping to 0 at ±tolerance years.demand_score= 0.4·pop_score + 0.4·income_fit + 0.2·age_fit
Supply (per ZCTA)
observed_count— sum of establishments matching this business type's NAICS codes from CBP ZIP detail.county_density_per_10k— county-wide establishments / 10k residents (used when ZIP data is missing).estimated_per_10k— final density used for scoring: observed when available, county fallback otherwise.supply_source— observed, county_fallback, or no_data.supply_score— percentile rank ofestimated_per_10kacross populated ZCTAs (0 = least competition, 1 = most).
Composite
opportunity_score = demand_score − 0.5 × supply_score
Why county fallback exists
CBP suppresses entire NAICS rows at the ZIP level when revealing them would expose individual businesses. For narrow industry codes, this creates massive false-zero gaps — we observed ~80% of ZCTAs reporting zero gyms when reality is closer to 50%. County-level CBP has far less suppression because counties are larger and individual businesses harder to identify, so we substitute the county's per-capita rate when ZIP-level data is missing.
Known limitations
- No service-area awareness. Adjacent ZCTAs share customers but the engine treats each in isolation.
- National supply baseline. Comparing dense urban ZCTAs to rural ones doesn't fully account for context.
- NAICS coverage. Some businesses span multiple NAICS codes; the engine matches a curated list per type, not every possible code.
- Median age is coarse. Daycare scoring would benefit from under-5 population brackets, not median age. (Coming.)
- No housing/family weighting yet. HVAC scoring should weight housing age; daycare should weight households-with-kids.