Methodology

The Market Opportunity Engine joins three federal datasets and produces a transparent opportunity score per ZIP code (ZCTA) per business type. There are no black-box weights — every input and subscore is visible in the rankings table.

Data sources

Census ACS 5-year estimates — population, age, income, housing, household composition.
Census County Business Patterns (CBP) ZIP detail — establishment counts by NAICS industry code at the ZIP level.
Census County Business Patterns (CBP) county — same, at county level. Used as a fallback when ZIP-level data is suppressed.
Census TIGER/Line — geographic boundaries and ZCTA→county crosswalk.

Scoring formula

Demand (per ZCTA)

pop_score — log-scaled population, 0 below 1,000, 1 at 50,000+.
income_fit — 0 below the business type's income floor, 1 at its ideal income, linear in between.
age_fit — tent function peaking at the business type's ideal age, dropping to 0 at ±tolerance years.
demand_score = 0.4·pop_score + 0.4·income_fit + 0.2·age_fit

Supply (per ZCTA)

observed_count — sum of establishments matching this business type's NAICS codes from CBP ZIP detail.
county_density_per_10k — county-wide establishments / 10k residents (used when ZIP data is missing).
estimated_per_10k — final density used for scoring: observed when available, county fallback otherwise.
supply_source — observed, county_fallback, or no_data.
supply_score — percentile rank of estimated_per_10k across populated ZCTAs (0 = least competition, 1 = most).

Composite

opportunity_score = demand_score − 0.5 × supply_score

Why county fallback exists

CBP suppresses entire NAICS rows at the ZIP level when revealing them would expose individual businesses. For narrow industry codes, this creates massive false-zero gaps — we observed ~80% of ZCTAs reporting zero gyms when reality is closer to 50%. County-level CBP has far less suppression because counties are larger and individual businesses harder to identify, so we substitute the county's per-capita rate when ZIP-level data is missing.

Known limitations

No service-area awareness. Adjacent ZCTAs share customers but the engine treats each in isolation.
National supply baseline. Comparing dense urban ZCTAs to rural ones doesn't fully account for context.
NAICS coverage. Some businesses span multiple NAICS codes; the engine matches a curated list per type, not every possible code.
Median age is coarse. Daycare scoring would benefit from under-5 population brackets, not median age. (Coming.)
No housing/family weighting yet. HVAC scoring should weight housing age; daycare should weight households-with-kids.