Testing Philosophy and the 70/20/10 Pyramid
If you have ever watched a test suite go green while the app was objectively broken—or fail after a harmless refactor—you already know that what you assert matters as much as that you assert. Advanced React work amplifies the problem: hooks hide state, concurrent rendering reorders effects, and design systems wrap primitive elements in layers of composition. Tests that reach past the public surface of a component become a second implementation you must maintain. The guiding principle for resilient suites is simple to state and disciplined to apply: test behavior the way a user would perceive it, not the way your source files happen to be organized today.
Behavior over implementation
“User-centric” testing is not a moral slogan; it is a maintenance strategy. Users interact with
roles, names, text, and keyboard focus. They do not interact with useReducer
dispatch shapes, private helper functions, or the fact that you chose CSS modules over Tailwind this
quarter. When your test queries getByRole('button', { name: /submit/i }) or
getByLabelText(/email/i), it is anchored to an accessibility contract that tends to survive
refactors. When it queries .submit-btn or container.querySelector('[data-testid="x"]'), it is
anchored to whatever detail was convenient the day the test was written—often long after that detail
stopped meaning anything to a human being.
The same logic applies to simulating input. fireEvent dispatches DOM events in a minimal way.
user-event goes through the library’s simulation of realistic interaction sequences—typing,
tabbing, pointer actions—so you catch issues that only appear when events arrive in believable order
and timing. The extra await noise in tests is the price of catching bugs that slip past shallow
event stubs.
That does not mean every internal function deserves a UI test. Pure utilities, parsers, and state machines without DOM are still excellent candidates for fast unit tests that import functions directly. The distinction is boundary: test through the user’s boundary unless there is a compelling reason not to.
The pyramid is a budget, not a badge
The testing pyramid is a rough allocation of effort: roughly 70% unit and component tests, 20% integration-style tests, and 10% end-to-end tests. Treat those numbers as a conversation starter, not a KPI to game. The underlying idea is economic. Component tests under jsdom are cheap enough to run on every save. Integration tests that spin up larger subgraphs cost more but catch wiring mistakes—wrong provider order, mistaken route assumptions, broken data flow between two components that each had perfect isolated tests. E2E tests running in real browsers are the most expensive; they should earn their keep by covering critical paths where failure has outsized impact: authentication, payments, core CRUD flows, and anything regulated or contractual.
Skewing too much toward E2E produces a familiar failure mode: long CI times, flaky failures from timing and environment drift, and teams that stop trusting red builds. Skewing everything to shallow unit tests produces another: green suites and broken journeys. The pyramid is a reminder to keep most feedback fast, some feedback realistic, and a little feedback as close to production as you can afford.
What “good coverage” actually means
Coverage percentages are a diagnostic, not a definition of quality. A file can hit 80% lines while asserting nothing meaningful, or sit at 60% with excellent behavioral checks on the riskiest flows. Still, for teams that have already agreed on behavioral testing, thresholds in CI—often in the 70–80% range for lines, branches, functions, and statements—prevent slow erosion. The threshold should be paired with review culture: raising coverage by testing getters and setters is worse than leaving the number alone and adding one test that would have caught last week’s outage.
Expand the idea of coverage beyond lines executed. Edge cases deserve explicit examples: empty lists, pagination boundaries, validation errors, and partial server responses. Loading and error states are where asynchronous React apps usually fail in production; if your tests only assert the success render, you have documented a demo, not a product. Accessibility is part of behavior: expanded regions, live regions, disabled controls during in-flight work, and focus management after dialogs open and close. Queries that lean on roles and accessible names naturally push you toward testing those concerns instead of bolting them on later.
End-to-end tests as spotlight, not floodlight
Reserve E2E for journeys that are hard to reconstruct in jsdom or that involve cross-cutting infrastructure: cookies, multiple origins, service workers, file uploads, and third-party widgets. Even then, keep scenarios short and deterministic. Prefer stable selectors tied to user-visible labels, and isolate data setup so tests do not depend on whatever happened to exist in a shared staging database on Tuesday afternoon.
Flakiness is rarely random. It is usually shared mutable state, implicit timing, or
environment drift. The pyramid pushes most assertions into layers where you control the clock
(vi.useFakeTimers in Vitest when appropriate), reset MSW handlers between cases, and avoid
parallel tests mutating the same global singleton. E2E suites that refuse those disciplines become
theater: people rerun until green and learn to ignore failures.
Tests as documentation for the next maintainer
Behavior-driven tests double as executable specifications when they speak in domain language. A
test named “calls setState twice” documents implementation; one named “shows validation when email
is missing” documents intent. That distinction shows up in code review: the former invites
bikeshedding about internals, the latter invites discussion about product rules. For TypeScript
teams, the same mindset applies to factories and fixtures—type your test data the way you type
production models so refactors surface compile errors in tests instead of silent divergence.
CI as the honest mirror
Philosophy means little if it only runs on a laptop. Fast suites belong on every pull request;
coverage thresholds belong in CI so main stays honest. When you align philosophy with the
pyramid—behavioral assertions at the bottom and middle, sparse but ruthless E2E at the top—you get a
suite that fails for reasons users would care about and passes when the product still feels the
same. That is the difference between testing as paperwork and testing as a senior engineer’s
leverage.