HIGHQualitymediumTesting

Test Behavior, Not Implementation

Test what the code does (outputs, side effects), not how it does it (internal method calls, private state). Implementation-coupled tests break every time you refactor, even when behavior is unchanged — making tests a liability instead of a safety net.

Why This Matters

Implementation-coupled tests assert on internal details: which private methods were called, what internal state was set, how many times a dependency was invoked. When you refactor the internals without changing behavior, these tests fail — even though the code is correct. This creates a perverse incentive to avoid refactoring because "the tests will break." Tests that verify behavior (given input X, expect output Y) survive refactoring and catch real bugs.

Related Rules

Maintain Test Isolation

Quality

HIGH

Use Factories Over Fixtures

Quality

MEDIUM

Structure tests with Arrange-Act-Assert

Quality

MEDIUM

Assert one logical concept per test

Quality

MEDIUM

Catch this automatically on every PR

BeforeMerge scans your pull requests against this rule and 3+ others. Get actionable feedback before code ships.

Join Waitlist Browse All Rules

Why this matters

The purpose of tests is to give you confidence that your code works correctly. Implementation-coupled tests undermine this purpose:

Refactoring breaks tests: you rename an internal function or change how a result is computed, and 20 tests fail — even though the public API and behavior are identical. Developers spend hours updating tests that didn't catch any bugs.
False confidence: tests pass because the implementation matches the test's expectations, not because the behavior is correct. If you assert "mock was called 3 times" and the bug is in what happens with the result, the test passes while the bug ships.
Refactoring fear: teams stop improving code structure because the cost of updating implementation-coupled tests exceeds the benefit of the refactoring.

Behavior-focused tests verify inputs and outputs. They survive refactoring, catch real regressions, and give genuine confidence.

The rule

Test the public API of your code: given specific inputs, assert on specific outputs or observable side effects. Don't assert on internal state, private method calls, or the number of times a dependency was invoked — unless the invocation count is the behavior being tested.

Bad example

// BAD: testing implementation details
describe("UserService", () => {
  it("creates a user", async () => {
    const mockRepo = {
      save: vi.fn().mockResolvedValue({ id: "1", name: "Alice" }),
      findByEmail: vi.fn().mockResolvedValue(null),
    };
    const mockHasher = {
      hash: vi.fn().mockResolvedValue("hashed-password"),
    };
 
    const service = new UserService(mockRepo, mockHasher);
    await service.createUser({ name: "Alice", email: "a@b.com", password: "pass" });
 
    // Testing HOW it works, not WHAT it does
    expect(mockRepo.findByEmail).toHaveBeenCalledWith("a@b.com");
    expect(mockHasher.hash).toHaveBeenCalledWith("pass");
    expect(mockRepo.save).toHaveBeenCalledTimes(1);
    expect(mockRepo.save).toHaveBeenCalledWith({
      name: "Alice",
      email: "a@b.com",
      passwordHash: "hashed-password",
    });
  });
});

Good example

// GOOD: testing behavior — what goes in and what comes out
describe("UserService", () => {
  it("creates a user and returns their profile", async () => {
    const service = createTestUserService(); // uses real or in-memory implementations
 
    const user = await service.createUser({
      name: "Alice",
      email: "a@b.com",
      password: "pass",
    });
 
    // Testing WHAT it does
    expect(user.name).toBe("Alice");
    expect(user.email).toBe("a@b.com");
    expect(user.id).toBeDefined();
  });
 
  it("rejects duplicate email addresses", async () => {
    const service = createTestUserService();
    await service.createUser({ name: "Alice", email: "a@b.com", password: "pass" });
 
    // Testing BEHAVIOR — the observable outcome
    await expect(
      service.createUser({ name: "Bob", email: "a@b.com", password: "pass" })
    ).rejects.toThrow("Email already registered");
  });
 
  it("does not store the password in plain text", async () => {
    const service = createTestUserService();
    const user = await service.createUser({
      name: "Alice",
      email: "a@b.com",
      password: "pass",
    });
 
    // This is behavior, not implementation — we care about the security property
    const stored = await testDb.users.findById(user.id);
    expect(stored.passwordHash).not.toBe("pass");
  });
});

How to detect

Search for implementation-coupled test patterns:

grep -n "toHaveBeenCalledTimes\|toHaveBeenCalledWith\|mock.*\.calls" --include="*.test.ts" --include="*.spec.ts" -r src/

Not all mock assertions are bad — but if a test only asserts on mock calls and never checks outputs, it's testing implementation.

Remediation

For each test, ask: "If I completely rewrite the implementation but keep the same public API, would this test still pass?"
Replace mock call assertions with output assertions
Use real or in-memory implementations instead of mocks where possible
Keep mocks for true external boundaries (HTTP APIs, databases in unit tests)
Assert on the mock's observable effect, not the call details

Test Behavior, Not Implementation

Why This Matters

Tags

Related Rules

Catch this automatically on every PR

Why this matters

The rule

Bad example

Good example

How to detect

Remediation

References