This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.
This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

By Adam Bender

The test pyramid is the canonical heuristic for guiding test suite evolution. It conveys a simple message - prefer more unit tests than integration tests, and prefer more integration tests than end-to-end tests.

A diagram of the test pyramid

While useful, the test pyramid lacks the details you need as your test suite grows and you face challenging trade-offs. To scale your test suite, go beyond the test pyramid.

The SMURF mnemonic is an easy way to remember the tradeoffs to consider when balancing your test suite:

  • Speed: Unit tests are faster than other test types and can be run more often—you’ll catch problems sooner.

  • Maintainability: The aggregated cost of debugging and maintaining tests (of all types) adds up quickly. A larger system under test has more code, and thus greater exposure to dependency churn and requirement drift which, in turn, creates more maintenance work.  

  • Utilization: Tests that use fewer resources (memory, disk, CPU) cost less to run. A good test suite optimizes resource utilization so that it does not grow super-linearly with the number of tests. Unit tests usually have better utilization characteristics, often because they use test doubles or only involve limited parts of a system. 

  • Reliability: Reliable tests only fail when an actual problem has been discovered. Sorting through flaky tests for problems wastes developer time and costs resources in rerunning the tests. As the size of a system and its corresponding tests grow, non-determinism (and thus, flakiness) creeps in, and your test suite is more likely to become unreliable.

  • Fidelity: High-fidelity tests come closer to approximating real operating conditions (e.g., real databases or traffic loads) and better predict the behavior of our production systems. Integration and end-to-end tests can better reflect realistic conditions, while unit tests have to simulate the environment, which can lead to drift between test expectations and reality.

A radar chart depicting the relationship between SMURF attributes as applied to unit, integration, and end-to-end tests. Unit tests perform best on all attributes except fidelity, where they are the worst. Integration tests are mid-way performers on all aspects. End-to-end tests are worst on all aspects, except fidelity where they are the best.

A radar chart  of Test Type vs. Test Property (i.e. SMURF). Farther from center is better. 


In many cases, the relationships between the SMURF dimensions are in tension: improving one dimension can affect the others. However, if you can improve one or more dimensions of a test without harming the others, then you should do so. When thinking about the types of your tests (unit, integration, end-to-end), your choices have meaningful implications for your test suite’s cost and the value it provides.



This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Amy Fu

Although a product's requirements can change often, its fundamental ideas usually change slowly. This leads to an interesting insight: if we write code that matches the fundamental ideas of the product, it will be more likely to survive future product changes.

Domain objects are building blocks (such as classes and interfaces) in our code that match the fundamental ideas of the product. Instead of writing code to match the desired behavior for the product's requirements ("configure text to be white"), we match the underlying idea ("text color settings").

For example, imagine you’re part of the gPizza team, which sells tasty, fresh pizzas to feed hungry Googlers. Due to popular demand, your team has decided to add a delivery service.

Without domain objects, the quickest path to pizza delivery is to simply create a deliverPizza method:

public class DeliveryService {

  public void deliverPizza(List<Pizza> pizzas) { ... }

}

Although this works well at first, what happens if gPizza expands its offerings to other foods?
You could add a new method:

  public void deliverWithDrinks(List<Pizza> pizzas, List<Drink> drinks) { ... }

But as your list of requirements grows (snacks, sweets, etc.), you’ll be stuck adding more and more methods. How can you change your initial implementation to avoid this continued maintenance burden?

You could add a domain object that models the product's ideas, instead of its requirements:

  • A use case is a specific behavior that helps the product satisfy its business requirements.
    (In this case, "Deliver pizzas so we make more money".)

  • A domain object represents a common idea that is shared by several similar use cases.

To identify the appropriate domain object, ask yourself:

  1. What related use cases does the product support, and what do we plan to support in future?

A: gPizza wants to deliver pizzas now, and eventually other products such as drinks and snacks.

  1. What common idea do these use cases share?

A: gPizza wants to send the customer the food they ordered.

  1. What is a domain object we can use to represent this common idea?

A: The domain object is a food order. We can encapsulate the use cases in a FoodOrder class.

Domain objects can be a useful generalization - but avoid choosing objects that are too generic, since there is a tradeoff between improved maintainability and more complex, ambiguous code. Generally, aim to support only planned use cases - not all possible use cases (see YAGNI principles).

// GOOD: It's clear what we're delivering.

public void deliver(FoodOrder order) {}

// BAD: Don't support furniture delivery.

public void deliver(DeliveryList items) {}

Learn more about domain objects and the more advanced topic of domain-driven design in the book Domain-Driven Design by Eric Evans.

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By David Bendory

Simplicity is the ultimate sophistication. — Leonardo da Vinci

You’re staring at a wall of code resembling a Gordian knot of Klingon. What’s making it worse? A sea of code comments so long that you’d need a bathroom break just to read them all! Let’s fix that.

  • Adopt the mindset of someone unfamiliar with the project to ensure simplicity. One approach is to separate the process of writing your comments from reviewing them; proofreading your comments without code context in mind helps ensure they are clear and concise for future readers.

  • Use self-contained comments to clearly convey intent without relying on the surrounding code for context. If you need to read the code to understand the comment, you’ve got it backwards!

Not self-contained; requires reading the code

Suggested alternative

// Respond to flashing lights in // rearview mirror.

// Pull over for police and/or yield to

// emergency vehicles.

while flashing_lights_in_rearview_mirror() {

  if !move_to_slower_lane() { stop_on_shoulder(); }

}

  • Include only essential information in the comments and leverage external references to reduce cognitive load on the reader. For comments suggesting improvements, links to relevant bugs or docs keep comments concise while providing a path for follow-up. Note that linked docs may be inaccessible, so use judgment in deciding how much context to include directly in the comments.

Too much potential improvement in the comment

Suggested alternative

// The local bus offers good average- // case performance. Consider using // the subway which may be faster

// depending on factors like time of // day, weather, etc.

// TODO: Consider various factors to // present the best transit option.

// See issuetracker.fake/bus-vs-subway

commute_by_local_bus();

  • Avoid extensive implementation details in function-level comments. When implementations change, such details often result in outdated comments. Instead, describe the public API contract, focusing on what the function does.

Too much implementation detail

Suggested alternative

// For high-traffic intersections // prone to accidents, pass through // the intersection and make 3 right // turns, which is equivalent to // turning left.

// Perform a safe left turn at a

// high-traffic intersection.

// See discussion in

// dangerous-left-turns.fake/about.

fn safe_turn_left() {

  go_straight();

  for i in 0..3 {

    turn_right();

  }

}


This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.
This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.


By Elliotte Rusty Harold

Note: A “pull request” refers to one self-contained change that has been submitted to version control or which is undergoing code review. At Google, this is referred to as a“CL”, which is short for “changelist”.

Prefer small, focused pull requests that do exactly one thing each. Why? Several reasons:
  • Small pull requests are easier to review. A mistake in a focused pull request is more obvious. In a 40 file pull request that does several things, would you notice that one if statement had reversed the logic it should have and was using true instead of false? By contrast, if that if block and its test were the only things that changed in a pull request, you’d be a lot more likely to catch the error.

  • Small pull requests can be reviewed quickly. A reviewer can often respond quickly by slipping small reviews in between other tasks. Larger pull requests are a big task by themselves, often waiting until the reviewer has a significant chunk of time.

  • If something does go wrong and your continuous build breaks on a small pull request, the small size makes it much easier to figure out exactly where the mistake is. They are also easier to rollback if something goes wrong.

  • By virtue of their size, small pull requests are less likely to conflict with other developers’ work. Merge conflicts are less frequent and easier to resolve.

  • If you’ve made a critical error, it saves a lot of work when the reviewer can point this out after you’ve only gone a little way down the wrong path. Better to find out after an hour than after several weeks.

  • Pull request descriptions are more accurate when pull requests are focused on one task. The revision history becomes easier to read.

  • Small pull requests can lead to increased code coverage because it’s easier to make sure each individual pull request is completely tested.

Small pull requests are not always possible. In particular:

  • Frequent pull requests require reviewers to respond quickly to code review requests. If it takes multiple hours to get a pull request reviewed, developers spend more time blocked. Small pull requests often work better when reviewers are co-located (ideally within Nerf gun range for gentle reminders). 

  • Some features cannot safely be committed in partial states. If this is a concern, try to put the new feature behind a flag.

  • Refactorings such as changing an argument type in a public method may require modifying many dozens of files at once.

Nonetheless, even if a pull request can’t be small, it can still be focused, e.g., fixing one bug, adding one feature or UI element, or refactoring one method.

This is another post in our Code Health series. A version of this post originally appeared in Google bathrooms worldwide as a Google Testing on the Toilet episode. You can download a printer-friendly version to display in your office.

By Dan Maksimovich

Many of us have been told the virtues of “Don’t Repeat Yourself” or DRY. Pause and consider: Is the duplication truly redundant or will the functionality need to evolve independently over timeApplying DRY principles too rigidly leads to premature abstractions that make future changes more complex than necessary. 

Consider carefully if code is truly redundant or just superficially similar.  While functions or classes may look the same, they may also serve different contexts and business requirements that evolve differently over time. Think about how the functions’ purpose holds with time, not just about making the code shorter. When designing abstractions, do not prematurely couple behaviors that may evolve separately in the longer term.

When does introducing an abstraction harm our code? Let’s consider the following code: 

# Premature DRY abstraction assuming # uniform rules, limiting entity-

# specific changes.

class DeadlineSetter:

 def __init__(self, entity_type):

  self.entity_type = entity_type


 def set_deadline(self, deadline):

   if deadline <= datetime.now():

    raise ValueError(

      “Date must be in the future”)

task = DeadlineSetter(“task”)

task.set_deadline(

datetime(2024, 3, 12))

payment = DeadlineSetter(“payment”)

payment.set_deadline(

datetime(2024, 3, 18))

# Repetitive but allows for clear,

# entity-specific logic and future

# changes.

def set_task_deadline(task_deadline):

  if task_deadline <= datetime.now():

raise ValueError(

    “Date must be in the future”)

def set_payment_deadline( payment_deadline):

  if payment_deadline <= datetime.now():

    raise ValueError(

    “Date must be in the future”)

set_task_deadline(

datetime(2024, 3, 12))

set_payment_deadline(

datetime(2024, 3, 18))

The approach on the right seems to violate the DRY principle since the ValueError checks are coincidentally the same.  However, tasks and payments represent distinct concepts with potentially diverging logic. If payment date later required a new validation, you could easily add it to the right-hand code; adding it to the left-hand code is much more invasive.

When in doubt, keep behaviors separate until enough common patterns emerge over time that justify the coupling. On a small scale, managing duplication can be simpler than resolving a premature abstraction’s complexity. In early stages of development, tolerate a little duplication and wait to abstract. 

Future requirements are often unpredictable. Think about the “You Aren’t Gonna Need It” or YAGNI principle. Either the duplication will prove to be a nonissue, or with time, it will clearly indicate the need for a well-considered abstraction.