Always measure one level deeper

2025-06-19

John Ousterhout explains how performance evaluation is more marketing than science and how to do it correctly

Most common mistakes

1. Trusting the numbers

Bugs in performance measurement code do not causes crashes or failures. They simply give the wrong numbers and hence engineers think everything is working as expected. But this is not the case. There may be bugs in the benchmarks / tests, in the code that gathers metrics (99th percentile being miscomputed), the system may have functional or performance bugs

2. Guessing instead of measuring

Making unsubstantiated claims based on intuition is common. People claim "what else could the reason be?", implying that it is up to others to prove the theory wrong and it's okay to guess until proven false. In some case the person making the comment feels like a process of elimination has been used and that every possible cause has been considered. But the causes are ususally non-obvious to find without measuring

3. Superficial measurements

Only measuring the outermost visible behavior of a system such as overall running time of an application. These are necessary but not sufficient. They still lead of some unanswered questions - what causes the greatest improvements? What are the limits to better performance? etc. We need to measure deeper in addition to top-level measurements

4. Confirmation bias

This causes people to select and interpret data that supports their hypotheses. It affects your level of trust. If you expect your system to perform better and see results that supports this, you will likely accept the results without questions. Another example is picking metrics that make your system look good. This is more marketing than science and is ineffective when you want to uncover the truth about a system

5. Haste

Building the system takes more time than expected. You have a deadline to meet, so performance measurement is done in haste leading to sloppiness

Keys to high-quality performance analysis

1. Allow lots of time

To measure, analyze and fix

2. Never trust a number generated by a computer

Performance measurements should be considered guilty until proven innocent. Take different measurements at the same level, measure the system at a lower level, make back-of-the-envelope calculations to see if they are in the ballpark expected, run simulations and compare results to measurements of the real implementation. Always questions thing you don't understand. Curmudgeons make good performance evaluators because they trust nothing and enjoy finding problems

3. Use your intuition to ask questions, not to answer them

Your intuition can save you lots of time and effort, but should not make your overconfident. Back your intuition with data before making decisions or claims

4. Always measure one level deeper

Break down the top-level measurements into smaller measurements until you find the cause of contradictions or surprising things

Measurement infrastructure

Some other investments that pay for itself in the long run: automation, dashboards, presentation of data