Split Testing (A/B Testing)
Test product changes with controlled experiments to know what actually works
Split testing, or A/B testing, is the practice of offering different versions of a product to different customers simultaneously and comparing how each version performs on specific metrics. Rather than debating whether a change will improve the product, you run an experiment that produces an unambiguous answer. One group of customers sees the current version (the control) while another group sees the modified version (the treatment), and the difference in behavior reveals whether the change is genuinely valuable.
In the Lean Startup context, split testing is particularly powerful because it forces teams to confront whether their improvements actually change customer behavior. Many product changes that feel like improvements, based on internal feedback or aesthetic preferences, turn out to have zero effect on the metrics that matter. Split testing eliminates subjective arguments about product direction and replaces them with empirical evidence.
Split testing also reveals the true cost of features. Some features that engineers and product managers consider essential turn out to have no measurable impact on customer behavior. Discovering this early prevents wasted development effort and keeps teams focused on changes that produce validated learning.
- Changes that feel like improvements often have zero measurable effect on customer behavior.
- Controlled experiments replace subjective arguments with empirical evidence about what actually works.
- Discovering that a feature has no impact early is far cheaper than building and shipping it.
- The only valid measure of an improvement is a change in the metrics that define customer value.
- Form a Clear HypothesisBefore running any test, articulate what you expect to happen and why. State the specific metric you expect to improve and by how much. A good hypothesis looks like: 'Adding social proof testimonials to the sign-up page will increase conversion rate from 8% to 12%.'
- Create a Control and Treatment VersionBuild two versions of the experience that differ only in the variable you are testing. The control is the current version, and the treatment includes the single change you are testing. Avoid changing multiple variables at once, as this makes it impossible to determine which change caused the result.
- Randomly Assign Customers to Each GroupEnsure that customers are randomly assigned to see either the control or treatment version. The random assignment is essential for eliminating confounding variables. Both groups should be large enough to produce statistically significant results.
- Measure the Difference in BehaviorTrack the metric specified in your hypothesis for both groups over a sufficient time period. Use statistical significance testing to determine whether the observed difference is real or could be due to chance. Do not call the test early if the initial results look good.
- Act on the ResultsIf the treatment significantly outperforms the control, implement the change for all customers. If there is no significant difference, you have learned that your hypothesis was wrong, which is valuable information. Document the result either way and move on to the next hypothesis.
Grockit, an online education platform, used split testing to evaluate which product features actually improved learning outcomes and customer engagement. Their team would form hypotheses about features they believed would increase student retention, then run controlled experiments where some students saw the new feature and others used the existing product. Many features that the team was confident about turned out to have zero measurable effect.
Split testing, or A/B testing, is the practice of offering different versions of a product to different customers simultaneously and comparing how each version performs on specific metrics. Rather than debating whether a change will improve the product, you run an experiment that produces an unambiguous answer. One group of customers sees the current version (the control) while another group sees the modified version (the treatment), and the difference in behavior reveals whether the change is g