Harvard business review - the surprising power of online experiment

FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG GETTING THE MOST OUT OF A/B AND OTHER CONTROLLED TESTS by Ron Kohavi and Stefan Thomke SEPTEMBER–OCTOBER 2017 HARVARD BUSINESS REVIEW 3 FEATURE THE SURPRISING POWER OF ONLINE EXPERIMENTS THE NEED When building websites and applications, too many companies make decisions—on everything from new product features, to look and feel, to marketing campaigns— using subjective opinions rather than hard data. THE SOLUTION Companies should conduct online controlled experiments to evaluate their ideas. Potential improvements should be rigorously tested, because large investments can fail to deliver, and some tiny changes can be surprisingly detrimental while others have big payoffs. IMPLEMENTATION Leaders should understand how to properly design and execute A/B tests and other controlled experiments, ensure their integrity, interpret their results, and avoid pitfalls. 4 HARVARD BUSINESS REVIEW SEPTEMBER–OCTOBER 2017 Yet we have found that too many organizations, including some major digital enterprises, are haphazard in their experimentation approach, don’t know how to run rigorous scientific tests, or conduct way too few of them. Together we’ve spent more than 35 years studying and practicing experiments and advising companies in a wide range of industries about them. In these pages we’ll share the lessons we’ve gleaned about how to design and execute them, ensure their integrity, interpret their results, and address the challenges they’re likely to pose. Though we’ll focus on the simplest kind of controlled experiment, the A/B test, our findings and suggestions apply to more-complex experimental designs as well. A APPRECIATE THE VALUE OF A/B TESTS In an A/B test the experimenter sets up two experiences: “A,” the control, is usually the current system and considered the “champion,” and “B,” the treatment, is a modification that attempts to improve something—the “challenger.” Users are randomly assigned to the experiences, and key metrics are computed and compared. (Univariable A/B/C tests and A/B/C/D tests and multivariable tests, in contrast, assess more than one treatment or modifications of different variables at the same time.) Online, the modification could be a new feature, a change to the user interface (such as a new layout), a back-end change (such as an improvement to an algorithm that, say, recommends books at Amazon), or a different business model (such as an offer of free shipping). Whatever aspect of operations companies care most about—be it sales, repeat usage, click-through rates, or time users spend on a site—they can use online A/B tests to learn how to optimize it. Any company that has at least a few thousand daily active users can conduct these tests. The ability to access large customer samples, to automatically collect huge amounts of data about user interactions on websites and apps, and to run concurrent experiments gives companies an unprecedented opportunity to evaluate many ideas quickly, with great precision, and at a negligible cost per incremental experiment. That allows organizations to iterate rapidly, fail fast, and pivot. COPYRIGHT © 2017 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. IN BRIEF IN 2012 A Microsoft employee working on Bing had an idea about changing the way the search engine displayed ad headlines. Developing it wouldn’t require much effort—just a few days of an engineer’s time—but it was one of hundreds of ideas proposed, and the program managers deemed it a low priority. So it languished for more than six months, until an engineer, who saw that the cost of writing the code for it would be small, launched a simple online controlled experiment—an A/B test—to assess its impact. Within hours the new headline variation was producing abnormally high revenue, triggering a “too good to be true” alert. Usually, such alerts signal a bug, but not in this case. An analysis showed that the change had increased revenue by an astonishing 12%—which on an annual basis would come to more than $100?million in the United States alone—without hurting key user-experience metrics. It was the best revenue-generating idea in Bing’s history, but until the test its value was underappreciated. Humbling! This example illustrates how difficult it can be to assess the potential of new ideas. Just as important, it demonstrates the benefit of having a capability for running many tests cheaply and concurrently—something more businesses are starting to recognize. Today, Microsoft and several other leading companies—including Amazon, Booking.com, Facebook, and Google—each conduct more than 10,000 online controlled experiments annually, with many tests engaging millions of users. Start-ups and companies without digital roots, such as Walmart, Hertz, and Singapore Airlines, also run them regularly, though on a smaller scale. These organizations have discovered that an “experiment with everything” approach has surprisingly large payoffs. It has helped Bing, for instance, identify dozens of revenue-related changes to make each month—improvements that have collectively increased revenue per search by 10% to 25% each year. These enhancements, along with hundreds of other changes per month that increase user satisfaction, are the major reason that Bing is profitable and that its share of U.S. searches conducted on personal computers has risen to 23%, up from 8% in 2009, the year it was launched. At a time when the web is vital to almost all businesses, rigorous online experiments should be standard operating procedure. If a company develops the software infrastructure and organizational skills to conduct them, it will be able to assess not only ideas for websites but also potential business models, strategies, products, services, and marketing campaigns—all relatively inexpensively. Controlled experiments can transform decision making into a scientific, evidence-driven process—rather than an intuitive reaction. Without them, many breakthroughs might never happen, and many bad ideas would be implemented, only to fail, wasting resources.
Please complete the form to gain access to this content