The Launch Review: bringing it all together …
We’ve talked about statistical vs human approaches to search, we’ve talked about metrics, and we’ve talked about A/B testing. Now we’re going to bring it all together in something I like to call “the Launch Review”
The Launch Review is simple; it’s the meeting where we decide to launch a search algorithm (or any other experiment, really). It’s also complex; there are many different priorities, stakeholders, and agenda that must be considered when deciding to launch. Experiments don’t happen in a vacuum, and that’s especially true for search experiments that can majorly impact the user experience of your product/app/website.
Let’s start with the purpose. The purpose of a launch review is to decide whether to launch an experimental treatment to 100% of users. It is a decision point. As such, it’s critical to have as much information as possible upon which to base that decision.
The launch review should bring together whatever data you have. For search algorithm changes, that means you will need human relevance evaluation results and a/b test results at the very least. Ideally, you want to have any other relevant feedback, such as survey responses (if you aren’t getting feedback on your search page, what are you doing with your life?)
Collect all that sweet, sweet data in one document. We called this (creatively) a Launch Review Doc. The purpose of the launch review doc is to get all the data in one place. It should start with an explanation of what the problem to be solved is, the experimental change being made, any early explorations, technical considerations, the hypothesis, the methods, and the results. Here’s a handy template for a launch review doc. Some of this should be done before the experiment begins — such as the hypothesis — by the experiment owner. The experiment owner might be a PM, an engineer, or a data scientist, depending on you organization. This makes the launch review doc a “living” document that can be updated as time goes by, however, the hypothesis and success criteria should be fixed before the experiment begins.
Once data has been collected through HRT and A/B testing, those results go into the launch review doc. With all the data collected in one place, it’s time to have the launch review meeting itself.
Much like the Query Triage meeting, it’s important that the Launch Review have a cross-functional group. You want product, data science, and engineering in the room. However, unlike query triage, you want to keep the meeting size reasonably small. It should be decision-makers only in the room, along with the experiment owner. Everyone else can provide feedback to the decision makers, and be informed after the decision is made.
The approach I like to take for the meeting is to have the experiment owner present the results and/or walk the stakeholders through the document, the results, and the implications. What were we trying to achieve, what were the results, and were we successful? Based on that, the stakeholders can weigh in — does this help us achieve strategic objectives, is it aligned with our brand, does it align with our values as an organization, will it help the company long term? These are all important questions that need to be addressed. It’s important for the experiment owner to have that data ready, where possible.
Typically, when we look at search experiments, results are often mixed. Perhaps top-line results are positive, but maybe it’s because we’ve improved the experience for established users at the expense of harming the user experience of new users. Would we want to ship in that case? Or maybe there are considerations around certain segments. If users who buy expensive items like cars get more engaged, we may increase revenue, even though buyers of housewares show a decrease in purchasing. There may be many more housewares buyers than car buyers, but the overall revenue tilts towards car shoppers. These are just a couple of examples, I’m sure you can think of many more for your own domain.
The point here is that you need to have all the data available to the key decision-makers and let them have the discussion, led by the experiment owner and the data science team. The decision might be “ship”, it could be “don’t ship”, or maybe something like “continue collecting data”, or “ship, but only in XYZ category”. The goal is to come to an agreement on what should be done, and to do so with rigor in our decision-making process. Experiments can be messy, and to try to set clear go/no-go goals only in the context of experimentation and statistics misses the reality of business: that we have to be data informed but not necessarily data-driven. Or maybe that we should be data driven, but some of that data comes from outside the narrow experimental context.
The launch review meeting is that opportunity. It’s the time and place to lay it all on the table and say “this is the case for and against this product change.” It provides accountability for the experiment-owning team, insight for the stakeholders, and a chance to get input from disparate parts of the organization.
Launch reviews aren’t only for search changes, but search is the domain where we often have the most data. Getting all of that data in one place, being rigorous and explicit in our decision-making process are the keys to having a successful experimentation program.