Statistical and human-centered approaches to search engine improvement

6 min readMay 8, 2020

For search product teams, search is like baseball, not golf.

In golf, you get a little closer to the hole with every stroke (unless you are as bad at golf as I am)

search is like baseball, the more chances we take the more likely we’ll be successful — Photo by Chris Chow on Unsplash

In baseball, you come up to the plate never knowing if it will be a hit, a walk, a strikeout, or a home run. So, the goal in search is to get as many “at-bats” as possible. The more chances you take, the more opportunities you’ll have for that home run. For search teams, working on search relevance improvement is the same. We only get so many chances to improve relevance for our users, and the more chances we take, the more likely it is we’ll make some relevance improvements to our search engine. Not to torture the metaphor too far, there are a couple of distinct strategies to making search improvements, like knowing which pitch to swing at and when to bunt.

One strategy to improve a search engine doesn’t even require that you speak a word of the language your search engine operates in. Simply observe the behaviors of your audience and tune your algorithms to maximize those behaviors. We’ll call that the “statistical approach” to search, though you could call it “metrics-centered” or “metrics-driven”. It’s a method where we can leverage tools like machine learning and treat our users as a black box. The metrics-centered approach is concerned about one thing, improving metrics (such as DCG or MAP). How we get there is irrelevant. The metrics chosen when tuning a search engine in this way are paramount. Good metrics must be selected that accurately reflect the user experience. At eBay, we had an issue where if we chose “number of items sold” as our metric of success, then we would often find ourselves promoting cheaper accessories. I’ll address metrics selection more in a subsequent blog post.

The other strategy of search improvement requires a deep understanding of the users, the tasks they are trying to accomplish, and how best to address those needs. This is the human-centered or user-centered approach.

The tools of the user-centered approach to search relevance improvement are very similar to any other product: talking to users, understanding their needs, finding points of failure. What’s different about a user-centered approach to relevance is you need to be able to tie it back to technical issues. Therefore an understanding of the types of tasks your users are doing is vital, but so is understanding how a search engine works,

For search to operate most effectively, you need both parts to be working well, independently and in conjunction. The statistical approach should be tuning multiple different inputs, looking for signals, and tying those signals to specific outcomes. The human approach should be talking to users, finding problems, looking at user logs, understanding the magnitudes of problems, and finding solutions. The solution might be one (or more) of the statistical approaches. It might be a new approach. Sometimes it can be simple, sometimes it will be complex.

I’ve used an eBay example previously, so please forgive me if I use another. One of the most successful algorithm tweaks we engaged with when I was there was from an insight that a couple of the ‘business’ guys in the UK had come up with. They had noticed that when a user searched for an appliance, say a refrigerator, they were swamped with parts and accessories and couldn’t find any actual fridges. It’s also easy to tell that something is an appliance vs a part because there is a big difference in prices. So why not boost the items that are in the correct price bracket? A solid user-centered insight! Now comes the technical challenge: how do you figure out the right price bracket for every query? First, break the problem down: what’s the right price bracket per category then what’s the right category per query. We already had the latter, but building the former took some serious effort leveraging user logs to understand what price points were the right price points in each category, and it wasn’t done just by looking at how many items were purchased or clicks. It took some machine learning on the prices, aspects, etc to really understand what the right price was. It was the statistical approach in action! This algorithm change ended up delivering a big win for eBay and for users, who didn’t have to slog through pages of accessories to find the items they wanted, because the user-and-metrics-centered approaches worked together in conjunction.

A generalization of this combined approach might go like this: PM talks with UX researchers who say “5 people we’ve brought into the lab have said X is a problem in search”. PM talks to some users to better understand what X problem is, and formulates some hypotheses about solving that problem. She then talks to the Analytics team to understand “how big a problem is X” and it looks like queries with X have very low DCG and account for 20% of searches! Now that we have an opportunity size, our PM can properly prioritize solving X. Now, we have to talk to the engineering team about how hard it will be to solve, X (but that’s another post for another time). In our real and hypothetical example, we used both human-centered and metrics driven approaches to understand the problem, size, and solve those problems.

Without both approaches, the metrics-driven approach will end up creating a search service that optimizes for metrics without addressing user value. Without the metrics-centered approach, the user centered approach is just an intuition that may not drive broad user impact.

When these two strategies work together in harmony, search can become an even more powerful lever in improving the quality of the user experience. The best examples of this are organizations where statistical approaches are hand-tuned by humans to solve particular relevance needs. Google was well-known for using human tuning for their search algorithm, though there are many machine-learned or statistical inputs to that algorithm. It’s often up to human team members to determine how much of any one statistical signal makes it into the final algorithm¹²³.

If done right, there is a mix of technology “push” (we have a new LTR algorithm to try) with user-need “pull” (people are struggling to find a product in a sea of accessories for that product). If it sounds like I’m describing roles for engineers and product managers in search organizations, it’s because I am! Search engineers can be championing new forms of technology that represent the cutting edge of the statistical approach, while search PMs are discovering what users truly need and want (which might be different than what they say they need or want).

This is where a collaboration of search engineers and product managers can really shine: a frequent collaboration and discussion of what are the users’ needs, how can we best address those needs alongside what are the levers of technology that we can pull that will address those needs. It’s not “one or the other”, it’s “yes, and!”

To go back to our baseball metaphor, a human-centered approach to search is like the batter understanding the pitcher’s intent, knowing the pitcher will try to throw a strike on the 3–2 pitch. The statistical approach is knowing that this pitcher throws down-and-away 65% of the time when the count is loaded. Having both pieces of information will give the batter a really good idea of where to swing, before the pitcher has even wound up.

For our batter (the search team), having a good idea of where to swing means we have a better chance of releasing a successful algorithm update and a better shot of “knocking it out of the park!” Combining a statistical and user-centered approach to search will increase your odds of getting a home run with your relevance program, too!

I owe a big thank you to Daniel Tunkelang for reading this article and providing useful feedback.

[1] https://towardsdatascience.com/when-to-use-a-machine-learned-vs-score-based-search-ranker-aa8762cd9aa9
[2] https://www.quora.com/Why-is-machine-learning-used-heavily-for-Googles-ad-ranking-and-less-for-their-search-ranking-What-led-to-this-difference
[3]https://www.wsj.com/articles/how-google-interferes-with-its-search-algorithms-and-changes-your-results-11573823753

Statistical and human-centered approaches to search engine improvement

Written by James Rubinstein

No responses yet