Multi-Armed Bandit powered recommendation engines

The August edition of Design+AI was jam packed with theories, terminology, and ideas. We welcomed Inmar Giovoni — current Autonomy Engineering Manager at Uber ATG, and former Head of Data Science at Kobo — to share a case study with us. Inmar talked us through the thinking and decisions her team made when using reinforcement learning to suggest ebooks to their customers. It took a little longer than usual to get this write up out — here is a roundup of the discussion from the event.

inmar design+ai

Multi-Armed Bandit

Inmar’s team wanted a data-driven way to determine the optimal arrangement of book carousels on the Kobo website to drive book purchases. This was a challenge in terms of knowing which variables to include on the website, as well as what arrangement to display these variables, per customer segment. So, she used the multi-armed bandit algorithm.

Multi-Armed Bandit (MAB) algorithms are a form of reinforcement learning. And the MAB problem comes from slot machines, a.k.a. one-armed bandit. Imagine you are at the casino in front of a row of slot machines. You want to maximize your winnings and have a limited amount of money to gamble. Since you have no prior knowledge about which machines pay out more often, you just start playing them; adjusting which machines you play, in what order, and how often — in order to bias towards playing the machines that maximize your reward. That is the concept behind MAB. That you try a number of options, but once you have a sense of which one is offering the most success, you play that one more than the others (also known as the exploration exploitation tradeoff — more on that below).

So, Inmar’s team took this method and applied it to organizing the variables on their e-book website — to figure out which book carousel combinations and recommendations resulted in more online book sales, per customer segment.

MAB is different (better!) than AB Testing in two major ways.

One of the most exciting things about MAB is that it enables you to boost intended outcomes while testing is still live. For Inmar’s team, this meant that once a particular combination of book carousels (genre, popular now, recommended for you) started to perform well, the algorithm would boost that combination as well as decrease the number of customers seeing a combination that was performing less well. This is interesting for companies as it means that money is not lost during testing. In contrast, with a/b testing, even if option b is yielding a less desirable outcome (less sales), half of customers are shown this option until the end of the test — meaning that the company is potentially missing out on sales that they could have captured if they had sent customers to option a.

As you might imagine, maximizing profits while testing is desirable for companies. MAB also has healthcare applications — for example, when running clinical trials for a new pharmaceutical drug. MAB used in this setting mean that, if the new drug being tested is creating good results for participants, the trial administrators can allow more people to get actual medicine and fewer participants receive the placebo. Thus, MAB has the potential to save lives.

Furthermore, MAB can be useful where there are numerous possible combinations to consider — more than would be possible to test in an a/b test or very costly and time consuming to do so. This was the case for the Kobo website; there were so many combinations of different ways to display book options on the website that it would have taken too much time to test all of these using an a/b test.

Contextual Multi-Armed Bandit

Inmar’s team recognized the diversity of people’s reading tastes and wanted to create more personalization in their algorithm — to make it smarter and more delightful over time. So, they tweaked the MAB to be a Contextual Multi-Armed Bandit. How it worked was: If I am someone who normally reads mystery novels (bucket a), sometimes reads leadership and management books (bucket b), and once in a blue moon reads a biography (bucket c), the contextual MAB starts to take these preferences into account.

This tweak meant that the algorithm would generally show me books and carousels that match the segmentation that I am closest to (bucket a and b), but occasionally it would recommend books out of categories that I very rarely choose from (bucket c). This starts to mimic what people do in real life and allow for the serendipity of seeing books that I might be interested in but are outside of my usual buying behaviour.

Learning about the contextual MAB made me appreciate my Netflix recommendations and Spotify Discover playlists a little bit more.

Some Terminology

Cold Start Problem

What happens when a company wants to recommend a book, a movie, a song to a new customer — but they don’t know anything about what the customer likes/dislikes, their behaviours, their purchasing habits, or their routines? That is the cold start problem: the challenge of not having information about a new users/customer when they first join a platform, and thus, it is difficult to segment them. It is also the reason that recommendations improve the longer you use the platform — the better the algorithm knows you and your preferences, the better the recommendations can be.

Exploration / Exploitation

Imagine you’re at a restaurant. Do you order the thing on the menu that you know you’ll like? Or, do you risk trying something new in case you’re missing out? At what point do you decide that you’ve tried enough different things and you just want the turkey burger? There is a similar trade-off happening with the MAB. The exploration exploitation trade-off in reinforcement learning is illustrated in the multi-armed bandit problem, where the algorithm must decide between acquiring new knowledge and maximizing reward.

Where else have you seen the Multi-Armed Bandit used? What about the Contextual MAB? 


inmar design+ai 2


In writing this post, I discovered a great podcast on data science and machine learning. They had some great episodes on the multi-armed bandit and reinforcement learning.


Designing a Smart Car Virtual Assistant

Imagine a smart car virtual assistant that would help you with directions and finding parking. This is exactly what Jane Motz and Geoffrey Hunter of Tribal Scale designed and prototyped. This post shares Jane and Geoff’s top tips when designing a smart car console: considerations around the cognitive load of the driver, entering and exiting the modality, regional linguistic variations, and how to build off what other’s have already done.

Read More

AI-first and AI-augmented products

This post explores machine learning considerations of two products. The first is Wattpad, an ai-augmented storytelling platform that, on the reader-side makes story recommendations and on the writer side assists with story tagging. And, the second is Dango, an ai-first smartphone emoji-suggester, which required designers to develop algorithm empathy (the algorithm only knows what you tell it) and required Dango's CEO to hand label 30,000 data points because the appropriate data set didn’t exist. This post was co-authored by Satsuko VanAntwerp and Scott Wright.

Read More

Design and AI: Where do we start?

Toronto is poised to become a hot spot for the fast-growing field of AI. While most of the current practitioner level conversation is engineering-centric, we’re interested in the intersection of AI and design. How is AI shaping human experiences? What does the work of designers look like in an AI-first future? This post explores a number of topics, including:

  • What do we mean by the term AI?
  • Our fear of sentient AI taking over stems from our sense of self importance as a species.
  • Designing for AI forces us to self reflect and confront the systemic bias in our societies.
  • We'll adjust to AI; soon it will be NBD.
  • Sometimes transparency doesn’t matter (there is likely bias in the data: racism, sexism, etc.)
  • AI has the potential to help us do our jobs better and do more of what we enjoy most.
Read More