- Posted by Intent Media 17 Feb
- 0 Comments
In machine learning (ML), we build models. These are programs that make decisions based on data. Typically, they learn from some set of past data and are used on new data as it comes in. Over the lifetime of the field, researchers have developed many approaches to learning from data. These techniques (i.e. algorithms) are things you may have heard of: linear regression, naive Bayes, decision trees, and so on.
One of the most important developments in the history of machine learning has been the development of ensemble learning methods. At its simplest, ensemble methods are just using more than one model to make a single decision.
One of the more interesting consequences of using multiple models in an ensemble is that you may not be getting the supposed benefits of an ensemble, even though you nominally have multiple models. How could this happen? A lack of diversity.
Let’s take for an example a simple classification problem. We’ll try to determine if my dog looks cute in this dress:
I’ve asked a bunch of people, and not one of has ever said that she looks anything but very, very cute in this dress. So while we have multiple decision makers, we’re not actually seeing a diversity of opinion.
The potential for lack of diversity has all sorts of implications for the use of ensemble models. Put in terms of code, you can view the situation above being analogous to the following two models:
(defn model-1 [dog] (is-dog-cute? (is-dog-in-a-dress? dog))) (defn model-2 [dog] (is-dog-in-a-dress? (is-dog-cute? dog)))
Even if you don’t know Lisp, I think that it’s not too hard to understand that these two models are exactly equivalent. Each one will always return the same decision given the same data. Given a dog, who is cute and is in a dress, these functions will always return true. This means that we don’t really have two models here; we only have one, written in two different syntaxes.
This observation leads to some interesting follow-ons. If these two models are equivalent, just written in a different order, then we could come up with a standard syntax, called a normal form, that always imposes the same ordering on the function applications (e.g. checking to see if the dog is cute, then checking if the dog is in a dress). A normal form for expressing these models makes it trivially simple to see when two models are in fact identical. This process of going from multiple models into a normal form that allows the redundancies to be discarded is called reduction.
So, sometimes we think that we have multiple models, but since they all agree, all of the time, we don’t. All we have is likely a slower, more complex way of expressing the same decision making process.
Diversity in startup teams
Zooming out of this somewhat abstract discussion of machine learning, it’s easy to see how these concepts have clear corollaries in our working life. Consider Max Levchin’s infamous lecture on how PayPal succeeded because it lacked diversity:
The notion that diversity in an early team is important or good is completely wrong. You should try to make the early team as non-diverse as possible.
The more diverse the early group, the harder it is for people to find common ground.
While I object to most of what he has to say on this topic, I do understand the challenge he’s trying to characterize. It’s hard to get a small group of strangers to work together to build something that’s never existed before. Developing a shared vision of a startup and its technology is incredibly hard stuff, and anyone who tells you differently probably hasn’t tried to do it.
So Levchin’s view is that you need a team full of nerdy men (no women allowed!) who all reach the same conclusions quickly. This allows them to build trust and get to common ownership of the problems of the startup quicker than a more diverse group. To me, this sounds a great deal like our multiple models reducing to a single model. It also results in an organizational plan that is classist, sexist, and a bit racist, but hey, Levchin made a bunch of money, so who am I to disagree with his perspective?
Instead, I’ll call out the work of the incredible Scott E. Page, in particular, his book The Difference. In it, he systematically defines and develops a theory of diversity and demonstrates its utility in problem solving contexts.
In Page’s formulation, there are diverse perspectives and diverse heuristics. You can think of perspectives as views of the world, whereas heuristics are closer to approaches to a given problem. To understand how the two work, let’s look at the following situation.
These two canine citizens want to exercise their rights to participate in the American system of representative democracy. Although they’re both Americans, one from birth the other through naturalization, no one will let them into their polling station.
So between them, they need to find a way to slip the chains of oppression, make their voices heard, and vote for increased funding for dog parks in Manhattan.
They are, according to the definition that we’ve been using so far, absolutely an ensemble. In terms of perspectives, they do exhibit diverse perspectives. The black dog thinks that they’re not being allowed in to vote because they didn’t bring any bones for bribes, whereas the white dog thinks it’s because they’re not tall enough.
They also exhibit diverse heuristics. The black dog favors just ducking out of their collars and rushing the polling station. The white dog thinks that attempting to persuade a human voter is the best approach and has been begging to be taken in for the past ten minutes.
Whether or not these two ever get to participate in the democratic process, I think that it’s pretty clear that they’re likely better off for their diversity of perspectives and heuristics. The range of solutions that will be explored by this diverse ensemble are going to more broadly explore the problem space for potential solutions than either of the dogs would in isolation.
Of course, the Levchins of the world have a point, in some contexts. If the white dog can convince the black dog to share her perspective, then we no longer have a diverse ensemble. What we likely have is two dogs stacked up on top of each other, in a trench coat and a hat. That is bound to be an impressive feat of balance, and maybe the VCs and the public markets will reward this high-risk, high-return strategy. Maybe the polling volunteers will be won over by this bold new vision of French bulldogs in trench coats. Or maybe it will all come toppling down in a pile of fur and stolen clothes. Such are the risks of homogeneity.
In the follow-up post to this one, I’ll dive in to how we can use the example of biological evolution to help us build more diverse technology teams.
Jeff Smith is a data engineer at Intent Media working on large scale machine learning systems. He has a background in AI and bioinformatics. Intent Media is the fifth startup he’s worked at, and it’s easily the most fun one. You can find him tweeting, blogging, doing more blogging, and drawing comics all over the internet.