Survival of the Best Fit

From: software-engineer@bestfit.com

Subject: Hiring Algorithm

You asked us how we can hire faster. So we built a hiring algorithm using machine learning. Basically, we will teach a computer to hire like you, but way faster!

How does that work?

That's great.

First, the algorithm will read through past applicants' CVs and whether they were hired or not. It will then learn what makes a candidate good or bad by copying your hiring decision process!

A machine will think like me?

Works for me

It’s impossible for the program to know good or bad candidates without human input - we first need to give it a lot of data to learn from.

Where do we get the data?

How much is a lot?

I need your help: can you send me the CVs of all applicants you’ve evaluated so far? Click on the file named "cv_all.zip" on your desktop

Status: no attachments

Thanks! Machine learning algorithms get more accurate with more data, so here’s what we’ll do: use big tech companies' data. They have huge applicant records, so we can merge our CVs with theirs and train our model! Choose a company you think hires smart people.

Google
Apple
Amazon

Thats it! We can now train the algorithm with a lot of past data and put it to use!

Great, let's train it!

calendar.doc

cv_all.zip

best-fit.pdf

HIRING ALGORITHM

From: software-engineer@bestfit.com

Subject: Hiring Algorithm Ready!

Our algorithm has been trained. Your job is automated now, so enjoy the ride!

Great.

From: software-engineer@bestfit.com

Subject: Hiring Algorithm

We're trying to figure out what's wrong with the algorithm.

Let's break down its decisions by orange and blue?

Here they are; what do you think?

Accepted Orange/Blue Makeup

Rejected Orange/Blue Makeup

Average Orange Person Performance

Average Blue Person Performance

We're rejecting more blue people.

This isn't biased.

Lets find out how! Do you remember how we first trained the algorithm?

I sent you my decisions for the algorithm to mimic me.

I don't care, fix it!

Look at our data from manual hiring:

Accepted Orange/Blue Makeup

Rejected Orange/Blue Makeup

Average Orange Person Performance

Average Blue Person Performance

I hired a lot more orange people.

I'm sure I wasn't biased!

We should have checked the data.

But the CVs didn't have colors on them!

We should have also checked the quality of the big company dataset you sent me! How am I supposed to understand hiring decisions? I’m a software engineer!

We should've worked together more and been more careful...

calendar.doc

cv_all.zip

best-fit.pdf

Taking A Step Back

As a recruiter, which of these attributes did you value the most?

Education

Experience

Ambition

Skills

This is a simplified simulation, but we hope it got you thinking. Now imagine how difficult it is to choose for a real-life recruiter!

When there are many unknowns, people tend to rely on their gut feelings, which is just an expression of their biases and preconceptions.

You might not be consciously discriminative, but in the early days of your company you received more highly qualified orange applicants. The unknowns and the biased environment can make well-intentioned decisions biased. This is what the industry needs to realize.

Training the Biased Algorithm

When the engineering team contacted you, they asked you for your decisions, in cv_all.zip , because a machine learning (ML) algorithm requires human input.

That means if you had biases, the software would replicate them! But what if you were as objective as you could be?

Your data alone wasn't enough to build a ML algorithm, because machine learning works only on large amounts of data. That’s why the software engineer asked you to choose a larger dataset. The problem is, all datasets are built by humans who tend to be biased.

In this case, it largely consisted of applicants from Orange Valley, where more people were historically allowed to work in tech.

Behind the Difficulties

Can we disregard race or gender?

There are no buttons on ML algorithms and the input data is often difficult to debias. While applicants don’t put race or gender explicitly on their CVs, they are often manifested through the colleges, communities, and organizations certain demographics are usually a part of.

When a program learns from years of data, we need to question the data it is learning from. Just because a decision is presented as 'automated' doesn't mean it is right or objective.

Your startup Bestfit might've failed today, but there's no reason our societies should! Head to our website to learn more. We hope it'll make you discuss and ask questions.

BECOME AN EXPERT

Click on resumes to

investigate decisions

Dataset Inspector

Task: Find out why Elvan Yang was rejected

Accepted Candidates

Click on resumes to

investigate decisions

Rejected Candidates

Taking A Step Back

Training the Biased Algorithm

Behind the Difficulties

Tilt to fit