A new tech tool uncovers ways organizations can eliminate bias during the hiring process.
By Judd B. Kessler and Corinne Low
A growing body of evidence suggests that hiring managers and recruiters display bias against underrepresented minorities. These findings have come from a research method called a “resume audit.” The idea is simple.
Researchers send fake resumes that vary only in the candidate’s name to a large sample of employers, revealing clues about the candidate’s race and gender. A randomly selected group of employers receives a resume with a male or female name from one demographic group while another group of employers receives a resume with a name from a different demographic group. Researchers then analyze differences in response rates between demographic groups.
A prominent example is a 2004 study of racial bias titled, “Are Emily and Greg more employable than Lakisha and Jamal?” by Marianne Bertrand and Sendhil Mullainathan. The researchers of that study asked whether the resume with the name Jamal got fewer voicemails inviting him for interviews than the same resume with the name Greg. Indeed, it did.
The resume audit method hasn’t changed much since 2004. But despite its longevity and success at uncovering discrimination, this approach has a few major limitations. The fi rst is that it requires deception. Hiring managers and recruiters do not know they are participating in a research study in which fake resumes are being passed off as real ones. It is hard enough to sort through hundreds or thousands -or hundreds of thousands -of resumes to identify potential recruits. That task is made even more arduous when some of those resumes are fake.
The second limitation is that researchers can only observe the decision to “call back” a candidate, or follow up with a candidate for an interview. But making it to the interview stage may not be a perfect measure for how much a firm likes a candidate. Since no one wants to waste time and resources pursuing a candidate who would not end up taking the job, calling a candidate back reflects both desirability and interest in whether a candidate is realistically “gettable.”
As a case in point, a recent resume audit study looked at the effect of unemployment (where being unemployed was varied on the resume instead of name) on candidate call back rates and found that firms called back unemployed candidates at higher rates than employed ones. Presumably, experience has taught hiring managers that unemployed candidates are more likely to be responsive to job offers, and thus better targets for their recruitment energy.
The third limitation of the resume audit method is that researchers can only study the hiring practices of firms that respond to unsolicited resumes. Many firms hire through partnerships with schools or other feeder organizations, automatically reducing the sample size of potential study participants. Researchers cannot send too many fake resumes to one firm for fear of being discovered, and cannot learn all that much from any single company.
But now, along with Colin Sullivan, a postdoctoral candidate at Stanford, we have developed a new approach to measure firm preferences and detect bias in hiring practices: incentivized resume rating (IRR). Rather than putting the interests of firms and researchers in conflict, IRR combines those interests. Employers are invited to evaluate a set of resumes that they know to be hypothetical. However, there’s good reason for the hiring managers and recruiters to evaluate these hypothetical candidates carefully: Their responses are used to match them with real candidates.
IRR sifts through piles of resumes using the data generated from their ratings of 40 hypothetical candidates. This is done by combining the hypothetical resumes with a machine learning algorithm that identifies what characteristics each firm is looking for in job candidates (for example, a prestigious summer experience, a high GPA, or specific majors) and then finds the best matches among available candidates. The more carefully the hiring managers and recruiters evaluate the hypothetical resumes, the better the algorithm will be at finding them real candidates that will likely be a match.
This first implementation of IRR invited firms engaging in on-campus recruiting at the University of Pennsylvania to evaluate hypothetical resumes created using real resume components -GPA, major, internships, extracurricular activities, and skills from the resumes of real Penn students -along with names that indicated race and gender. These components could be randomly combined in hundreds of millions of ways, allowing IRR to identify the influence of individual candidate characteristics on employer preferences. The responses were then used to recommend real Penn graduating seniors who fit each organization’s needs.
What did the research find? Through surveys, the firms recruiting at Penn expressed a seemingly genuine desire to hire diverse candidates. And yet, IRR identified ways they might be handicapping their own efforts.
- Firms hiring in STEM fields rated candidates with female and minority names significantly lower than candidates with white male names. In fact, a female or minority candidate with a 4.0 GPA received the same rating as a white man with a 3.75 GPA. Plus, female and minority candidates received less credit for prestigious internships from firms in all fields. Results suggest that these biases are most likely subconscious and get more pronounced when employers are fatigued from rating more resumes in a row.
- The IRR diagnostic tool asked about a firm’s interest in hypothetical candidates and the likelihood those candidates would accept a job if offered. IRR found that firms expect women to be harder to hire.
- IRR findings also show that firms were not particularly going after women and minorities. At best, firms looking to hire in the social science and business fields displayed no gender preference for diverse candidates while firms hiring STEM candidates displayed a bias against them.
Given this potential mismatch between the goals of the firm and the reality in the trenches, IRR can deliver big benefits for organizations interested in investigating -and improving -their hiring practices. IRR could serve as a useful diagnostic for firms internally checking whether individuals display subconscious bias in their evaluation of resumes.
In particular, firms could have their hiring managers use the diagnostic tool and then work with researchers to analyze their data from IRR to identify whether bias is present. But they could also go beyond this and use the tool to help correct any bias. This could be done by using the preference data elicited by IRR to help screen real candidates without preferences over race and gender. As was done in the study at Penn, the tool could identify which characteristics of candidates (such as applicant education, work experience, or skills) the firm particularly values and then screen real candidates by looking for these characteristics while ignoring name and any other indicators of race and gender that might trigger bias.
Even when there is no evidence of bias, IRR can be a useful tool for firms to learn whether leadership priorities have been properly communicated to hiring managers and recruiters. Having both executives and front-line staff members evaluate resumes using the IRR diagnostic tool allows the firm to identify whether the two groups place the same value on candidate characteristics such as applicant education, work experience, or skills. Finding that the groups’ preferences diverge would allow the firm to improve their alignment in recruiting.
IRR allows a peek under the hood at firms engaging in on-campus recruiting. These big, prestigious firms value diversity, but major roadblocks remain.
Corinne Low and Judd B. Kessler are professors of business economics and public policy at the University of Pennsylvania’s Wharton School.