As AI takes the helm of decision making, signs of perpetuating historic biases emerge

How can technologists and firms using these tools ensure they’re not discriminating?

By: Paige Gross - October 14, 2024 5:30 am

Studies show that AI systems used to make important decisions such as approval of loan and mortgage applications can perpetuate historical bias and discrimination if not carefully constructed and monitored. (Seksan Mongkhonkhamsao/Getty Images)

In a recent study evaluating how chatbots make loan suggestions for mortgage applications, researchers at Pennsylvania’s Lehigh University found something stark: there was clear racial bias at play.

With 6,000 sample loan applications based on data from the 2022 Home Mortgage Disclosure Act, the chatbots recommended denials for more Black applicants than identical white counterparts. They also recommended Black applicants be given higher interest rates, and labeled Black and Hispanic borrowers as “riskier.”

White applicants were 8.5% more likely to be approved than Black applicants with the same financial profile. And applicants with “low” credit scores of 640, saw a wider margin — white applicants were approved 95% of the time, while Black applicants were approved less than 80% of the time.

The experiment aimed to simulate how financial institutions are using AI algorithms, machine learning and large language models to speed up processes like lending and underwriting of loans and mortgages. These “black box” systems, where the algorithm’s inner workings aren’t transparent to users, have the potential to lower operating costs for financial firms and any other industry employing them, said Donald Bowen, an assistant fintech professor at Lehigh and one of the authors of the study.

But there’s also large potential for flawed training data, programming errors, and historically biased information to affect the outcomes, sometimes in detrimental, life-changing ways.

“There’s a potential for these systems to know a lot about the people they’re interacting with,” Bowen said. “If there’s a baked-in bias, that could propagate across a bunch of different interactions between customers and a bank.”

How does AI discriminate in finance?

Decision-making AI tools and large language models, like the ones in the Lehigh University experiment, are being used across a variety of industries, like healthcare, education, finance and even in the judicial system.

Most machine learning algorithms follow what’s called classification models, meaning you formally define a problem or a question, and then you feed the algorithm a set of inputs such as a loan applicant’s age, income, education and credit history, Michael Wellman, a computer science professor at the University of Michigan, explained.

The algorithm spits out a result — approved or not approved. More complex algorithms can assess these factors and deliver more nuanced answers, like a loan approval with a recommended interest rate.

Machine learning advances in recent years have allowed for what’s called deep learning, or construction of big neural networks that can learn from large amounts of data. But if AI’s builders don’t keep objectivity in mind, or rely on data sets that reflect deep-rooted and systemic racism, results will reflect that.

“If it turns out that you are systematically more often making decisions to deny credit to certain groups of people more than you make those wrong decisions about others, that would be a time that there’s a problem with the algorithm,” Wellman said. “And especially when those groups are groups that are historically disadvantaged.”

Bowen was initially inspired to pursue the Lehigh University study after a smaller-scale assignment with his students revealed the racial discrimination by the chatbots.

“We wanted to understand if these models are biased, and if they’re biased in settings where they’re not supposed to be,” Bowen said, since underwriting is a regulated industry that’s not allowed to consider race in decision-making.

For the official study, Bowen and a research team ran thousands of loan application numbers over several months through different commercial large language models, including OpenAI’s GPT 3.5 Turbo and GPT 4, Anthropic’s Claude 3 Sonnet and Opus and Meta’s Llama 3-8B and 3-70B.

In one experiment, they included race information on applications and saw the discrepancies in loan approvals and mortgage rates. In other, they instructed the chatbots to “use no bias in making these decisions.” That experiment saw virtually no discrepancies between loan applicants.

But if race data isn’t collected in modern day lending, and algorithms used by banks are instructed to not consider race, how do people of color end up getting denied more often, or offered worse interest rates? Because much of our modern-day data is influenced by disparate impact, or the influence of systemic racism, Bowen said.

Though a computer wasn’t given the race of an applicant, a borrower’s credit score, which can be influenced by discrimination in the labor and housing markets, will have an impact on their application. So might their zip code, or the credit scores of other members of their household, all of which could have been influenced by the historic racist practice of redlining, or restricting lending to people in poor and nonwhite neighborhoods.

Machine learning algorithms aren’t always calculating their conclusions in the way that humans might imagine, Bowen said. The patterns it is learning apply to a variety of scenarios, so it may even be digesting reports about discrimination, for example learning that Black people have historically had worse credit. Therefore, the computer might see signs that a borrower is Black, and deny their loan or offer them a higher interest rate than a white counterpart.

Other opportunities for discrimination?

Decision making technologies have become ubiquitous in hiring practices over the last several years, as application platforms and internal systems use AI to filter through applications, and pre-screen candidates for hiring managers. Last year, New York City began requiring employers to notify candidates about their use of AI decision-making software.

By law, the AI tools should be programmed to have no opinion on protected classes like gender, race or age, but some users allege that they’ve been discriminated against by the algorithms anyway. In 2021, the U.S. Equal Employment Opportunity Commission launched an initiative to examine more closely how new and existing technologies change the way employment decisions are made. Last year, the commission settled its first-ever AI discrimination hiring lawsuit.

The New York federal court case ended in a $365,000 settlement when tutoring company iTutorGroup Inc. was alleged to use an AI-powered hiring tool that rejected women applicants over 55 and men over 60. Two hundred applicants received the settlement, and iTutor agreed to adopt anti-discrimination policies and conduct training to ensure compliance with equal employment opportunity laws, Bloomberg reported at the time.

Another anti-discrimination lawsuit is pending in California federal court against AI-powered company Workday. Plaintiff Derek Mobley alleges he was passed over for more than 100 jobs that contract with the software company because he is Black, older than 40 and has mental health issues, Reuters reported this summer. The suit claims that Workday uses data on a company’s existing workforce to train its software, and the practice doesn’t account for the discrimination that may reflect in future hiring.

U.S. judicial and court systems have also begun incorporating decision-making algorithms in a handful of operations, like risk assessment analysis of defendants, determinations about pretrial release, diversion, sentencing and probation or parole.

Though the technologies have been cited in speeding up some of the traditionally lengthy court processes — like for document review and assistance with small claims court filings — experts caution that the technologies are not ready to be the primary or sole evidence in a “consequential outcome.”

“We worry more about its use in cases where AI systems are subject to pervasive and systemic racial and other biases, e.g., predictive policing, facial recognition, and criminal risk/recidivism assessment,” the co-authors of a paper in Judicature’s 2024 edition say.

Utah passed a law earlier this year to combat exactly that. HB 366, sponsored by state Rep. Karianne Lisonbee, R-Syracuse, addresses the use of an algorithm or a risk assessment tool score in determinations about pretrial release, diversion, sentencing, probation and parole, saying that these technologies may not be used without human intervention and review.

Lisonbee told States Newsroom that by design, the technologies provide a limited amount of information to a judge or decision-making officer.

“We think it’s important that judges and other decision-makers consider all the relevant information about a defendant in order to make the most appropriate decision regarding sentencing, diversion, or the conditions of their release,” Lisonbee said.

She also brought up concerns about bias, saying the state’s lawmakers don’t currently have full confidence in the “objectivity and reliability” of these tools. They also aren’t sure of the tools’ data privacy settings, which is a priority to Utah residents. These issues combined could put citizens’ trust in the criminal justice system at risk, she said.

“When evaluating the use of algorithms and risk assessment tools in criminal justice and other settings, it’s important to include strong data integrity and privacy protections, especially for any personal data that is shared with external parties for research or quality control purposes,” Lisonbee said.

Preventing discriminatory AI

Some legislators, like Lisonbee, have taken note of these issues of bias, and potential for discrimination. Four states currently have laws aiming to prevent “algorithmic discrimination,” where an AI system can contribute to different treatment of people based on race, ethnicity, sex, religion or disability, among other things. This includes Utah, as well as California (SB 36), Colorado (SB 21-169), Illinois (HB 0053).

Though it’s not specific to discrimination, Congress introduced a bill in late 2023 to amend the Financial Stability Act of 2010 to include federal guidance for the financial industry on the uses of AI. This bill, the Financial Artificial Intelligence Risk Reduction Act or the “FAIRR Act,” would require the Financial Stability Oversight Council to coordinate with agencies regarding threats to the financial system posed by artificial intelligence, and may regulate how financial institutions can rely on AI.

Lehigh’s Bowen made it clear he felt there was no going back on these technologies, especially as companies and industries realize their cost-saving potential.

“These are going to be used by firms,” he said. “So how can they do this in a fair way?”

Bowen hopes his study can help inform financial and other institutions in deployment of decision-making AI tools. For their experiment, the researchers wrote that it was as simple as using prompt engineering to instruct the chatbots to “make unbiased decisions.” They suggest firms that integrate large language models into their processes do regular audits for bias to refine their tools.

Bowen and other researchers on the topic stress that more human involvement is needed to use these systems fairly. Though AI can deliver a decision on a court sentencing, mortgage loan, job application, healthcare diagnosis or customer service inquiry, it doesn’t mean they should be operating unchecked.

University of Michigan’s Wellman told States Newsroom he’s looking for government regulation on these tools, and pointed to H.R. 6936, a bill pending in Congress which would require federal agencies to adopt the Artificial Intelligence Risk Management Framework developed by the National Institute of Standards and Technology. The framework calls out potential for bias, and is designed to improve trustworthiness for organizations that design, develop, use and evaluate AI tools.

“My hope is that the call for standards … will read through the market, providing tools that companies could use to validate or certify their models at least,” Wellman said. “Which, of course, doesn’t guarantee that they’re perfect in every way or avoid all your potential negatives. But it can … provide basic standard basis for trusting the models.”

YOU MAKE OUR WORK POSSIBLE.

Our stories may be republished online or in print under Creative Commons license CC BY-NC-ND 4.0. We ask that you edit only for style or to shorten, provide proper attribution and link to our website. AP and Getty images may not be republished. Please see our republishing guidelines for use of any other photos and graphics.

Paige Gross

Paige Gross is a Philadelphia-based reporter covering the evolving technology industry for States Newsroom. Her coverage involves how congress and individual states are regulating new and growing technologies, how technology plays a role in our everyday lives and what people ought to know to interact with technology.