Unmasking Hidden Bias

By Srivathsan Karanai Margan

There is nothing artificial about artificial intelligence.
Artificial intelligence is developed and designed by humans; therefore, artificial intelligence puts a mirror on humanity.

– Murat Durmus

As artificial intelligence (AI) transforms insurance, unintended biases in data and algorithms can create hidden risks affecting fairness, accuracy, and trust. Understanding and addressing these risks is key to protecting customers and ensuring ethical AI use.

Currently, we may be at the peak of artificial intelligence hype. Every day, we see news stories claiming new AI breakthroughs. For the uninitiated (if such a rare, blissful human being exists—they must have been living under a rock for more than a decade), AI comprises a suite of technologies and applications capable of learning, reasoning, adapting, and performing.

Artificial intelligence systems (AIS) are built by humans, trained on the data humans provide, modeled on human needs, and shaped by the feedback humans give. Hence, the algorithms inherently reflect the characteristics of their creators—us. We humans inherently carry a mélange of biases, both conscious and unconscious, which get baked into these algorithms. Bias has several definitions, and its common usage is decidedly negative—referring to an unreasoned and unfair distortion of judgment in favor of or against a person, idea, or thing when compared to others.

Though the term “bias” often carries a negative and unlawful connotation, in insurance parlance, it still preserves its original, neutral meaning as the act of risk-based differentiation of customers representing different risk groups. The advent of new age connecting technologies and the resultant data deluge make it practically impossible for human effort to process the data, glean insights on risk, or initiate action. This makes the need for leveraging the use of AIS compelling. However, biased algorithms could result in incorrect insights, outputs, and/or predictions, and in the process, institutionalize discrimination.

This article discusses bias, why its meaning is different in insurance, the role AI plays with respect to bias in insurance, and how behavioral data could potentially help solve the problem of bias.

On Bias and Discrimination

Bias is defined as a predisposition or preference for one person, idea, or thing over another. Over the last several decades, the term “bias” has gained a negative connotation, implying unfair or prejudicial treatment of people belonging to protected classes based on factors that are prohibited by law. However, bias becomes problematic when it leads to outcomes that are considered unethical, immoral, inappropriate, or discriminatory. Bias can be either conscious or unconscious. When conscious, the person is aware that she or he is biased and is acting with intent. However, unconscious biases are beliefs and attitudes that operate outside of a person’s awareness and control. The person may not even be aware that they hold these biases.

The term “bias” is closely related to another term: “discrimination,” which is a narrower concept than bias.

While bias refers to a predisposition or preference, discrimination involves acting on that bias by treating someone differentially. Like bias, discrimination can also be fair or unfair. Not all forms of discrimination are inherently unjust—for instance, differentiating candidates based on skill or experience in a hiring process may be necessary and appropriate. However, discrimination becomes unfair or prejudicial when individuals from protected classes are treated differently based on characteristics such as race and color, gender, religion or belief, national origin, age, disability, genetic information, marital status, sexual orientation, or socio-economic status, rather than their individual merits. The definition of these protected characteristics is not uniform across the world; they vary by jurisdiction and the legal and social context in which the protections are applied.

Discrimination can be either direct or indirect. Direct discrimination occurs when a person is treated less favorably than another person because of a difference in a protected characteristic. Indirect discrimination, on the other hand, occurs when a policy or practice, though seemingly neutral, disproportionately disadvantages a group of people with a protected characteristic.

Bias in the Insurance World

Contrary to conventional belief, bias and discrimination need not always be undesirable. Especially in the insurance industry, these terms still retain their original, neutral meaning as acts of differentiation based on the level of risk represented by the customer. Insurance is a form of risk transfer, where the person or entity purchasing an insurance policy transfers the financial burden of potential losses to the insurance company in exchange for payment of a premium. Insurers must assess the risk being transferred, predict the probability of a covered risk event occurring within the specified contract period, estimate the cost they might incur, and calculate the appropriate premium. It is common knowledge that not all customers have the same risk levels, and hence, charging all customers the same premium is not economically sensible. Besides, such equal treatment, which penalizes low-risk customers by making them pay more and subsidizes the premium for high-risk customers, is not something low-risk customers would appreciate.

The basic philosophy of insurance is to bring relative certainty into an otherwise collection of uncertainties regarding the risk traded, factors considered to assess the risk, their accuracy levels, the prediction process, and the changing nature of risks, perils, and hazards. While it is difficult to accurately predict the probable frequency and severity of a loss that could occur within a specific period for every individual customer, it is possible to predict with reasonable accuracy how many risk events might occur within these homogeneous risk pools in a specific period. Hence, insurers assess the individual risk profiles of the customers and segregate them into separate risk pools based on the differences in their risk profiles. For example, the causal association of smoking with a higher risk of cancer, heart disease, stroke, and various lung diseases is medically well established. It is possible to predict that smokers as a group represent a higher risk class, compared with non-smokers, and on average, a certain number of increased risk events would occur from this group in a specific period. However, it is impossible to predict with any certainty that an individual customer from this smoker group will be riskier and face a risk event in a specific period.

The Academy’s Focus on AI and Data Bias

For more than a decade, the Academy has offered public comments, published actuarial analysis, and been actively engaged with federal and state stakeholders around artificial intelligence (AI) and the need to acknowledge and address data bias.

While the issue cuts across practice areas and involves both public policy and professionalism, in recent years the Academy’s Data Science and Analytics Committee (DSAC) within the Risk Management and Financial Reporting Council has been the primary source of publications
and other educational resources for actuaries and policymakers alike.

DSAC’s 2021 issue paper, Big Data and Algorithms in Actuarial Modeling and Consumer Impacts, highlighted the changes that had occurred since the 2018 monograph, Big Data and the Role of the Actuary. The issue paper offered a comprehensive framework for understanding how data and algorithmic methods can introduce unintended biases into insurance applications and outlined potential consumer impacts.

Building on this foundation, the 2023 issue brief, An Actuarial View of Data Bias: Definitions, Impact, and Considerations, offers a detailed exploration of bias sources across the actuarial modeling lifecycle—from data collection and variable selection to calibration and deployment—while also suggesting methods for mitigation and improved fairness in outcomes. That was complemented by the 2024 issue brief, Discrimination: Considerations for Machine Learning, AI Models, and Underlying Data, exploring
fairness, discrimination, and practical approaches to monitoring predictive models in insurance, including bias mitigation strategies.

Looking ahead, the Academy will continue to provide actuarial thought and perspective to policymakers, actuaries, and others in this space. As noted by Margan, the nuances for addressing biases in generative AI—and in working with big data—continue to evolve. Our focus will remain on delivering balanced, objective actuarial analysis as technology continues to change at a rapid-fire pace.

To create a customer’s risk profile, insurers consider several observable factors related to the customer or entity that is covered, the risk coverage sought, and the environment in which the interplay among these occurs. Though it is optimal to consider only the factors that are causal to risk, it is not always possible to obtain all the causal data. A major challenge faced by insurers while performing risk assessment and segregation is that of anti-selection, also called adverse selection or negative selection. Anti-selection arises due to information asymmetry, which means one party in a transaction possesses more information than the other. In the context of insurance contracts, all parties involved are required to act with the highest level of honesty and transparency, making full disclosure of all material facts. However, information asymmetry and anti-selection occur when a customer from a higher-risk category does not disclose all the facts that are material to the risk assessment or suppresses some information, to obtain insurance at the same price as one from the lower-risk category.

Anti-selection hurts the balance of insurance transactions. To mitigate this risk and improve the accuracy of predictability, insurers attempt to create a larger risk profile by including as many factors as possible that might be causal or act as proxies with a higher correlation to the risk. The cornerstone of insurance is to differentiate fairly between different groups based on the different risk characteristics they represent. The core tenet is to charge an actuarially fair premium, which means the premium charged is equal to the insurer’s expected payout, and customers bearing the same risk are charged the same price. Insurers do not have any specific motive or financial incentive to unfairly discriminate against protected classes. They simply have a compelling need to identify all the risk characteristics, predict risk accurately, discriminate fairly, and remain solvent. Insurers are exempt from anti-discrimination laws if they can provide an actuarial defense that discrimination is purely based on risk-related factors.

Even such an actuarial defense could not stand when the use of some protected characteristics was considered unfairly discriminatory and illegal. For instance, insurers used race and gender for assessing risk and charging differential premiums, as they believed these two characteristics had a very strong statistical association with risk. When the use was legally challenged as an unfairly discriminatory practice, many countries started enacting anti-discrimination laws to prevent their use. The insurance industry invoked the principle of actuarial fairness to seek exemption, but the defense was overwhelmed and, consequently, insurance companies stopped using these characteristics.

Bias in AI

The evolution of AI has been nothing short of phenomenal. Starting from its origins as a rule-based system that helped in automation, AIS have evolved into machine learning algorithms that discover patterns and insights in data, to generative artificial intelligence that is capable of generating various forms of new content, and now to agentic AI, where AIS can make decisions, take actions, and self-optimize in real-time, with minimal human intervention.

The core strength of AIS is their ability to identify patterns in data. The algorithms, including the much-hyped generative AI algorithms, are purely statistical in nature and not sentient enough to distinguish between right and wrong. These algorithms perform the tasks they are designed for. Unless the specific task requires bias or discrimination, they do not have any reason to exhibit it. The reward for any algorithm is to perform the mandated task optimally, and bias is not a metric of success. To optimize results, the machine learning algorithms may find new correlations and patterns without necessarily considering whether the basis for those relationships is fair or unfair—à la paperclip maximizer style. The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom. It depicts an advanced artificial intelligence tasked with manufacturing paperclips that would try to turn all matters in the universe, including living beings, into paperclips or machines that manufacture further paperclips.

When it comes to any AIS, bias can arise at any stage of the AI pipeline, which spans three major areas: the data used, algorithm or model design, and user interactions. The efficiency of AIS is related to the quality, diversity, and representativeness of the data they are trained on. When the training data are outdated, incomplete, biased, or non-representative, the AI model will reflect the same. Algorithm design bias can stem from programming errors, such as how a problem is defined, biased assumptions, the criteria used to make decisions, or unfair weighting factors in algorithm decision-making based on the individual’s conscious or unconscious biases. Bias from user interaction may arise when humans evaluate and validate the models or when they interact with the systems in ways that reflect their own biases.

In the last few years, generative AI (GenAI) has completely hijacked the AI narrative. Unlike traditional predictive AI, which operates on pre-existing datasets to recognize patterns and make predictions, GenAI can produce new content such as data, text, images, music, or even videos. This shift introduces a new bias paradigm. The era of organic, human-created data may already be behind us, and we are now in an age dominated by massive synthetic, system-generated data. As GenAI continues to evolve and its integration with real-world applications increases, its outputs could re-enter the AI pipeline—potentially creating a biased feedback loop. When compared to the sheer volume of AI-generated data, human-created data now pales in comparison. The new AIS will be influenced by this system-generated data and will likely reflect the biases that are subtly absorbed in the data.

The advent of agentic AI increases the possibility of algorithmic bias. While GenAI assists in executing a task, agentic AI operates with high autonomy and requires minimal or no human intervention. Depending on the requirement, the agentic AIS may involve either a single AI agent or multiple agents. Multi-agent AI systems employ multiple role-specific AI agents to understand requests, plan workflows, coordinate role-specific agents, streamline actions, collaborate with humans, and validate outputs. These multi-agent AI systems can also be hybrid AI systems that leverage both traditional and generative AI models to create more powerful and versatile systems. In such systems, the bias from one agent could be passed to another as an input to produce new manifestations and undercurrents of bias that are compounded or amplified.

The Changing Nature of Insurance

The insurance industry is synonymous with data and has always relied heavily on it to perform core functions such as risk assessment, classification, pricing, and claims payment. The explosion of connected technologies over the last decade has led to a data revolution. There has been an exponential growth in technologies that enable real-time data access directly from the source, as well as its transmission, storage, and processing. With the advent of wearables, connected devices, connected cars, and home sensors, insurers can now collect data on previously unknown risk characteristics, real-time manifestations of risk, as well as behavioral data. The industry is undergoing a major transformation as insurers are now able to assess risk more accurately, improve risk prediction, proactively intervene to prevent risk losses or promote well-being, and offer incentives for risk-reduction behaviors.

New data sources are now capable of producing highly individualized profiles of customer risk. The data-
intensive ecosystem has revolutionized the insurance industry by creating several new customer touchpoints in what was once seen as a low-touch sector. Considering the volume of data that needs to be processed and the speed at which the response are expected, it is becoming extremely difficult for humans to identify patterns in the data and derive insights. As a result, the industry is becoming
more dependent on AIS to find patterns and generate actionable insights.

Convergence of the Triad

The insurance industry thrives on bias and discrimination—albeit fair forms of it. With the influx of massive volumes of behavioral data, the need for AIS becomes irrefutable. In the insurance industry, a unique convergence of bias in AI is happening. The experimentation with or deployment of AI in insurance can be grouped under two broad categories: improving the efficiency and effectiveness of the existing activities (e.g., risk analysis, pricing, and claim assessment) or enabling new abilities to perform new functions (e.g., risk monitoring, risk prevention, predictive analytics, and personalized offerings). Across the insurance value chain, the use of AI is producing varying degrees of impact and success.

Considering anti-discrimination laws that prohibit use of protected class characteristics, it is unlikely that any insurer will intentionally build any AIS to consider such characteristics or engage in direct discrimination. However, AIS are created with a singular objective of optimizing specific business functions and tasks. As they enhance insurance processes by uncovering previously unseen information and correlations, they may still produce outcomes that may be distorted, incorrect or even biased leading to discriminatory consequences. A customer could potentially be categorized as high-risk considering additional non-causative, correlative, or irrelevant risk factors. Huge volumes of data used to identify such patterns—and the high confidence levels attached to those patterns—could reinforce the assertion. The complexity of analyzing and disproving such black box output could be a potential impediment to challenging the legitimacy or legality of the correlations.

In the past, proxies such as zip code, educational qualification, occupation, and credit scores were used to indicate a higher correlation to risk—and, in practice, to differentiate between customer segments. Despite the insurance industry’s argument regarding their probable higher risk correlation, many lawmakers and regulators across the world viewed these proxies as indirectly discriminating against people from protected classes. As a result, they enacted laws and regulations to restrict or prohibit the use of these characteristics in insurance risk classification.

Considering the influx of new data types and the emerging dimensions of risk insights they could potentially unravel, the risk of unfair indirect discrimination remains a strong possibility. These correlations, drawn through complex and opaque algorithms, could create new forms of indirect discrimination that are difficult to detect. For example, in auto insurance, the color of a car could be used as a proxy for gender discrimination. In home insurance, online-only discounts could unintentionally discriminate against elderly customers with limited or no digital literacy. Yet it remains a challenge to recognize a proxy is functioning as a surrogate for unfair discrimination and to intervene effectively in a timely manner.

The complexity of bias increases when proxy characteristics are just one of the many factors considered by opaque algorithms to make predictions. At the same time, some correlations identified by AIS could project spurious or irrelevant patterns. For example, an AIS might find that pet owners are safe drivers or have lower mortality rates, or that customers who pay auto insurance premiums on time also have lower mortality rates. It is a possibility that these new correlations that appear statistically significant cause unfair discriminatory against all categories of customers, including those from non-protected classes.

Evolution of the Trifecta

Far from the unenviable responsibilities concerning fair or unfair indirect discrimination, the insurance industry is witnessing a major change in how risks are assessed. An interesting trifecta of bias, insurance, and AI is taking shape. Customer behavior—and the moral hazard it may present—has always remained an important reason for the information asymmetry that leads to losses. Though insurers were aware of this, they could not access this data due to technological limitations. The emergence of new-age technologies is now filling the gap, enabling insurers to access real-time behavioral data. Instead of looking for correlations, insurers can now monitor a range of behaviors that are unambiguously causal to risk, such as driving-related parameters in auto insurance, healthy lifestyle activities in life and health insurance, and proper home maintenance in home insurance.

Behavior is a conscious choice that an individual makes, and is largely within their control or discretion. If that behavior is risky and increases the probability or severity of risk, insurers could simply hold the individual’s behavior directly accountable and discriminate accordingly. It is highly unlikely that bias or discrimination based on conscious behavior could be called out as unfair. Behavior is generally independent of—and agnostic to—any protected characteristics, so actuarial fairness is appropriately complemented by behavioral fairness. Even if such discrimination affects customers from a protected class, insurers have a valid actuarial and behavioral defense to justify their position and claim exemption.

Despite the defense, it is well known that the predictions made by algorithms are mere statistical approximations and not perfect. This raises the question of whether behavior-data-based discrimination could still result in indirect and unfair discrimination. Although the possibility is remote, it is not zero, and there is always a chance of unfair, indirect, discrimination. For example, data patterns extracted from telematics could show customers with patterns of late-night driving or commuting long distances from the city to the outskirts as riskier. However, these patterns could serve as potential proxies for racial discrimination. In home insurance, customers may be asked to upgrade their sensors to a newer, more advanced, and costlier model that captures additional risk characteristics. It is likely that customers from the protected classes might not be able to afford this upgrade. AIS could then treat the absence of the new risk data unfairly and discriminate against the customers by judging them as recalcitrant or suppressing material facts.

AIS are efficient at finding correlations that are too complex for humans to analyze. To prevent unfair discrimination based on behavioral characteristics, it is advised to stick with the golden rule: Consider only those factors over which the customer has absolute influence or control. Unfair discrimination occurs whenever the characteristic being considered is irrelevant to the provision of insurance coverage. In the examples above, a customer driving late at night could be a gig worker from an economically underprivileged class, working late shifts or moonlighting across different jobs to make ends meet. A customer commuting long distances from the city to the outskirts could be a laborer who could not afford housing in the city. A customer who failed to upgrade their sensor could be a digitally unsavvy elderly person living on social security.

Conclusion

Preventing bias is a multifaceted activity that spans the entire lifecycle of AI model development—from the initial stages of data pre-processing to in-processing and post-processing. Across these stages, AI experts generally employ three mitigation strategies: data-, algorithmic-, and human-driven mitigation, each encompassing several methods and activities. The nuances with respect for addressing biases in generative AI and agentic AI, however, are still evolving.

Across the world, laws governing direct discrimination are well established. While some of these general laws may also apply to certain areas of indirect discrimination, there are no comprehensive legal frameworks to specifically address such issues in the insurance sector. This is a gray area that requires urgent attention to identify and address emerging patterns of unfair or spurious indirect discrimination. Until such laws and regulations are in place, the insurance industry must be risk-aware when acting on correlation-based findings. Insurers must apply a triple-filter test: Is the action necessary, appropriate, and legitimate? And if doubt persists, it is always worth recalling the adage: “Just because you can, doesn’t mean you should.”

Srivathsan Karanai Margan works as an insurance domain consultant at Tata Consultancy Services.

References

Ferrara, E. (2023). Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci, 6(1), 3.

Gaulding, J. (1995). Race Sex and Genetic Discrimination in Insurance: What’s Fair, 80 Cornell L. In Rev (p. 1646).

GOV.UK. (2020). Review into bias in algorithmic decision-making Centre for Data Ethics and Innovation 2 2.

Hodge, N. (2024, March 26). Risk Management Magazine—The Impact of AI on Insurance Underwriting.

Huang, F. (2022, July 13). How insurers can mitigate the discrimination risks posed by AI [Review of How insurers
can mitigate the discrimination risks posed by AI].

‌Noordhoek, D. (2023). REGULATION OF ARTIFICIAL INTELLIGENCE IN INSURANCE: Balancing consumer protection and innovation.

Pareek, C. S. (2022). Unmasking Bias: A Framework for Testing and Mitigating AI Bias in Insurance Underwriting Models. Journal of Artificial Intelligence, Machine Learning and Data Science, 1(1), 1736–1741.

‌Suresh, H., & Guttag, J. (2021). A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. Equity and Access in Algorithms, Mechanisms, and Optimization.

van Bekkum, M., Zuiderveen Borgesius, F., & Heskes, T. (2025). AI, insurance, discrimination and unfair
differentiation: an overview and research agenda. Law, Innovation and Technology, 1–28.

Xin, X., & Huang, F. (2021). Anti-Discrimination Insurance Pricing: Regulations, Fairness Criteria, and Models. SSRN Electronic Journal.