GNS Healthcare Blog

GNS Healthcare Blog

Making the Case for Data Diversity


The power and effectiveness of artificial intelligence (AI) and its continued integration into everyday life is becoming more widely accepted. This is especially true in healthcare where researchers are making significant strides in identifying the causes of and providing more effective treatments for major diseases.

But among these successes is a question of whether some AI results may be biased. A recent article in Fast Company acknowledges that AI should be “the great equalizer” because it is all about objective math and calculations. But the article raises the question of “creator bias” that is reflected in the algorithms that are developed¹. 

The reality is that any bias that may be present in AI outcomes is not the result of the algorithms or the technology itself. Instead, if the output is biased, it’s the result of how the data is being used to formulate those results or the lack of diversity within the data set.


Importance of diversity in data

An important outcome of AI in healthcare is the promise that precision medicine will ensure the right treatment is delivered to the right patient at the right time. For this promise to become reality, AI must leverage the expanding available data to get to the underlying mechanisms of disease as they apply to every individual. If the data is not representative of the population at large, the derived outcomes will be incomplete.

Unfortunately, much of the data that is being fed into AI platforms lacks the necessary diversity. A 2009 study showed that 96 percent of participants in genome-wide association studies were of European descent.  In 2016, that number was down to 80 percent, which means that only two of every ten data participants were non-European. And the increase was mostly due to a spike in the participation of those from Asian populations, meaning real diversity is still an issue. This lack of diversity can significantly skew results, since people of European descent make up only seven percent of the world’s population³.

The National Center for Biotechnology Information (NCBI) points out that including a more diverse population in genomic research is “a scientific imperative.” They feel it is critical to better understand how changes in climate, infectious diseases, diet and other factors impact human biology and may affect clinical interventions4.


Difficulties collecting diverse data

One of the key reasons for the lack of diversity in research efforts is a lack of participation in these studies by a significant portion of the population. According to a 2013 poll, less than 10 percent of Americans have taken part in a clinical trial, despite the fact that 80 percent of Americans are aware of these trials5.

Among the reasons given for this lack of participation is a fear of exploitation in medical research, mistrust of medicine, fear of unintended consequences, and social constraints like lack of transportation, time demands and unique cultural and linguistic differences6.

Lack of access to healthcare for significant numbers of people can also be a reason databases being used for research are less diverse. According to a study from the Agency for Healthcare Research and Quality, almost 25 percent of the poor and near poor lacked health insurance, likely reducing their ability to secure healthcare7.

Those in rural areas also face obstacles to obtaining healthcare. According to Stanford Medicine, although rural communities make up about 20 percent of the population, less than 10 percent of physicians practice in those areas8.

If these population groups are not accessing healthcare systems, data about them is not contained in the EHR systems that are increasingly being used in AI research projects, and can be another contributor to the lack of data diversity.


Overcoming the challenge to increase diversity in data

The government began an effort to alleviate this imbalance 25 years ago with the passage of the NIH Revitalization Act of 1993, which outlined guidelines for inclusion in federally funded or approved clinical research. As evidenced by the relatively slow increase in diversity since its passage, the legislation has had minimal impact.

The NIH is making another foray into improving diversity in research with the introduction of the All of Us Research Program. The aggressive initiative is striving to gather data from one million or more Americans with the goal of accelerating research and improving health.

One of the keys to the effort is including people of all backgrounds, ethnicities, walks of life and regions across the country who haven’t participated in medical research before now. The program is building a diverse community of traditional and non-traditional researchers to help discover new ways to understand health and disease. Reaching out to communities that have been marginalized up to now should help build a much more diverse database.

Fulfilling the promise of precision medicine and realizing the benefits of artificial intelligence requires a significant increase in the diversity of research databases. Only then will researchers be able to truly uncover bias-free insights that can benefit everyone.



[1]Now Is The Time To Act To End Bias In AI, by Will Byrne, Fast Company, February 28, 2018.

[2] AI Now website

[3] Precision Medicine Research in Diverse Populations, by Erin Linnenbringer, Institute for Public Health, Technology and Innovation, January 31, 2018.

[4] Diversity and inclusion in genomic research: why the uneven progress? by Amy Bentley, Shawneequa Callier, and Charles Rotimi, NCBI Journal List, July 18, 2017

[5] Research!America poll of U.S. Adults conducted in partnership with Zogby Analytics in May 2013

[6] The Importance of Diversity in Research and Clinical Trials, Christine Metz, PHD, Northwell Health, January 19, 2016

[7] 2015 National Healthcare Quality and Disparities Report and 5th Anniversary Update on the National Quality Strategy, Agency for Healthcare Research and Quality Report, U. S. Department of Health and Human Services

[8] Healthcare Disparities & Barriers to Healthcare, , eCampus Rural Health, Stanford Medicine website




Subscribe to the GNS Blog

Recent Posts: