While AI tools accelerate data collection, a study reveals significant inaccuracies in policy interpretation, particularly in African and Eastern Mediterranean regions.
Study: Evaluating generative artificial intelligence’s limitations in health policy identification and interpretation. Image Credit: Thapana_Studio/Shutterstock.com
Scientists at the Georgetown University Center for Global Health Science and Security, USA, have conducted a study to evaluate the accuracy and precision of generative artificial intelligence (GAI) tools in identifying and interpreting health policies worldwide. The study is published in the journal PLOS ONE.
Background
The Analysis and Mapping of Policies for Emerging Infectious Diseases project utilizes human subject-matter experts to systematically surface, analyze, and categorize health policies from all United Nations (UN) Member States.
Recently emerging generative artificial intelligence (GAI) technology has developed tools for rapidly screening and examining vast amounts of medical data.
The recent proliferation of GAI technology has encouraged the incorporation of these tools in the Analysis and Mapping of Policies for Emerging Infectious Diseases project to reduce the human resources required to complete the work.
In this study, scientists have evaluated the efficacy and accuracy of GAI tools in identifying and interpreting relevant health policies.
Specifically, they have utilized two validated policies (emergency and childhood vaccination policy and quarantine and isolation policy) in each United Nations Member State. They have systematically assessed and compared the responses produced by a GAI tool and a subject-matter expert.
Important observations
The GAI tool used in the study significantly increased the data collection efficacy for both vaccination and isolation and quarantine policy datasets. The tool reduced the time required for vaccination data collection by 88% and increased the efficiency by 90%.
Analysis of the Vaccination Policy Dataset
For the vaccination dataset, the GAI tool was used to assess the presence of a legally enforceable routine childhood vaccination mandate or emergency powers for mandatory vaccination of the domestic population during a crisis.
A 78% concordance rate was observed between the tool and the human subject-matter expert in this particular assessment. However, the concordance rate was reduced to 63% after filtering out the countries for which both the expert and the tool could not find any universal legal mandate for vaccination.
A variation in concordance rate was observed across the World Health Organization (WHO) regions. The highest concordance between the expert and tool was observed for countries within the Western Pacific and European regions, and the lowest concordance was observed for countries within the Southeast Asian and Eastern Mediterranean regions.
The study found significant systematic inaccuracies and imprecision of the GAI tool after filtering out the responses on the lack of legal vaccination requirements that were in concordance between the expert and the tool.
The GAI tool was found to generate inaccurate responses on vaccination for more than 50% of countries in the African, Southeast Asian, and Eastern Mediterranean regions. However, the Western Pacific, European, and American areas remained the most accurately represented WHO regions by the tool.
For five countries, the GAI tool was found to identify a policy that the expert had not previously identified.
Analysis of quarantine and isolation policy dataset
For the quarantine and isolation dataset, the GAI tool was used to identify and interpret existing policies associated with the isolation of infected individuals and quarantine of contacts in the domestic population.
In this assessment, the expert and the tool were in agreement at 67%. Similar to vaccination responses, a variation in concordance regarding quarantine and isolation responses was observed across the WHO regions.
The highest concordance between the expert and tool was observed for countries within the Western Pacific regions, and the lowest concordance was observed for African and Eastern Mediterranean regions. A moderate concordance was observed for countries within the Southeast Asian, European, and American regions.
As the scientists mentioned, the highest concordance observed for Western Pacific regions was due to the fact that countries within these regions use English as an official language and thus routinely produce government documents in English.
In accordance with this hypothesis, the study found that the GAI tool exactly matches the expert responses or provides more information 81% of the time for 61 countries where policies are written in English.
However, for 133 countries with policies not written in English, the GAI tool exactly matches the expert responses or provides more information 63%% of the time.
Regarding overall non-concordance between the expert and the tool, the study found that the tool missed information found by the expert for 21% of the total responses and provided wrong information as compared to the expert-provided information for 8.8% of the total responses. For 2% of the total responses, the tool provided information that the expert missed.
Study significance
The study finds GAI as a useful tool for quality assurance and quality control processes in health policy identification.
However, the tool needs further improvements to accurately identify policies across diverse global regions and languages and interpret context-specific information.
Considering the study findings, the scientists suggest that GAI tools should not be used as primary reviewers in health policy identification or interpretation. Rather, these tools can be effectively used as second or third reviewers in health policy identification.