Researchers showcase how GPT-4 simplifies diabetes management by accurately interpreting glucose data and generating actionable insights, setting the stage for AI’s role in personalized healthcare.
Study: A case study on using a large language model to analyze continuous glucose monitoring data. Image Credit: Me dia / Shutterstock
A recent study published in the journal Scientific Reports investigated the application of a large language model (LLM) to analyze continuous glucose monitoring (CGM) data for diabetes care.
In the study, researchers from the United States (U.S.) evaluated the model’s ability to calculate glucose metrics and generate descriptive summaries, aiming to address challenges in interpreting CGM data for clinicians and patients and enhance diabetes management strategies.
Background
Continuous glucose monitoring (CGM) systems are vital tools in diabetes management, offering real-time insights into glucose fluctuations.
These devices collect detailed glucose data and enable the calculation of essential metrics such as glycemic variability. Clinicians often rely on software-generated ambulatory glucose profile reports to identify glucose trends and guide treatment decisions.
While these reports provide valuable information, they are often too complex for patients to understand or for clinicians to reach a consensus on adjustments, such as insulin dosing. Variations in interpretation among healthcare providers, as highlighted in prior studies, further underscore the need for standardized, accessible tools.
With the rapid advancements in artificial intelligence, LLMs have become a promising avenue in healthcare for tasks such as text summarization and data analysis. Previous studies have demonstrated their potential in generating summaries of medical data. However, their role in analyzing wearable device outputs, such as CGM data, remains underexplored.
About the study
The present study evaluated the use of an LLM, generative pre-trained transformer-4 or GPT-4, to analyze CGM data over 14 days for type 1 diabetes patients. Synthetic CGM data was generated using an FDA-approved patient simulator, which modeled a range of glycemic control scenarios. Glucose Management Indicators (GMIs) ranged from 6.0% to 9.0%.
Study design. The setup above shows the evaluation procedure for a single case.
The study consisted of two parts—a quantitative metric evaluation and a qualitative data summarization. For the quantitative analysis, GPT-4 was prompted to calculate standardized CGM metrics such as mean glucose, glycemic variability, and time spent in specified glucose ranges. These outputs were compared to generated metrics related to real features or ground truth values.
For the qualitative evaluation, GPT-4 was tasked with producing narrative summaries across five categories, namely, hypoglycemia, hyperglycemia, glycemic variability, data quality, and primary clinical takeaways.
Two independent clinicians assessed the outputs for accuracy, completeness, safety, and suitability. Furthermore, the prompts were designed based on established guidelines, including the standards of care defined by the American Diabetes Association.
Subsequently, to enable model interaction, the researchers uploaded the CGM data as preprocessed files, and GPT-4 was accessed through OpenAI’s ChatGPT Plus interface along with the Data Analyst plugin. The study also tested the model’s performance across varied temperature settings to evaluate consistency in its code generation.
Results
The findings showed that GPT-4 demonstrated high accuracy in analyzing CGM data and generating summaries for diabetes care. The quantitative analysis revealed that GPT-4 accurately performed nine out of the ten metric computations across ten cases, with errors in calculating time above glucose thresholds stemming from ambiguities in prompt definitions. For example, the model misinterpreted the threshold for “time above 180 mg/dL” due to inconsistencies in how ranges were defined in the prompt.
Among the qualitative tasks, GPT-4 effectively generated narrative summaries for data quality, hypoglycemia, hyperglycemia, glycemic variability, and clinical takeaways.
Furthermore, the clinicians rated the summaries highly for accuracy, completeness, and safety, with average scores ranging between 8 and 10 out of 10 across categories. However, errors included overstating hyperglycemia concerns and occasionally misinterpreting trends, such as classifying euglycemic periods as prolonged hyperglycemia.
The analysis also highlighted variability in clinician agreement regarding patient and clinician suitability. For example, GPT-4 sometimes emphasized clinically irrelevant events, such as mild hyperglycemia, while missing significant trends like nocturnal hypoglycemia. Additionally, the model occasionally failed to prioritize important clinical metrics such as time in range or GMI when summarizing overall glucose control.
Despite these limitations, GPT-4 effectively synthesized complex data into accessible summaries, demonstrating its potential to assist in routine CGM data interpretation. The study authors noted that refining prompts and incorporating better error handling could improve the model’s clinical utility.
Conclusions
Overall, the study highlighted the promise of LLMs in diabetes management, showing GPT-4’s ability to analyze and summarize CGM data accurately.
The results indicated that LLMs such as GPT-4 can complement clinical workflows by automating CGM data analysis and summary generation, although further refinement is necessary for widespread clinical adoption. The researchers emphasized that addressing limitations, such as missing nocturnal hypoglycemia and refining clinical significance in summaries, will be critical for safe integration into clinical practice.
These findings pave the way for integrating LLMs into clinical practice, potentially enhancing efficiency and accessibility in managing chronic conditions such as diabetes.
Journal reference:
- Healey, E., Tan, A. L., Flint, K. L., Ruiz, J. L., & Kohane, I. (2025). A case study on using a large language model to analyze continuous glucose monitoring data. Scientific Reports, 15(1), 1-7. DOI: 10.1038/s41598-024-84003-0, https://www.nature.com/articles/s41598-024-84003-0