World's first AI model for thyroid cancer diagnosis has an accuracy exceeding 90%
Our Team's Takeaways
Key points summarized by the MDLinx Team.
An interdisciplinary research team has unveiled the world's first artificial intelligence (AI) model designed to classify both the cancer stage and risk category of thyroid cancer, achieving impressive accuracy exceeding 90%.
This innovative AI model promises to significantly cut frontline clinicians' pre-consultation preparation time by approximately 50%. The findings are published in the journal npj Digital Medicine, and the team includes researchers from the LKS Faculty of Medicine of the University of Hong Kong (HKUMed), the InnoHK Laboratory of Data Discovery for Health (InnoHK D24H), and the London School of Hygiene & Tropical Medicine (LSHTM).
Thyroid cancer is among the most prevalent cancers in Hong Kong and globally. Precision management of the disease often relies on two systems: (1) the 8th edition of the American Joint Committee on Cancer (AJCC) or Tumor-Node-Metastasis (TNM) cancer staging system to determinethe cancer stage; and (2) the American Thyroid Association (ATA) risk classification system to categorize cancer risk.
These systems are crucial for predicting patient survival and guiding treatment decisions. However, the manual integration of complex clinical information into these systems can be time-consuming and lack efficiency.
The research team developed an AI assistant that leverages large language models (LLMs), like ChatGPT and DeepSeek, which are designed to understand and process human language, to analyze clinical documents and enhance the accuracy and efficiency of thyroid cancer staging and risk classification.
The model leverages four offline open-source LLMs—Mistral (Mistral AI), Llama (Meta), Gemma (Google), and Qwen (Alibaba)—to analyze free-text clinical documents. The AI model was trained with a U.S.-based open-access data with pathology reports of 50 thyroid cancer patients from the Cancer Genome Atlas Program (TCGA), with subsequent validation against pathology reports from 289 TCGA patients and 35 pseudo cases created by endocrine surgeons.
By combining the output of all four LLMs, the team improved the overall performance of the AI model, achieving overall accuracy of 88.5% to 100% in ATA risk classification and 92.9% to 98.1% in AJCC cancer staging. Compared to traditional manual document reviews, this advancement is expected to halve the time clinicians spend on pre-consultation preparation.
Professor Joseph T Wu, Sir Kotewall Professor in Public Health and Managing Director of InnoHK D24H at HKUMed, emphasized the model's remarkable performance. "Our model achieves more than 90% accuracy in classifying AJCC cancer stages and ATA risk category," he said. "A significant advantage of this model is its offline capability, which would allow local deployment without the need to share or upload sensitive patient information, thereby providing maximum patient privacy."
"In view of the recent debut of DeepSeek, we conducted further comparative tests with a 'zero-shot approach' against the latest versions of DeepSeek—R1 and V3—as well as GPT-4o. We were pleased to find that our model performed on par with these powerful online LLMs," added Professor Wu.
Dr. Matrix Fung Man-him, clinical assistant professor and chief of endocrine surgery, Department of Surgery, School of Clinical Medicine, HKUMed, stated, "In addition to providing high accuracy in extracting and analyzing information from complex pathology reports, operation records and clinical notes, our AI model also dramatically reduces doctors' preparation time by almost half compared to human interpretation. It could simultaneously provide cancer staging and clinical risk stratification based on two internationally recognized clinical systems."
"The AI model is versatile and could be readily integrated into various settings in the public and private sectors, and both local and international health care and research institutes," said Dr. Fung. "We are optimistic that the real-world implementation of this AI model could enhance the efficiency of frontline clinicians and improve the quality of care. In addition, doctors will have more time to counsel with their patients."
"In line with government's strong advocacy of AI adoption in health care, as exemplified by the recent launch of LLM-based medical report writing system in the Hospital Authority, our next step is to evaluate the performance of this AI assistant with a large amount of real-world patient data.
"Once validated, the AI model can be readily deployed in real clinical settings and hospitals to help clinicians improve operational and treatment efficiency," explained Dr. Carlos Wong, Honorary Associate Professor in the Department of Family Medicine and Primary Care, School of Clinical Medicine, HKUMed.
This article was originally published on MedicalXpress Breaking News-and-Events.