Is this round of AI feasible for online medical diagnosis? Expert: The existing model is not yet complete

2023-07-18

Have you searched online for 'Do I have any pain or illness?'? The answer may not be satisfactory. But with the rise of large natural language models (LLMs) such as ChatGPT, people began to try to use them to answer medical questions or knowledge. But is it reliable? On its own, the answer given by artificial intelligence (AI) is accurate. However, James Davenport, a professor at the University of Bath, UK, pointed out the difference between medical problems and actual medical practice. He believed that "medical practice is not just to answer medical questions. If it is purely to answer medical questions, we do not need Teaching hospital, and doctors do not need to receive years of training after academic courses." In view of various doubts, in a recent paper published in Nature, The world's top artificial intelligence experts have demonstrated a benchmark for evaluating how well large-scale natural language models can solve people's medical problems. The existing model is not yet complete. The latest evaluation comes from Google Research and Deep Thinking. Experts believe that artificial intelligence models have many potential in the medical field, including knowledge retrieval and support for clinical decision-making. However, the existing models are still incomplete, such as the possibility of fabricating convincing medical error information or incorporating bias to exacerbate health inequality. Therefore, it is necessary to evaluate their clinical knowledge. The relevant evaluation has not been absent before. However, in the past, automated assessments often relied on limited benchmarks, such as individual medical test scores. This translates into a lack of reliability and value in the real world. Moreover, when people turn to the internet to obtain medical information, they will encounter "information overload" and then choose the worst of 10 possible diagnoses, thus enduring a lot of unnecessary pressure. The research team hopes that language models can provide brief expert opinions without bias, indicate their citation sources, and reasonably express uncertainty. How does the performance of LLM with 540 billion parameters evaluate the ability of LLM to encode clinical knowledge? Google Research Institute expert Shekufi Aziz and colleagues explored their ability to answer medical questions. The team proposed a benchmark, called "MultiMedQA": it combines six existing question answering data sets covering professional medical care, research and consumer queries, and "HealthSearchQA" - a new data set, which contains 3173 medical questions searched online. The team then evaluated PaLM (a 540 billion parameter LLM) and its variant Flan PaLM. They found that Flan PaLM reached the most advanced level in some datasets. In the MedQA data set integrating Medical license examination questions in the United States, Flan PaLM exceeded the most advanced LLM by 17%. However, although Flan PaLM performs well in multiple choice questions, further evaluation shows that there is a gap in its ability to answer consumers' medical questions. The LLM specialized in medicine is encouraging. To solve this problem, artificial intelligence experts use a method called design instruction fine-tuning to further debug Flan PaLM for medical adaptation

Edit:XiaoWanNing Responsible editor:YingLing

Source:Science and Technology Daily

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email：lwxsd@liaowanghn.com

Return to list