Latest

ChatGPT Achieves 85% in Professional-Level Neurology Exam

ChatGPT Surpasses Human Average with 85% Achievement in Neurology Exam

A groundbreaking study recently conducted to evaluate the performance of large language models (LLMs) in neurology board-style examinations has produced astonishing results. Utilizing a question bank approved by the esteemed American Board of Psychiatry and Neurology, researchers gained valuable insights into the capabilities of these advanced language models.

The study focused on two versions of the LLM ChatGPT, namely version 3.5 and version 4, comparing their performance. The findings unequivocally demonstrated that LLM 2 outperformed its predecessor significantly, exceeding even the average score achieved by human neurology professionals.

Impressively, LLM 2 accurately answered a staggering 85.0% of the questions posed during the examination. In contrast, the mean human score stood at 73.8%. This groundbreaking data strongly suggests that large language models, with further advancements and enhancements, have the potential to revolutionize clinical neurology and improve healthcare outcomes.

Interestingly, the previous model, LLM 1, also showcased satisfactory performance, albeit slightly below the human average, achieving a commendable score of 66.8%. Both models consistently displayed confidence in their language, regardless of the accuracy of their answers, indicating room for improvement in future iterations.

The study further categorized questions into lower-order and higher-order based on the Bloom taxonomy. Both models exhibited superior performance on lower-order questions. However, LLM 2 exhibited exceptional competence in both lower and higher-order questions, affirming its versatility and high cognitive abilities.

The remarkable achievement of ChatGPT in this neurology exam ushers in an era of exciting possibilities for integrating large language models into clinical practice and transforming the field of neuroscience.