This "ChatGPT" of medicine would compete with doctors: is it already viable?

Google’s MedPaLM AI responds with a high level of performance to medical questions. But, to date, the model is not viable.

It’s not just ChatGPT. The latter, based on OpenAI’s GPT-3 algorithm, has made headlines in recent weeks. Its level of performance is impressive and very interesting. For now, however, its concrete applicability remains to be determined. But other projects have a more specific aim. This is how Google and DeepMind developed MedPaLM, whose model they detail in works published on the arxiv server (this is not a study published in a scientific journal, at this time) at the end of 2022.

This algorithm is designed as a chatbot, integrating databases that contain many common questions and answers written by professionals or patients (in a framed medical context). The principle as such is quite simple: the user is supposed to be able to ask a question, by delivering several symptoms for example, and MedPaLM must know how to respond by giving a diagnosis and treatment options.

Examples of answers provided by MedPaLM // Source: Google

MedPaLM generates impressive scores

To test MedPaLM, Google and DeepMind presented the same set of questions to AI and (human) healthcare professionals. Then they had the answers evaluated by another group of human health professionals.

The result is quite amazing:

92.6% answers provided by Med-PaLM were considered correct;
92.9% answers provided by human professionals were considered correct.

On paper, it is very impressive, because almost identical. And indeed, it is. The progress is dazzling. A previous model, Flan-PaLM, provided just over 60% consistent responses.

The progress is also notable on a significant element in the medical field: the danger posed by the responses to patients. For MedPaLM:

5.8% responses were assessed as potentially harmful;
6.5% answers provided by human physicians were assessed as possibly harmful.

In the old model, Flan-PaLM, the rate of responses that could potentially harm patients was 29.7%. While with MedPaLM, the success rate is, again, equivalent to that of humans – and even higher, although this point is to be qualified in relation to other evaluation criteria.

” Med-PaLM showed promising performance in several aspects, including scientific and clinical accuracy, reading comprehension, medical knowledge recall, medical reasoning, and usefulness, compared to Flan-PaLM says one of the engineers, Shek Azizi, on Twitter.

Such a model of AI in medicine is not yet viable

The practice of medicine can in no way be reduced to such percentages or question-and-answer questionnaires. As the Google team points out in their study: While these results are promising, the medical field is complex. Further assessments are needed, especially with regard to aspects related to fairness, equity and bias. »

There are other criteria than a simple “correct” answer in appearance. When Google engineers assess more factually and more precisely the quality of the answers provided by MedPaLM, it remains better than the previous models, but systematically less than human doctors. Clearly, human responses remain better:

Flan-PaLM, MedPaLM and human physician scores. // Source: Google

The conclusion of the MedPaLM team, within the preprint study posted online, is therefore also that of limitations. And these ” must be overcome before such models become viable for use in clinical applications.. »

However, the Med-PaLM answers remain inferior to clinicians overall, suggesting further research is necessary before LLMs become viable for clinical applications.

We look forward to careful and responsible innovation to drive further progress in this safety-critical domain. pic.twitter.com/NhTVEtWDjc

—Vivek Natarajan (@vivnat) December 27, 2022

While the answers provided by Med-PaLM show encouraging improvement, they still fall short of those provided by clinicians overall. This suggests that further research is needed before these models can be considered viable for clinical applications.

— Shek Azizi (@AziziShekoofeh) December 27, 2022

Understand everything about experimenting with OpenAI, ChatGPT

Add California18 to your Google news feed.

MedPaLM generates impressive scores

Such a model of AI in medicine is not yet viable

California18

You Might Also Like

Transplanted woman keeps her old heart in a plastic bag: Why?

5G still disrupts 1,000 planes in the United States

Guardians of the Galaxy Vol. 3 breaks makeup record

PlayStation 5: This is the official Sony PS5 bundle with 2 controllers

3D printing: Czech company Sensio sells printed cellos

The Tesla Model X Plaid is still undefeated in drag racing

Leave a Reply Cancel reply