Foundation models for generalist medical artificial intelligence

Moor, Michael; Banerjee, Oishi; Abad, Zahra Shakeri Hossein; Krumholz, Harlan M.; Leskovec, Jure; Topol, Eric J.; Rajpurkar, Pranav

doi:10.1038/s41586-023-05881-4

Perspective
Published: 12 April 2023

Foundation models for generalist medical artificial intelligence

Nature volume 616, pages 259–265 (2023)Cite this article

234k Accesses
554 Citations
721 Altmetric
Metrics details

Subjects

Abstract

The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI) models is likely to usher in newfound capabilities in medicine. We propose a new paradigm for medical AI, which we refer to as generalist medical AI (GMAI). GMAI models will be capable of carrying out a diverse set of tasks using very little or no task-specific labelled data. Built through self-supervision on large, diverse datasets, GMAI will flexibly interpret different combinations of medical modalities, including data from imaging, electronic health records, laboratory results, genomics, graphs or medical text. Models will in turn produce expressive outputs such as free-text explanations, spoken recommendations or image annotations that demonstrate advanced medical reasoning abilities. Here we identify a set of high-impact potential applications for GMAI and lay out specific technical capabilities and training datasets necessary to enable them. We expect that GMAI-enabled applications will challenge current strategies for regulating and validating AI devices for medicine and will shift practices associated with the collection of large medical datasets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of a GMAI model pipeline.**

**Fig. 2: Illustration of three potential applications of GMAI.**

AI in health and medicine

Article 20 January 2022

Guiding principles for the responsible development of artificial intelligence tools for healthcare

Article Open access 01 April 2023

A short guide for medical professionals in the era of artificial intelligence

Article Open access 24 September 2020

References

Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2022).
Reed, S. et al. A generalist agent. In Transactions on Machine Learning Research (2022). This study presented Gato, a generalist model that can carry out a variety of tasks across modalities such as chatting, captioning images, playing video games and controlling a robot arm.
Alayrac, J.-B. et al. Flamingo: a Visual Language Model for few-shot learning. In Advances in Neural Information Processing Systems (eds Oh, A. H. et al.) 35, 23716–23736 (2022).
Lu, J., Clark, C., Zellers, R., Mottaghi, R. & Kembhavi, A. Unified-IO: a unified model for vision, language, and multi-modal tasks. Preprint at https://arxiv.org/abs/2206.08916 (2022).
Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) 33, 1877–1901 (2020). This study presented the language model GPT-3 and discovered that large language models can carry out in-context learning.
Aghajanyan, A. et al. CM3: a causal masked multimodal model of the Internet. Preprint at https://arxiv.org/abs/2201.07520 (2022).
Wei, J. et al. Emergent abilities of large language models. In Transactions on Machine Learning Research (2022).
Steinberg, E. et al. Language models are an effective representation learning technique for electronic health record data. J. Biomed. Inform. 113, 103637 (2021).
Article PubMed Google Scholar
Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022). This study demonstrated that CheXzero—an early example of a foundation model in medical AI—can detect diseases on chest X-rays without explicit annotation by learning from natural-language descriptions contained in accompanying clinical reports.
Singhal, K. et al. Large language models encode clinical knowledge. Preprint at https://arxiv.org/abs/2212.13138 (2022). This study demonstrated that the language model Flan-PaLM achieves a passing score (67.6%) on a dataset of US Medical Licensing Examination questions and proposed Med-PaLM, a medical variant of Flan-PaLM with improved clinical reasoning and comprehension.
Yang, X. et al. A large language model for electronic health records. npj Digit. Med. 5, 194 (2022).
Article PubMed PubMed Central Google Scholar
Food and Drug Administration. Artificial intelligence and machine learning (AI/ML)-enabled medical devices. FDA https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices (2022).
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Article CAS PubMed Google Scholar
Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346–1352 (2022).
Article PubMed Google Scholar
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Burstein, J., Doran, C. & Solorio, T.) 1, 4171–4186 (2019). This paper introduced masked language modelling, a widely used technique for training language models where parts of a text sequence are hidden (masked) in order for the model to fill in the blanks. This strategy can be extended beyond text to other data types.
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th Int. Conference on Machine Learning (eds Meila, M. & Zhang, T.) 139, 8748–8763 (2021). This paper introduced contrastive language–image pretraining (CLIP), a multimodal approach that enabled a model to learn from images paired with raw text.
Zhang, X.-A. et al. A zoonotic henipavirus in febrile patients in China. N. Engl. J. Med. 387, 470–472 (2022).
Article PubMed Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems (eds Guyon, I. et al.) 30, 5998–6008 (2017). This paper introduced the transformer architecture, a key breakthrough that ultimately led to the development of large-scale foundation models.
Borgeaud, S. et al. Improving language models by retrieving from trillions of tokens. In Proc. 39th Int. Conference on Machine Learning (eds Chaudhuri, K. et al.) 162, 2206–2240 (2022).
Guu, K., Lee, K., Tung, Z., Pasupat, P. & Chang, M.-W. REALM: retrieval-augmented language model pre-training. In Proc. 37th Int. Conference on Machine Learning (eds Daumé, H. & Singh, A.) 119, 3929–3938 (2020).
Igelström, E. et al. Causal inference and effect estimation using observational data. J. Epidemiol. Community Health 76, 960–966 (2022).
Article Google Scholar
Wang, Q., Huang, K., Chandak, P., Zitnik, M. & Gehlenborg, N. Extending the nested model for user-centric XAI: a design study on GNN-based drug repurposing. IEEE Trans. Vis. Comput. Graph. 29, 1266–1276 (2023).
Article PubMed Google Scholar
Li, J. et al. Align before fuse: vision and language representation learning with momentum distillation. In Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 34, 9694–9705 (2021).
Google Scholar
Wang, Z. et al. SimVLM: simple visual language model pretraining with weak supervision. In Int. Conference on Learning Representations (eds Hofmann, K. & Rush, A.) (2022).
Yasunaga, M. et al. Deep bidirectional language-knowledge graph pretraining. In Advances in Neural Information Processing Systems (eds Oh, A. H. et al.) 35 (2022).
Yasunaga, M., Ren, H., Bosselut, A., Liang, P. & Leskovec, J. QA-GNN: reasoning with language models and knowledge graphs for question answering. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Toutanova, K. et al.) 535–546 (2021).
Guha Roy, A. et al. Does your dermatology classifier know what it doesn’t know? Detecting the long-tail of unseen conditions. Med. Image Anal. 75, 102274 (2022).
Article PubMed Google Scholar
Radford, A. et al. Robust speech recognition via large-scale weak supervision. Preprint at https://arxiv.org/abs/2212.04356 (2022).
Dixon, R. F. et al. A virtual type 2 diabetes clinic using continuous glucose monitoring and endocrinology visits. J. Diabetes Sci. Technol. 14, 908–911 (2020).
Article PubMed Google Scholar
Kucera, T., Togninalli, M. & Meng-Papaxanthos, L. Conditional generative modeling for de novo protein design with hierarchical functions. Bioinformatics 38, 3454–3461 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Chellappa, R. et al.) 10684–10695 (2022).
Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th Int. Conference on Machine Learning (eds Meila, M. & Zhang, T.) 139, 8821–8831 (2021).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Zvyagin, M. et al. GenSLMs: genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. Preprint at bioRxiv https://doi.org/10.1101/2022.10.10.511571 (2022).
Watson, J. L. et al. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. Preprint at bioRxiv https://doi.org/10.1101/2022.12.09.519842 (2022).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
Article Google Scholar
Guo, L. L. et al. Systematic review of approaches to preserve machine learning performance in the presence of temporal dataset shift in clinical medicine. Appl. Clin. Inform. 12, 808–815 (2021).
Article PubMed PubMed Central Google Scholar
Finlayson, S. G. et al. The clinician and dataset shift in artificial intelligence. N. Engl. J. Med. 385, 283–286 (2021).
Article PubMed PubMed Central Google Scholar
Lampinen, A. K. et al. Can language models learn from explanations in context? In Findings of the Association for Computational Linguistics: EMNLP 2022 (eds Goldberg, Y., Kozareva, Z. & Zhang, Y.) 537–563 (2022).
Yoon, S. H., Lee, J. H. & Kim, B.-N. Chest CT findings in hospitalized patients with SARS-CoV-2: Delta versus Omicron variants. Radiology 306, 252–260 (2023).
Article PubMed Google Scholar
Ouyang, L. et al. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems (eds Oh, A. H. et al.) 35, 27730–27744 (2022).
Pilipiszyn, A. GPT-3 powers the next generation of apps. OpenAI https://openai.com/blog/gpt-3-apps/ (2021).
Burns, C., Ye, H., Klein, D. & Steinhardt, J. Discovering latent knowledge in language models without supervision. Preprint at https://arxiv.org/abs/2212.03827 (2022).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Article ADS CAS PubMed Google Scholar
Sex and Gender Bias in Technology and Artificial Intelligence: Biomedicine and Healthcare Applications (Academic, 2022).
Srivastava, A. et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models. Preprint at https://arxiv.org/abs/2206.04615 (2022).
Carlini, N. et al. Extracting training data from large language models. In Proc. 30th USENIX Security Symposium (eds Bailey, M. & Greenstadt, R.) 6, 2633–2650 (2021).
Branch, H. J. et al. Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples. Preprint at https://arxiv.org/abs/2209.02128 (2022).
Chowdhery, A. et al. PaLM: scaling language modeling with pathways. Preprint at https://arxiv.org/abs/2204.02311 (2022).
Zhang, S. et al. OPT: open pre-trained transformer language models. Preprint at https://arxiv.org/abs/2205.01068 (2022).
Hoffmann, J. et al. An empirical analysis of compute-optimal large language model training. In Advances in Neural Information Processing Systems (eds Oh, A. H. et al.) 35, 30016–30030 (2022).
Chung, H. W. et al. Scaling instruction-finetuned language models. Preprint at https://arxiv.org/abs/2210.11416 (2022).
Kung, T. H. et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Dig. Health 2, 2 (2023).
Huang, S.-C., Shen, L., Lungren, M. P. & Yeung, S. GLoRIA: a multimodal global-local representation learning framework for label-efficient medical image recognition. In Proc. IEEE/CVF Int. Conference on Computer Vision (eds Brown, M. S. et al.) 3942–3951 (2021).
Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).
Article PubMed PubMed Central Google Scholar
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Gou, J., Yu, B., Maybank, S. J. & Tao, D. Knowledge distillation: a survey. Int. J. Comput. Vis. 129, 1789–1819 (2021).
Article Google Scholar
Vegunta, R., Vegunta, R. & Kutti Sridharan, G. Secondary aortoduodenal fistula presenting as gastrointestinal bleeding and fungemia. Cureus 11, e5575 (2019).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We gratefully acknowledge I. Kohane for providing insightful comments that improved the manuscript. E.J.T. is supported by the National Institutes of Health (NIH) National Center for Advancing Translational Sciences grant UL1TR001114. M.M. is supported by Defense Advanced Research Projects Agency (DARPA) N660011924033 (MCS), NIH National Institute of Neurological Disorders and Stroke R61 NS11865, GSK and Wu Tsai Neurosciences Institute. J.L. was supported by DARPA under Nos. HR00112190039 (TAMI) and N660011924033 (MCS), the Army Research Office under Nos. W911NF-16-1-0342 (MURI) and W911NF-16-1-0171 (DURIP), the National Science Foundation under Nos. OAC-1835598 (CINES), OAC-1934578 (HDR) and CCF-1918940 (Expeditions), the NIH under no. 3U54HG010426-04S1 (HuBMAP), Stanford Data Science Initiative, Wu Tsai Neurosciences Institute, Amazon, Docomo, GSK, Hitachi, Intel, JPMorgan Chase, Juniper Networks, KDDI, NEC and Toshiba.

Author information

These authors contributed equally: Michael Moor, Oishi Banerjee
These authors jointly supervised this work: Eric J. Topol, Pranav Rajpurkar

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, USA
Michael Moor & Jure Leskovec
Department of Biomedical Informatics, Harvard University, Cambridge, MA, USA
Oishi Banerjee & Pranav Rajpurkar
Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
Zahra Shakeri Hossein Abad
Yale University School of Medicine, Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, CT, USA
Harlan M. Krumholz
Scripps Research Translational Institute, La Jolla, CA, USA
Eric J. Topol

Authors

Michael Moor
View author publications
You can also search for this author inPubMed Google Scholar
Oishi Banerjee
View author publications
You can also search for this author inPubMed Google Scholar
Zahra Shakeri Hossein Abad
View author publications
You can also search for this author inPubMed Google Scholar
Harlan M. Krumholz
View author publications
You can also search for this author inPubMed Google Scholar
Jure Leskovec
View author publications
You can also search for this author inPubMed Google Scholar
Eric J. Topol
View author publications
You can also search for this author inPubMed Google Scholar
Pranav Rajpurkar
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

P.R. conceived the study. M.M., O.B., E.J.T. and P.R. designed the review article. M.M. and O.B. made substantial contributions to the synthesis and writing of the article. Z.S.H.A. and M.M. designed and implemented the illustrations. All authors provided critical feedback and substantially contributed to the revision of the manuscript.

Corresponding authors

Correspondence to Eric J. Topol or Pranav Rajpurkar.

Ethics declarations

Competing interests

In the past three years, H.M.K. received expenses and/or personal fees from UnitedHealth, Element Science, Eyedentifeye, and F-Prime; is a co-founder of Refactor Health and HugoHealth; and is associated with contracts, through Yale New Haven Hospital, from the Centers for Medicare & Medicaid Services and through Yale University from the Food and Drug Administration, Johnson & Johnson, Google and Pfizer. The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Arman Cohan, Joseph Ledsam and Jenna Wiens for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Moor, M., Banerjee, O., Abad, Z.S.H. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023). https://doi.org/10.1038/s41586-023-05881-4

Download citation

Received: 03 November 2022
Accepted: 22 February 2023
Published: 12 April 2023
Issue Date: 13 April 2023
DOI: https://doi.org/10.1038/s41586-023-05881-4

This article is cited by

Charting the future of cardiology with large language model artificial intelligence
- Ramsey M. Wehbe
Nature Reviews Cardiology (2025)
Mass-spectrometry-based proteomics: from single cells to clinical applications
- Tiannan Guo
- Judith A. Steen
- Matthias Mann
Nature (2025)
Examining human-AI interaction in real-world healthcare beyond the laboratory
- Magdalena Katharina Wekenborg
- Stephen Gilbert
- Jakob Nikolas Kather
npj Digital Medicine (2025)
The MI-CLAIM-GEN checklist for generative artificial intelligence in health
- Brenda Y. Miao
- Irene Y. Chen
- Madhumita Sushil
Nature Medicine (2025)
Surgical video workflow analysis via visual-language learning
- Pengpeng Li
- Xiangbo Shu
- Jinhui Tang
npj Health Systems (2025)

Foundation models for generalist medical artificial intelligence

Subjects

Abstract

Access options

Similar content being viewed by others

AI in health and medicine

Guiding principles for the responsible development of artificial intelligence tools for healthcare

A short guide for medical professionals in the era of artificial intelligence

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

Charting the future of cardiology with large language model artificial intelligence

Mass-spectrometry-based proteomics: from single cells to clinical applications

Examining human-AI interaction in real-world healthcare beyond the laboratory

The MI-CLAIM-GEN checklist for generative artificial intelligence in health

Surgical video workflow analysis via visual-language learning

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links