In 2023, our research team had to type ECG findings into ChatGPT. In 2025, LLMs like Google’s medical chatbot (AMIE) can "see" and interpret images—rashes, ECGs, PDFs of labs—and again outperforms doctors in diagnosis. Why this matters: - Multi modal LLMs that process images (and not just text) is a big step towards an AI that reviews information exactly as a doctor would - When we ran our 2023 Nature Communications Medicine study on chest pain, ECG findings were typed into a chatbot because AI couldn’t read images, vs doctor participants who had to review images from a 12-lead ECG 🧠 What’s new: • Based on Gemini 2.0 Flash • Accepts text + image input: eg. images of rashes, ECGs, PDFs • Multi-turn dialogue, not Q&A pairs 📊 How it was tested: • Simulated primary care visits—AMIE vs doctors • 105 clinical scenarios with patient actors • Each actor did two consults: one with AMIE, one with a physician • Both wrote diagnoses + treatment plans • 18 MDs (derm, cardio, IM) reviewed blind transcripts + notes ⚠️ Limitations: • No real patients or clinical data • Doesn't reflect real-world complexity • Closed model—code, prompts, training data not released 📌 Key findings: 🔹 Diagnostic accuracy: – Top-1: AMIE ~98% vs. physicians ~89% (on clinical documents), p < 0.001 – Top-3: AMIE > physicians across all modalities 🔹 Robustness: – When image quality degraded, physician accuracy dropped sharply – AMIE maintained stable performance 🔹 Multimodal reasoning: – Specialists scored AMIE higher in explaining and reasoning with visual data (p < 0.01) – AMIE outperformed on ECG, skin photo, and PDF interpretation – Especially strong when image was critical to diagnosis 🔹 Patient communication (rated by actors): – AMIE rated equal or better on communication, empathy, rapport – >90% of actors preferred AMIE for future visit (p < 0.01) 🔹 Reasoning improvements: – Adding “state-aware” reasoning boosted ECG accuracy: 20% → 28% 📌 State-aware reasoning = model’s ability to track diagnostic conversation's progression + adjust responses based on what it already knows (the "state") — eg. such as prior patient answers, uncertainties, and remaining diagnostic goals. 📌 My take: The next version of multi modal LLMs for healthcare will take voice input as well - most ambient tool companies (eg. Abridge, Ambience Healthcare) already do this quite well. Also look forward to moderating this panel next week - 📅 Stanford #AIMI25, 3rd June, Tuesday: 📍The Foundation Model Roadmap: What Health AI Teams Need to Know 🗓️ June 3, 11:30am PT @ Stanford With: Khaled Saab (Google DeepMind) — led AMIE Karan Singhal (OpenAI Health) — led HealthBench Emily Alsentzer, PhD (Stanford) — developed ClinicalBERT, leads LLM safety and bias studies Marinka Zitnik, PhD (Harvard) — builds graph-based foundation models for drug discovery and clinical prediction Reporting in Nature by Mariana Lenharo. Preprint study by Google's Khaled Saab et al.
Multimodal Approaches in Medical Imaging
Explore top LinkedIn content from expert professionals.
Summary
Multimodal approaches in medical imaging refer to using multiple types of data—like text, images, lab results, and more—to improve diagnosis, treatment planning, and patient care. By combining these diverse inputs, advanced AI models can mimic how doctors integrate various information sources to make clinical decisions.
- Explore innovative tools: Stay informed about multimodal AI systems that integrate text and images for diagnostic accuracy and reduced clinician workload.
- Adopt collaborative solutions: Encourage the use of AI systems as complementary tools to support radiologists in generating detailed reports and enhancing patient communication.
- Prioritize ethical considerations: Address biases, transparency, and the development of inclusive AI models to ensure fair and effective medical insights for diverse patient populations.
-
-
MIT and Harvard Medical School researchers just unlocked interactive 3D medical image analysis with language! Medical imaging AI has long been limited to rigid, single-task models that require extensive fine-tuning for each clinical application. 𝗩𝗼𝘅𝗲𝗹𝗣𝗿𝗼𝗺𝗽𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝘃𝗶𝘀𝗶𝗼𝗻-𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗮𝗴𝗲𝗻𝘁 𝘁𝗵𝗮𝘁 𝗲𝗻𝗮𝗯𝗹𝗲𝘀 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲, 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝘃𝗲 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀 𝗼𝗳 𝟯𝗗 𝗺𝗲𝗱𝗶𝗰𝗮𝗹 𝘀𝗰𝗮𝗻𝘀 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝗻𝗮𝘁𝘂𝗿𝗮𝗹 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗰𝗼𝗺𝗺𝗮𝗻𝗱𝘀. 1. Unified multiple radiology tasks (segmentation, volume measurement, lesion characterization) within a single, multimodal AI model. 2. Executed complex imaging commands like “compute tumor growth across visits” or “segment infarcts in MCA territory” without additional training. 3. Matched or exceeded specialized models in anatomical segmentation and visual question answering for neuroimaging tasks. 4. Enabled real-time, interactive workflows, allowing clinicians to refine analysis through language inputs instead of manual annotations. Notably, I like that the design includes native-space convolutions that preserve the original acquisition resolution. This addresses a common limitation in medical imaging where resampling can degrade important details. Excited to see agents being introduced more directly into clinician workflows. Here's the awesome work: https://lnkd.in/ggQ4YGeX Congrats to Andrew Hoopes, Victor Ion Butoi, John Guttag, and Adrian V. Dalca! I post my takes on the latest developments in health AI – 𝗰𝗼𝗻𝗻𝗲𝗰𝘁 𝘄𝗶𝘁𝗵 𝗺𝗲 𝘁𝗼 𝘀𝘁𝗮𝘆 𝘂𝗽𝗱𝗮𝘁𝗲𝗱! Also, check out my health AI blog here: https://lnkd.in/g3nrQFxW
-
✨ Multimodal AI in Radiology: Pushing the Boundaries of AI in Radiology ✨ 💡 Artificial intelligence (AI) in radiology is evolving, and multimodal AI is at the forefront. This is a nice overview of the landscape of multimodal AI in radiology research by Amara Tariq, Imon Banerjee, Hari Trivedi, and Judy Gichoya in The British Institute of Radiology. It is a recommended read for those interested in multimodal AI, including vision-language models. 👍 🔍 Why Multimodal AI? 🔹 Single-modality limitations: AI models trained on a single data type (e.g., head CTs) can have limited utility in real-world clinical settings. Radiologists, for example, rely on multiple information sources. 🔹 Clinical context matters: Without context, AI models may flag irrelevant findings, leading to unnecessary workflow disruptions. "Building single modality models without clinical context (available from multimodal data) ultimately results in impractical models with limited clinical utility." 🔹 Advancements in fusion techniques enable the integration of imaging, lab results, and clinical notes to mirror real-life decision-making. 🧪 How Does It Work? Fusion Methods Explained 🔹 Traditional Fusion Models: Combines data at different stages (early, late, or joint fusion). This approach struggles with missing data and has the potential for overfitting (early and joint). 🔹 Graph-Based Fusion Models: Uses graph convolutional networks (GCNs) to fuse implicit relationships between patients or samples based on clinical similarity, improving generalizability capabilities for missing data but facing explainability challenges. 🔹 Vision-Language Models (VLMs): Leverage transformer-based architectures to process images and text together, showing promise in tasks like radiology report generation but requiring massive training datasets. 🔧 Challenges & Ethical Considerations 🔹 Bias and transparency: AI models can unintentionally reinforce historical biases. 🔹 Generalizability: Models trained on structured clinical datasets may struggle with diverse patient populations ("out-of-distribution datasets"). 🌐 The Future of Multimodal AI in Radiology ✅ Benchmark datasets must be developed for robust evaluation. ✅ Ethical concerns must be addressed to ensure fair, explainable, and patient-centered AI solutions. ✅ Collaborative efforts between radiologists and AI developers are essential for creating clinically relevant models. 🔗 to the original open-access article is in the first comment 👇 #AI #MultimodalAI #LMMs #VLMs #GCNs #GenAI #Radiology #RadiologyAI
-
Our new perspective piece in Nature introduces what we call "GenMI" –multimodal generative AI that transforms medical images into detailed reports. I'm particularly excited about our "AI resident" paradigm. Instead of replacing clinicians, these systems work alongside them – drafting initial reports, linking textual findings to relevant areas in images, and enabling interactive exploration. We envision three key benefits: - Reducing radiologist workload while preserving interpretation quality - Enhancing patient understanding through guided image exploration. Accelerating medical education with interactive feedback - Of course, challenges remain. We need better clinical benchmarks, transparency in model reasoning, and safeguards against over-reliance. The future of medical imaging isn't AI vs. humans – it's a thoughtful collaboration between both. A great collaboration with Eric Topol, MD & Vish Rao with Michael Hla Michael Moor Subathra Adithan Stephen Kwak Nature Portfolio. Harvard Medical School Department of Biomedical Informatics Harvard Medical School CC a2z Radiology AI, where we're working on building to solve the problems we lay out. #MedicalAI #Radiology #HealthTech https://rdcu.be/efcIG