How Modern LLMs Perform Reasoning and Synthesis

Explore top LinkedIn content from expert professionals.

Summary

Modern large language models (LLMs) are evolving to perform advanced reasoning and synthesis, meaning they can now analyze complex problems, explore alternative solutions, and adapt their thinking much like humans do. Instead of simply generating text, these systems use structured approaches to decision-making and problem-solving, enabling them to handle tasks that require planning, verification, and self-correction.

  • Clarify constraints: Specify clear limitations and requirements in your prompts so the model can focus its reasoning on practical solutions that meet your needs.
  • Use structured formats: Organize your input with section headers or tags to help the model distinguish between different parts of your request and improve its understanding.
  • Encourage exploration: Try prompts that invite the model to consider multiple possibilities or revisit earlier steps, allowing it to search for the best path to a solution.
Summarized by AI based on LinkedIn member posts
  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    599,136 followers

    If you’re an AI engineer trying to understand how reasoning actually works inside LLMs, this will help you connect the dots. Most large language models can generate. But reasoning models can decide. Traditional LLMs followed a straight line: Input → Predict → Output. No self-checking, no branching, no exploration. Reasoning models introduced structure, a way for models to explore multiple paths, score their own reasoning, and refine their answers. We started with Chain-of-Thought (CoT) reasoning, then extended to Tree-of-Thought (ToT) for branching, and now to Graph-based reasoning, where models connect, merge, or revisit partial thoughts before concluding. This evolution changes how LLMs solve problems. Instead of guessing the next token, they learn to search the reasoning space- exploring alternatives, evaluating confidence, and adapting dynamically. Different reasoning topologies serve different goals: • Chains for simple sequential reasoning • Trees for exploring multiple hypotheses • Graphs for revising and merging partial solutions Modern architectures (like OpenAI’s o-series reasoning models, Anthropic’s Claude reasoning stack, DeepSeek R series and DeepMind’s AlphaReasoning experiments) use this idea under the hood. They don’t just generate answers, they navigate reasoning trajectories, using adaptive depth-first or breadth-first exploration, depending on task uncertainty. Why this matters? • It reduces hallucinations by verifying intermediate steps • It improves interpretability since we can visualize reasoning paths • It boosts reliability for complex tasks like planning, coding, or tool orchestration The next phase of LLM development won’t be about more parameters, it’ll be about better reasoning architectures: topologies that can branch, score, and self-correct. I’ll be doing a deep dive on reasoning models soon on my Substack- exploring architectures, training approaches, and practical applications for engineers. If you haven’t subscribed yet, make sure you do: https://lnkd.in/dpBNr6Jg ♻️ Share this with your network 🔔 Follow along for more data science & AI insights

  • View profile for Kuldeep Singh Sidhu
    Kuldeep Singh Sidhu Kuldeep Singh Sidhu is an Influencer

    Senior Data Scientist @ Walmart | BITS Pilani

    13,433 followers

    Reasoning Agentic RAG: The Evolution from Static Pipelines to Intelligent Decision-Making Systems The AI research community has just released a comprehensive survey that could reshape how we think about Retrieval-Augmented Generation. Moving beyond traditional static RAG pipelines, researchers from leading institutions including Beijing University of Posts and Telecommunications, University of Georgia, and SenseTime Research have mapped out the emerging landscape of Reasoning Agentic RAG. The Core Innovation: System 1 vs System 2 Thinking Drawing from cognitive science, the survey categorizes reasoning workflows into two distinct paradigms: Predefined Reasoning (System 1): Fast, structured, and efficient approaches that follow fixed modular pipelines. These include route-based methods like RAGate that selectively trigger retrieval based on model confidence scores, loop-based systems like Self-RAG that enable iterative refinement through retrieval-feedback cycles, and tree-based architectures like RAPTOR that organize information hierarchically using recursive structures. Agentic Reasoning (System 2): Slow, deliberative, and adaptive systems where the LLM autonomously orchestrates tool interaction during inference. The model actively monitors its reasoning process, identifies knowledge gaps, and determines when and how to retrieve external information. Under the Hood: Technical Mechanisms The most fascinating aspect is how these systems work internally. In prompt-based agentic approaches, frameworks like ReAct interleave reasoning steps with tool use through Thought-Action-Observation sequences, while function calling mechanisms provide structured interfaces for LLMs to invoke search APIs based on natural language instructions. Training-based methods push even further. Systems like Search-R1 use reinforcement learning where the search engine becomes part of the RL environment, with the LLM learning policies to generate sequences including both internal reasoning steps and explicit search triggers. DeepResearcher takes this to the extreme by training agents directly in real-world web environments, fostering emergent behaviors like cross-validation of information sources and strategic plan adjustment. The Technical Architecture What sets these systems apart is their dynamic control logic. Unlike traditional RAG's static retrieve-then-generate pattern, agentic systems can rewrite failed queries, choose different retrieval methods, and integrate multiple tools-vector databases, SQL systems, and custom APIs-before finalizing responses. The distinguishing quality is the system's ability to own its reasoning process rather than executing predetermined scripts. The research indicates we're moving toward truly autonomous information-seeking systems that can adapt their strategies based on the quality of retrieved information, marking a significant step toward human-like research and problem-solving capabilities.

  • View profile for Dimitris Papadopoulos

    CAIO @ EXUS | PhD in NLP | Making GenAI deliver impact

    8,494 followers

    Researchers from Meta recently introduced COCONUT (Chain of Continuous Thought): a new reasoning approach that uses the last hidden state of the LLM as a representation of the reasoning state (termed “continuous thought”) COCONUT shifts LLMs from language-bound reasoning to a continuous latent space, unlocking advanced problem-solving efficiency and accuracy. 🤔 The problem: Traditional reasoning methods rely heavily on language-based reasoning chains, the familiar CoT and its variants. While effective for some tasks, these methods face inherent limitations: ➖ They prioritize fluency over reasoning, wasting computational effort by analyzing the intermediate logical steps in free text. ➖ When faced with complex tasks that demand planning or backtracking, CoT often struggles to map out all possibilities effectively. ➖ In general, these approaches mimic human communication patterns rather than the deeper cognitive processes involved in reasoning, 💡The solution: COCONUT introduces latent reasoning, a method where reasoning steps are represented as continuous states instead of explicit language tokens. This shift enables models to operate more effectively by: 🥥 Exploring multiple paths: encoding possibilities simultaneously, the model can evaluate alternatives, akin to a Breadth-First Search approach. 🥥 Backtracking: Latent reasoning supports revisiting earlier steps, crucial for tasks requiring complex planning. 🥥 Token efficiency: Continuous reasoning uses fewer tokens, reducing computational overhead. 🛠️ How COCONUT works: 🔹 Encoding reasoning: Continuous thoughts, derived from the model’s hidden states, represent the reasoning process. These states are looped back as input rather than being decoded into language. 🔹 Switching modes: The model alternates between two operational modes: 1. Language mode: Used for handling input questions and producing final answers. 2. Latent mode: Processes reasoning steps in the latent space, bypassing language generation. 📈 Performance: COCONUT outperforms CoT in various benchmarks, especially those involving planning-intensive tasks. More importantly, it does so with fewer tokens and while handling complex logical structures with ease. I find the main motivation really interesting: Authors claim that according to neuroimaging studies, reasoning in humans does not rely on the brain's language network, which primarily handles comprehension and communication. Instead, reasoning processes often involve distinct neural pathways optimized for logic and planning, independent of language structures. This insight is what motivated the development of 🥥COCONUT, aiming to separate the computational reasoning of LLMs from the constraints of language-based reasoning. Paper in comments.

  • View profile for Sahar Mor

    I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

    40,990 followers

    A simple prompt hack that makes reasoning models 3x more effective has been hiding in plain sight - and it's not "think step by step." While many developers instinctively add chain-of-thought prompts to their LLM calls, new research shows this actually degrades performance with modern reasoning models like OpenAI's o1 and DeepSeek's recent R1. Key insights on effective prompting: (1) Developer-first architecture - modern reasoning models like o3 use a new paradigm where instructions are passed as what's called "developer messages" rather than system prompts, enabling more precise control over model behavior (2) Zero-shot superiority - these models often perform better without examples, contrary to traditional LLMs (3) Constraint clarity - explicitly outline limitations (e.g., "solution under $500") instead of asking for step-by-step thinking (4) Structured input control - use delimiters (XML tags, markdown, section headers) to organize your prompts, significantly improving the model's understanding of distinct input components (5) Markdown handling - newer versions of reasoning models require explicit activation of markdown formatting through a simple header flag, so don't expect markdown to be generated otherwise My take: Clearly, reasoning models operate differently from traditional LLMs such as Claude Sonnet and GPT-4o. Comparing these models is like judging a coding model, such as Mistral’s Codestral, by its ability to write poetry—it’s an apples-to-oranges comparison. This isn't just about better prompts - it's about fundamentally rethinking how we interact with AI systems designed for complex reasoning.

  • View profile for Himanshu J.

    Building Aligned, Safe and Secure AI

    27,131 followers

    🚀 Exploring the transition from LLMs to LRMs: Unveiling the evolution of "Thinking" in AI 🤖🧠 The shift from Large Language Models (LLMs) to Large Reasoning Models (LRMs) marks a significant transformation in how AI tackles intricate problem-solving tasks. 📚 A recent collaborative study by researchers from Massachusetts Institute of Technology, Cornell University, University of Washington, and Microsoft Research delves into a fundamental query:- 🔍 How can AI be trained to engage in "thinking" rather than merely generating text? 💡 The innovative approach, Reinforcement Learning via Self-Play (RLSP), introduces a novel method of instructing AI to engage in reasoning by integrating:- ✅ Supervised Fine-Tuning (SFT) – Learning from human or synthetic demonstrations of reasoning. ✅ Exploration Reward Signals – Promoting diverse reasoning avenues such as backtracking, verification, and the consideration of multiple hypotheses. ✅ Reinforcement Learning (RL) with Outcome Verification – Ensuring accurate reasoning without exploiting rewards. 🔥 Key Revelations & Advancements:- 📌 Emergent Behaviors: Models trained with RLSP showcased traits like self-correction, exploration, and verification, mirroring human problem-solving approaches. 📌 Performance Enhancement: RLSP led to a 23% increase in math problem-solving accuracy on Llama-3.1-8B and a 10% boost on AIME 2024 for Qwen2.5-32B. 📌 AI as a Search Mechanism: Thinking essentially involves a guided exploration of potential solutions, a concept resonating in methodologies like AlphaZero and Process Reward Modeling. 🌎 Significance of the Progress:- As AI systems transcend mere memorization to engage in active reasoning, the implications extend across scientific exploration, enterprise AI applications, and self-directed decision-making. Could this signify the dawn of AI cultivating its innate intuition? 🤔 📖 Explore the complete paper here - https://lnkd.in/dhr_C4-e Would love to hear your thoughts—where do you see AI reasoning making the biggest impact? 🚀👇 #AI #MachineLearning #LLMs #AIReasoning #ReinforcementLearning #LLMsToLRMs

  • View profile for Matthew Berman

    AI Enthusiast, YouTuber, Investor, Entrepreneur, Founder of Forward Future.

    7,708 followers

    New research paper shows how LLMs can "think" internally before outputting a single token! Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space. TONS of benefits from this approach. Let me break down this fascinating paper... The key insight: Human thinking often happens before we verbalize thoughts. Traditional LLMs think by generating tokens (Chain of Thought), but this new approach lets models reason in their continuous latent space first. So what is it? The researchers built a 3.5B parameter model with a recurrent architecture that can "think" repeatedly in latent space before generating any output. The more thinking iterations, the better the performance! How it works: The model has 3 parts: • Prelude: Embeds input into latent space • Recurrent Core: Does the actual thinking • Coda: Converts thoughts back to tokens The cool part? The model can dynamically adjust how much "thinking" it needs based on the task: • Math problems → More iterations • Simple tasks → Fewer iterations Just like humans do! Key advantages over traditional Chain of Thought: • No special training data needed • Works with smaller context windows • Can capture reasoning that's hard to put into words Weird finding: The researchers found fascinating patterns in how the model thinks - it develops "orbits" and "sliders" in latent space to represent different types of reasoning! Is it good? Performance scales impressively with compute - at 32 recurrent iterations, it matches models 2x its size on many tasks. And it can go even further! This architecture also enables some neat tricks: • Zero-shot adaptive compute • KV cache sharing • Continuous chain-of-thought All without special training! The implications are huge: This could be a missing piece in getting LLMs to truly reason rather than just manipulate language. It's especially interesting given Yann LeCun's critiques that LLMs can't truly reason. This latent approach might bridge that gap. https://lnkd.in/gMNcBfTQ While still a proof of concept, the results suggest this could be a powerful new direction for language models - combining internal reasoning with traditional token generation. Full paper: https://lnkd.in/gCp8pXuz Breakdown video: https://lnkd.in/gSK2xw37

    • +5
  • View profile for Agus Sudjianto

    A geek who can speak: Co-creator of PiML and MoDeVa, SVP Risk & Technology H2O.ai, former EVP-Head of Wells Fargo MRM

    24,902 followers

    Brilliant in some cases and dumb in others! I’m a heavy user of LLM for many tasks that I do, but… Large Language Models (LLMs) can appear brilliant in some areas and surprisingly bad in others because of the way they are designed and trained. 1. Training Data Bias and Coverage LLMs are trained on vast amounts of text data from the internet, research papers, books, and code repositories. They perform well in areas where they have seen a lot of high-quality data (e.g., general knowledge, programming, mathematics). However, they struggle in areas where data is sparse, biased, or highly nuanced, leading to gaps in reasoning. 2. Pattern Recognition vs. True Understanding LLMs are pattern recognition engines, not true reasoning machines. They generate responses based on statistical likelihood rather than deep conceptual understanding. This means they can sound intelligent without actually “thinking,” leading to confident but incorrect answers in complex situations. 3. Lack of Real-World Experience LLMs do not have real-world experience—they cannot observe, experiment, or interact with the physical world. This makes them excellent at answering structured, well-documented questions but bad at reasoning about real-world uncertainties. 4. Difficulty with Logic and Consistency While LLMs can follow logical rules, they often struggle with multi-step reasoning, consistency across responses, and self-correction. A simple fact recall might be perfect, but when asked to extend logic to a new situation, the model can make obvious mistakes. 5. Overfitting to User Inputs LLMs tend to mirror the structure and assumptions of the input they receive. If a user provides leading or biased questions, the model may generate an answer that aligns with those biases rather than critically analyzing the question. 6. Struggles with Small Data Scenarios LLMs are designed for big-picture knowledge but struggle with specific, small-sample reasoning (e.g., experimental setups, statistical overfitting). They can generalize well over large datasets but may fail in cases that require deep domain expertise. 7. Computational Constraints LLMs operate under finite compute budgets—they truncate memory, which makes long-term dependencies difficult to track. This can make them great at short, factual questions but weak at complex, multi-step problems requiring extended context. As for agentic to do data science …draw your own conclusion 😝

  • View profile for Aishwarya Naresh Reganti

    Founder & CEO @ LevelUp Labs | Ex-AWS | Consulting, Training & Investing in AI

    114,025 followers

    🤩 What if you could use just 17k fine-tuning samples and change only 5% of the model to make a small LLM reason like the o1-preview model? DeepSeek-R1’s famous trick to make cheaper/smaller LLMs behave more like reasoning models seems to be working well—another paper reproduces similar results more efficiently! The DeepSeek-R1 paper introduced an experiment where they fine-tuned smaller Qwen and Llama models to improve their reasoning abilities by using outputs from the larger DeepSeek-R1 671B model. Some have called this soft distillation, while others say it's fine-tuning, but you get the point! Another recent paper has done something similar: ⛳ The paper focuses on improving LLMs' reasoning ability by getting them to generate Long Chain-of-Thought (Long CoT) responses for complex problems. ⛳It uses DeepSeek-R1's results to fine-tune smaller models like the Qwen2.5-32B-Instruct . ⛳They use only supervised fine-tuning (SFT) and low-rank adaptation (LoRA) with just 17k samples, meaning they didn't even modify the entire model (only 5% of it as per the authors). ⛳ The paper highlights that the structure of Long CoTs is far more critical than the content of individual reasoning steps. Errors in the content (e.g., mistakes in reasoning steps) have little impact, while disrupting the structure (e.g., deleting or shuffling reasoning steps) significantly hurts performance. ⛳They demonstrate that this approach works across different models and tasks! If this approach works well, many smaller models could be adapted to perform reasoning tasks! 💡 It's super interesting, every breakthrough with large models seems to push smaller models to become much more powerful simply by using these big models as teachers. Link: https://lnkd.in/e5rzRWqd

  • View profile for Carey C.

    Leadership in Data & AI Strategy | Human-Centered & Responsible AI | From Innovation to Scaled Impact

    2,685 followers

    In March, Anthropic released a new article titled “Tracing the Thoughts of a Language Model,” offering an intuitive look into the inner workings of LLMs and how their reasoning unfolds. (https://lnkd.in/gwsa-eYk) Back in 2021, Anthropic introduced the concept of Transformer Circuits and the residual stream — a framework for interpreting transformer models not as black boxes, but as systems where each layer contributes linearly and meaningfully to the final prediction. (https://lnkd.in/g_AzRHt2) This latest work builds on that foundation by using attribution graphs to analyze Claude 3.5 Haiku across a wide range of phenomena. It shows how reasoning unfolds layer by layer, rather than appearing all at once, allowing us to trace the model’s internal logic as it progresses. This approach enables deeper auditing of the model’s complex reasoning paths—whether multi-step or parallel—making it easier to understand how different components contribute to the final output. It also reveals instances of motivated reasoning, where subtle user cues steer the model’s response, and offers critical insights into how and why hallucinations emerge within the model’s thought process. This is an important step toward making LLMs more interpretable, transparent, and trustworthy — especially for real-world use cases that require accountability.

  • View profile for Mary Newhauser

    Machine Learning Engineer

    24,740 followers

    What if reasoning is actually a retrieval problem? Could that turn SLMs into strong reasoners? A recent paper out of the University of Michigan may have found a way to bring big-scale reasoning to local-scale compute. SLMs are amazing for agentic workflows and running on local hardware, but they still struggle with multi-step reasoning. And that’s not surprising, because SLMs often lack both specific domain knowledge and the structured support necessary for multi-step reasoning. This paper reframes reasoning as a retrieval problem. Instead of having models generate reasoning chains completely from scratch, they programmed SLMs to retrieve existing instructions. Here’s how they did it: 📝 First, they created an instruction corpus of question and instruction pairs. Each instruction contains relevant background knowledge and a step-by-step reasoning procedure to solve the question. 🎣 During inference, the SLM retrieves the most relevant instructions which are injected into the prompt. Upon evaluation, the authors share these findings: 🏆 Significantly improved performance for SLMs (3B - 14B parameters) on a number of benchmarks, including MedQA (+9.4%), MMLU Law (+7.9%), and MathQA (+5.1%). 🤏 Shorter, more concise instructions outperformed longer ones. 👪 Magnitude of improvement depends strongly on model family and inherent reasoning ability of models. 🔗 Full paper here: https://lnkd.in/gB3AHTwi

Explore categories