How Qwen2.5 Processes Advanced Workflows

Explore top LinkedIn content from expert professionals.

Summary

Qwen2-5 is an advanced artificial intelligence model designed to process complex workflows by simulating multi-agent collaboration and handling extremely long sequences of information. It combines techniques like reinforcement learning, dynamic memory management, and integrated search reasoning to tackle intricate tasks in fields such as code analysis, legal review, and scientific research.

  • Simulate teamwork: Train the model to act like multiple specialized agents working together, allowing it to coordinate tools and strategies for challenging problems.
  • Handle long contexts: Use progressive training and efficient memory techniques so your AI can understand connections throughout very large documents without missing key details.
  • Integrate search reasoning: Teach the AI to combine internal thinking with external search queries, enabling it to answer multi-step questions and retrieve relevant information during its workflow.
Summarized by AI based on LinkedIn member posts
  • View profile for Elvis S.

    Founder at DAIR.AI | Angel Investor | Advisor | Prev: Meta AI, Galactica LLM, Elastic, Ph.D. | Serving 7M+ learners around the world

    81,502 followers

    Chain-of-Agents Interesting idea to train a single model with the capabilities of a multi-agent system. 84.6% reduction in inference cost! Distillation and Agentic RL are no joke! Here are my notes: ● Overview This work proposes training single models to natively behave like multi‐agent systems, coordinating “role‐playing” and tool agents end‐to‐end. They distill strong multi‐agent frameworks into CoA trajectories, then optimize with agentic RL on verifiable tasks. ● Paradigm shift CoA generalizes ReAct/TIR by dynamically activating multiple roles and tools within one model, preserving a single coherent state while cutting inter‐agent chatter. ● Training recipe 1) Multi‐agent distillation turns successful OAgents runs into CoA‐formatted traces with planning, tool calls, observations, and reflection, filtered for difficulty and quality; 2) Agentic RL targets hard queries where tools matter, with simple binary rewards via LLM‐as‐Judge for web tasks and executable or exact‐match rewards for code/math. ● Training framework Stage 1 (SFT) – uses reformatted ReAct-style data (both short and long reasoning chains) to give the model a solid “cold start.” Progressive filtering ensures only high-quality trajectories are used, emphasizing coherence, tool efficiency, and reflective reasoning. Stage 2 (RL) – builds on the SFT base. The model performs tool-aware rollouts on unused QA pairs. Rewards are computed from task correctness (via LLM-as-Judge, exact match, or test cases), and policy updates improve tool coordination and reasoning robustness. ● Main results With Qwen‐2.5‐32B backbones, Agent Foundation Models (AFM) sets new pass@1 on GAIA 55.3, BrowseComp 11.1, HLE 18.0, and leads WebWalker 63.0; it also tops multi‐hop QA suites across sizes. ● Code + math AFM‐RL‐32B reaches AIME25 59.8, MATH500 94.6, OlympiadBench 72.1, and LiveCodeBench v5 47.9, beating prior TIR methods including ReTool and Reveal. ● Efficiency and robustness Compared to traditional multi‐agent systems, AFM cuts inference tokens and tool calls substantially. The paper reports an 84.6% token cost reduction while staying competitive. It also generalizes to unseen tools better when strict formatting is required. ● Test‐time scaling Best‐of‐3 and pass@3 markedly boost AFM, e.g., GAIA 69.9 and HLE 33.2, closing the gap with larger proprietary agent stacks. Overall, Chain-of-Agents enables training single-agent foundation models that natively simulate multi-agent collaboration, combining multi-agent distillation with agentic RL to achieve state-of-the-art results.

    • +4
  • View profile for Raphaël MANSUY

    Data Engineering | DataScience | AI & Innovation | Author | Follow me for deep dives on AI & data-engineering

    31,746 followers

    Why Handling 1M Tokens Isn’t Just About Adding More Memory ... Qwen 2.5 a local model with 1M token support ... "What if your AI could analyze an entire library’s worth of text in one go—without missing a single detail?  👉 Why Long Context Matters Imagine trying to solve a complex math problem by examining one equation at a time. You might grasp individual steps but miss the overarching logic. Similarly, language models with limited context windows struggle to connect ideas across vast datasets, whether analyzing code repositories, legal documents, or multi-chapter narratives.  The challenge isn’t just about "storing"more data—it’s about "understanding"relationships between distant pieces of information. Traditional methods often sacrifice coherence or accuracy when scaling to longer sequences, creating bottlenecks for real-world applications.  👉 What Makes Qwen2.5-1M Different  The Qwen2.5-1M series addresses this gap with three innovations:  1. "Progressive Training": Instead of brute-forcing longer sequences, the models are trained incrementally—starting at 4k tokens and scaling to 256k—using adaptive positional encoding to retain short-context accuracy.  2. "Synthetic Data Synthesis": To teach the model long-range dependencies, tasks like paragraph reordering and keyword retrieval are embedded into training data, mimicking how humans contextualize information.  3. "Memory-Efficient Inference": Techniques like Dual Chunk Attention (DCA) reduce positional encoding collisions, while sparse attention kernels cut prefill times by 3–7x, even on consumer-grade GPUs.  👉 "How It Works Without Breaking Your GPU Training and deploying 1M-token models isn’t trivial. Here’s how the team tackled it:  - "Memory Constraints": A 1M-token context requires ~80GB of VRAM for dense attention. Qwen2.5-1M sidesteps this with chunked prefill, splitting sequences into 32k-token blocks and optimizing GPU memory reuse.  - "Speed vs. Accuracy Tradeoffs": Sparse attention (MInference) skips non-critical tokens, but over-aggressive sparsity harms accuracy. The solution? A refinement step that validates token importance using softmax likelihoods, preserving 95%+ accuracy while slashing compute.  - "Inference Engine Tweaks": Kernel-level optimizations for MoE models and dynamic pipeline parallelism ensure consistent throughput, even when processing mixed-length queries.  For developers, this means:  - "Local Deployment": The open-source inference framework (integrated with vLLM) supports 1M-token contexts on my M4 Max 128Go.  - "Cost-Efficiency": Qwen2.5-Turbo’s API offers GPT-4o-mini-level performance at lower cost, with 8x the context length.  The Qwen team’s decision to open-source their framework invites the community to build on these tools, ensuring progress isn’t confined to proprietary systems.  To the Qwen Team: ❤️  Thank you for pushing the boundaries of what’s possible—and making it accessible.  

  • View profile for Sachin Kumar

    Experienced Agentic AI and Generative AI Expert

    8,572 followers

    ReSearch: LLM training framework to Reason with Search via Reinforcement Learning. Integrating reasoning with external search processes remains challenging, especially for complex multi-hop questions requiring multiple retrieval steps. To address it, this paper propose ReSearch, a novel framework that trains LLMs to Reason with Search via reinforcement learning without using any supervised data on reasoning steps. 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄 - reasoning chain in this framework is not only composed of text-based thinking (i.e., enclosed by <think> </think>) as DeepSeek-R1, but also search query (i.e., enclosed by <search> </search>) and retrieval results (i.e., enclosed by <result> </result>) - treats search operation as part of chain-like reasoning process, and search operation will interact with text-based thinking - didn’t provide any supervised data on reasoning steps for LLMs to imitate, instead, leveraged reinforcement learning to incentivize LLMs to perform reasoning with search - trained ReSearch from scratch on Qwen2.5–7B(-Instruct) and Qwen2.5–32B(-Instruct) 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵 i) Group Relative Policy Optimization(GRPO) - used as learning algorithm, which estimate baseline from a group of rollouts instead of training a separate critic model in Proximal Policy Optimization (PPO) ii) Rollout with Search -  iterative process between text-based thinking, search queries, and retrieval results - when generation process encounters </search> tag, query between last <search> and current </search> tags is used as search query to retrieve relevant factual information, and retrieval results is enclosed by <result> and </result> tags - existing rollout concatenated with retrieval results used as next input to generate response iteratively, until generation encounters end-of-sentence (eos) tag c) Retrieval Result Masking - retrieval results are masked in loss calculation to avoid training policy from being biased towards retrieval results 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲 - for base model, template is filled with a specific user question, to use as direct input to LLMs - For instruction-tuned model, its prompt template serves as system prompt, utilized in conjunction with corresponding chat template of instruction-tuned LLM 𝗥𝗲𝘄𝗮𝗿𝗱 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴 i) Answer Reward - calculate correctness of final answer in \boxed{} and ground truth answer via F1 score ii) Format Reward - check whether rollout correctly follows defined format as described in prompt templates, checking correctness of tags and existence of \boxed{} in answer 𝗥𝗲𝘀𝘂𝗹𝘁𝘀 - average improvement of ReSearch over the best baseline is 15.81% in exact match and 17.56% in LLM-as-a-judge, for Qwen2.5 model with 7B parameters. - For Qwen2.5 model with 32B parameters, the average improvement is 14.82% in exact match and 15.46% in LLM-as-a-judge 𝗕𝗹𝗼𝗴: https://lnkd.in/ehnmzj_Q 𝗣𝗮𝗽𝗲𝗿: https://lnkd.in/e-MWGAcG 𝗖𝗼𝗱𝗲: https://lnkd.in/eE4xPwBd

Explore categories