When working with multiple LLM providers, managing prompts, and handling complex data flows — structure isn't a luxury, it's a necessity. A well-organized architecture enables: → Collaboration between ML engineers and developers → Rapid experimentation with reproducibility → Consistent error handling, rate limiting, and logging → Clear separation of configuration (YAML) and logic (code) 𝗞𝗲𝘆 𝗖𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁𝘀 𝗧𝗵𝗮𝘁 𝗗𝗿𝗶𝘃𝗲 𝗦𝘂𝗰𝗰𝗲𝘀𝘀 It’s not just about folder layout — it’s how components interact and scale together: → Centralized configuration using YAML files → A dedicated prompt engineering module with templates and few-shot examples → Properly sandboxed model clients with standardized interfaces → Utilities for caching, observability, and structured logging → Modular handlers for managing API calls and workflows This setup can save teams countless hours in debugging, onboarding, and scaling real-world GenAI systems — whether you're building RAG pipelines, fine-tuning models, or developing agent-based architectures. → What’s your go-to project structure when working with LLMs or Generative AI systems? Let’s share ideas and learn from each other.
AI Workflow Enhancement
Explore top LinkedIn content from expert professionals.
-
-
Most LLM-infused RAG-based vector DB searches could be done just as easily with SQL or SPARQL queries. We must stop overengineering workflows just to get the agentic label. All that accomplishes is making products more complex and expensive. Use the simplest approach to create the outcome, and more use cases will be feasible. Use AI as an augmentation layer when reinventing workflows. Remember, in an agentic workflow, not every step must be handled by an LLM. The power of agents is their ability to use tools like people do. Leverage tools, information architecture, and other models to keep costs down. I had a startup founder ask about building a recommendation system with LLMs yesterday. In ten minutes, I helped them save millions in AI API costs by working through a simple workflow evaluation with them. We discovered that Step 1 was the only place that an LLM could provide value. They wanted to enable users to make their requests with natural language, so the app could extract more information to use for personalization. Intent detection, multistep conversation (asking clarifying questions), and gathering granular information from unstructured data are all great LLM use case categories. What happens next doesn’t have to use expensive models. The value creation came from reinventing the user workflow, not the operational workflow. A menu-based or categorical selection workflow isn’t feasible for this use case due to the complexity of the subject domain. The only way to extract all the information required to serve a high-quality recommendation was conversationally. The rest of the workflow was fairly straightforward. No vector DBs or RAG required. Once all the details were extracted, the rest of the workflow fit into conditional statements and logical steps. That’s the power of AI workflow evaluations. It’s just as valuable to define what AI shouldn’t do to make the unit economics work for more use cases. The upfront work to make the data more structured and define the logic could also benefit from an LLM’s help. That one-time cost can save the business from having to use an LLM repeatedly as part of the product’s workflow. Just because we could use AI doesn’t mean we should. I advise clients to run workflow evaluations and only use AI when it provides more value than an alternative, lower-cost approach.
-
Let's understand how multi-agent agentic RAG systems work 🧠 While AI agents started to become a huge thing in the industry, everyone started using them in building robust RAG systems and this gave rise to agentic and multi-agent RAG systems. Multi-Agent RAG represents a modular and scalable evolution of single-agent architectures, designed to handle complex workflows and diverse query types by leveraging multiple specialized agents. Instead of relying on a single agent to manage all tasks—reasoning, retrieval, and response generation—this system distributes responsibilities across multiple agents, each optimized for a specific role or data source. Below is an overview of the Multi-Agent RAG agentic workflow 1. Query Submission: The process begins with a user query, which is received by a coordinator agent or master retrieval agent. This agent acts as the central orchestrator, delegating the query to specialized retrieval agents based on the query’s requirements. 2. Specialized Retrieval Agents: The query is distributed among multiple retrieval agents, each focusing on a specific type of data source or task. Examples include: • Agent 1: Handles structured queries, such as interacting with SQL-based databases. • Agent 2: Manages semantic searches for retrieving unstructured data from sources like PDFs, books, or internal records. • Agent 3: Focuses on retrieving real-time public information from web searches or APIs. • Agent 4: Specializes in recommendation systems, delivering context-aware suggestions based on user behavior or profiles. 3. Tool Access and Data Retrieval: Each agent routes the query to the appropriate tools or data sources within its domain, such as: • Vector Search: For semantic relevance. • Text-to-SQL: For structured data. • Web Search: For real-time public information. • APIs: For accessing external services or proprietary systems. The retrieval process is executed in parallel, allowing for efficient processing of diverse query types. 4. Data Integration and LLM Synthesis: Once retrieval is complete, the data from all agents is passed to a Large Language Model (LLM). The LLM synthesizes the retrieved information into a coherent and contextually relevant response, integrating insights from multiple sources seamlessly. 5. Output Generation: The system generates a comprehensive response, which is delivered back to the user in an actionable and concise format. Know more about the different RAG approaches in the comprehensive research paper: https://lnkd.in/g-Mh4btd Here is my in-depth article on building multi-agent RAG systems: https://lnkd.in/gvqSDwSK Here is how you can build enterprise AI apps with multi-agent RAG: https://lnkd.in/gPCWUup2
-
Excited to share our end-to-end LLM workflows guide that we’ve used to help our industry customers fine-tune and serve OSS LLMs that outperform closed-source models in quality, performance and cost. Key LLM workloads with docs.ray.io and Anyscale: - 🔢 Preprocess our dataset (filter, schema, etc.) with batch data processing. - 🛠️ Fine-tune our LLMs (ex. Meta Llama 3) with full control (LoRA/full param, compute, loss, etc.) and optimizations (parallelism, mixed precision, flash attn, etc.) with distributed training. - ⚖️ Evaluate our fine-tuned LLMs with batch inference using Ray + vLLM. - 🚀 Serve our LLMs as a production application that can autoscale, swap between LoRA adapters, optimize for latency/throughput, etc. Key Anyscale infra capabilities that keeps these workloads efficient and cost-effective: - ✨ Automatically provision worker nodes (ex. GPUs) based on our workload's needs. They'll spin up, run the workload and then scale back to zero (only pay for compute when needed). - 🔋 Execute workloads (ex. fine-tuning) with commodity hardware (A10s) instead of waiting for inaccessible resources (H100s) with data/model parallelism. - 🔙 Configure spot instance to on-demand fallback (or vice-versa) for cost savings. - 🔄 Swap between multiple LoRA adapters using one base model (optimized with multiplexing). - ⚡️ Autoscale to meet demand and scale back to zero. 🆓 You can run this guide entirely for free on Anyscale (no credit card needed). Instructions in the links below 👇 🔗 Links: - Blog post: https://lnkd.in/gvPQGzjh - GitHub repo: https://lnkd.in/gxzzuFAE - Notebook: https://lnkd.in/gmMxb36y
-
BREAKING: Anthropic launches Claude Opus 4.5 Several of the upgrades are squarely at people like me, developers building agents. I'm particularly impressed with how elegantly they handle one of the biggest issues developers are dealing with: tool calling. MCP is great (as a protocol) but has a major issue with how it front-loads the context window and consuming a lot of tokens. It's easy to have a run-away set of MCP servers/tools that clutter the context window and degrade performance. So, here's what we have now (that I'm playing with for the next 12 hours): 1. Tool Search Instead of shoving every tool definition into the model up front (like packing your entire house for a weekend trip), Claude can now fetch tool definitions only when it needs them. Fewer tokens. Faster responses. Less clutter. More joy. 2. Programmatic Tool Calling Claude can now write code to orchestrate multiple tools without dragging every intermediate result back into the model’s context. Think of it as moving from “chatty assistant” to “competent developer who actually reads the docs.” 3. Tool Use Examples You can now give example calls to show how a tool should be used—not just what fields it has. This dramatically reduces the “I see your schema and choose chaos anyway” problem. Why this matters: If you’re building agent workflows with lots of tools, these upgrades cut token usage, reduce latency, improve reliability, and generally make your agent behave more like a well-trained teammate and less like an overeager intern. My take: As agent architectures get more complex, the bottleneck isn’t the model—it's the orchestration. These features move us closer to agents that can reason, retrieve, call tools, and coordinate real work at scale. In other words: better plumbing, better agents. If you're building anything agent-heavy, it’s worth a look. p.s. The image was created with my new ImageGen .ai agent (single prompt consisting of the text of the post).
-
One of the biggest challenges I see with scaling LLM agents isn’t the model itself. It’s context. Agents break down not because they “can’t think” but because they lose track of what’s happened, what’s been decided, and why. Here’s the pattern I notice: 👉 For short tasks, things work fine. The agent remembers the conversation so far, does its subtasks, and pulls everything together reliably. 👉 But the moment the task gets longer, the context window fills up, and the agent starts forgetting key decisions. That’s when results become inconsistent, and trust breaks down. That’s where Context Engineering comes in. 🔑 Principle 1: Share Full Context, Not Just Results Reliability starts with transparency. If an agent only shares the final outputs of subtasks, the decision-making trail is lost. That makes it impossible to debug or reproduce. You need the full trace, not just the answer. 🔑 Principle 2: Every Action Is an Implicit Decision Every step in a workflow isn’t just “doing the work”, it’s making a decision. And if those decisions conflict because context was lost along the way, you end up with unreliable results. ✨ The Solution to this is "Engineer Smarter Context" It’s not about dumping more history into the next step. It’s about carrying forward the right pieces of context: → Summarize the messy details into something digestible. → Keep the key decisions and turning points visible. → Drop the noise that doesn’t matter. When you do this well, agents can finally handle longer, more complex workflows without falling apart. Reliability doesn’t come from bigger context windows. It comes from smarter context windows. 〰️〰️〰️ Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg
-
Predictive Process Excellence is crucial. It shifts focus from fixing problems to preventing them. Companies must stop reacting and start foreseeing. Most businesses wait until issues arise. They analyze past data. They hunt for mistakes. They rush to fix problems. But this approach has limits. Example: A factory identifies a bottleneck only after production slows. By then, time and resources are already wasted. Reactive AI helps in the moment. But it doesn’t learn. In fast-moving markets, short-sightedness leads to lost opportunities. The solution is Predictive BPM. Predictive BPM does not just react. It foresees problems. With AI and machine learning, you can: ✅ Monitor processes in real time. ✅ Detect patterns before issues arise. ✅ Optimize workflows automatically. How does Predictive BPM work? Anomaly Detection → Identifies irregularities in real time (e.g., slow approvals, compliance risks). Simulation & Scenario Modeling → Predicts business outcomes using AI-powered process mining. Self-Optimizing Workflows → Adjusts tasks and resources dynamically based on forecasts. The result? ✔️ Process Optimization: BPM-driven automation reduces errors by up to 30%, leading to operational cost savings of 15-20% on average. ✔️ Compliance Assurance: BPM frameworks ensure consistent, documented processes, reducing compliance risks by 60% and streamlining audits. ✔️ Enhanced Customer Experience: BPM-optimized workflows reduce customer wait times by 40% and increase satisfaction scores by 25%. Want to implement Predictive BPM? Start here: → Identify key processes: AI thrives on data-rich workflows. → Integrate the right solutions: Process Mining extracts insights from real-time data to optimize workflows. → Shift the mindset: Move from reactive problem-solving to proactive strategy. AI is not just automating processes. It is redefining them. Companies that wait to adopt Predictive BPM risk falling behind. The question is: Will you lead the change - or react to it later? #AI #automation #businessdevelopment
-
Building LLM Agent Architectures on AWS - The Future of Scalable AI Workflows What if you could design AI agents that not only think but also collaborate, route tasks, and refine results automatically? That’s exactly what AWS’s LLM Agent Architecture enables. By combining Amazon Bedrock, AWS Lambda, and external APIs, developers can build intelligent, distributed agent systems that mirror human-like reasoning and decision-making. These are not just chatbots - they’re autonomous, orchestrated systems that handle workflows across industries, from customer service to logistics. Here’s a breakdown of the core patterns powering modern LLM agents : Breakdown: Key Patterns for AI Workflows on AWS 1. Prompt Chaining / Saga Pattern Each step’s output becomes the next input — enabling multi-step reasoning and transactional workflows like order handling, payments, and shipping. Think of it as a conversational assembly line. 2. Routing / Dynamic Dispatch Pattern Uses an intent router to direct queries to the right tool, model, or API. Just like a call center routing customers to the right department — but automated. 3. Parallelization / Scatter-Gather Pattern Agents perform tasks in parallel Lambda functions, then aggregate responses for efficiency and faster decisions. Multiple agents think together — one answer, many minds. 4. Saga / Orchestration Pattern Central orchestrator agents manage multiple collaborators, synchronizing tasks across APIs, data sources, and LLMs. Perfect for managing complex, multi-agent projects like report generation or dynamic workflows. 5. Evaluator / Reflect-Refine Loop Pattern Introduces a feedback mechanism where one agent evaluates another’s output for accuracy and consistency. Essential for building trustworthy, self-improving AI systems. AWS enables modular, event-driven, and autonomous AI architectures, where each pattern represents a step toward self-reliant, production-grade intelligence. From prompt chaining to reflective feedback loops, these blueprints are reshaping how enterprises deploy scalable LLM agents. #AIAgents
-
Less talking, more building: 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗦𝘆𝘀𝘁𝗲𝗺 𝗳𝗿𝗼𝗺 𝘀𝗰𝗿𝗮𝘁𝗰𝗵! 👇 Some weeks ago I released an episode of my Newsletter and an update to the 𝗔𝗜 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀 𝗛𝗮𝗻𝗱𝗯𝗼𝗼𝗸 GitHub repository. There I implemented a Deep Research Agent from scratch without using any LLM Orchestration frameworks (using DeepSeek-R1 for some planning tasks). In the project we implement the following Agentic topology: 𝟭. A user provides a query or topic to be researched. 𝟮. A LLM creates an outline of the final report that it will be aiming for. It will be instructed to produce not more than a certain number of paragraphs. 𝟯. Each of the paragraph description is fed into a research process separately to produce a comprehensive set of information to be used in report construction. Detailed description of the research process will be outlined in the next section. 𝟰. All of the information will be fed into a summarisation step that will construct the final report including conclusion. 𝟱. The report will then be delivered to the user in MarkDown form. Each of the research steps are following given flow: 𝟭. Once we have the outline of each paragraph, it will be passed to a LLM to construct Web Search queries in an attempt to best enrich the information needed. 𝟮. The LLM will output the search query and the reasoning behind it. 𝟯. We will execute Web search against the query and retrieve top relevant results. 𝟰. The results will be passed to the Reflection step where a LLM will reason about any missed nuances to try and come up with a search query that would enrich the initial results. 𝟱. This process will be repeated for n times in an attempt to get the best set of information possible. Detailed walkthrough Blog Post: https://lnkd.in/dWMapU7r GitHub with implementation code and Notebooks to follow: https://lnkd.in/daiAbpc9 Happy Building! Be sure to leave a like or star if you find the content useful! #AI #LLM
-
Context engineering is quickly becoming one of the most critical skills in applied AI. Not prompt tweaking. Not model fine-tuning. But knowing what information a model needs, and when to give it. That is the real unlock behind AI agents that actually work. At its core, context engineering is about delivering the right information to the model, at the right time, in the right format; so it can reason effectively. It pushes developers to think more intentionally about how they shape a model’s inputs: 🔸What does the model need to know for this task? 🔸Where should that information come from? 🔸How do we fit it within the limits of the context window? 🔸And how do we prevent irrelevant or conflicting signals from getting in the way? Why does this matter so much? In practice, most agent failures are not due to weak models. They happen because the model did not have the context it needed. It missed a key fact, relied on stale data, or was overloaded with noise. Context engineering addresses this directly. It forces you to design the flow of information step by step, not just what the model sees, but how and when it sees it. This context can come from many places: 🔹Long- and short-term memory (such as prior conversations or user history) 🔹Retrieved data from APIs, vector stores, or internal systems 🔹Tool definitions and their recent outputs 🔹Structured formats or schemas that define how information is used 🔹Global state shared across multi-step workflows Frameworks like LlamaIndex, LangGraph AI, LangChain, are evolving to support this shift, giving developers the tools to manage context with much more precision. And there are now better resources than ever to help teams write, select, compress, and organize context with real control. Image from Langchain blog. #contextengineering #llms #generativeai #artificialintelligence #technology