LLMs are going vertical → and functional. We’re moving from “everyday AI” to functional AI: domain-specific agents embedded in real workflows where enterprise value is trapped Proof the shift is led by LLM providers themselves: Banking: OpenAI is partnering directly with banks (e.g., BNY Mellon’s multiyear deal to upgrade its Eliza platform; NatWest’s UK-first collaboration). These are not generic chats, they’re deeply embedded, regulated-industry builds. Life sciences: Anthropic’s Claude for Life Sciences adds connectors to tools like Benchling, PubMed, 10x Genomics and offers domain skills: from protocol QA to bioinformatics workflows. That’s vertical by design. Healthcare: Google’s MedLM + Vertex AI Search for Healthcare targets clinical documentation and medical record retrieval: out-of-the-box isn’t enough; it’s workflow-native. Industrial: Siemens Industrial Copilot (with Microsoft) is scaling across factories and engineering teams. LLMs tuned to PLC code, Teamcenter, and shop-floor realities. The takeaway: The real value isn’t a model. It is configuration and customization: grounding in your systems of record, domain ontologies, governed connectors, policy guardrails, eval harnesses tied to domain KPIs, and change management. Off-the-shelf chat interface won’t clear the bar for accuracy, compliance, or UX in complex functions. Verticalization is the on-ramp. Customization is the unlock. #EnterpriseAI
Customizing LLMs for Enterprise Applications
Explore top LinkedIn content from expert professionals.
Summary
Customizing large language models (LLMs) for enterprise applications means tailoring these AI tools to solve specific business problems while meeting industry requirements for accuracy, security, and compliance. Unlike generic models, customized LLMs are adapted to understand company-specific data and workflows, making them far more reliable for business-critical tasks.
- Build domain expertise: Train your LLMs on industry-specific datasets to improve accuracy and ensure your model understands the unique language and regulations of your business sector.
- Ground in enterprise data: Connect your LLMs to verified internal sources and systems to reduce errors and keep responses relevant to your organization's needs.
- Control and customize: Choose open-source models or proprietary setups when you need full control over data privacy, compliance, and the ability to fine-tune features for future business changes.
-
-
In enterprise AI - '23 was the mad rush to a flashy demo - '24 will be all about getting to real production value Three key steps for this in our experience: - (1) Develop your "micro" benchmarks - (2) Develop your data - (3) Tune your entire LLM system- not just the model 1/ Develop your "micro" benchmarks: - "Macro" benchmarks e.g. public leaderboards dominate the dialogue - But what matters for your use case is a lot narrower - Must be defined iteratively by business/product and data scientist together! Building these "unit tests" is step 1. 2/ Develop your data: - Whether via a prompt or fine-tuning/alignment, the key is the data in, and how you develop it - Develop = label, select/sample, filter, augment, etc. - Simple intuition: would you dump a random pile of books on a student's desk? Data curation is key. 3/ Tune your entire LLM system- not just the model: - AI use cases generally require multi-component LLM systems (eg. LLM + RAG) - These systems have multiple tunable components (eg. LLM, retrieval model, embeddings, etc) - For complex/high value use cases, often all need tuning 4/ For all of these steps, AI data development is at the center of getting good results. Check out how we make this data development programmatic and scalable for real enterprise use cases @SnorkelAI snorkel.ai :)
-
Are your LLM apps still hallucinating? Zep used to as well—a lot. Here’s how we worked to solve Zep's hallucinations. We've spent a lot of cycles diving into why LLMs hallucinate and experimenting with the most effective techniques to prevent it. Some might sound familiar, but it's the combined approach that really moves the needle. First, why do hallucinations happen? A few core reasons: 🔍 LLMs rely on statistical patterns, not true understanding. 🎲 Responses are based on probabilities, not verified facts. 🤔 No innate ability to differentiate truth from plausible fiction. 📚 Training datasets often include biases, outdated info, or errors. Put simply: LLMs predict the next likely word—they don’t actually "understand" or verify what's accurate. When prompted beyond their knowledge, they creatively fill gaps with plausible (but incorrect) info. ⚠️ Funny if you’re casually chatting—problematic if you're building enterprise apps. So, how do you reduce hallucinations effectively? The #1 technique: grounding the LLM in data. - Use Retrieval-Augmented Generation (RAG) to anchor responses in verified data. - Use long-term memory systems like Zep to ensure the model is always grounded in personalization data: user context, preferences, traits etc - Fine-tune models on domain-specific datasets to improve response consistency and style, although fine-tuning alone typically doesn't add substantial new factual knowledge. - Explicit, clear prompting—avoid ambiguity or unnecessary complexity. - Encourage models to self-verify conclusions when accuracy is essential. - Structure complex tasks with chain-of-thought prompting (COT) to improve outputs or force "none"/unknown responses when necessary. - Strategically tweak model parameters (e.g., temperature, top-p) to limit overly creative outputs. - Post-processing verification for mission-critical outputs, for example, matching to known business states. One technique alone rarely solves hallucinations. For maximum ROI, we've found combining RAG with a robust long-term memory solution (like ours at Zep) is the sweet spot. Systems that ground responses in factual, evolving knowledge significantly outperform. Did I miss any good techniques? What are you doing in your apps?
-
In the rapidly evolving landscape of AI/LLM, a paradigm shift is underway that demands the attention of forward-thinking enterprises: the rise of Domain-Specific Large Language Models. While general-purpose LLMs have captured headlines, it is the targeted power of domain-specific models that is going to reshape the AI landscape. ## Why Domain-Specific LLMs 1. Precision in Specialization: Domain-specific LLMs offer unparalleled accuracy within their designated fields. By training on curated, industry-specific datasets, these models develop a nuanced understanding of sector-specific terminology, regulations, and best practices. 2. Resource Optimization: While general-purpose LLMs require vast computational resources, domain-specific models present a more sustainable alternative. Their focused training datasets and narrower scope allow for more efficient use of computational power and data storage. 3. Enhanced Data Governance and Compliance: In an era of stringent data protection regulations, domain-specific LLMs offer superior control over sensitive information. By limiting the model's exposure to a specific domain, organizations can more effectively manage data access, reducing the risk of inadvertent disclosure. 4. Accelerated Innovation Cycles: The focused nature of domain-specific LLMs allows for more rapid iteration and deployment of AI solutions. 5. Competitive Differentiation: By investing in domain-specific LLMs, organizations can develop proprietary AI capabilities that are uniquely tailored to their specific market challenges. ## The Implementation Imperative In our experience implementing domain-specific LLMs, we've observed: - A 40% increase in task-specific accuracy compared to general-purpose models - A 50% reduction in time to deployment for new AI features - A 35% decrease in data processing costs due to more efficient resource utilization This diagram illustrates the flow of how enterprises can use domain-specific LLMs while maintaining security and isolation. Here's a brief explanation of the diagram: 1- Enterprise data is first classified into sensitive and non-sensitive categories. Sensitive data is processed in a secure enclave, where domain-specific LLMs operate. 2- Non-sensitive data can be processed by a general-purpose LLM. 3- Each domain-specific LLM produces isolated outputs. 4- All outputs, including those from the general-purpose LLM, go through a security check. 5- Finally, the verified outputs are integrated and used in various enterprise applications. This flow emphasizes the importance of data security, isolation of domain-specific models, and the integration of outputs from various LLMs.
-
I often get this question from industry friends and peers - Should we lean on commercial like OpenAI's black-box API or immerse ourselves in open-source LLMs? 🌍🤔 For those in the prototype phase 🛠: **Use OpenAI's API**: - It’s budget-friendly for early-stage projects. 🪙 - A single API key, and voila, integration is complete! 🔑 - This allows for swift product idea validation.✅ However, for long-term visionaries 🚀🌌: **Invest in open-source local LLMs**: - Establish a custom LLM setup within your cloud infrastructure. ☁️ - Focus on curating top-notch datasets that resonate with your business objectives. 📊🎯 Considering this pivot? Here’s why 🤓: 1️⃣ **Optimal Performance & Savings**: Tailored models often surpass giants like GPT4, especially with a specific dataset. They're not only effective but also economically wise. 💡💰 2️⃣ **Guardian of Data**: In our data-driven age, LLMs thrive on premium data. Ensure your data’s privacy by keeping it close. 🛡️🔒 Sending data over third-party channels might expose vulnerabilities. 🚫 3️⃣ **Flexibility in Strategy**: Transitioning back to APIs like OpenAI's is straightforward. Yet, initiating a proprietary LLM can be more complex later on. Hence, current investment paves the way for future adaptability. 🌳🔄 4️⃣ **Customization & Control**: With open-source LLMs, you have the autonomy to tweak and refine models to your heart's content, ensuring they align perfectly with your requirements. 🎛️🔧 5️⃣ **In-House Expertise**: Building in-house capabilities elevates your team's knowledge, making them not just users but experts in LLM technology. 🎓💼 6️⃣ **Future-Proofing**: Technology and regulations are evolving. Having control over your LLM means you can swiftly adapt to changes without waiting for third-party updates. ⏱️📈 7️⃣ **Cost Predictability**: With third-party APIs, costs can surge based on usage. In-house LLMs allow for more predictable budgeting in the long run. 📉💼 Would love to engage in a discussion and get insights from others in the field. Drop your thoughts below! 💭 #llms #languagemodels #openai #genai #deployment #production #datascience #artificialintelligence #largelanguagemodels
-
𝗠𝗮𝘀𝘁𝗲𝗿𝗶𝗻𝗴 𝗟𝗟𝗠 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 Large language models have transformed from simple text generators into intelligent reasoning systems powering search engines, enterprise copilots, and autonomous agents. Yet their accuracy, relevance, and efficiency depend on how we optimize them. There are three core techniques shaping this next wave of AI innovation: Context Engineering, Prompt Engineering, and Fine-Tuning. Each plays a distinct role, and the future belongs to those who know how to combine them effectively. 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗚𝗼𝗮𝗹: Dynamically feed the model the right information at the right time without retraining. 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀: Chunk and embed documents, store them in vector databases such as Pinecone, Weaviate, FAISS, or Milvus, and retrieve the most relevant content using retrieval augmented generation. Tools like LangChain and LlamaIndex orchestrate this process, ensuring token efficiency and building dynamic contexts. 𝗨𝘀𝗲 𝗰𝗮𝘀𝗲: Enterprise knowledge assistants that instantly retrieve policies, Jira tickets, or AWS configurations on demand. 𝗣𝗿𝗼𝗺𝗽𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗚𝗼𝗮𝗹: Design high-quality prompts that maximize clarity, control, and reasoning depth. 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀: Define objectives, structure zero-shot or few-shot examples, leverage chain-of-thought reasoning, and continuously refine outputs through iterative testing and feedback loops. Tools such as OpenAI Playground, LangSmith, PromptFlow, and Weights & Biases make experimentation and evaluation seamless. 𝗨𝘀𝗲 𝗰𝗮𝘀𝗲: AI compliance reporting agents where precision and regulatory alignment are critical. 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 𝗚𝗼𝗮𝗹: Permanently teach an LLM domain-specific knowledge or custom behavior. 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀: Prepare high-quality labeled datasets, initialize a base model, and train using OpenAI Fine-Tuning API, Hugging Face Transformers, LoRA adapters, or AWS Sagemaker. Fine-tuning improves consistency and enables models to learn proprietary information and unique writing styles. 𝗨𝘀𝗲 𝗰𝗮𝘀𝗲: Training a medical AI assistant with proprietary datasets to improve diagnostic accuracy and decision support. 𝗧𝗵𝗲 𝗙𝘂𝘁𝘂𝗿𝗲 𝗼𝗳 𝗟𝗟𝗠 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 Prompt engineering guides behavior. Context engineering supplies knowledge. Fine-tuning builds expertise. When combined, these disciplines enable engineers to design scalable, explainable, and production ready AI systems. Follow Umair Ahmad for more insights. #AI #LLM #ContextEngineering #PromptEngineering #FineTuning #MachineLearning #SystemDesign
-
The bottleneck isn't GPUs or architecture. It's your dataset. Three ways to customize an LLM: 1. Fine-tuning: Teaches behavior. 1K-10K examples. Shows how to respond. Cheapest option. 2. Continued pretraining: Adds knowledge. Large unlabeled corpus. Extends what model knows. Medium cost. 3. Training from scratch: Full control. Trillions of tokens. Only for national AI projects. Rarely necessary. Most companies only need fine-tuning. How to collect quality data: For fine-tuning, start small. Support tickets with PII removed. Internal Q&A logs. Public instruction datasets. For continued pretraining, go big. Domain archives. Technical standards. Mix 70% domain, 30% general text. The 5-step data pipeline: 1. Normalize. Convert everything to UTF-8 plain text. Remove markup and headers. 2. Filter. Drop short fragments. Remove repeated templates. Redact PII. 3. Deduplicate. Hash for identical content. Find near-duplicates. Do before splitting datasets. 4. Tag with metadata. Language, domain, source. Makes dataset searchable. 5. Validate quality. Check perplexity. Track metrics. Run small pilot first. When your dataset is ready: All sources documented. PII removed. Stats match targets. Splits balanced. Pilot converges cleanly. If any fail, fix data first. What good data does: Models converge faster. Hallucinate less. Cost less to serve. The reality: Building LLMs is a data problem. Not a training problem. Most teams spend 80% of time on data. That's the actual work. Your data is your differentiator. Not your model architecture. Found this helpful? Follow Arturo Ferreira.
-
Like many, I've been spending some time experimenting with and learning about LLMs (large language models), particularly about how developers not familiar with AI can integrate AI into their existing web apps. One of the most popular use cases for LLMs is performing semantic search over custom documents that ChatGPT isn't trained on (or documents that are too long to paste into ChatGPT). Imagine having a private GPT instance that has access to your org's internal knowledge base that you can ask questions about certain information. I wrote up a tutorial and sample application in which you can learn to do exactly this: 1. User can upload a private/custom document 2. Perform ChatGPT Q&A style interaction with the doc 3. Highlight the contents of the answer in the original doc This sample app is full-stack web app built using OpenAI GPT-3.5, Pinecone vector databse, LangChain, and Vercel NextJS. Great for app developers who are looking to learn more about adding AI to their apps. Gitub repo: https://lnkd.in/eQvpvE3K Medium article: https://lnkd.in/e25kfWnw
-
What is the LLM Mesh AI architecture and why your enterprise may need it? Key highlights include: • Introducing the LLM Mesh, a new architecture for building modular, scalable agentic applications • Standardizing interactions across diverse AI services like LLMs, retrieval, embeddings, tools, and agents • Abstracting complex dependencies to streamline switching between OpenAI, Gemini, HuggingFace, or self-hosted models • Managing over seven AI-native object types including prompts, agents, tools, retrieval services, and LLMs • Supporting both code-first and visual low-code agent development while preserving enterprise control • Embedding safety with human-in-the-loop oversight, reranking, and model introspection • Enabling performance and cost optimization with model selection, quantization, MoE architectures, and vector search Insightful: Who should take note • AI architects designing multi-agent workflows with LLMs • Product teams building RAG pipelines and internal copilots • MLOps and infrastructure leads managing model diversity and orchestration • CISOs and platform teams standardizing AI usage across departments Strategic: Noteworthy aspects • Elevates LLM usage from monolithic prototypes to composable, governed enterprise agents • Separates logic, inference, and orchestration layers for plug-and-play tooling across functions • Encourages role-based object design where LLMs, prompts, and tools are reusable, interchangeable, and secure by design • Works seamlessly across both open-weight and commercial models, making it adaptable to regulatory and infrastructure constraints Actionable: What to do next Start building your enterprise LLM Mesh to scale agentic applications without hitting your complexity threshold. Define your abstraction layer early and treat LLMs, tools, and prompts as reusable, modular objects. Invest in standardizing the interfaces between them. This unlocks faster iteration, smarter experimentation, and long-term architectural resilience. Consideration: Why this matters As with microservices in the cloud era, the LLM Mesh introduces a new operating model for AI: one that embraces modularity, safety, and scale. Security, governance, and performance aren’t bolted on and they’re embedded from the ground up. The organizations that get this right won’t just deploy AI faster they’ll actually deploy it responsibly, and at scale.