AI Model Development

Explore top LinkedIn content from expert professionals.

  • View profile for Andreas Horn

    Head of AIOps @ IBM || Speaker | Lecturer | Advisor

    221,795 followers

    𝗢𝗻𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗠𝗢𝗦𝗧 𝗱𝗶𝘀𝗰𝘂𝘀𝘀𝗲𝗱 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻: 𝗛𝗼𝘄 𝘁𝗼 𝗽𝗶𝗰𝗸 𝘁𝗵𝗲 𝗿𝗶𝗴𝗵𝘁 𝗟𝗟𝗠 𝗳𝗼𝗿 𝘆𝗼𝘂𝗿 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲? The LLM landscape is booming and choosing the right LLM is now a business decision, not just a tech choice. One-size-fits-all? Forget it. Nearly all enterprises today rely on different models for different use cases and/or industry-specific fine-tuned models. There’s no universal “best” model — only the best fit for a given task. The latest LLM landscape (see below) shows how models stack up in capability (MMLU score), parameter size and accessibility — and the differences REALLY matter.  𝗟𝗲𝘁'𝘀 𝗯𝗿𝗲𝗮𝗸 𝗶𝘁 𝗱𝗼𝘄𝗻: ⬇️ 1️⃣ 𝗚𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝘁 𝘃𝘀. 𝗦𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘀𝘁: - Need a broad, powerful AI? GPT-4, Claude Opus, Gemini 1.5 Pro — great for general reasoning and diverse applications.   - Need domain expertise? E.g. IBM Granite or Mistral models (Lightweight & Fast) can be an excellent choice — tailored for specific industries.  2️⃣ 𝗕𝗶𝗴 𝘃𝘀. 𝗦𝗹𝗶𝗺:  - Powerful, large models (GPT-4, Claude Opus, Gemini 1.5 Pro) = great reasoning, but expensive and slow. - Slim, efficient models (Mistral 7B, LLaMA 3, RWWK models) = faster, cheaper, easier to fine-tune. Perfect for on-device, edge AI, or latency-sensitive applications.  3️⃣ 𝗢𝗽𝗲𝗻 𝘃𝘀. 𝗖𝗹𝗼𝘀𝗲𝗱   - Need full control? Open-source models (LLaMA 3, Mistral, Llama) give you transparency and customization.   - Want cutting-edge performance? Closed models (GPT-4, Gemini, Claude) still lead in general intelligence.  𝗧𝗵𝗲 𝗞𝗲𝘆 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆? There is no "best" model — only the best one for your use case, but it's key to understand the differences to make an informed decision: - Running AI in production? Go slim, go fast. - Need state-of-the-art reasoning? Go big, go deep. - Building industry-specific AI? Go specialized and save some money with SLMs.  I love seeing how the AI and LLM stack is evolving, offering multiple directions depending on your specific use case. Source of the picture: informationisbeautiful.net

  • View profile for Andrew Ng
    Andrew Ng Andrew Ng is an Influencer

    Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI

    2,324,649 followers

    Large language models (LLMs) are typically optimized to answer peoples’ questions. But there is a trend toward models also being optimized to fit into agentic workflows. This will give a huge boost to agentic performance! Following ChatGPT’s breakaway success at answering questions, a lot of LLM development focused on providing a good consumer experience. So LLMs were tuned to answer questions (“Why did Shakespeare write Macbeth?”) or follow human-provided instructions (“Explain why Shakespeare wrote Macbeth”). A large fraction of the datasets for instruction tuning guide models to provide more helpful responses to human-written questions and instructions of the sort one might ask a consumer-facing LLM like those offered by the web interfaces of ChatGPT, Claude, or Gemini. But agentic workloads call on different behaviors. Rather than directly generating responses for consumers, AI software may use a model in part of an iterative workflow to reflect on its own output, use tools, write plans, and collaborate in a multi-agent setting. Major model makers are increasingly optimizing models to be used in AI agents as well. Take tool use (or function calling). If an LLM is asked about the current weather, it won’t be able to derive the information needed from its training data. Instead, it might generate a request for an API call to get that information. Even before GPT-4 natively supported function calls, application developers were already using LLMs to generate function calls, but by writing more complex prompts (such as variations of ReAct prompts) that tell the LLM what functions are available and then have the LLM generate a string that a separate software routine parses (perhaps with regular expressions) to figure out if it wants to call a function. Generating such calls became much more reliable after GPT-4 and then many other models natively supported function calling. Today, LLMs can decide to call functions to search for information for retrieval augmented generation (RAG), execute code, send emails, place orders online, and much more. Recently, Anthropic released a version of its model that is capable of computer use, using mouse-clicks and keystrokes to operate a computer (usually a virtual machine). I’ve enjoyed playing with the demo. While other teams have been prompting LLMs to use computers to build a new generation of RPA (robotic process automation) applications, native support for computer use by a major LLM provider is a great step forward. This will help many developers! [Reached length limit; full text: https://lnkd.in/gHmiM3Tx ]

  • View profile for Brij kishore Pandey
    Brij kishore Pandey Brij kishore Pandey is an Influencer

    AI Architect | AI Engineer | Generative AI | Agentic AI

    693,358 followers

    𝗥𝗔𝗚 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿’𝘀 𝗦𝘁𝗮𝗰𝗸 — 𝗪𝗵𝗮𝘁 𝗬𝗼𝘂 𝗡𝗲𝗲𝗱 𝘁𝗼 𝗞𝗻𝗼𝘄 𝗕𝗲𝗳𝗼𝗿𝗲 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 Building with Retrieval-Augmented Generation (RAG) isn't just about choosing the right LLM. It's about assembling an entire stack—one that's modular, scalable, and future-proof. This visual from Kalyan KS neatly categorizes the current RAG landscape into actionable layers: → 𝗟𝗟𝗠𝘀 (𝗢𝗽𝗲𝗻 𝘃𝘀 𝗖𝗹𝗼𝘀𝗲𝗱) Open models like LLaMA 3, Phi-4, and Mistral offer control and customization. Closed models (OpenAI, Claude, Gemini) bring powerful performance with less overhead. Your tradeoff: flexibility vs convenience. → 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸𝘀 LangChain, LlamaIndex, Haystack, and txtai are now essential for building orchestrated, multi-step AI workflows. These tools handle chaining, memory, routing, and tool-use logic behind the scenes. → 𝗩𝗲𝗰𝘁𝗼𝗿 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀 Chroma, Qdrant, Weaviate, Milvus, and others power the retrieval engine behind every RAG system. Low-latency search, hybrid scoring, and scalable indexing are key to relevance. → 𝗗𝗮𝘁𝗮 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 (𝗪𝗲𝗯 + 𝗗𝗼𝗰𝘀) Whether you're crawling the web (Crawl4AI, FireCrawl) or parsing PDFs (LlamaParse, Docling), raw data access is non-negotiable. No context means no quality answers. → 𝗢𝗽𝗲𝗻 𝗟𝗟𝗠 𝗔𝗰𝗰𝗲𝘀𝘀 Platforms like Hugging Face, Ollama, Groq, and Together AI abstract away infra complexity and speed up experimentation across models. → 𝗧𝗲𝘅𝘁 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 The quality of retrieval starts here. Open-source models (Nomic, SBERT, BGE) are gaining ground, but proprietary offerings (OpenAI, Google, Cohere) still dominate enterprise use. → 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 Tools like Ragas, Trulens, and Giskard bring much-needed observability—measuring hallucinations, relevance, grounding, and model behavior under pressure. 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆: RAG is not just an integration problem. It’s a design problem. Each layer of this stack requires deliberate choices that impact latency, quality, explainability, and cost. If you're serious about GenAI, it's time to think in terms of stacks—not just models. What does your RAG stack look like today?

  • View profile for Montgomery Singman
    Montgomery Singman Montgomery Singman is an Influencer

    Managing Partner @ Radiance Strategic Solutions | xSony, xElectronic Arts, xCapcom, xAtari

    26,752 followers

    A team of researchers from Google Research, Google DeepMind, and Tel Aviv University has developed a groundbreaking AI application capable of recreating and simulating parts of existing video games, including the iconic game Doom. In a fascinating advancement for gaming and AI, researchers have modified a machine learning model to recreate video game environments and actions. Named GameNGen, this new system uses neural rendering techniques based on diffusion models to simulate realistic gameplay. The team trained the AI by feeding it video footage of Doom, allowing it to generate new gameplay frames nearly indistinguishable from the original. This development marks a significant step in the intersection of AI and gaming, opening up new game development and simulation possibilities. 🎮 Recreating Games with AI: The research team successfully used a modified diffusion model, GameNGen, to simulate sections of the video game Doom, highlighting the potential of AI in game development. 🧠 Neural Rendering Techniques: The process relies on neural rendering, where AI learns to recreate the imagery and the actions within a game, pushing the boundaries of what AI can achieve. 🖼️ Diffusion Models in Action: Building on the Stable Diffusion 1.4 model, GameNGen is explicitly trained on video game footage, allowing it to generate new, realistic gameplay frames. ⚙️ Realistic Gameplay Simulation: The AI-generated frames were shown to human raters, who often could not distinguish them from real game footage, demonstrating the model's effectiveness. 🚀 Impact on Game Development: This technology could revolutionize the gaming industry by enabling more efficient game development and even the possibility of creating entirely new games through AI. #GameNGen #AIinGaming #NeuralRendering #MachineLearning #VideoGameAI #GenerativeAI #DoomSimulation #GoogleResearch #DeepMind #GameDevelopment 

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    598,968 followers

    If you’re an AI engineer, understanding how LLMs are trained and aligned is essential for building high-performance, reliable AI systems. Most large language models follow a 3-step training procedure: Step 1: Pretraining → Goal: Learn general-purpose language representations. → Method: Self-supervised learning on massive unlabeled text corpora (e.g., next-token prediction). → Output: A pretrained LLM, rich in linguistic and factual knowledge but not grounded in human preferences. → Cost: Extremely high (billions of tokens, trillions of FLOPs). → Pretraining is still centralized within a few labs due to the scale required (e.g., Meta, Google DeepMind, OpenAI), but open-weight models like LLaMA 4, DeepSeek V3, and Qwen 3 are making this more accessible. Step 2: Finetuning (Two Common Approaches) → 2a: Full-Parameter Finetuning - Updates all weights of the pretrained model. - Requires significant GPU memory and compute. - Best for scenarios where the model needs deep adaptation to a new domain or task. - Used for: Instruction-following, multilingual adaptation, industry-specific models. - Cons: Expensive, storage-heavy. → 2b: Parameter-Efficient Finetuning (PEFT) - Only a small subset of parameters is added and updated (e.g., via LoRA, Adapters, or IA³). - Base model remains frozen. - Much cheaper, ideal for rapid iteration and deployment. - Multi-LoRA architectures (e.g., used in Fireworks AI, Hugging Face PEFT) allow hosting multiple finetuned adapters on the same base model, drastically reducing cost and latency for serving. Step 3: Alignment (Usually via RLHF) Pretrained and task-tuned models can still produce unsafe or incoherent outputs. Alignment ensures they follow human intent. Alignment via RLHF (Reinforcement Learning from Human Feedback) involves: → Step 1: Supervised Fine-Tuning (SFT) - Human labelers craft ideal responses to prompts. - Model is fine-tuned on this dataset to mimic helpful behavior. - Limitation: Costly and not scalable alone. → Step 2: Reward Modeling (RM) - Humans rank multiple model outputs per prompt. - A reward model is trained to predict human preferences. - This provides a scalable, learnable signal of what “good” looks like. → Step 3: Reinforcement Learning (e.g., PPO, DPO) - The LLM is trained using the reward model’s feedback. - Algorithms like Proximal Policy Optimization (PPO) or newer Direct Preference Optimization (DPO) are used to iteratively improve model behavior. - DPO is gaining popularity over PPO for being simpler and more stable without needing sampled trajectories. Key Takeaways: → Pretraining = general knowledge (expensive) → Finetuning = domain or task adaptation (customize cheaply via PEFT) → Alignment = make it safe, helpful, and human-aligned (still labor-intensive but improving) Save the visual reference, and follow me (Aishwarya Srinivasan) for more no-fluff AI insights ❤️ PS: Visual inspiration: Sebastian Raschka, PhD

  • View profile for Greg Coquillo
    Greg Coquillo Greg Coquillo is an Influencer

    Product Leader @AWS | Startup Investor | 2X Linkedin Top Voice for AI, Data Science, Tech, and Innovation | Quantum Computing & Web 3.0 | I build software that scales AI/ML Network infrastructure

    216,356 followers

    If you haven’t yet reviewed the Technical Report and Model Card for Amazon’s Nova of Models, check it out and choose the best for your use cases! 🔸TLDR: These models emphasize advanced multimodal capabilities, efficient performance, and cost-effectiveness for diverse applications. Let’s quickly go through them: 🔹Core Models (Pro, Lite, and Micro) • Multimodal Processing: Supports text, images, documents, and video as input to generate accurate outputs. • Speed: Offers fast responses; Nova Micro is the quickest among them. • Customizability: Can be fine-tuned with text and multimodal data for specific needs. • Cost-Effectiveness: Optimized for excellent price-to-performance ratios. • Data Sources: Trained on multilingual and multimodal data from over 200 languages, focusing on major global languages. 🔸Business Use Cases: 1. Enterprise Automation: Automating customer service (chatbots) with accurate language understanding. 2. Content Moderation: Processing large-scale multimodal content (text, video, images) for policy enforcement. 3. Education: Language tutoring, multimodal learning support, and advanced problem-solving tools. 4. Media and Entertainment: Summarizing multimedia or assisting with script and content generation. 🔹Canvas (Image Generation Model) • Generates high-resolution images up to 2K in various aspect ratios. • Supports image editing with tools like inpainting, outpainting, and background removal. • Performs well on metrics like Text-to-Image Faithfulness (TIFA) and ImageReward. • Superior human preference rates in comparison to competing models (DALL.E 3, Stable Diffusion). 🔹Reel (Video Generation Model) • Produces 6-second, high-quality 720p videos from text or images. • Includes camera motion controls with over 20 predefined actions. • Demonstrates high video quality and consistency in human evaluations. • Outperforms other state-of-the-art models (e.g., Gen3 Alpha, Luma 1.6) in video quality and consistency. 🔹Specialized Evaluations • Agentic Workflows: Excels in using tools and APIs for executing multi-step tasks. • Long Context Understanding: Handles input contexts up to 300k tokens for tasks like summarization and document retrieval. 🔹Functional Expertise: • Strong in software coding (HumanEval benchmarks). • Effective in financial analysis (FinQA dataset). • Reliable in retrieval-augmented generation (CRAG benchmarks). 🔹Runtime Performance • Short response times with fast token generation rates. • Excellent runtime performance ensures smooth user experiences. ✅As you can see, there are multiple aspects that you need to compare to figure out which model is best. What framework do you usually apply? Share with us below! cc: Amazon Science #genai #technology #artificialintelligence

  • View profile for Clem Delangue 🤗
    Clem Delangue 🤗 Clem Delangue 🤗 is an Influencer

    Co-founder & CEO at Hugging Face

    288,423 followers

    New research from Massachusetts Institute of Technology! The following is going to change in my opinion as more people and companies realize the advantages of open models: "Closed models dominate, with on average 80% of monthly LLM tokens using closed models despite much higher prices - on average 6x the price of open models - and only modest performance advantages. Frontier open models typically reach performance parity with frontier closed models within months, suggesting relatively fast convergence. Nevertheless, users continue to select closed models even when open alternatives are cheaper and offer superior performance. This systematic underutilization is economically significant: reallocating demand from observably dominated closed models to superior open models would reduce average prices by over 70% and, when extrapolated to the total market, generate an estimated $24.8 billion in additional consumer savings across 2025. These results suggest that closed model dominance reflects powerful drivers beyond model capabilities and price - whether switching costs, brand loyalty, or information frictions - with the economic magnitude of these hidden factors proving far larger than previously recognized, reframing open models as a largely latent, but high-potential, source of value in the AI economy."

  • View profile for Shubham Saboo

    AI Product Manager @ Google | Open Source Awesome LLM Apps Repo (#1 GitHub with 82k+ stars) | 3x AI Author | Views are my Own

    71,912 followers

    Switch between OpenAI, Anthropic, and Google with a single line of code. any-llm gives you a single, clean interface to work with OpenAI, Anthropic, Google, and every other major LLM provider. Key Features: • Unified interface: one function for all providers, switch models with just a string change • Developer friendly: full type hints and clear error messages • Framework-agnostic: works across different projects and use cases • Uses official provider SDKs when available for maximum compatibility • No proxy or gateway server required The problem it solves: The LLM provider landscape is fragmented. OpenAI became the standard, but every provider has slight variations in their APIs. LiteLLM reimplements everything instead of using official SDKs. AISuite lacks maintenance. Most solutions force you through a proxy server. any-llm takes a different approach - leverage official SDKs where possible, provide a clean abstraction layer, and keep it simple. The best part? It's 100% Open Source. Link to the repo in the comments!

  • View profile for Ross Dawson
    Ross Dawson Ross Dawson is an Influencer

    Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation

    34,041 followers

    Prompt formatting can have a dramatic impact on LLM performance, but it varies substantially across models. Some pragmatic findings from a recent research paper: 💡 Prompt Format Significantly Affects LLM Performance. Different prompt formats (plain text, Markdown, YAML, JSON) can result in performance variations of up to 40%, depending on the task and model. For instance, GPT-3.5-turbo showed a dramatic performance shift between Markdown and JSON in code translation tasks, while GPT-4 exhibited greater stability. This indicates the importance of testing and optimizing prompts for specific tasks and models. 🛠️ Tailor Formats to Task and Model. Prompt formats like JSON, Markdown, YAML, and plain text yield different performance outcomes across tasks. For instance, GPT-3.5-turbo performed 40% better in JSON for code tasks, while GPT-4 preferred Markdown for reasoning tasks. Test multiple formats early in your process to identify which structure maximizes results for your specific task and model. 📋 Keep Instructions and Context Explicit. Include clear task instructions, persona descriptions, and examples in your prompts. For example, specifying roles (“You are a Python coder”) and output style (“Respond in JSON”) improves model understanding. Consistency in how you frame the task across different formats minimizes confusion and enhances reliability. 📊 Choose Format Based on Data Complexity. For simple tasks, plain text or Markdown often suffices. For structured outputs like programming or translations, formats such as JSON or YAML may perform better. Align the prompt format with the complexity of the expected response to leverage the model’s capabilities fully. 🔄 Iterate and Validate Performance. Run tests with variations in prompt structure to measure impact. Tools like Coefficient of Mean Deviation (CMD) or Intersection-over-Union (IoU) can help quantify performance differences. Start with benchmarks like MMLU or HumanEval to validate consistency and accuracy before deploying at scale. 🚀 Leverage Larger Models for Stability. If working with sensitive tasks requiring consistent outputs, opt for larger models like GPT-4, which show better robustness to format changes. For instance, GPT-4 maintained higher performance consistency across benchmarks compared to GPT-3.5. Link to paper in comments.

  • View profile for Sahar Mor

    I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

    40,978 followers

    Relying on one LLM provider like OpenAI is risky and often leads to unnecessary high costs and latency. But there's another critical challenge: ensuring LLM outputs align with specific guidelines and safety standards. What if you could address both issues with a single solution? This is the core promise behind Portkey's open-source AI Gateway. AI Gateway is an open-source package that seamlessly integrates with 200+ LLMs, including OpenAI, Google Gemini, Ollama, Mistral, and more. It not only solves the provider dependency problem but now also tackles the crucial need for effective guardrails by partnering with providers such as Patronus AI and Aporia. Key features: (1) Effortless load balancing across models and providers (2) Integrated guardrails for precise control over LLM behavior (3) Resilient fallbacks and automatic retries to guarantee your application recovers from failed LLM API requests (4) Adds minimal latency as a middleware (~10ms) (5) Supported SDKs include Python, Node.JS, Rust, and more One of the main hurdles to enterprise AI adoption is ensuring LLM inputs and outputs are safe and adhere to your company’s policies. This is why projects like Portkey are so useful. Integrating guardrails into an AI gateway creates a powerful combination that orchestrates LLM requests based on predefined guardrails, providing precise control over LLM outputs. Switching to more affordable yet performant models is a useful technique to reduce cost and latency for your app. I covered this and eleven more techniques in my last AI Tidbits Deep Dive https://lnkd.in/gucUZzYn GitHub repo https://lnkd.in/g8pjgT9R

Explore categories