GPT-4-Vision is a powerful, but expensive, foundation model. With these new foundational capabilities being added, and the increasing range of use cases where foundation models can be engaged it becomes really important to understand how they can be cost effectively used. For a few years I've been putting trail cameras out into the Northern Arizona alpine high plains, this gives me over a million images in total, and over 100,000 new images a quarter. To process 100k+ images directly with #GPT4Vision would cost over $1000, which is quite a lot for a hobby. So I used a multi-modal approach to filter out images where there were no animals, and to focus GPT4 just on giving me information about the animals. This turned the cost from over $1000 to around $5, it also significantly reduced the network calls required, reduced the carbon impact of the process and significantly sped up the processing. As we all look to scale our use of AI, the #AIFinOps challenge will only increase, and the requirement to create these sort of blended multi-modal approaches will need to become the norm. Massive scale models are powerful but expensive beasts, and not every interaction requires that power. #GPT #AI #costoptimization #mutlmodalAI
How to Reduce Generative AI Model Costs
Explore top LinkedIn content from expert professionals.
Summary
Reducing generative AI model costs involves strategies to lower computational expenses without sacrificing performance, especially as these technologies scale in usage. By employing tailored methods and efficient resource management, organizations can achieve significant cost savings while maintaining AI effectiveness.
- Adopt smaller, tailored models: Focus on using compact, purpose-built models that require less data and computational resources, which can dramatically cut expenses compared to larger, generalized AI systems.
- Utilize model optimization techniques: Implement methods like model quantization, speculative decoding, or reinforcement learning to improve efficiency and reduce processing costs without compromising accuracy.
- Leverage multimodal approaches: Combine different AI tools strategically, such as filtering tasks to selectively allocate high-performance models for complex problems, reducing overall computational demand and expenses.
-
-
As generative AI shifts from pilot to production, efficiency, cost, and scalability matter a lot more. Founded 2 years ago as "AWS for Generative AI," Together AI has raised $240M to provide cloud compute optimized for AI workloads. In this week's episode of my #AskMoreOfAI podcast, CEO/founder Vipul Ved Prakash talks about innovations to make models faster and smarter including: 🔹 FlashAttention: Smart GPU-aware tricks to reduce memory needed for calculating attention and rearrange calculations to speed up inference. 🔹 Speculative decoding: Speeds up inference by predicting multiple tokens in advance instead of one at a time, then selects the best ones and prunes the rest. 🔹 Model quantization: Reduce model size and speed up inference by reducing precision of numerical representations used in models without significantly degrading performance. In most LLMs, parameters are stored as 32-bit floating-point numbers, which consume a lot of memory and processing power. Quantization converts these to lower sig figs, eg 16-bit floats or even 8-bit integers. 🔹 Mixture of Agents, combining use of multiple specialized models (agents) that work together, with each agent handling a different aspect of a problem such as a sales agent, sales manager agent, deal desk agent, and legal contracts agents collaborating together. Vipul predicts that cloud compute for #GenAI will surpass the traditional hyperscaler business within 2-3 years. Salesforce Ventures is proud to have led the Series A earlier this year, and customers running models on Together can BYOM with Einstein Model Builder. 🎧 Listen or watch here! https://lnkd.in/g6XX4KCR
-
AI Cost Optimization: 27% Growth Demands Planning The concept of Lean AI is another essential perspective in cost optimization. Lean AI focuses on developing smaller, more efficient AI models tailored to a company’s specific operational needs. These models require less data and computational power to train and run, markedly reducing costs compared to large, generalized AI models. By solving specific problems with precisely tailored solutions, enterprises can avoid the unnecessary expenditure associated with overcomplicated AI systems. Starting with these smaller, targeted applications allows organizations to incrementally build on their AI capabilities and ensure that each step is cost-justifiable and closely tied to its potential value. Companies can progressively expand AI capabilities through a Lean AI approach, making cost management a central consideration. Efficiently optimizing computational resources plays another critical role in controlling AI expenses. Monitor and manage computing resources to ensure the company only pays for what it needs. Tools that track compute usage can highlight inefficiencies and help make more informed decisions about scaling resources.
-
The frenzy around the new open-source reasoning #LLM, DeepSeek-R1, continued today, and it’s no wonder. With model costs expected to come in 90-95% lower than OpenAI o1, the news has reverberated across the industry from infrastructure players to hyperscalers and sent stocks dropping. Amid the swirl of opinions and conjecture, I put together a brief synopsis of the news – just the brass tacks – to try and simplify the implications and potential disruptions and why they matter to leaders. 1. 𝗦𝗸𝗶𝗽𝗽𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝘂𝗹𝗲𝘀: DeepSeek-R1-Zero ditched supervised fine-tuning and relied solely on reinforcement learning—resulting in groundbreaking reasoning capabilities but less polished text. 2. 𝗧𝗵𝗲 𝗣𝗼𝘄𝗲𝗿 𝗼𝗳 𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗗𝗮𝘁𝗮: Even a tiny set of curated examples significantly boosted the model's readability and consistency. 3. 𝗦𝗺𝗮𝗹𝗹 𝗕𝘂𝘁 𝗠𝗶𝗴𝗵𝘁𝘆 𝗠𝗼𝗱𝗲𝗹𝘀: Distilled smaller models (1.5B–70B parameters) outperformed much larger ones like GPT-4o, proving size isn’t everything. Why does this matter to business leaders? • 𝗚𝗮𝗺𝗲-𝗖𝗵𝗮𝗻𝗴𝗲𝗿 𝗳𝗼𝗿 𝗔𝗜 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗖𝗼𝘀𝘁𝘀: Skipping supervised fine-tuning and leveraging reinforcement learning could reduce costs while improving reasoning power in AI models. • 𝗛𝗶𝗴𝗵-𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗗𝗮𝘁𝗮 𝗶𝘀 𝗮 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗰 𝗔𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲: Investing in carefully curated data (even in small quantities) can lead to a competitive edge for AI systems. • 𝗦𝗺𝗮𝗹𝗹𝗲𝗿 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 𝗠𝗼𝗱𝗲𝗹𝘀 𝗦𝗮𝘃𝗲 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀: Smaller, distilled models that perform better than larger ones can drive efficiency, cutting costs on infrastructure while maintaining high performance. Let me know if you agree… And if you're curious, the DeepSeek-R1 paper is a must-read. https://lnkd.in/eYPidAzg #AI #artificialintelligence #OpenAI #Hitachi
-
Model Distillation is now generally available in Amazon Bedrock — a powerful capability that enables developers to create smaller, faster, and more cost-effective models by transferring knowledge from larger foundation models. Key Highlights: • Higher Efficiency, Lower Cost: Distilled models can be up to 500% faster and 75% less expensive to run — with less than 2% accuracy drop for tasks like Retrieval-Augmented Generation (RAG). • Fully Managed Workflow: Bedrock handles the entire distillation process — from generating teacher outputs and synthesizing training data, to training the student model — no infrastructure setup required. • Broad Model Support: Supports Anthropic’s Claude family, Meta’s Llama models, and Amazon’s Nova family. 🧠 Use Case Highlight: The use case blog linked below shows how model distillation can be used to improve function calling in smaller models — a crucial building block for Agentic AI applications. By distilling a large model into a smaller one, teams can achieve high function call accuracy with significantly reduced cost and latency. 🔗 Announcement: https://lnkd.in/gGsYs2sT 🧵 Use case blog: https://lnkd.in/ginmrDXp #AmazonBedrock #GenerativeAI #ModelDistillation #AgenticAI #AWS