Data and Model Privacy Issues in Tech

Explore top LinkedIn content from expert professionals.

Summary

Data-and-model privacy issues in tech refer to the risks and challenges around how personal information is collected, stored, used, and embedded within artificial intelligence (AI) systems and models. As AI relies on massive datasets, protecting sensitive data and ensuring responsible model practices are crucial for user safety and regulatory compliance.

  • Clarify sensitive boundaries: Regularly review what counts as sensitive personal information—beyond the obvious details—to keep customer data safe when working with AI.
  • Update privacy practices: Refresh privacy policies and notifications to clearly explain how AI is used and ensure your handling of personal data meets legal requirements.
  • Threat-model your AI: Proactively identify how attackers might exploit your data and AI models, and implement safeguards before deploying new systems.
Summarized by AI based on LinkedIn member posts
  • View profile for Vanessa Larco

    Formerly Partner @ NEA | Early Stage Investor in Category Creating Companies

    18,436 followers

    Before diving headfirst into AI, companies need to define what data privacy means to them in order to use GenAI safely. After decades of harvesting and storing data, many tech companies have created vast troves of the stuff - and not all of it is safe to use when training new GenAI models. Most companies can easily recognize obvious examples of Personally Identifying Information (PII) like Social Security numbers (SSNs) - but what about home addresses, phone numbers, or even information like how many kids a customer has? These details can be just as critical to ensure newly built GenAI products don’t compromise their users' privacy - or safety - but once this information has entered an LLM, it can be really difficult to excise it. To safely build the next generation of AI, companies need to consider some key issues: ⚠️Defining Sensitive Data: Companies need to decide what they consider sensitive beyond the obvious. Personally identifiable information (PII) covers more than just SSNs and contact information - it can include any data that paints a detailed picture of an individual and needs to be redacted to protect customers. 🔒Using Tools to Ensure Privacy: Ensuring privacy in AI requires a range of tools that can help tech companies process, redact, and safeguard sensitive information. Without these tools in place, they risk exposing critical data in their AI models. 🏗️ Building a Framework for Privacy: Redacting sensitive data isn’t just a one-time process; it needs to be a cornerstone of any company’s data management strategy as they continue to scale AI efforts. Since PII is so difficult to remove from an LLM once added, GenAI companies need to devote resources to making sure it doesn’t enter their databases in the first place. Ultimately, AI is only as safe as the data you feed into it. Companies need a clear, actionable plan to protect their customers - and the time to implement it is now.

  • View profile for Nick Abrahams
    Nick Abrahams Nick Abrahams is an Influencer

    Futurist, International Keynote Speaker, AI Pioneer, 8-Figure Founder, Adjunct Professor, 2 x Best-selling Author & LinkedIn Top Voice in Tech

    30,367 followers

    If you are an organisation using AI or you are an AI developer, the Australian privacy regulator has just published some vital information about AI and your privacy obligations. Here is a summary of the new guides for businesses published today by the Office of the Australian Information Commissioner which articulate how Australian privacy law applies to AI and set out the regulator’s expectations. The first guide is aimed to help businesses comply with their privacy obligations when using commercially available AI products and help them to select an appropriate product. The second provides privacy guidance to developers using personal information to train generative AI models. GUIDE ONE: Guidance on privacy and the use of commercially available AI products Top five takeaways * Privacy obligations will apply to any personal information input into an AI system, as well as the output data generated by AI (where it contains personal information).  * Businesses should update their privacy policies and notifications with clear and transparent information about their use of AI * If AI systems are used to generate or infer personal information, including images, this is a collection of personal information and must comply with APP 3 (which deals with collection of personal info). * If personal information is being input into an AI system, APP 6 requires entities to only use or disclose the information for the primary purpose for which it was collected. * As a matter of best practice, the OAIC recommends that organisations do not enter personal information, and particularly sensitive information, into publicly available generative AI tools. GUIDE 2: Guidance on privacy and developing and training generative AI models Top five takeaways * Developers must take reasonable steps to ensure accuracy in generative AI models. * Just because data is publicly available or otherwise accessible does not mean it can legally be used to train or fine-tune generative AI models or systems.. * Developers must take particular care with sensitive information, which generally requires consent to be collected. * Where developers are seeking to use personal information that they already hold for the purpose of training an AI model, and this was not a primary purpose of collection, they need to carefully consider their privacy obligations. * Where a developer cannot clearly establish that a secondary use for an AI-related purpose was within reasonable expectations and related to a primary purpose, to avoid regulatory risk they should seek consent for that use and/or offer individuals a meaningful and informed ability to opt-out of such a use. https://lnkd.in/gX_FrtS9

  • View profile for Katharina Koerner

    AI Governance & Security I Trace3 : All Possibilities Live in Technology: Innovating with risk-managed AI: Strategies to Advance Business Goals through AI Governance, Privacy & Security

    44,368 followers

    This new white paper by Stanford Institute for Human-Centered Artificial Intelligence (HAI) titled "Rethinking Privacy in the AI Era" addresses the intersection of data privacy and AI development, highlighting the challenges and proposing solutions for mitigating privacy risks. It outlines the current data protection landscape, including the Fair Information Practice Principles, GDPR, and U.S. state privacy laws, and discusses the distinction and regulatory implications between predictive and generative AI. The paper argues that AI's reliance on extensive data collection presents unique privacy risks at both individual and societal levels, noting that existing laws are inadequate for the emerging challenges posed by AI systems, because they don't fully tackle the shortcomings of the Fair Information Practice Principles (FIPs) framework or concentrate adequately on the comprehensive data governance measures necessary for regulating data used in AI development. According to the paper, FIPs are outdated and not well-suited for modern data and AI complexities, because: - They do not address the power imbalance between data collectors and individuals. - FIPs fail to enforce data minimization and purpose limitation effectively. - The framework places too much responsibility on individuals for privacy management. - Allows for data collection by default, putting the onus on individuals to opt out. - Focuses on procedural rather than substantive protections. - Struggles with the concepts of consent and legitimate interest, complicating privacy management. It emphasizes the need for new regulatory approaches that go beyond current privacy legislation to effectively manage the risks associated with AI-driven data acquisition and processing. The paper suggests three key strategies to mitigate the privacy harms of AI: 1.) Denormalize Data Collection by Default: Shift from opt-out to opt-in data collection models to facilitate true data minimization. This approach emphasizes "privacy by default" and the need for technical standards and infrastructure that enable meaningful consent mechanisms. 2.) Focus on the AI Data Supply Chain: Enhance privacy and data protection by ensuring dataset transparency and accountability throughout the entire lifecycle of data. This includes a call for regulatory frameworks that address data privacy comprehensively across the data supply chain. 3.) Flip the Script on Personal Data Management: Encourage the development of new governance mechanisms and technical infrastructures, such as data intermediaries and data permissioning systems, to automate and support the exercise of individual data rights and preferences. This strategy aims to empower individuals by facilitating easier management and control of their personal data in the context of AI. by Dr. Jennifer King Caroline Meinhardt Link: https://lnkd.in/dniktn3V

  • View profile for Richard Lawne

    Privacy & AI Lawyer

    2,678 followers

    I'm increasingly convinced that we need to treat "AI privacy" as a distinct field within privacy, separate from but closely related to "data privacy". Just as the digital age required the evolution of data protection laws, AI introduces new risks that challenge existing frameworks, forcing us to rethink how personal data is ingested and embedded into AI systems. Key issues include: 🔹 Mass-scale ingestion – AI models are often trained on huge datasets scraped from online sources, including publicly available and proprietary information, without individuals' consent. 🔹 Personal data embedding – Unlike traditional databases, AI models compress, encode, and entrench personal data within their training, blurring the lines between the data and the model. 🔹 Data exfiltration & exposure – AI models can inadvertently retain and expose sensitive personal data through overfitting, prompt injection attacks, or adversarial exploits. 🔹 Superinference – AI uncovers hidden patterns and makes powerful predictions about our preferences, behaviours, emotions, and opinions, often revealing insights that we ourselves may not even be aware of. 🔹 AI impersonation – Deepfake and generative AI technologies enable identity fraud, social engineering attacks, and unauthorized use of biometric data. 🔹 Autonomy & control – AI may be used to make or influence critical decisions in domains such as hiring, lending, and healthcare, raising fundamental concerns about autonomy and contestability. 🔹 Bias & fairness – AI can amplify biases present in training data, leading to discriminatory outcomes in areas such as employment, financial services, and law enforcement. To date, privacy discussions have focused on data - how it's collected, used, and stored. But AI challenges this paradigm. Data is no longer static. It is abstracted, transformed, and embedded into models in ways that challenge conventional privacy protections. If "AI privacy" is about more than just the data, should privacy rights extend beyond inputs and outputs to the models themselves? If a model learns from us, should we have rights over it? #AI #AIPrivacy #Dataprivacy #Dataprotection #AIrights #Digitalrights

  • View profile for Leonard Rodman, M.Sc. PMP® LSSBB® CSM® CSPO®

    Follow me and learn about AI for free! | AI Consultant and Influencer | API Automation Developer/Engineer | DM me for promotions

    53,518 followers

    AI is rapidly becoming the nerve-center of how we build, sell, and serve—but that also makes it a bullseye. Before you can defend your models, you need to understand how attackers break them. Here are the five most common vectors I’m seeing in the wild: 1️⃣ Prompt Injection & Jailbreaks – Hidden instructions in seemingly harmless text or images can trick a chatbot into leaking data or taking unintended actions. 2️⃣ Data / Model Poisoning – Adversaries slip malicious samples into your training or fine-tuning set, planting logic bombs that detonate after deployment. 3️⃣ Supply-Chain Manipulation – LLMs sometimes “hallucinate” package names; attackers register those libraries so an unwary dev installs malware straight from npm or PyPI. 4️⃣ Model Theft & Extraction – Bulk-scraping outputs or abusing unsecured endpoints can replicate proprietary capabilities and drain your competitive moat. 5️⃣ Membership-Inference & Privacy Leakage – Researchers keep showing they can guess whether a sensitive record was in the training set, turning personal data into low-hanging fruit. Knowing the playbook is half the battle. How do you shut these attacks down before they start? Stay tuned—and start threat-modeling your AI today. 🔒🤖

  • View profile for Odia Kagan

    CDPO, CIPP/E/US, CIPM, FIP, GDPRP, PLS, Partner, Chair of Data Privacy Compliance and International Privacy at Fox Rothschild LLP

    24,220 followers

    European Data Protection Board issues long awaited opinion on AI models: part 3 - anonymization (See Part 1: https://shorturl.at/TYbq3 consequences and Part 2: https://shorturl.at/ba5A1 legitimate interest legal basis). 🔹️AI models are not always anonymous; assess case by case. 🔹️ AI models specifically designed to provide personal data regarding individuals whose personal data were used to train the model, cannot be considered anonymous. 🔹️For an AI model to be considered anonymous, both (1) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to develop the model and (2) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant, taking into account ‘all the means reasonably likely to be used’ by the controller or another person. 🔹️ Pay special attention to risk of singling out, which is substantial 🔹️ Consider all means reasonably likely to be used by the controller or another person to identify individuals which may include: characteristics of training data, AI model & training procedure; context; c. additional information; costs and amount of time needed to obtain such info; available technology & technological developments. 🔹️ Such means & levels of testing may differ between a publicly available and a model to be used only internally by employees. 🔹️ Consider risk of identification by controller & different types of ‘other persons’, including unintended third parties accessing the AI model, and unintended reuse or disclosure of model. Be able to prove, through steps taken and documentation, that you have taken effective measures to anonymize the AI Model. Otherwise, you may be in breach of your accountability obligations under Article 5(2) GDPR. Factors to consider: 🔹️ selection of sources: (selection criteria; relevance and adequacy of chosen sources; exclusion of inappropriate sources. 🔹️ preparation of data for training phase: (could you use anonymous or pseudonymous); if not why not; data minimisation strategies & techniques to restrict volume of personal data included in training process; data filtering processes to remove irrelevant personal data. 🔹️ Methodological choices regarding training: improve model generalisation & reduce overfitting; privacy-preserving techniques (e.g. differential privacy) 🔹️ Measures regarding outputs of model (lower likelihood of obtaining personal data related to training data from queries). 🔹️ Conduct sufficient tests on model that cover widely known, state-of-the-art attacks: eg attribute and membership inference; exfiltration; regurgitation of training data; model inversion; or reconstruction attacks. 🔹️ Document process including: DPIA; advice by DPO; technical & organisational measures; AI model’s theoretical resistance to re-identification techniques. #dataprivacy #dataprotection #privacyFOMO #AIFOMO Pic by Grok

  • View profile for Anand Singh, PhD

    CSO (Symmetry Systems) | Bestselling Author | Keynote Speaker | Board Member

    16,592 followers

    AI Models Are Talking, But Are They Saying Too Much? One of the most under-discussed risks in AI is the training data extraction attack, where a model reveals pieces of its training data when carefully manipulated by an adversary through crafted queries. This is not a typical intrusion or external breach. It is a consequence of unintended memorization. A 2023 study by Google DeepMind and Stanford found that even billion-token models could regurgitate email addresses, names, and copyrighted code, just from the right prompts. As models feed on massive, unfiltered datasets, this risk only grows. So how do we keep our AI systems secure and trustworthy? ✅ Sanitize training data to remove sensitive content ✅ Apply differential privacy to reduce memorization ✅ Red-team the model to simulate attacks ✅ Enforce strict governance & acceptable use policies ✅ Monitor outputs to detect and prevent leakage 🔐 AI security isn’t a feature, it’s a foundation for trust. Are your AI systems safe from silent leaks? 👇 Let’s talk AI resilience in the comments. 🔁 Repost to raise awareness 👤 Follow Anand Singh for more on AI, trust, and tech leadership

  • View profile for Namrata Ganatra

    Entrepreneur & Tech Executive | ex-Meta, Coinbase, Microsoft | Investor

    10,297 followers

    Your AI models are learning from your most sensitive data. Here's why that should worry you. Most companies don’t stop to ask: what happens to that data once it’s inside the model? 🤯 That’s where Privacy-Preserving Machine Learning (PPML) comes in. It lets you train powerful AI models without ever exposing your raw data. Here's how it works: ⭐ Differential Privacy - Adds mathematical noise to your data so individual records can't be identified, but the AI still learns useful patterns.  E.g. Apple uses this to collect iOS usage stats without exposing individuals. ⭐ Federated Learning - Trains models across multiple devices or organizations without centralizing the data anywhere. E.g Google trains Gboard’s next-word predictions across millions of devices without centralizing keystrokes. ⭐ Homomorphic Encryption - Lets AI process encrypted data without ever decrypting it. E.g. Imagine a bank detecting fraud on encrypted transactions without decrypting them. ⭐ Secure Multi-party Computation - Multiple parties can jointly train a model without sharing their raw data with each other. E.g. Healthcare orgs collaborate on drug discovery without ever exchanging patient records. In a world where everyone is trying to build AI apps and AI native workflows, the companies that figure out PPML first will have a massive competitive advantage and will be able to: ✅ Tap into more data sources ✅ Collaborate across industries ✅ Earn customer trust 👉 What’s your biggest privacy concern with how AI is being used today?

  • View profile for Brian Mullin

    CEO at Karlsgate

    2,420 followers

    How do you get your personal information out of an LLM? You can’t. AI adoption is accelerating, unlocking incredible opportunities—but also raising critical questions about security and data management. I’ve spent years working at the intersection of data security, identity management, and scalable data infrastructure, and one thing is clear: AI models are only as ethical, trustworthy, and informed as the data they’re trained on. As AI systems grow more advanced, they increasingly rely on personal, proprietary, and operationally critical information to tackle ever increasing uses cases. That training needs to be correlated or matched to find patterns. So, fully anonymous data really doesn’t cut it in the data preparation steps, which leads to the need to safeguard personal identity in the model building pipeline. This is why it is critical to address the privacy and security of data aggregation early in the AI pipeline. Once private information enters the model weights, there’s no fixing that later. Employing privacy enhancing technology within the AI training process can safely consolidate and match operational information before transitioning to fully anonymous training sets. The right tooling means everything here. Without these safeguards in place, AI can introduce risks that create long-term challenges for businesses that didn’t account for protection from the start. Building AI responsibly isn’t just about compliance—it’s about ensuring AI remains a powerful tool without turning people away from toxic side-effects. I would love to hear how others in tech, privacy, and security are thinking about this challenge.  #ResponsibleAI #DataEthics #PrivacyFirst #SecureAI

  • View profile for Nicholas Nouri

    Founder | APAC Entrepreneur of the year | Author | AI Global talent awardee | Data Science Wizard

    131,200 followers

    Real footage showing AI companies trying to remove personal data from the AI training dataset to avoid compliance actions. When someone requests that their personal data be removed from an AI model - say, under GDPR or similar laws - it might sound as simple as hitting "delete." But in reality, it’s anything but. Unlike traditional databases, AI models don’t store data in rows or cells. They're trained on massive amounts of text, and the information gets distributed across billions of parameters - like trying to remove a single ingredient from a baked cake. Even if you know the data made it in, there's no obvious way to trace where or how it shaped the model’s behavior. And while you could retrain the entire model from scratch, that’s rarely practical - both financially and technically. That’s where the concept of machine unlearning comes in: the idea of surgically removing specific knowledge from a model without damaging the rest of it. It's still early days, but researchers are making headway. Meanwhile, companies are trying a few approaches: - Filtering out personal data before training even starts - Building opt out systems and better consent mechanisms - Using techniques like differential privacy to avoid memorization - Adding filters to stop models from revealing sensitive outputs The tension here is real: how do we build powerful AI systems while honoring people’s right to privacy? Solving this challenge isn’t just about regulatory compliance - it’s about building trust. Because the moment AI forgets how to forget, the public stops forgiving. #innovation #technology #future #management #startups

Explore categories