This new white paper by Stanford Institute for Human-Centered Artificial Intelligence (HAI) titled "Rethinking Privacy in the AI Era" addresses the intersection of data privacy and AI development, highlighting the challenges and proposing solutions for mitigating privacy risks. It outlines the current data protection landscape, including the Fair Information Practice Principles, GDPR, and U.S. state privacy laws, and discusses the distinction and regulatory implications between predictive and generative AI. The paper argues that AI's reliance on extensive data collection presents unique privacy risks at both individual and societal levels, noting that existing laws are inadequate for the emerging challenges posed by AI systems, because they don't fully tackle the shortcomings of the Fair Information Practice Principles (FIPs) framework or concentrate adequately on the comprehensive data governance measures necessary for regulating data used in AI development. According to the paper, FIPs are outdated and not well-suited for modern data and AI complexities, because: - They do not address the power imbalance between data collectors and individuals. - FIPs fail to enforce data minimization and purpose limitation effectively. - The framework places too much responsibility on individuals for privacy management. - Allows for data collection by default, putting the onus on individuals to opt out. - Focuses on procedural rather than substantive protections. - Struggles with the concepts of consent and legitimate interest, complicating privacy management. It emphasizes the need for new regulatory approaches that go beyond current privacy legislation to effectively manage the risks associated with AI-driven data acquisition and processing. The paper suggests three key strategies to mitigate the privacy harms of AI: 1.) Denormalize Data Collection by Default: Shift from opt-out to opt-in data collection models to facilitate true data minimization. This approach emphasizes "privacy by default" and the need for technical standards and infrastructure that enable meaningful consent mechanisms. 2.) Focus on the AI Data Supply Chain: Enhance privacy and data protection by ensuring dataset transparency and accountability throughout the entire lifecycle of data. This includes a call for regulatory frameworks that address data privacy comprehensively across the data supply chain. 3.) Flip the Script on Personal Data Management: Encourage the development of new governance mechanisms and technical infrastructures, such as data intermediaries and data permissioning systems, to automate and support the exercise of individual data rights and preferences. This strategy aims to empower individuals by facilitating easier management and control of their personal data in the context of AI. by Dr. Jennifer King Caroline Meinhardt Link: https://lnkd.in/dniktn3V
Challenges of AI Development in Compliance with GDPR
Explore top LinkedIn content from expert professionals.
Summary
Developing AI systems that comply with GDPR, the General Data Protection Regulation, presents unique challenges due to strict rules around data privacy, consent, and accountability, especially given AI's reliance on massive data processing. Companies must navigate complex regulations while addressing potential privacy risks and maintaining robust data governance.
- Emphasize data minimization: Adopt "privacy by default" practices by collecting only the minimum data necessary for a specific purpose, and ensure proper consent mechanisms are in place.
- Prepare for data rights requests: Invest in technical solutions to address GDPR requirements, such as facilitating data deletion or "machine unlearning" when individuals request their information be removed from AI systems.
- Focus on lifecycle transparency: Maintain accountability across the AI data supply chain by clearly documenting how AI systems handle, process, and reuse data from collection to implementation.
-
-
On October 11, 2023, the French Data Protection Authority (the “CNIL”) published a new set of guidelines addressing the research and development of AI systems from a data protection perspective (the “Guidelines”). In the Guidelines, the CNIL confirms the compatibility of the EU General Data Protection Regulation (“GDPR”) with AI research and development. The Guidelines are divided into seven “AI how-to sheets”, these guides: (1) determining the applicable legal regime (e.g., the GDPR or the Law Enforcement Directive); (2) adequately defining the purpose of processing; (3) defining the role (e.g., controller, processor, or joint controller) of AI system providers; (4) defining the legal basis and implementing necessary safeguards to ensure the lawfulness of the data processing; (5) drafting a data protection impact assessment (“DPIA”) where necessary; (6) adequately considering data protection in the AI system design choices; and (7) implementing the principle of data protection by design in the collection of data and adequately managing data after collection. Noteworthy takeaways from the Guidelines include: In line with the GDPR, the purpose of the development of an AI system must be specific, explicit, and legitimate. The CNIL clarifies that where the operational use of AI systems in the deployment phase is unique and precisely identified from the development stage, the processing operations carried out in both phases pursue, in principle, a single overall purpose. Consent, legitimate interests, contract performance, and public interest may all theoretically serve as legal bases for the development of AI systems. Controllers must carefully assess the most adequate legal basis for their specific case. DPIAs carried out to address the processing of data for the development of AI systems must address specific AI risks, such as the risk of producing false content about a real person or the risks associated with known attacks specific to AI systems (such as attacks by data poisoning, insertion of a backdoor, or model inversion). Data minimization and data protection measures that have been implemented during data collection may become obsolete over time and must be continuously monitored and updated when required. Re-using datasets, particularly those publicly available on the Internet, is possible to train AI systems, provided that the data was lawfully collected and the purpose of re-use is compatible with the original collection purpose. The CNIL considers AI to be a topic of priority. It has set up a dedicated AI department, launched an action plan to clarify the rules and support innovation in this field, and introduced two support programs for French AI players. What do you think about the CNIL's Guidelines on AI development and data protection? #France #DPA #dataprotection #ai
-
What does GDPR compliance look like in the age of AI? OpenAI is under investigation or enforcement action from data privacy regulators in: - Italy - Spain - Poland All of them are alleging some form of General Data Protection Regulation (GDPR) non-compliance. Because the GDPR is so expansive (and vague), these actions stem from a wide variety of underlying issues like: - Data collection and web scraping - Child protection - Hallucination There are varying levels of validity to these accusations, but I think the toughest nut to crack is what I call INTERNAL sensitive data generation. Separately from EXTERNAL sensitive data generation, which happens when parties external to you or your organization are able to intuit things about you that you wish they couldn’t, internal sensitive generation is mainly a privacy compliance problem It occurs when you or your organization are able to intuit things about others that you wish you couldn’t. For example, in 2023 OpenAI geo-blocked all of Italy from using ChatGPT after that country’s data protection regulator ordered the company to stop processing Italians’ data. The thing is, the blocking didn’t stop the processing! ChatGPT was still absolutely capable of answering questions about Italian “natural persons,” just to people outside of Italy. The European Union (EU) essentially claims that GDPR has global reach because it ostensibly applies to any organization processing the data of EU residents. So this blocking didn’t really solve the issue. And for every EU resident that OpenAI cannot establish a lawful basis for processing his or her personal data, it’s going to have a big problem on its hands. Without AI, a company would just purge its databases of the offending information. But when that information has been embedded into model weights, stopping the processing becomes far more difficult. Selective lobotomization of generative AI models, i.e. machine unlearning, appears to provide a possible solution. With that said, the impacts to performance are unclear when done at scale. And you can bet a sizable minority of EU data subjects are going to request it if it becomes an option. Even if that challenge is technically surmountable, it will require continuous testing and validation of impacted models. If you think this would be an enormous burden from an administrative and engineering perspective, you would be correct. But that doesn’t mean it won’t happen. EU court decisions are already forcing Meta to completely redesign its business model and products due to the latter’s reliance on data collection for advertising. To deal with this, consider: 1️⃣ An emergency decommissioning plan for potentially impacted systems. 2️⃣ How to process machine unlearning requests if GDPR requires you to take them. 3️⃣ Your willingness to block EU data subjects from using your products and from having their personal data processed (even as part of model weights).