Open-Source AI and EU Privacy Standards

Explore top LinkedIn content from expert professionals.

Summary

Open-source AI refers to artificial intelligence systems whose code and models are freely available to use, modify, and share. EU privacy standards set strict rules to ensure personal data is protected when AI is developed or used, especially in light of new laws like the AI Act. Recent conversations highlight how these regulations impact developers and organizations working with open-source AI in Europe, including requirements for risk management, transparency, and data sharing.

  • Clarify licensing terms: Make sure any open-source AI model is released with a license that truly allows free use, modification, and distribution, as the EU sets specific requirements for this status.
  • Publish training data summaries: Prepare to disclose general information about the data used to train AI models, following EU templates that balance transparency and trade secrets.
  • Document risk management: Establish clear procedures for identifying and addressing data protection risks, including explainability, fairness checks, and human oversight throughout the AI lifecycle.
Summarized by AI based on LinkedIn member posts
  • View profile for Victoria Beckman

    Associate General Counsel - Cybersecurity & Privacy

    31,581 followers

    The EDPS - European Data Protection Supervisor has issued a new "Guidance for Risk Management of Artificial Intelligence Systems." The document provides a framework for EU institutions acting as data controllers to identify and mitigate data protection risks arising from the development, procurement, and deployment of AI systems that process personal data, focusing on fairness, accuracy, data minimization, security and data subjects’ rights. Based on ISO 31000:2018, the guidance structures the process into risk identification, analysis, evaluation, and treatment — emphasizing tailored assessments for each AI use case. Some highlights and recommendations include: - Accountability: AI systems must be designed with clear documentation of risk decisions, technical justifications, and evidence of compliance across all lifecycle phases. Controllers are responsible for demonstrating that AI risks are identified, monitored, and mitigated. - Explainability: Models must be interpretable by design, with outputs traceable to underlying logic and datasets. Explainability is essential for individuals to understand AI-assisted decisions and for authorities to assess compliance. - Fairness and bias control: Organizations should identify and address risks of discrimination or unfair treatment in model training, testing, and deployment. This includes curating balanced datasets, defining fairness metrics, and auditing results regularly. - Accuracy and data quality: AI must rely on trustworthy, updated, and relevant data.  - Data minimization: The use of personal data in AI should be limited to what is strictly necessary. Synthetic, anonymized, or aggregated data should be preferred wherever feasible. - Security and resilience: AI systems should be secured against data leakage, model inversion, prompt injection, and other attacks that could compromise personal data. Regular testing and red teaming are recommended. - Human oversight: Meaningful human involvement must be ensured in decision-making processes, especially where AI systems may significantly affect individuals’ rights. Oversight mechanisms should be explicit, documented, and operational. - Continuous monitoring: Risk management is a recurring obligation — institutions must review, test, and update controls to address changes in system performance, data quality, or threat exposure. - Procurement and third-party management: Contracts involving AI tools or services should include explicit privacy and security obligations, audit rights, and evidence of upstream data protection compliance. The guidance establishes a practical benchmark for embedding data protection into AI governance — emphasizing transparency, proportionality, and accountability as the foundation of lawful and trustworthy AI systems. 

  • View profile for Ben Brooks

    Fellow @ Berkman Klein Center, ex-Stability AI | GoogleX | Uber | Coinbase

    4,923 followers

    The EU draft AI Code of Practice could affect open models and small developers in unintended ways. The draft Code (which outlines how model developers can comply with the AI Act): 1. Defines systemic risk too broadly. An open model with systemic risk is not exempt from the AI Act, and a lot depends on how the EU defines systemic risk.* However, the Code endorses an impossibly nebulous list of risks, including: persuasion, "loss of trust in media", “large-scale discrimination”, and "oversimplification of knowledge". Yet these are not model-layer risks, and aren't amenable to precise evaluation. Any 2B model can be easily tuned into a disinformation parrot or spambot. We need to be careful lifting regulatory language from ethics literature. 2. Envisions pre-deployment audits. Developers of these models must submit to “independent testing” and file a safety report before deployment. But the Act did not mandate third-party testing or reporting before deployment. The Code would prevent an open release until the AI Office and “appropriate third party evaluators” have finished their work. 3. Requires developers to test the unforeseeable. Developers must test not just "reasonably foreseeable" applications but also applications that expose the model’s “maximum potential” for systemic risk. It's a costly and indeterminate obligation that means testing for possible risks—not just foreseeable or probable risks. And it becomes more difficult and expensive in an open source context, where developers can modify or integrate the model in ways that aren’t possible in a paywalled API environment. 4. Doesn't clarify the urgent obligations. All developers need to understand how to comply with e.g. opt-outs. The Code defers the question, requiring developers to "make best efforts" with "widely used standards". But there are still no widely used standards (especially for text data) and developers are already training models that will be subject to the Act. If it’s unclear how to comply, that exposes all developers to potential litigation—especially those with open datasets or auditable models. 5. Requires developers to draw a line in the sand. Developers must identify conditions under which they would pause, withdraw, or delete models. This isn’t the first attempt to crystallize risk thresholds (see e.g. WH or SB1047), and the Code doesn’t mandate a specific threshold. But if regulators disagree, or if thresholds vary widely—as they certainly will—that could trigger future intervention that adversely impacts open models. To be clear, no one expects the EU to enforce the Code in a plainly ridiculous or adverse way. I know from experience the AI Office is led by good people who value open innovation in Europe. But the unintended effects of well-meaning rules can be significant. We should use this opportunity to get it right from day one. * Still TBD whether the threshold will be 1E25 FLOP alone, or include other criteria e.g. audiovisual risks like NCII.

  • View profile for Luiza Jarovsky, PhD
    Luiza Jarovsky, PhD Luiza Jarovsky, PhD is an Influencer

    Co-founder of the AI, Tech & Privacy Academy (1,300+ participants), Author of Luiza’s Newsletter (87,000+ subscribers), Mother of 3

    121,375 followers

    🚨 BREAKING: Meta claims that its AI model Llama 4 is open-source. However, according to the latest EU guidelines, the company will likely NOT be able to benefit from the EU AI Act's open-source exemptions: A few days ago, the EU Commission released its "Guidelines on the scope of obligations for providers of general-purpose AI models under the AI Act." Among the topics covered are the exemptions from certain EU AI Act obligations for some AI models released as open-source (pages 21-25). First, it's important to remember that being considered open-source under the EU AI Act is a big deal from a compliance perspective, as the following obligations will not apply (unless the AI model is classified as a general-purpose AI model with systemic risk): - Article 53(1), point (a): "draw up and keep up-to-date the technical documentation of the model, including its training and testing process and the results of its evaluation, which shall contain, at a minimum, the information set out in Annex XI for the purpose of providing it, upon request, to the AI Office and the national competent authorities" - Article 53(1), point (b): "draw up, keep up-to-date and make available information and documentation to providers of AI systems who intend to integrate the general-purpose AI model into their AI systems" - Article 54 AI Act, on the obligation to appoint an authorized representative for providers of general-purpose AI models established in 3rd countries. - So, which models can be considered open-source? To qualify for the exemptions, the AI model must be released under a "free and open-source licence that allows for the access, use, modification, and distribution of the model." As a group of EU lawmakers have already criticized in the past, Meta's Llama license has specific prohibitions and conditions, including forcing those who develop a successful AI system based on Llama to negotiate a special license with them. When reading the EU's latest guidelines, it seems that these restrictions make Llama ineligible to benefit from the open-source exemptions. Why? The EU states it only allows very limited licensing conditions restricting access, use, modification, or distribution, including the requirement of "crediting the author(s) and retaining their copyright notice (attribution)." The EU also states that: "the recipient of the licenced model should be able to modify it and redistribute the resulting derivative work under different licence terms, including closed-source proprietary terms." I'm adding the link to the full guidelines below. Pages 21-25 cover open-source conditions. - As I told you in my recent post about Meta REFUSING to sign the code of practice for general-purpose AI: Meta is unhappy with the EU AI Act as a whole, and its legal disputes with the EU regarding AI compliance are likely just starting... - 👉 Never miss my analyses and curations on AI: join my newsletter's 69,600+ subscribers (link below).

  • View profile for Dr. Barry Scannell
    Dr. Barry Scannell Dr. Barry Scannell is an Influencer

    AI Law & Policy | Partner in Leading Irish Law Firm William Fry | Member of Irish Government’s Artificial Intelligence Advisory Council | PhD in AI & Copyright | LinkedIn Top Voice in AI | Global Top 200 AI Leaders 2025

    56,760 followers

    From August 2025, the EU will require providers of general-purpose AI (GPAI) models to publish a summary of the data used to train those models. This requirement, introduced under Article 53 of the AI Act, aims to increase transparency and enable the enforcement of rights under EU law, particularly copyright. The European Commission’s long awaited training data disclosure template, has JUST been published and sets out the structure and content of these public summaries. The template covers various data categories, including publicly available datasets, private datasets, scraped web content, user data and synthetic data. It requires providers to describe sources in general terms, such as naming the top 10 percent of internet domains from which content was scraped. However, it does not compel disclosure of individual works, authors or files. The aim is to be broadly comprehensive without revealing confidential details. The policy objective is clear: to make AI development less opaque. But in practice, the high-level nature of these summaries may offer limited utility to individual rights holders. The legal background is shaped by Article 4 of the Copyright in the Digital Single Market Directive. That provision permits commercial text and data mining unless rights holders have expressly opted out through machine-readable means. The burden rests on the rights holder to take proactive steps to prevent use. The training data summary does not alter this fundamental structure. The template does require providers to explain how they respected opt-out mechanisms. This includes whether they honoured robots.txt files or other machine-readable notices, and whether they are signatories to the upcoming Code of Practice. The structure of the template is better suited to documenting bulk licensing arrangements than enabling individual licensing. Licensed datasets can be described in general terms only, and there is no requirement to identify individual works or authors. This reflects the expectation that AI developers will continue to negotiate deals with major publishers or intermediaries rather than individual creators. The Commission has also prioritised protection of trade secrets. Where datasets are not publicly known or considered commercially sensitive, providers are permitted to describe them only in broad terms. Voluntary disclosure is encouraged, and the template includes an optional “upon request” mechanism that allows rights holders to ask if their content was used. But providers are under no legal obligation to respond. The AI Office has the authority to assess whether providers have completed the summaries correctly but it will not investigate specific use of copyrighted material. Rights holders remain responsible for proving infringement under national law. While the summaries may offer a starting point for further inquiry, they are unlikely to provide the level of detail needed to support licensing negotiations or enforcement proceedings.

Explore categories