Training AI With Personal Information – Important Considerations

There was a flurry of activity about the training of large language models in Europe late in 2024. The European Data Protection Board (EDPB) issued an opinion on the use of personal information to train AI models just days before Italy’s privacy watchdog fined OpenAI 15 million Euros for privacy violations relating to training data. Now that the dust has settled, we’ve put together our thoughts on the findings and highlighted some insights relevant to Australian organisations that use personal information and generative AI.

Background: The EDPB’s Opinion on Personal Data & AI Models

The Irish Data Protection Authority (DPA) sought an opinion from the EDPB to confirm: 

  1. When and how AI models can be considered anonymous; 
  2. Whether and how legitimate interest can be used as a legal basis for developing or using AI models; and 
  3. What happens if an AI model is developed using personal data that was processed unlawfully. 

You can read the 35-page Opinion on certain data protection aspects related to the processing of personal data in the context of AI models for detailed information, but to summarise some of the key points: 

  • Personal data can be used for training AI models without express consent, so long as the final product doesn’t reveal personal information. 
  • The Opinion proposes that companies should consider measures that allow individuals to exercise their rights, including proposing an unconditional opt-out and allowing for the right to erasure. From our perspective, user choice and control over data processing and use remains a competitive advantage and trust builder in today’s business world, so it’s generally worthwhile considering the law and compliance, business needs, and user sentiment – and working to balance all three. We note that the earlier privacy is considered, the more likely it is that your organisation will achieve win-win-win situations.   
  • Whether an AI model is considered anonymous will be assessed on a case-by-case basis, since anonymisation is complex.

Digging Deeper: AI & Anonymity

Anonymising data is not just a matter of switching out or masking email addresses and first names. The bar is much higher. 

The specific legal test under the GDPR requires data controllers to demonstrate:

(1) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to develop the model and 

(2) the likelihood of obtaining, intentionally or not, such personal data from queries, 

should be insignificant, taking into account ‘all the means reasonably likely to be used’ by the controller or another person.

We dug into this in more detail in an earlier blog post about deidentification expectations in AI

Action Items For Australian Organisations

Prevent shadow AI downloads and use

We’re finding that many organisations aren’t expressly stating expectations when it comes to the download and use of artificial intelligence models, including Grok, ChatGPT, and Gemini. We’re also finding that many workers are increasingly using these models, often without employer knowledge or consent. In many cases, these employees are feeding the AI confidential information about the company operations as well as personal information, which could potentially be a data breach.

Consider, for example, a customer service representative who feeds an AI cut-and-pasted data from a customer query about billing. The information provided to the AI included the customer’s name, address, and credit card number as well as specific transaction details. 

All the customer service representative wanted to achieve was to rewrite a response to make it sound more empathetic and complete. However, in doing so, they have passed on personal data in a manner that may fall afoul of existing privacy laws.  

Be transparent about how personal information you collect may be used with AI

Transparency is almost always the key for reducing privacy risk, outside of data breaches. Customers want to know how and why their data is used and processed, and they want the option to choose whether to allow their data to be used for unexpected purposes. 

By transparently disclosing how personal information may be used in a clear, easy-to-understand privacy policy, organisations can shield against reputational risks from unexpected uses of AI. 

Consider banning specific AI models, or broader categories

DeepSeek, China’s response to the Generative AI models released from the US, was promptly banned by multiple governments around the world. Meanwhile, in the private sector, news spread quickly about the risk that DeepSeek could (and likely does) share factually inaccurate information, erode public trust, and censor truths about the Chinese government and their allies.

This example highlights the risks of using AI without thinking critically about the accuracy of the information. But, it also highlights the risk that AI models pose to users around the world – since the information shared with DeepSeek is going to be subject to the privacy rules of China – which are not designed to protect the average employee in Australia.

Include the opportunity to talk to a real person in your process

If you are looking at using AI in a way that will have an impact on people e.g. short-listing job applicants or calculating insurance premiums or deciding on eligibility for a loan, try and incorporate a step in the process that gives affected people an opportunity to talk to a real person about the decision and how it was made.  This is sometimes required by privacy laws and always a good way to deal at an early stage with potential issues.

Complete an AI Impact Assessment Before Adoption

An AI Impact Assessment helps you to evaluate the potential risks, benefits, and ethical implications of any AI system. It can help you understand how an AI tool might affect users, employees, customers, and society more broadly. 

To help, we have prepared a template AI impact assessment form that can be used to create a structured evaluation process to analyse the potential consequences, benefits and risks associated with the deployment of “High Risk Uses” of AI technologies.
It has been created for organisations to help ensure that AI technologies are employed in ways that uphold ethical standards, maintain privacy and security, and respect legal boundaries. The goal is to ensure responsible and informed decision-making when implementing AI technologies. You can get your downloadable AI Impact Assessment

Consider signing up for our newsletter to receive twice-monthly updates about global privacy. 

  • We collect and handle all personal information in accordance with our Privacy Policy.

  • This field is for validation purposes and should be left unchanged.

Privacy, security and training. Jodie is one of Australia’s leading privacy and security experts and the Founder of Privacy 108 Consulting.