The OAIC’s New AI Guidance Explained
This month the OAIC released 2 new guides on artificial intelligence (AI) for organisations for businesses using AI and developers using personal information to train generative AI models.:
- Guidance on privacy and the use of commercially available AI products – guide for users
- Guidance on privacy and developing and training generative AI models – guide for developers.
The OAIC has previously published separate guidance on privacy and developing and training generative AI models.
Given the increasing widespread use of AI, this guidance should be of interest to almost every Australian organisation. In this post, we’ll look at both guidance documents in more detail.
What’s The Purpose Of The New AI Guidance?
The purpose of the new guidance documents is to set out some general principles for organisations seeking to use, develop or train AI that involves personal information.
Overall, the OAIC makes it clear that it expects organisations to take a cautious approach and make sure privacy is a key consideration, particularly in regard to high privacy risk activities.
It’s clear that AI is an issue that the OAIC want to get in front of. An OAIC spokesperson said it is already in the “initial stages of assessing a number of practices and entities for compliance” relating to generative AI but has yet to formally launch any investigations. “If non-compliance comes to our attention, we will consider taking regulatory action,” the spokesperson said.
The OAIC has said that its aim in writing the guidance was ‘to remove any doubt about how Australia’s existing privacy law applies to AI, make compliance easier, and help businesses follow privacy best practice.’
Australians are increasingly concerned about their personal information being used by AI, especially for the purpose of training generative AI products. For businesses to benefit from AI while holding on to (and building upon) community trust, it’s essential to have robust privacy governance and safeguards in place. Overall, the OAIC’s expectation is that organisations seeking to use AI to take a cautious approach, assess risks and make sure privacy is a key consideration. In this context, the responsible use of AI makes good business sense.
Guidance on The Use of Commercially Available AI Products
This guidance is directed at organisations looking to use commercially available AI products such as chatbots, content-generation tools (including text-to-image generators), and productivity assistants that augment writing, coding, note-taking, and transcription – whether free or paid for.
Top 5 Takeaways
Helpfully, this Guidance includes the top 5 takeaways. These are:
- Privacy obligations will apply to any personal information input into an AI system, as well as the output data generated by AI (where it contains personal information).
- Businesses should update their privacy policies and notifications with clear and transparent information about their use of AI, including ensuring that any public facing AI tools (such as chatbots) are clearly identified as such to external users such as customers.
- If AI systems are used to generate or infer personal information, including images, this is a collection of personal information and must comply with APP 3 (and restrictions around collection).
- If personal information is being input into an AI system, APP 6 requires entities to only use or disclose the information for the primary purpose for which it was collected, unless they have consent or can establish the secondary use would be reasonably expected by the individual, and is related (or directly related, for sensitive information) to the primary purpose. A secondary use may be within an individual’s reasonable expectations if it was expressly outlined in a notice at the time of collection and in your business’s privacy policy.
- As a matter of best practice, the OAIC recommends that organisations do not enter personal information, and particularly sensitive information, into publicly available generative AI tools, due to the significant and complex privacy risks involved.
Privacy risks and harms
Generally, the OAIC recommends taking a proportionate and risk-based approach to the selection and use of any AI products.
Some of the privacy risks that should be considered as part of this approach include:
For both traditional and generative AI technologies, risks include:
- Bias and discrimination: which can be replicated from the source data.
- Lack of transparency: The complexity of many AI systems creates significant challenges in ensuring the open and transparent handling of personal information in relation to AI systems.
- Risk of disclosure of personal information through a data breach: The vast amounts of data collected and stored by many AI models, particularly generative AI, may increase the risks related to data breaches.[21] This could be through unauthorised access to the training dataset or through attacks designed to make a model regurgitate its training dataset.[22]
- Individuals losing control over their personal information: It can be difficult for individuals to identify when their personal information is used in AI systems and to request the correction or deletion of this information. These risks will also arise in relation to some traditional AI systems which are trained on public data containing personal information, such as facial recognition systems.
The guidance sets out questions aimed at assisting organisation with this process.
For generative AI, privacy risks include:
- Misuse of generative AI systems: AI models can built by malicious actors for improper purposes, or there can be misuse of the AI model including
- Generating disinformation at scale, such as through deepfakes
- Scams and identity theft
- Generating harmful or illegal content, such as image-based abuse, which can be facilitated through the accidental or unintended collection and use of harmful or illegal material, such as child sexual abuse material, to train AI systems[25]
- Generating harmful or malicious code that can be used in cyber attacks or other criminal activity.
- Other inaccuracies: Issues in relation to accuracy or quality of the training data (including as a result of data poisoning) and the predictive nature of generative AI models can lead to outputs that are inaccurate but appear credible.
Practical Considerations When Selecting An AI Product
Recommendations for organisations looking at selecting an AI product include:
- Privacy by Design: Practising ‘privacy by design’ is the best way to manage privacy risks relating to the adoption and use of AI. PbD means building the management of privacy risks into systems and processes from the beginning rather than at the end.
- Privacy Impact Assessment: A Privacy Impact Assessment will assist to understand the impact that the use of a particular AI product may have on the privacy of individuals and identify ways to manage, minimise or eliminate those impacts.
Other questions to ask when considering an AI product include:
- Is the AI product appropriate for its proposed use (considering how high-risk that use may be)?
- What are the potential security risks?
- Who will have access to data in the AI system?
- What are the key privacy risks – including where personal information is being used in the AI system and where it is collected (including by inference or creation)? See the new section for tips where using or disclosing PI as an AI input.
- What obligations apply to ensuring the accuracy of AI systems?
- What transparency and governance measures are required?
- What ongoing assurance measures are needed?
Practical Tips For Using or Disclosing Personal Information as an AI Input
If your organisation wants to use personal information as an input into an AI system, you should consider your APP 6 obligations:
- Is the purpose for which you intend to use or disclose the personal information the same as the purpose for which you originally collected it?
- If it is a secondary use or disclosure, consider whether the individual would reasonably expect you to use or disclose it for this secondary purpose. What information have you given them about your intention to use their personal information in this way?
- If it is a secondary use or disclosure, also consider whether the secondary purpose is related to the primary purpose of collection (or if the information is sensitive information, whether it is directly related to the primary purpose).
- Take steps to minimise the amount of personal information that is input into the AI system.
Guidance on Privacy and Developing and Training Generative AI Models
This separate guide is for developers of generative AI models or systems that use personal information. While it focuses on generative AI, developers of any kind of AI model that involves personal information will find it helpful.
Top 5 Takeaways
The top 5 takeaways of this Guidance are:
- Developers must take reasonable steps to ensure accuracy in generative AI models, commensurate with the likely increased level of risk in an AI context. This includes using high-quality datasets and undertaking appropriate testing.
- Developers must consider whether data they intend to use or collect (including publicly available data) contains personal information, and comply with their privacy obligations. Developers may need to take additional steps (e.g. deleting information) to ensure they are complying with their privacy obligations.
- Developers must take particular care with sensitive information, which generally requires consent to be collected.
- Where developers are seeking to use personal information they already hold for training an AI model, and this was not a primary purpose of collection, they need to carefully consider their privacy obligations. If they can’t establish that this secondary use is related to the primary use and would be reasonably expected by the individual – then they may need do consent for a secondary, AI-related purpose,
- Where a developer cannot clearly establish that a secondary use for an AI-related purpose was within reasonable expectations and related to a primary purpose, they should seek consent for that use and/or offer individuals a meaningful and informed ability to opt-out of such a use.
Purpose of The Guidance
This guidance is intended for developers of generative AI models or systems who are subject to the Privacy Act. A developer includes any organisation who designs, builds, trains, adapts or combines AI models and applications. This includes adapting through fine-tuning, which refers to modifying a trained AI model (developed by them or someone else) with a smaller, targeted fine-tuning dataset to adapt it to suit more specialised use cases.
The guidance also addresses where an organisation provides personal information to a developer so they can develop or fine-tune a generative AI model.
Privacy risks and harms
Some privacy risks that may be relevant in the context of the development of generative AI include the following:
- Individuals losing control over their personal information: Technologies such as generative AI are trained on large amounts of public data, including the personal information of individuals, which is likely to be collected without their knowledge and consent. It can be difficult for individuals to identify when their personal information is used in AI systems and to request the correction or deletion of this information.
- Bias and discrimination: As AI systems learn from source data which may contain inherent bias, this bias may be replicated in their outputs through inferences made based on gender, race or age and have discriminatory effects. AI outputs can often appear credible even when they produce errors or false information.
- Other inaccuracies: Issues in relation to accuracy or quality of the training data (including as a result of data poisoning) and the predictive nature of generative AI models can lead to outputs that are inaccurate but appear credible. Feedback loops can cause the accuracy and reliability of an AI model to degrade over time. Inaccuracies in output can have flow-on consequences that depend on the context, including reputational harm, misinformation or unfair decisions.
- Lack of transparency: AI can make it harder for entities to manage personal information in an open and transparent way, as it can be difficult for entities to understand and explain how personal information is used and how decisions made by AI systems are reached.
- Re-identification: The use of aggregated data drawn from multiple datasets also raises questions about the potential for individuals to be re-identified through the use of AI and can make it difficult to de-identify information in the first place.
- Misuse of generative AI systems: The capabilities of generative AI models can be misused through malicious actors building AI systems for improper purposes or the AI model or end users of AI systems misusing them, with potential impacts on individual privacy or broader negative consequences including through:
- Generating disinformation at scale, such as through deepfakes.
- Scams and identity theft.
- Generating harmful or illegal content, such as image-based abuse, which can be facilitated through the accidental or unintended collection and use of harmful or illegal material, such as child sexual abuse material, to train AI systems.
- Generating harmful or malicious code that can be used in cyber attacks or other criminal activity.
- Risk of disclosure of personal information through a data breach involving the training dataset or through an attack on the model: The vast amounts of data collected and stored by generative AI may increase the risks related to data breaches, especially when individuals disclose particularly sensitive data in their conversations with generative AI chatbots because they are not aware it is being retained or incorporated into a training dataset. This could be through unauthorised access to the training dataset or through attacks designed to make a model regurgitate its training dataset.
This guidance is focused on APPs 1, 3, 5, 6 and 10.
Privacy considerations when planning and designing an AI model or system include:
- APP 1 – Privacy by Design;
- APP – 10 Accuracy including potential bias in results. Some of the factors that might impact on accuracy of Generative AI models include:
-
- They are often trained on huge amounts of data sourced from across the internet, which is highly likely to include inaccuracies and be impacted by unfounded biases. The models can then perpetuate and amplify those biases in their outputs.
- The probabilistic nature of generative AI (in which the next word, sub-word, pixel or other medium is predicted based on likelihood) and the way it tokenises input can generate hallucinations. For example, without protective measures an LLM asked how many ‘b’s are in banana will generally state there are two or three ‘b’s in banana as the training data is weighted with instances of people asking how many ‘a’s or ‘n’s are in banana and because of the way it tokenises words not letters.
- The accuracy and reliability of an AI model is vulnerable to deterioration over time. This can be caused by the accumulation of errors and misconceptions across successive generations of training, or by a model’s development on training data obtained up to a certain point in time, which eventually becomes outdated.
- An LLM’s reasoning ability declines when they encounter a scenario or task that differs to what is in their training data.
- APP 1 and APP 5 – Notice and transparency obligations. Regardless of how a developer compiles a dataset, they must:
-
- have a clearly expressed and up-to-date privacy policy about their management of personal information
- take such steps as are reasonable to notify or otherwise ensure the individual is aware of certain matters at, before or as soon as practicable after they collect the personal information.
- A developer’s privacy policy must contain the information set out in APP 1.4. In the context of training a generative AI model, this will generally include:
- information about the collection (whether through data scraping, a third party or otherwise) and how the data set will be held
- the purposes for which it collects and uses personal information, specifically that of training generative AI models
- how an individual may access personal information about them that is held by the developer and seek correction of such information, including an explanation of how this will work in the context of the dataset and generative AI model.
Checklists – Selecting and using an AI product
The Guidance includes two helpful checklists with a series of questions and associated considerations for use in the following situations:
- Checklist 1 – privacy considerations when selecting an AI product
- Checklist 2 – privacy considerations when using an AI product
We recommend using these in your AI decision-making.
Voluntary AI Safety Standard Guardrails
Don’t forget the National AI Centre’s Voluntary AI Safety Standard, designed to help organisations develop and deploy AI systems in Australia safely and reliably. The standard consists of 10 voluntary guardrails that apply to all organisations across the AI supply chain. It does not seek to create new legal obligations, but rather helps organisations deploy and use AI systems in accordance with existing Australian laws. The information in this guidance is focussed on compliance with the Privacy Act, but will also assist organisations in addressing the guardrails in the Standard.
For more information, see www.industry.gov.au/publications/voluntary-ai-safety-standard.