The OAIC and I-MED / Harrison.AI: What it tells us about de-identification
The OAIC has closed its inquiry into the I-MED / Harrison.ai use of health-related data to train an AI model – unexpectedly (for some of us) finding all was good. This is an important outcome for any Australian organisation looking to de-identify health information to train AI models and certainly will make things easier in the future.
But is it a good decision and what value does it provide for Australian organisations?
Background
In September 2024, media reports covered I-MED’s disclosure of medical imaging scans to Annalise.ai a former joint venture between I-MED and Harrison.ai (Harrison.ai is a healthcare artificial intelligence company). Between 2020 and 2022, I-MED provided Annalise.ai with patient data for the purpose of developing and training an artificial intelligence model to enhance diagnostic imaging support services. I-Med did not collect the consent of any of their patients to either the de-identification or the sharing and there was also no notification of either the de-identification or the sharing.
We covered the media reports at the time in our blog post here.
Following those reports, the OAIC made preliminary inquiries with I-MED, Annalise.ai, and Harrison.ai to decide if it should open an investigation. According to the recent OAIC statement, these inquiries focused on the form and content of the patient data that I-MED provided to Annalise.ai, the process of the data flow, and the steps taken to de-identify the data.
What has happened?
The OAIC has announced that it has completed its investigation and ‘there’s nothing to see here’:
(O)our preliminary inquiries were sufficient to satisfy me that the patient data shared in this instance had been de-identified sufficiently such that it was no longer personal information for the purposes of the Privacy Act. Accordingly, I will not be pursuing regulatory action on this occasion. [More here]
Interestingly, the Commissioner decided to publish a report covering its preliminary investigation and findings, recognising the public interest in the findings. The report notes that it is ‘in the public interest to inform the community of the outcome of the inquiries and as a case study of good privacy practice.’
The OAIC and De-identification
Standard for de-identification
The OAIC found that robust de-identification methods had been applied. It noted that, prior to sharing the patient data with Annalise.ai, I-MED processed the data using a number of techniques, including:
- aggregating the patient data from the underlying dataset,
- scanning the records with text recognition software,
- using two hashing techniques (for unique identifiers such as patient ID numbers, and names, addresses and phone numbers),
- time-shifting dates (to a random date within a specified number of years),
- aggregating certain fields into large cohorts to avoid identification of outliers, and
- redacting any text that appears within or within 10% from the boundary of an image scan.
In addition, contractual obligations were imposed on Annalise.ai, including:
- prohibiting them from doing any act, or engaging in any practice, that would result in the patient data becoming ‘reasonably identifiable’,
- prohibiting them from disclosing or publishing the patient data for any purpose (to prevent wider dissemination of the dataset and accordingly reduce the risk that the patient data may become re-identifiable in the hands of other third parties or the public domain),
- requiring them to store the patient data in a secure environment, and
- requiring them to notify I-MED if it inadvertently received any patient personal information.
i-Med also developed a Data De-identification Policy and Approach to guide the sharing of patient data.
During the course of the preliminary inquiries, I-MED and Annalise.ai provided samples of image scans and other patient data used. A review of these samples by OAIC staff revealed no identifiable personal information.
In considering the controls in place, the OAIC noted that I-MED’s de-identification practices reflect many of the practices endorsed by the National Institute of Standards and Technology, including:
- utilising of the 5-Safes Principles,
- ensuring separation of the Annalise.ai and I-MED environments,
- utilising a ‘Data Use Agreement Model’,
- imposing prescriptive de-identification standards,
- removing or transforming all direct identifiers, and
- utilising top and bottom coding and aggregation of outliers.
It was also noted that there had been a small number of instances where personal information had been shared with it in error due to failures in the de-identification process. In both cases, the material was subsequently deleted or de-identified.
Based on this, OAIC said re-identification risk was sufficiently reduced so the data was not “personal information” under the Privacy Act – while at the same time noting that the risk of re-identification could not be entirely removed.
De-identification is not a use?
A question often raised when thinking about moving personal data into data warehouses or another data set for research or analytics is whether de-identifying the data itself (before moving it to the warehouse) is a ‘use’ that needs to be considered in the context of APP 6 i.e. is it a use that is directly related to the primary use or can it be regarded as a use for a reasonably expected but directly related secondary purpose?
This question does not seem to have been considered in detail in the report. This might be explained by the investigation’s focus ‘on the form and content of the patient data that I-MED provided to Annalise.ai, the process of the data flow and the steps taken to de-identify the data’ – and consideration of whether notice and consent was required for the disclosure, rather than whether notice and consent were required for the de-identification process itself.
It might be assumed that the failure to identify this issue suggests the de-identification is a not a use by itself – and should be regarded more as a risk mitigation or governance issue. However this does seem to contradict the OAIC’s own guidance: De-identifying personal information is a use of the personal information for a secondary purpose.
Some further clarification on that point would be useful.
Can you rely on the report?
The report suggests that it is intended to be relied on, as publication of the details provides ‘a beneficial example of good privacy practices and how the use of de-identified data may still allow an APP entity to effectively carry out its functions and activities, including with the adoption of new and innovative data-driven technologies.’
However, a note at the end of the media statement about the report provides:
this case study should not be taken as an endorsement of I-MED’s acts or practices or an assurance of their broader compliance with the APPs.
So, although a very helpful example, following the I-Med de-identification process may not ensure compliance with the APPs.
Conclusion
What is the net result? I-MED is in the clear while the regulated community has some clarification of the rules for de-identification of sensitive health data to be used to train AI models in Australia. But perhaps not so much clarity on whether de-identification itself is a ‘use’ (and potentially a secondary use).
In any case, with the ever-changing world of de-identification, data sharing and re-identification we still recommend extreme caution in relying on ‘de-identification’ to get you out of Privacy Act obligations. This is consistent with the OAIC expectation that AI developers will ‘take a cautious approach to these activities and give due regard to privacy in a way that is commensurate with the considerable risks for affected individuals.’
The reference to community expectations is interesting as it is not clear whether this decision will make people more comfortable about this sort of use of their health data (without consent or notice). One also can’t help but wonder whether an internal OAIC review of sample data was enough to confirm that there is no reasonable risk of re-identification.
Also worth a special mention is the almost lightning speed of the OAIC in taking less than a year (a mere 11 months!) from opening its preliminary inquiries to issuing its report. This reflects perhaps an understanding of the importance of the issues in this case to many Australian organisations. But a lot happens in 11 months particularly in the world of AI.