The recent investigation conducted by Human Rights Watch (HRW) has brought to light a deeply concerning issue within the realm of AI development.
According to Ars Technica, HRW researcher Hye Jung Han uncovered the troubling practice of using images of children to train artificial intelligence models without obtaining consent, potentially subjecting them to serious privacy and safety risks.
The implications of this discovery are vast and raise significant concerns about the welfare of minors in the digital age.
Han’s examination, which only delved into less than 0.0001 percent of the 5.85 billion images in the LAION-5B dataset, unearthed 190 photos of Australian children from all states and territories.
This small sample size suggests that the actual number of affected children could be substantially higher.
What’s more, these images cover a wide range of ages, making it feasible for AI image generators to fabricate convincing deepfakes featuring real Australian children.
Even more disconcerting is the fact that certain URLs within the dataset divulge identifying details about the children, including their names and locations.
Han managed to trace “both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia” from a single photo link.
This level of specific information puts children at risk of privacy violations and potential safety threats.
Furthermore, Han discovered that even photos protected by stricter privacy settings were not immune to being scraped.
Instances were found where images from “unlisted” YouTube videos, which should only be accessible with a direct link, were included in the dataset.
This raises questions regarding the efficacy of existing privacy measures and tech companies’ responsibility in safeguarding user data.
The utilization of these images in AI training sets presents unique risks to Australian children, particularly indigenous children who may be more susceptible to harm.
Han’s report emphasizes that for First Nations peoples who “restrict the reproduction of photos of deceased people during periods of mourning,” including these images in AI datasets could perpetuate cultural harms.
The potential for misuse of this data is significant. Recent incidents in Australia have already demonstrated its dangers, with approximately 50 girls from Melbourne reporting that their social media photos were manipulated using AI to create sexually explicit deepfakes.
This highlights an urgent need for stronger protections and regulations regarding personal data usage in AI development.
Although LAION has expressed its commitment to removing flagged images from their dataset, this process appears to be slow-moving.
Furthermore, eliminating links from the dataset does not address the fact that AI models have already been trained on these images nor does it prevent their use in other AI datasets.
In conclusion, HRW’s investigation has revealed a disturbing trend where children’s images are being exploited without consent for AI model training purposes.
The far-reaching implications underscore an urgent need for enhanced safeguards and regulations surrounding personal data usage in AI development.