AI can discover a lot of information from innocuous social media posts (Picture: Getty)
ChatGPT knows where you are, and how much you earn.
That’s the verdict from a recent study that showed large language model (LLM) based artificial intelligence (AI) could accurately guess Reddit users’ age, location, gender and income with up to 96% accuracy based solely on what they write in social media posts.
Researchers from ETH Zurich university tasked nine LLMs with trying to determine the personal characteristics of 520 users that the team were able to confidently verify.
ChatCGPT-4 came out on top with overall accuracy of 85%, while Meta’s LlaMA-2-7b scored the lowest at 51%.
And while some personal details were explicitly written in posts or elsewhere online, such as income in forums on financial advice, much of the information was determined using more subtle cues, such as location-specific words.
Speaking to New Scientist, lead author Robin Staab said the results served as a warning about how much information we share online without realising.
‘It tells us that we give a lot of our personal information away on the internet without thinking about it,’ he said. ‘Many people would not assume that you can directly infer their age or their location from how they write, but LLMs are quite capable.’
AI learnt a lot about Reddit users just from their online presence (Picture: Getty)
The LLMs found age the easiest characteristic to determine, with ChatGPT-4 showing 98% accuracy, compared to 63% accuracy on income.
The authors note that while a human could make similar assumptions when presented with the same information, LLMs are 240 times quicker and 100 times cheaper, posing significant risks to privacy.
‘Our findings highlight that current LLMs can infer personal data at a previously unattainable scale,’ they said. ‘In the absence of working defences, we advocate for a broader discussion around LLM privacy implications beyond memorisation, striving for a wider privacy protection.’
Previous concern around LLMs has focused on the use of public data for training, or potential leaks of personal conversations with the software, but the latest study presents a new frontier.
‘We’re only just beginning to understand how privacy might be affected by use of LLMs,’ said cybersecurity expert Professor Alan Woodward, speaking to New Scientist.
The study is timely given prime minister Rishi Sunak is currently hosting a global AI summit at Bletchley Park, bringing together world leaders and tech bosses to discuss safety issues.
Earlier today the technology secretary Michelle Donelan said delegations from around the world attending the summit had agreed on the ‘Bletchley declaration on AI safety’ as the starting point for a global conversation on the issue.
Speaking at the opening of the summit, Ms Donelan said the agreement was a ‘landmark achievement’ and that it ‘lays the foundations for today’s discussions’.
‘It affirms the need to address these risks as they are the only way to safely unlock the extraordinary opportunities,’ she said.
Be careful what you write.