The days of AI companies having largely unregulated and unprotected access to internet content posted online look to be numbered, as a number of countries have initiated a backlash on the use of social media posts and blogs as a means of training Large Language Models (LLMs). A large language model is a type of artificial intelligence that has been trained to understand and generate human language. It uses vast amounts of text data to learn patterns, meanings, and grammar, allowing it to perform tasks like answering questions, writing sentences, translating languages, and even holding conversations and the use of online posted content across websites such as X and Facebook have been go-to locations for providing real user inputs for training purposes. The backlash towards this has been focused recently within the EU and UK, with Brazil now also joining a fightback towards the use of sensitive details posted by users.
The scrutiny has appeared in the wake of changes to Meta’s Privacy Policy which has been updated on June 26 to include the detail that Meta users' information would be used to "develop and improve" its AI products going forward. This inclusion has flagged concerns from the likes of the Data Protection Commission in the EU and the Brazilian National Data Protection Agency who likely see the change as an attack on user rights and misuse of user content for alternative motives that would likely be too opaque to truly understand. It raises the question of how far is too far; and are we as internet users, happy to feed into an AI hivemind that will underpin the wider operations of the internet as a whole. The use of ‘recorded’ content for training purposes is hardly a new concept – such as calls being recorded from a call center to aid new employees learning the job – but the application of this concept to train machine learning and artificial intelligence for purposes that are not entirely clear is a key concern. Meta claim such decisions will stifle the growth and usefulness of AI in the regions that are rebelling against the changes; but the key question is if the growth is worth the invasion of privacy in the first place. Similar concerns have grown elsewhere in the technology space, with Microsoft’s newly showcased Copilot AI raising concerns about Operating System level access to user computers – when the vast majority of users are likely to never make true use of the technology.
The concern over AI is just as much a moral quandary as it is an objective one; but without proper legislative definitions the likes of Meta will always choose to push the boundaries in regard to the use of its own platform.