Privacy regulators tell social media companies to fear the scrapers

Social media companies and other businesses have an obligation to protect users’ publicly available information from data scrapers that gather it for unintended purposes, an international group of privacy regulators said Thursday.

“Personal information that is publicly accessible is still subject to data protection and privacy laws in most jurisdictions,” 12 agencies said in a joint statement. Stopping unlawful data scraping requires “multi-layered technical and procedural controls,” the agencies said, and “vigilance is paramount.”

The concern is that third-party scrapers could end up collecting users’ data “for reasons they don’t expect,” including cyberattacks or identity fraud, according to a post by the U.K.’s Information Commissioner’s Office. In addition to the ICO, the signatories are data protection authorities from Argentina, Australia, Canada, Colombia, Hong Kong, the island of Jersey, Mexico, Morocco, New Zealand, Norway and Switzerland.

The joint statement does not specifically mention artificial intelligence companies, which are facing criticism that successful AI products like ChatGPT were trained on large harvests of publicly available information. Companies like Google, Microsoft and OpenAI face class-action lawsuits filed on behalf of content creators.

The regulators’ statement is blunt about the potential regulatory and legal pitfalls for social media companies and others.

“Mass data scraping incidents that harvest personal information can constitute reportable data breaches in many jurisdictions,” the agencies said.

The statement warns that unlawfully collected information can end up on hacking forums and be used in social engineering or phishing attacks. Other threats include spam, identity fraud, intelligence operations by foreign governments and improper use in facial recognition technology.

Companies can attempt to protect against data scraping in several ways, the regulators say. Suggestions include:

“Rate limiting” the number of visits per hour or day by one account to other account profiles.
Monitoring how quickly and aggressively a new account starts looking for other users.
Identifying patterns in behavior by bots, including the use of suspicious IP addresses.
Supporting users so they can “make informed decisions about how they use the platform and what personal information they share.”

Get more insights with the

Recorded Future

Intelligence Cloud.

Learn more.