'Junk inferences' by data brokers are a problem for consumers and the industry itself
The data broker industry is sometimes criticized as creepy and Orwellian because it collects, packages and sells information about people to other companies and even the government. It’s presented as an example of how our personal lives and tastes are for sale at a granular level.
As the business draws more scrutiny, though, experts are warning that the picture is much messier than that: Some of what data brokers sell is inaccurate trash, creating serious downstream effects.
In addition to any errors in their data files, brokers regularly peddle inaccurate inferences about individuals to advertisers, insurers or any number of other buyers, experts say.
These “junk” inferences — essentially, bad predictions about people — can affect everything from someone’s web browsing experience all the way to a major financial transaction. The multibillion-dollar ad tech industry, meanwhile, relies on the brokers’ products with little transparency into whether they are actually right.
Overall, the evolving awareness about inaccuracies is prompting questions about whether most of the industry — particularly so-called third-party data brokers who do not have direct relationships with consumers the way Meta or Google do — can survive under the current business models, as privacy regulation proliferates and marketers begin to wake up to the problem.
Even the top privacy official for one of the world’s largest third-party data brokers, Acxiom, acknowledges that the inferences the company sells are not always correct.
“We want to be as accurate as possible, but our inferences, all they are, are informed guesses,” Acxiom’s chief privacy officer, Jordan Abbott, said in an interview with Recorded Future News at an industry conference last month.
Inaccurate inferences used to determine significant health and financial decisions are for more worrying to him than those used for advertising, Abbott said.
“If we tend to guess wrong in the immediate,” Abbott said, speaking to the larger conference audience, “my hope is it doesn't have any sort of consequences [for] denial of benefits or eligibility for credit.”
Those real-world effects also can compromise privacy and lead to discrimination, data broker expert Justin Sherman said in an interview.
“Insurance companies, for example, regularly buy personal data to inform insurance pricing, so if data in there is inferred or portrayed incorrectly that could result in an erroneous spike in someone's rate,” Sherman said.
Bad inferences also can compound and cause bigger problems when companies and other large entities aggregate inferred data to learn more about customer bases and social trends, he said.
For example, health insurers and researchers often buy inference-informed data profiles from third-party collectors like Acxiom to understand social determinants of health, including sensitive information such as alcohol consumption and activity level, Sherman said. Too many flaws in a data set could potentially skew scientific research or a health insurance plan, he said.
Bad inferences also can directly affect a person’s safety, said Nathalie Maréchal, co-director of privacy and data at the Center for Democracy and Technology. Examples include law enforcement getting inaccurate information about people they are investigating; consumers being excluded from important opportunities through discriminatory ad targeting; and individuals being followed around the web by predatory fraud schemes because data brokers have segmented them as having cognitive disabilities or other vulnerabilities.
Identity check
Consumers can check the inferences made about them by making “subject access requests” directly with data brokers or using other online services. (In the U.S., states like California guarantee the right to such requests.) They will likely be very surprised by what turns up, experts say.
Arielle Garcia left her position as chief privacy and responsibility officer at the international media and advertising behemoth UM Worldwide when she became disillusioned by the industry’s data problems and the pervasiveness of big tech’s influence.
When she ran subject access requests on herself, Garcia was surprised to see that a very large data broker had identified her as a Southeast Asian single mother of two when she is childfree, married and has an ethnicity based in every continent except for Asia, she said. Garcia also found Meta had listed her as a Kazakhstan tourism enthusiast despite her having no special interest in the country.
Data brokers’ reliance on cookie files that track web browsing — which don’t give the full picture of a person and can lead to false assumptions — is part of the problem, she said. However, Garcia also blames the rampant inaccuracies on what she calls an “incentive structure issue.”
“I’ve seen the same individual be placed in a likely Republican donor audience segment and also a likely Democrat donor audience segment,” Garcia said. “The more categories you're in the more likely they are to be able to monetize that.”
Marketers back up Garcia’s assertion.
Joshua Lowcock, president of Quad Media, a data media and services company working with advertisers, said he too is categorized incorrectly, with one large data broker listing him as a New Jersey resident (he lives in Connecticut) and another placing his address on a vacant block in New York.
“Data broker profiles have inferences of me being a heavy drinker and smoker, and an active participant in extreme sports,” Lowcock said via email. “I am none of these.”
He said the incorrect health and lifestyle inferences concern him because “they could potentially be used in discriminatory ways to make decisions that are completely wrong, impacting everything from my ability to access services, to the premiums I am charged, to how companies might score me for risk.”
An experienced marketer, Lowcock called the inaccuracies “a huge problem for advertisers, who are collectively spending billions on data broker audience segments that are riddled with errors.”
Yet when inferences are correct they can be equally damaging and sometimes downright offensive, experts said.
According to a Connecticut attorney general’s office report in February that summarized outcomes of the state’s data privacy law, a national cremation service targeted ads to a state resident who had just finished chemotherapy.
Privacy advocates have long decried how data is extracted from consumers’ web browsing habits without their consent even as they recognize many inferences are wrong.
However, until more stringent data privacy laws take hold, there is one big — and dangerous — exception to the inaccuracy trend, according to Augustine Fou, an independent cybersecurity and ad fraud researcher.
Geolocation data has proven to be scarily accurate and damaging, he said, pointing to the outing of a Catholic priest and women who have visited reproductive health clinics as well as the location of secret military bases.
The Federal Trade Commission has been particularly aggressive about enforcing geolocation abuses by data brokers in recent months and the issue has galvanized privacy advocates.
Long-term quality problems
Multiple academic studies have confirmed that third-party data broker inferences can be wildly inaccurate.
Nico Neumann, the author of papers examining junk inferences, said via email that his work across multiple studies shows that “advertising audiences based on third-party cookies are often inaccurate, sometimes barely better than finding the right customers by random chance.”
“This has not changed over the years - in 2024, the quality of consumer profiles sold by many data brokers shows no improvement over what we measured eight years ago,” Neumann added.
The industry is already showing potential signs of battle fatigue. Acxiom recently sent an email blast to marketers offering an unsolicited one-third reduction in pricing for its audience data sets, according to a marketing executive who received the email and did not want to be named due to professional conflicts.
An Acxiom spokesperson declined comment on the solicitation, saying the company has no evidence that it had sent such an email.
Third-party data brokers like Acxiom have been hurt by the ability of Meta, Google and other first-party data brokers — or those who have a direct relationship with the consumers whose data they sell — to extract more accurate data from their users and sell it directly to advertisers. But experts say even the so-called walled garden tech giants sometimes sell junk.
“The opacity of these platforms is such that while they theoretically have better data, they rely on that to make marketers trust that they'll use that data in their interests,” Garcia said.
Garcia, who is now the director of intelligence for the industry watchdog Check My Ads, pointed to recent research showing that Google’s YouTube showed ads for a bank to viewers of a Barbie-themed children’s video in a channel for preschoolers.
Those who clicked on the ad were sent to the bank’s website, leading children to be followed by tracking software from Google, Meta and Microsoft, according to the research from Adalytics, which analyzes ad campaigns for brands.
“The hard thing to talk about is that two things can be true at once: The data and the way it's collected is incredibly invasive and poses a risk of harm to people,” Garcia said. “At the same time, the incentive structures are such that it's not incredibly useful for marketing.”
Garcia believes that junk inferences could ultimately put most brokers out of business.
“At some point, marketers will recognize that they are paying a premium for useless data, and that the real beneficiaries are the data brokers and other ad tech middlemen themselves,” she said. “When that happens, it will be very difficult for most data brokers to sustain their business models.”
However, even if the information gleaned from cookies is often wrong, the collection method underpins the third-party data broker business model. And soon cookies will begin to disappear.
Google is in the process of phasing out third-party cookies in Chrome browsers, which will destroy a key source of information and could pose a major threat to many third-party brokers, experts said.
A data broker customer turned watchdog
The inaccuracy of data broker inferences has led to a nascent industry dedicated to helping them improve.
Scott McKinley worked at audience-measurement giant Nielsen for 7 years and evaluated many data brokers as part of his role developing products for the firm. Over time he noticed that the personal information available to his company to make accurate inferences for advertisers was often poor.
In 2019 he started Truthset, a company focused on helping data brokers understand the quality of their own data so they can improve it.
Like Garcia, McKinley said third-party data brokers are incentivized to hit “scale targets” and place individuals in more categories than are accurate. Worse, there is no way for advertisers to validate the data they are buying from third-party brokers.
“Primarily, it's the third-party data brokers who are feeding everybody this data on which these pretty crappy audiences are built,” McKinley said, noting that his clients include the large and more established third-party data brokers Experian, Epsilon and TransUnion.
“There’s wide differences in the quality of data across the ecosystem,” he added.
The lack of validation and the ad industry’s insatiable appetite for data has led to a “race to the bottom” by most brokers, who rely on web browsing habits from the open internet, which McKinley called “at least half fraud.”
Data from the open internet is “really just a shape shifting world of snake oil,” McKinley said. “It's really, really, really bad.”
Even as wide-scale inaccuracies persist, there is starting to be a “reckoning” as advertisers are slowly catching on to the problem, privacy regulation takes off and cookies begin to disappear — a harbinger of trouble ahead for most third-party data brokers, McKinley said.
“There are billions of IDs that brokers trade in that aren’t about people,” McKinley said. “It's about devices and IP addresses and mobile IDs that are changing constantly.”
Many devices are shared by multiple members of the same family, McKinley said, and one IP address can represent a whole apartment complex.
“We'd analyze how accurate this data was when I was at Nielsen,” McKinley said, “and it was half wrong all the time.”
Suzanne Smalley
is a reporter covering privacy, disinformation and cybersecurity policy for The Record. She was previously a cybersecurity reporter at CyberScoop and Reuters. Earlier in her career Suzanne covered the Boston Police Department for the Boston Globe and two presidential campaign cycles for Newsweek. She lives in Washington with her husband and three children.