Musk got the Twitter data, now comes the hard part

by time news

Alon Musk was given access to Twitter data he said he needed to complete the $ 44 billion purchase, but data scientists and experts doubt that this stream of data will provide Musk with the final answers he seeks on some of the fake social media accounts.

Following a legal exchange between the parties, Twitter has in recent weeks sent Musk historical data on tweets and access to what the company calls the ‘fire hose’ of the tweets, people familiar with the matter said. The firefighting hose shows all the tweets – people are tweeting hundreds of millions of times a day on the platform, according to the company – in real time.

Musk’s access to data may pave the way for transaction completion. He said the deal would not go ahead unless he could see these figures to assess the company’s claims about the amount of social network accounts that are Pike or spam accounts. Twitter has long published an estimate that the amount of fake or spam accounts is less than 5% of the amount of daily users that can be posted to them. The company recently estimated that it has 229 real customers. Musk says he believes the number of fake accounts is closer to 20%.

An impractical and data-laden stream of tweets

The nature of the data coming out of the fire department – both the volume and the limitations of the data stream – make it difficult for Musk or anyone else to reach clear findings in a short period of time that can prove Twitter’s estimates of counterfeit and junk accounts are accurate. Socialism. And it will be difficult to compare any assessment that is made, to the assessments that Twitter has published to the public, because Twitter has its own protocol on how to define accounts as fake.

The Twitter hose is a stream of tweets that includes such a huge amount of data that it is impractical to analyze the data in search of spam, said Micha Sheffer, a consultant for social networking companies on issues of trust and safety, who previously worked at YouTube and Snap.

Making the hose accessible to Musk is “more of a step of take it, shut up and go than a real waiver,” he said. Twitter has already explained to Musk how it calculates the amount of daily users to which advertisements can be sent, said one person familiar with the subject.

Last month, Musk said, a few weeks after agreeing to buy Twitter, that the purchase was “temporarily suspended” for fear of fake accounts – prompting some bystanders to assess that Musk may be trying to bargain for the price or withdraw from the deal.

Earlier this month, Tesla’s CEO threatened to cancel the deal if Twitter did not provide him with all the data he requested. In response, Twitter announced that it would “continue to send information to Musk in cooperation.”

People who have researched Twitter’s data say that digesting it in real time is a huge challenge because of the amount of data obtained and the size of the resources needed to analyze it, especially computing power, infrastructure and expertise. There are about a dozen companies that have paid for access to the fire department hose over the years, said a person familiar with the subject.

“The average company would be drowning in data,” said Rahul Talang, a professor of information systems at Heinz College at Carnegie Mellon University. Musk has not said how he will perform his analysis, though as the richest person in the world, he has the resources available to hire enough data analysts to complete the test within about a month, he said.

Musk’s assessment of Pike will not be the same as Twitter’s

With Twitter’s firefighting hose, Musk will be able to find some instances of behavior that are likely to indicate Pike or spam accounts, such as accounts that post more tweets than a human can post in a short period of time, said Tamar Hassan, CEO of Human Security Inc. However, such findings could also include automated tweets that disseminate entertaining or useful information, he said, such as updates on the weather or photos of cute animals. .

At the time, Twitter’s firefighting hose did not include some of the information that could help verify that specific accounts were human – such as IP addresses, phone numbers and other personal information.

If Musk comes up with his own estimate of the amount of fake accounts, it will most likely not be a one-to-one comparison with Twitter’s estimates. Twitter has already said the number it posted is based on several human tests of thousands of randomly sampled accounts, compared to user information the company has not revealed.

Musk “will have to reliably replace their process in order to challenge the method of calculation they use,” said Schaefer, a social media consultant.

The limitations of fire hose data can actually affect the percentage of true users. The Fire Hose does not have data on users who enter the platform to read tweets but do not tweet themselves – probably a large amount of platform users, said John Kelly, CEO of analysts at social media Graphika. This means that the tool can not be used to estimate the total From which an estimate of the amount of counterfeit accounts can be derived.

It is not enough to estimate the amount of daily users that can be monetized on the platform and are not human, “he said.

What is the definition of a fake account in general

Twitter and Musk will have to agree on the definition of what constitutes a fake account or spam, said Jay Nathan Matias, an assistant professor of communication at Cornell University who researches social networks and other technology platforms. There is no universal definition of these terms and companies often do not publish their definitions because such information can be used to circumvent protections, Matias said.

“If Musk and his team decide they want to achieve different results than what Twitter came up with, it will be very easy for them to do so,” Matias said. “But any other team can come and challenge the definitions of Musk and his teams, because there is no uniform standard.”

Because of the amount of data and the different ways in which the information can be interpreted, a split in data on bots between Musk and Twitter will not be unusual or surprising, data experts say, but it may be large enough to change the trajectory of the purchase transaction or its terms.

“It’s going to be very difficult to reach a level of certainty that will allow Musk to take a defensive stance or take any other action,” said Carrie O’Connor Wallaja, CEO of Au10Tix.

Cara Lombardo participated in the preparation of the article.

You may also like

Leave a Comment