Rice University researchers have discovered a more efficient way for social media companies to keep misinformation from spreading online using probabilistic filters trained with artificial intelligence.
The new approach to scanning social media is outlined in a study presented today at the online-only 2020 Conference on Neural Information Processing Systems (NeurIPS 2020) by Rice computer scientist Anshumali Shrivastava and statistics graduate student Zhenwei Dai. Their method applies machine learning in a smarter way to improve the performance of Bloom filters, a widely used technique devised a half-century ago.
Using test databases of fake news stories and computer viruses, Shrivastava and Dai showed their Adaptive Learned Bloom Filter (Ada-BF) required 50% less memory to achieve the same level of performance as learned Bloom filters.
To explain their filtering approach, Shrivastava and Dai cited some data from Twitter. The social media giant recently revealed that its users added about 500 million tweets a day, and tweets typically appeared online one second after a user hit send.
“Around the time of the election they were getting about 10,000 tweets a second, and with a one-second latency that’s about six tweets per millisecond,” Shrivastava said. “If you want to apply a filter that reads every tweet and flags the ones with information that’s known to be fake, your flagging mechanism cannot be slower than six milliseconds or you will fall behind and never catch up.”
If flagged tweets are sent for an additional, manual review, it’s also vitally important to have a low false-positive rate. In other words, you need to minimize how many genuine tweets are flagged by mistake.
“If your false-positive rate is as low as 0.1%, even then you are mistakenly flagging 10 tweets per second, or more than 800,000 per day, for manual review,” he said. “This is precisely why most of the traditional AI-only approaches are prohibitive for controlling the misinformation.”
Shrivastava said Twitter doesn’t disclose its methods for filtering tweets, but they are believed to employ a Bloom filter, a low-memory technique invented in 1970 for checking to see if a specific data element, like a piece of computer code, is part of a known set of elements, like a database of known computer viruses. A Bloom filter is guaranteed to find all code that matches the database, but it records some false positives too.
“Let’s say you’ve identified a piece of misinformation, and you want make sure it is not spread in tweets,” Shrivastava said. “A Bloom filter allows to you check tweets very quickly, in a millionth of a second or less. If it says a tweet is clean, that it does not match anything in your database of misinformation, that’s 100% guaranteed. So there is no chance of OK’ing a tweet with known misinformation. But the Bloom filter will flag harmless tweets a fraction of the time.”
Within the past three years, researchers have offered various schemes for using machine learning to augment Bloom filters and improve their efficiency. Language recognition software can be trained to recognize and approve most tweets, reducing the volume that need to be processed with the Bloom filter. Use of machine learning classifiers can lower how much computational overhead is needed to filter data, allowing companies to process more information in less time with the same resources.
“When people use machine learning models today, they waste a lot of useful information that’s coming from the machine learning model,” Dai said.
The typical approach is to set a tolerance threshold and send everything that falls below that threshold to the Bloom filter. If the confidence threshold is 85%, that means information that the classifier deems safe with an 80% confidence level is receiving the same level of scrutiny as information it is only 10% sure about.
“Even though we cannot completely rely on the machine-learning classifier, it is still giving us valuable information that can reduce the amount of Bloom filter resources,” Dai said. “What we’ve done is apply those resources probabilistically. We give more resources when the classifier is only 10% confident versus slightly less when it is 20% confident and so on. We take the whole spectrum of the classifier and resolve it with the whole spectrum of resources that can be allocated from the Bloom filter.”
Shrivastava said Ada-BF’s reduced need for memory translates directly to added capacity for real-time filtering systems.
“We need half of the space,” he said. “So essentially, we can handle twice as much information with the same resource.”
The Latest Updates from Bing News & Google News
Go deeper with Bing News on:
- Selena Gomez Shares Why She Publicly Called Out Facebook & More on Spreading Misinformationon January 19, 2021 at 11:49 pm
Selena Gomez wears her hair in a cute updo with her face mask while out and about in New York City on Sunday afternoon (January 17). In a new interview, Selena opened up about her decision to call out ...
- COVID-19 misinformation: scientists create a 'psychological vaccine' to protect against fake newson January 19, 2021 at 7:31 am
Anti-vaccination groups are projected to dominate social media in the next decade if left unchallenged. To counter their viral misinformation at a time when COVID-19 vaccines are being rolled out, our ...
- Online Election Misinformation Dropped a Whopping 73 Percent After Trump Was Kicked Off Social Mediaon January 19, 2021 at 6:42 am
A research firm found that online misinformation about the 2020 election plummeted a stunning 73 percent after Trump was kicked off Twitter, Facebook, and other social media platforms.
- Report: Trump’s Twitter ban led to a 73% drop in election fraud misinformationon January 18, 2021 at 12:56 pm
Trump‘s bans from Twitter, Facebook, and other social media platforms may have been controversial, but they appear to have been effective in achieving one goal: combatting misinformation. According to ...
- Investors Push Home Depot and Omnicom to Steer Ads From Misinformationon January 18, 2021 at 8:11 am
Companies have struggled to keep their ad dollars from going to online outlets that promoted fraudulent theories about the 2020 election.
Go deeper with Google Headlines on:
Go deeper with Bing News on:
Real-time misinformation filtering systems
- How to trade Community Health Systems $CYH With Risk Controlson January 19, 2021 at 8:46 pm
Price matters most to making money in the market. Celebrating 20 years, Stock Traders Daily provides the tools that help you develop investment ...
- Lightspeed Systems® Recognized as a Tech & Learning Best of 2020 Winner for Lightspeed Analytics™on January 12, 2021 at 6:10 am
Austin, Texas, Jan. (GLOBE NEWSWIRE) -- Lightspeed Systems® (Lightspeed), the leading K-12 online safety and effectiveness solutions provider, was announced as ...
- Promising Development to Revolutionize Image Processing for Autonomous Systemson January 12, 2021 at 1:00 am
Inspired by the human eye, new technology under development has great potential for autonomous vehicle image processing, promising to improve the efficiency of autonomous systems on cars, trucks, and ...
- Watch out for this misinformation when Congress meets to certify the election.on January 6, 2021 at 11:16 am
As Congress meets on Wednesday to certify Joseph R. Biden Jr.’s victory in the November election, President Trump and his supporters continue to spread rumors, conspiracy theories and misinformation ...
- Machine learning improves particle accelerator diagnosticson January 5, 2021 at 9:08 am
The machine learning system has passed its first two-week test, correctly identifying glitchy accelerator components and the type of glitches they're experiencing in near-real-time. An analysis of ...