System that replaces human intuition with algorithms outperforms 615 of 906 human teams
Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.
MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.
In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.
“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
Between the lines
Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.
Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.
“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”
In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.
Read more: Automating big-data analysis
The Latest on: Big-data analysis
via Google News
The Latest on: Big-data analysis
- KPMG strikes deal with Alteryx to boost data analytics capabilitieson June 15, 2022 at 4:01 pm
KPMG entered into a “strategic alliance” with California tech firm Alteryx, as part of the accountancy firm’s push to boost its data analytics capabilities. The Big Four firms’ deal with the analytics ...
- Sensitive proprietary patterns discovered in data mining given privacy booston June 14, 2022 at 5:00 pm
Computer scientists inspired by the optimization problems solved by flocks of birds have given a boost to protection of sensitive proprietary patterns discovered over the course of data mining of ...
- Splunk bolsters its big-data platforms with new security and observability toolson June 14, 2022 at 7:44 am
Big-data analytics company Splunk Inc. announced some major updates to the cloud-based and self-managed versions of its data platforms today. With the updated Splunk Cloud Platform and general ...
- Entrepreneurs, CMO’s, And All Marketers, Why Small Data Might Drive More Creative Customer Insights Than Big Dataon June 14, 2022 at 6:42 am
Small data might actually yield more real insights then big data. Learn what you can do to get insights and an edge on your competition.
- Big Data Market worth $273.4 billion by 2026 - Report by Marketsandmarkets™on June 8, 2022 at 8:00 am
According to a research report "Big Data Market by Component, Deployment Mode, Organization Size, Business Function (Finance, Marketing & Sales), Industry Vertical (BFSI, Manufacturing, Healthcare & ...
- With 13.2% CAGR, Big Data Analytics Market to Surpass USD 549.73 Billion by 2028on June 7, 2022 at 3:39 am
Pune, India, June 07, 2022 (GLOBE NEWSWIRE) -- The global big data analytics market size is rising and projected to reach USD 549.73 billion by 2028. The overall market size was USD 206.95 billion ...
- Asia-Pacific is seeing a hiring boom in retail industry big data roleson June 7, 2022 at 2:00 am
Asia-Pacific was the fastest growing region for big data hiring among retail industry companies in the three months ending April. The number of roles in Asia-Pacific made up 16.2% of total big data ...
- Big data innovation among power industry companies has dropped off in the last yearon June 6, 2022 at 6:00 am
Analysis of patent filings shows a shrinking level of big data related applications in the industry over the past year ...
- Asia-Pacific is seeing a hiring boom in air force industry big data roleson June 6, 2022 at 3:17 am
Asia-Pacific was the fastest growing region for big data hiring among air force industry companies in the three months ending April. The number of roles in Asia-Pacific made up 11.8% of total big data ...
- Are Big Data Frameworks Accelerating to a Dead End?on June 6, 2022 at 2:09 am
There is promise in these lofty projections because big data has proven key to progress and innovation for countless industries in the digital age. For healthcare organizations, t ...
via Bing News