Automating big-data analysis

System that replaces human intuition with algorithms outperforms 615 of 906 human teams

Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.

In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

Between the lines

Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.

“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”

In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.

The Latest on: Big-data analysis

[google_news title=”” keyword=”big-data analysis” num_posts=”10″ blurb_length=”0″ show_thumb=”left”]

via Google News

The Latest on: Big-data analysis

Unveiling the Power of Big Data Analytics in the Casino Industry
on May 7, 2024 at 1:12 pm
The modern digital era is the age of data and this applies to casinos as well. Big data analytics has transformed the working process of casinos providing them with vital information about player ...
Rethinking ‘Big Data’ — and the rift between business and data ops
on May 7, 2024 at 3:00 am
As an era, ‘Big Data’ may be over, but its underlying value (and tensions) live on, even as organizations seek to make the leap to an AI future.
AI and Big Data Take the Centre Stage in Central Asia at Beetech 2024 Hosted by Beeline Kazakhstan and QazCode
on May 6, 2024 at 3:43 am
Kazakh and international delegates participate in the annual conference, discussing artificial intelligence, big data analytics and app ...
Big data and multicultural competence driving economic growth
on May 5, 2024 at 11:02 pm
The tourism and travel industry is not only rich in adventures and experiences, it is also a data-rich industry. This comes to evidence as travelers create digital footprints throughout their journey ...
Leveraging Big Data for Enhanced Cybersecurity Solutions
on May 3, 2024 at 4:01 am
In this contributed article, Alexander Norell of VikingCloud explores how big data analytics can significantly improve cybersecurity strategies by enabling more accurate threat detection and real-time ...
Big Data Analytics in Semiconductor and Electronics Market Key Segments, Share, Trends, Size, Growth, and Forecast 2024 to 2032
on May 3, 2024 at 1:06 am
report_id=gc1813 The big data analytics in semiconductor & electronics market in EMEA is projected to grow at a compound annual growth rate (CAGR) of 7.19% during the forecast period 2021-2027, ...
Aviation emissions: Big data analysis sparks new concerns
on May 2, 2024 at 8:08 am
The experts meticulously calculated greenhouse gas emissions from aviation for 197 countries, shedding light on previously unreported data.
DIA building ‘data-literate’ workforce to harness OSINT
on May 1, 2024 at 5:41 pm
The DIA deputy director for global integration says middle management is preventing the intelligence community from harnessing more advanced data analysis.
Big Data Analytics in Agriculture Market Know Faster Growing Segments Now
on April 30, 2024 at 10:57 pm
According to a new market research report published by Report Ocean, the global Big Data Analytics in Agriculture Market is anticipated to grow at a CAGR of $%, during the forecast period. The market ...
How next-gen data analytics is changing American football
on April 30, 2024 at 5:00 am
Every year, shortly after the Super Bowl, America’s best college football players head to Indianapolis. It’s a rite of spring, like the migration of birds. Their destination is the Combine, a weeklong ...