System that replaces human intuition with algorithms outperforms 615 of 906 human teams
Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.
MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.
In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.
“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
Between the lines
Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.
Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.
“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”
In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.
Read more: Automating big-data analysis
The Latest on: Big-data analysis
via Google News
The Latest on: Big-data analysis
- Big Data Analytics Market – Growth, Trends, COVID19 Impact Analysis and Forecast 2021on June 11, 2021 at 1:56 pm
The Big Data Analytics market was valued at US$ 37.34 Bn in 2018 and expected to reach US$ 105.08 Bn by 2027 at ...
- Big data: IPK researchers double accuracy in predicting wheat yieldson June 11, 2021 at 11:00 am
The enormous potential of Big Data has already been demonstrated in areas such as financial services and telecommunications. An international team of researchers led by the IPK Leibniz Institute has ...
- Big Data in E-commerce Market Future Trends, Position, Opportunities, Threats, Challenges, Risks, Competitive Scenario and Analysis By 2027on June 11, 2021 at 9:19 am
New Analysis Of Big Data in E-commerce Market overview, spend analysis, imports, segmentation, key players and opportunity analysis 2021-2027. The study also includes an in-depth competitive analysis ...
- The Role of Big Data in Banking : How do Modern Banks Use Big Data?on June 11, 2021 at 4:18 am
Recently, we have been hearing about Big Data more and more often. In today's digital world, this technology is being actively used in the financial industry as well. Let's take a closer look at the ...
- Big Data Network Security Market 2021-28 Supply Chain Analysis | Oracle, Microsoft, Symantecon June 10, 2021 at 11:09 pm
Download a free copy of the Big Data Network Security market report: Impact of COVID-19 on the Global Big Data Network Security Market: The ongoing health crisis COVID-19 pandemic continues to impact ...
- Healthcare Big Data Analytics Market Report Delivering Growth Analysis With Key Trends Of Top Companies (2020-2026)on June 10, 2021 at 6:33 am
The research report highlights all the key elements and components that are thriving global Healthcare Big Data Analytics market during the forecast period of 2020 to 2026. It sheds light on every ...
- Big Data Management Market 2021– Industry Insights, Drivers, Top Trends, Global Analysis And Forecast To 2029on June 10, 2021 at 3:15 am
The Detailed Market intelligence report on the global Big Data Management market applies the most effective of each primary and secondary analysis to weighs upon the competitive landscape and also the ...
- Big Data and Analytics in Higher Educationon June 8, 2021 at 10:47 am
Education is transforming rapidly, striving to prepare youth for the upcoming challenges. Higher education programs created a decade ago are quite outdated today. They usually lack one important ...
- TheCUBE on Cloudera: How a big-data pioneer navigated the shift to the cloudon June 6, 2021 at 8:42 pm
Cloudera delivers an enterprise data cloud platform in a variety of applications, built entirely on open-source technology. As an early frontrunner in the big data market, Cloudera faced disruption ...
- Pandey wins NHLBI Big Data Analysis Challenge for heart failure researchon June 3, 2021 at 2:46 pm
Dr. Ambarish Pandey, Assistant Professor of Internal Medicine and a Texas Health Resources Clinical Scholar, is using big data to improve diuretic resistance among patients with acute heart failure.
via Bing News