A*STAR researchers present a new machine learning framework to solve big-data problems.
via Pavel_R/Getty
Helping computers learn to tackle big-data problems outside their comfort zones
Imagine combing through thousands of mugshots desperately looking for a match. If time is of the essence, the faster you can do this, the better. A*STAR researchers have developed a framework that could help computers learn how to process and identify these images both faster and more accurately1.
Peng Xi of the A*STAR Institute for Infocomm Research notes that the framework can be used for numerous applications, including image segmentation, motion segmentation, data clustering, hybrid system identification and image representation.
A conventional way that computers process data is called representation learning. This involves identifying a feature that allows the program to quickly extract relevant information from the dataset and categorize it — a bit like a shortcut. Supervised and unsupervised learning are two of the main methods used in representation learning. Unlike supervised learning, which relies on costly labeling of data prior to processing, unsupervised learning involves grouping or ‘clustering’ data in a similar manner to our brains, explains Peng.
Subspace clustering is a form of unsupervised learning that seeks to fit each data point into a low-dimensional subspace to find an intrinsic simplicity that makes complex, real-world data tractable. Existing subspace clustering methods struggle to handle ‘out-of-sample’, or unknown, data points and the large datasets that are common today.
“One of the challenges of the big-data era is to organize out-of-sample data using a machine learning model based on ‘in-sample’, or known, observational data,” explains Peng who, with his colleagues, has proposed three methods as part of a unified framework to tackle this issue. These methods differ in how they implement representation learning; one focuses on sparsity, while the other two focus on low rank and grouping effects. “By solving the large-scale data and out-of-sample clustering problems, our method makes big-data clustering and online learning possible,” notes Peng.
The framework devised by the team splits input data into ‘in-sample’ data or ‘out-of-sample’ data during an initial ‘sampling’ step. Next, the in-sample data is grouped into subspaces during the ‘clustering’ step, after which the out-of-sample data is assigned to the nearest subspace. These points are then designated as cluster members.
The team tested their approach on a range of datasets including different types of information, from facial images to text — both handwritten and digital — poker hands and forest coverage. They found that their methods outperformed existing algorithms and successfully reduced the computational complexity (and hence running time) of the task while still ensuring cluster quality.
Learn more: Thinking outside the sample
The Latest on: Big data
via Google News
The Latest on: Big data
- Trai seeks views on artificial intelligence, big data adoption to improve telecom serviceson August 6, 2022 at 3:11 am
The Telecom Regulatory Authority of India (Trai) in its consultation paper on "Leveraging artificial intelligence and Big Data in the telecommunication sector" has sought views on sectors where ...
- No, You’re Not Alone. Google Is Also Making This Big Mistake On AIon August 5, 2022 at 10:15 pm
As AI goes, garbage in is most definitely garbage out, and AI projects are suffering from significant bad data garbage. If Google, ImageNet, and others are making this mistake, for sure you are making ...
- Armed with $19.5M, LiveEO plots a big data course between satellite geospatial information and industryon August 4, 2022 at 4:41 pm
When it comes to geospatial and mapping data and how they are leveraged by organizations, satellites continue to play a critical role when it comes to sourcing raw information. Getting that raw data ...
- Growth of Global Big Data Analysis Software Market Analysis Report Till 2028on August 2, 2022 at 3:30 am
The Big Data Analysis Software market report provides a detailed analysis of global market size, regional and country-level market size, segmentation market growth, market share, competitive Landscape ...
- Open Source Big Data Tools Market Size Detailed Analysis of Current Industry Figures with Forecasts Growth By 2028on August 2, 2022 at 3:19 am
Open Source Big Data Tools market report is a comprehensive analysis of the growth dynamics and revenue inflows ...
- Big Data Analytics in Healthcare Accounts for ~14.2% Market Share By 2032; Owing To The Cloud Deployment in Healthcare Data Managementon July 30, 2022 at 3:23 pm
The global demand for big data analytics in healthcare is expected to reach US$ 39.7 billion in 2022, according to The Fact. most MR’s recent study. Data as a technology has been rapidly accepted by ...
- How big data could form the cornerstone of the metaverseon July 30, 2022 at 10:22 am
Interested in learning what's next for the gaming industry? Join gaming executives to discuss emerging parts of the industry this October at GamesBeat Summit Next. Register today. The emergence of ...
- Big Data and AI Can Defend Democracy—Or Destroy Iton July 29, 2022 at 7:41 am
Data analysis is like any other tool; its impact on our lives depends upon its owners’ intentions. Today’s world is full of sensors, and the higher your nation-state is on the advanced-industrial food ...
- The 10 Coolest Big Data Tools Of 2022 (So Far)on July 27, 2022 at 1:30 pm
Data is an increasingly valuable asset for businesses and a critical component of many digital transformation and business automation initiatives. Here’s a look at 10 cool tools in the big data ...
- Big Data Market in the Automotive Industry - Growth, Trends, COVID-19 Impact, and Forecasts (2022 - 2027)on July 25, 2022 at 6:20 am
New York, July 25, 2022 (GLOBE NEWSWIRE) -- Reportlinker.com announces the release of the report "Big Data Market in the Automotive Industry - Growth, Trends, COVID-19 Impact, and Forecasts (2022 ...
via Bing News