Linguists, computer scientists use TACC supercomputers to improve natural language processing
It’s not hard to tell the difference between the “charge” of a battery and criminal “charges.” But for computers, distinguishing between the various meanings of a word is difficult.
For more than 50 years, linguists and computer scientists have tried to get computers to understand human language by programming semantics as software. Driven initially by efforts to translate Russian scientific texts during the Cold War (and more recently by the value of information retrieval and data analysis tools), these efforts have met with mixed success. IBM’s Jeopardy-winning Watson system and Google Translate are high profile, successful applications of language technologies, but the humorous answers and mistranslations they sometimes produce are evidence of the continuing difficulty of the problem.
Our ability to easily distinguish between multiple word meanings is rooted in a lifetime of experience. Using the context in which a word is used, an intrinsic understanding of syntax and logic, and a sense of the speaker’s intention, we intuit what another person is telling us.
“In the past, people have tried to hand-code all of this knowledge,” explained Katrin Erk, a professor of linguistics at The University of Texas at Austin focusing on lexical semantics. “I think it’s fair to say that this hasn’t been successful. There are just too many little things that humans know.”
Other efforts have tried to use dictionary meanings to train computers to better understand language, but these attempts have also faced obstacles. Dictionaries have their own sense distinctions, which are crystal clear to the dictionary-maker but murky to the dictionary reader. Moreover, no two dictionaries provide the same set of meanings — frustrating, right?
Watching annotators struggle to make sense of conflicting definitions led Erk to try a different tactic. Instead of hard-coding human logic or deciphering dictionaries, why not mine a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a weighted map of relationships — a dictionary without a dictionary?
“An intuition for me was that you could visualize the different meanings of a word as points in space,” she said. “You could think of them as sometimes far apart, like a battery charge and criminal charges, and sometimes close together, like criminal charges and accusations (“the newspaper published charges…”). The meaning of a word in a particular context is a point in this space. Then we don’t have to say how many senses a word has. Instead we say: ‘This use of the word is close to this usage in another sentence, but far away from the third use.'”
To create a model that can accurately recreate the intuitive ability to distinguish word meaning requires a lot of text and a lot of analytical horsepower.
“The lower end for this kind of a research is a text collection of 100 million words,” she explained. “If you can give me a few billion words, I’d be much happier. But how can we process all of that information? That’s where supercomputers and Hadoop come in.”
Applying Computational Horsepower
Erk initially conducted her research on desktop computers, but around 2009, she began using the parallel computing systems at the Texas Advanced Computing Center (TACC). Access to a special Hadoop-optimized subsystem on TACC’s Longhorn supercomputer allowed Erk and her collaborators to expand the scope of their research. Hadoop is a software architecture well suited to text analysis and the data mining of unstructured data that can also take advantage of large computer clusters. Computational models that take weeks to run on a desktop computer can run in hours on Longhorn. This opened up new possibilities.
“In a simple case we count how often a word occurs in close proximity to other words. If you’re doing this with one billion words, do you have a couple of days to wait to do the computation? It’s no fun,” Erk said. “With Hadoop on Longhorn, we could get the kind of data that we need to do language processing much faster. That enabled us to use larger amounts of data and develop better models.”
Treating words in a relational, non-fixed way corresponds to emerging psychological notions of how the mind deals with language and concepts in general, according to Erk. Instead of rigid definitions, concepts have “fuzzy boundaries” where the meaning, value and limits of the idea can vary considerably according to the context or conditions. Erk takes this idea of language and recreates a model of it from hundreds of thousands of documents.
The Latest Bing News on:
When Will My Computer Understand Me?
- ‘Somebody help me!’: Security video shows guard attacking hotel guest near resort poolon April 25, 2024 at 11:05 pm
KPHO reports that its investigative team viewed police bodycam video and security footage from the hotel. The confrontation occurred in a hallway near the hotel pool. A woman in a towel, a man in a ...
- 'I love my disabled son - but he has stolen my life and I hate him for it'on April 25, 2024 at 3:11 am
A mum-of-one has shared what it's really like to raise a disabled child who will never be able to take care of himself - as she admits she 'hates' the way her life has turned out ...
- My Dinner With Andreessenon April 24, 2024 at 12:22 pm
Marc Andreessen and Laura Arrillaga-Andreessen arrive at the tenth Breakthrough Prize Ceremony on April 13, 2024, at the Academy Museum of Motion Pictures in Los Angeles.
- My ‘Flu’ Symptoms Turned Out to Be Early Signs of a Strokeon April 24, 2024 at 10:46 am
Mary Peterson, 33, had a stroke last year. The symptoms came out of nowhere—ringing in her ears, convulsions, and loss of feeling on the right side of her body. At the emergency room, she was ...
- Help! I Nursed My Girlfriend Back to Health. Now I Want to Leave Her.on April 23, 2024 at 3:00 am
Dear Prudence, After three years, my girlfriend is finally getting over her clinical depression. She is taking her ...
- Retirement made me feel invisible – so I became a male modelon April 21, 2024 at 10:00 am
When I decided to retire, it felt like the right moment. I was 63, had been running my office furniture business for 25 years, and had a young family I wanted to spend more time with – without the ...
- My Life Outside of the Apple Vision Proon April 20, 2024 at 3:31 am
Apple's mixed-reality headset is impossible to ignore and, one WIRED writer finds, can create a wall of isolation between partners.
- The 65-year-old computer system at the heart of American businesson April 15, 2024 at 3:16 am
More than 40% of U.S. banking systems are built on a coding language that predates the Beatles. Is that a problem?
- What my 12-year-old told me about WhatsApp groups changed my mind about a banon April 12, 2024 at 7:31 am
Never has the generational divide been more apparent between me and my kids ... was via the shared “computer lab”. Nobody had laptops, or smartphones – I got my first mobile phone (a Nokia ...
- A Congressman wanted to understand AI. So he went back to a college classroom to learnon April 10, 2024 at 9:17 pm
Artificial intelligence has been called an economic game changer, a threat to democracy or even an existential threat to humanity ...
The Latest Google Headlines on:
When Will My Computer Understand Me?
[google_news title=”” keyword=” When Will My Computer Understand Me?” num_posts=”10″ blurb_length=”0″ show_thumb=”left”]
The Latest Bing News on:
Computers understand human language
- Study explores why human-inspired machines can be perceived as eerieon April 25, 2024 at 4:40 am
Artificial intelligence (AI) algorithms and robots are becoming increasingly advanced, exhibiting capabilities that vaguely resemble those of humans. The growing similarities between AIs and humans ...
- Forget the AI doom and hype, let's make computers usefulon April 25, 2024 at 12:26 am
Artificial Intelligence is the study of ideas that enable computers to be intelligent. Well, OK, that’s pretty circular, since you need to define intelligence somehow, as Winston admits. But he then ...
- Tiny but mighty: The Phi-3 small language models with big potentialon April 23, 2024 at 3:25 pm
A series of SLMs offer many of the same capabilities found in LLMs but are smaller in size and are trained on smaller amounts of data.
- Computers to grade written portions of STAAR tests, parents and faculty question AI reliabilityon April 23, 2024 at 8:10 am
STAAR tests are underway in schools across North Texas but there are some questions about how the tests are being graded.
- A new framework to generate human motions from language promptson April 23, 2024 at 3:40 am
Machine learning-based models that can autonomously generate various types of content have become increasingly advanced over the past few years. These frameworks have opened new possibilities for ...
- AI and the End of the Human Writeron April 22, 2024 at 3:09 am
Baron points out in her book Who Wrote This?, readers aren’t always able to tell if a slab of text came out of a human torturing herself over syntax or a machine’s frictionless innards. (William Blake ...
- IT leaders hiring CISOs aplenty, but don’t fully understand the roleon April 21, 2024 at 9:00 pm
Most businesses now have a CISO, but perceptions of what CISOs are supposed to do, and confusion over the value they offer, may be holding back harmonious relations, according to a report.
- Computer scientist William Wang receives prestigious early career technical achievement awardon April 16, 2024 at 10:18 am
As artificial intelligence continues to boom, scaling algorithms to ever-increasing data sets also becomes a bigger hurdle. Such is the case in the domain of natural language processing (NLP), or, the ...
- The 65-year-old computer system at the heart of American businesson April 15, 2024 at 3:16 am
More than 40% of U.S. banking systems are built on a coding language that predates the Beatles. Is that a problem?
- AI's flawed human yardstickon April 11, 2024 at 5:37 am
"There is no widely accepted definition of intelligence or 'smart,' so there is no general test that people use," says Blake Richards, a professor of computer science and neuroscience at McGill ...
The Latest Google Headlines on:
Computers understand human language
[google_news title=”” keyword=”computers understand human language” num_posts=”10″ blurb_length=”0″ show_thumb=”left”]