The aim of natural Language Processing (NLP) group would be to design and make software which will evaluate, understand, and generate languages that humans use naturally, to ensure that eventually you’ll have the ability to address your pc as if you had been addressing someone else.
This goal is difficult to achieve. "Understanding" language means, amongst other things, understanding what concepts a thing or phrase means and understanding how to link individuals concepts together inside a significant way. It’s ironic that natural language, the symbol system that’s simplest for humans to understand and employ, is toughest for any computer to understand. Lengthy after machines have proven able to inverting large matrices with speed and sophistication, they still neglect to master the fundamentals in our spoken and written languages.
The difficulties we face originate from the highly ambiguous character of natural language. Being an British speaker you very easily understand a sentence like "Flying planes could be harmful". Yet this sentence presents difficulties to some computer software that lacks your understanding around the globe as well as your knowledge about linguistic structures. May be the more plausible interpretation the pilot reaches risk, or the danger would be to people on the floor? Should "can" be examined like a verb or like a noun? Which of the numerous possible meanings of "plane" is pertinent? Based on context, "plane" could make reference to, amongst other things, an plane, a geometric object, or perhaps a woodworking tool. Just how much and just what kind of context must be introduced to deal with on these questions to be able to adequately disambiguate the sentence?
We address these complaints using a mixture of understanding-designed and record/machine-learning strategies to disambiguate and react to natural language input. Our work has implications for programs like text looking at, information retrieval, question responding to, summarization, gaming, and translation. The grammar checkers at work for British, French, German, and The spanish language are outgrowths in our research Encarta uses our technology to retrieve solutions to user questions Intellishrink uses natural language technology to compress mobile phone messages Microsoft Product Support uses our machine translation software to translate the Microsoft Understanding Base into other languages. As our work evolves, we predict it to allow any section where human customers may benefit by interacting using their computer systems naturally.
Selected current projects
Machine Translation is presently a significant focus from the group. As opposed to most existing commercial MT systems, we’re going after an information-driven approach which all translation understanding is learned from existing bilingual text.
The ESL Assistant presents a brand new paradigm of grammar correction by which large-scale record models and web services offer writing assistance for students of British like a second or language. The services are available these days online. More information are available around the team website. Updates around the project may also be available every so often around the ESL Assistant team blog on MSDN.
Realizing Textual Entailment continues to be suggested like a generic task that captures major semantic inference needs across many natural language processing programs. Along with our work in this region, we’ve distributed around the study community By hand Word Aligned RTE 2006 Data Sets (referred to in Brockett, 2007 ).
Paraphrase recognition and generation are very important to making programs that approximate our knowledge of language. We’ve launched a corpus of roughly 5000 sentence pairs which have been annotated by humans to point whether they can be viewed as paraphrases. Alignment phrase tables produced while using data referred to in Quirk et al. (2004) and Dolan et al. (2004) are actually available too for download.
MindNet aims to formalize the representation of word meanings by developing techniques for instantly building semantic systems from text after which exploring their structure. MindNets built from Japanese and British dictionary data are for sale to online browsing.
Japan NLP project page summarizes regions of research we’re focusing on in processing Japanese.
Amalgam is really a novel system coded in natural Language Processing group at Microsoft Research for sentence realization throughout natural language generation that utilizes machine learning techniques. Sentence realization is the procedure of producing (recognizing) a fluent sentence from the semantic representation.
IntelliShrink is something that uses linguistic analysis to abbreviate an e-mail message to ensure that it may be shown on a mobile phone. IntelliShrink analyses messages in British, French, German or The spanish language.