Collaborative Research and Projects

Collaborative Research and Projects

Microsoft Research introduced a yearly invitation for plans (IFP) associated with NLP and MT in 2009 and 2008. The objective of the IFPs ended up being to encourage scientists and professionals to go over our most pressing needs regarding being able to access information web new ideas in NLP technologies that may offer viable solutions. Microsoft Research asked for research plans around the following subjects:

Web-Scale Natural Language Processing (2008)

Information extraction

Information gisting of search engine results

Machine translation and mix-lingual information retrieval

Monolingual and multilingual online conversational agent or Chatbot

Machine Translation for Multiple Language Information Access (2009)

Applied research for that translation of documents

New approaches of record machine translation

Using machine translation for search engines like google

Relevance ranking of search engine results in multiple languages

Parallel data mining, translation understanding acquisition

Parallel data mining using search engines like google as well as other web assets

Listed here are only a couple of from the funded projects.

Web-Scale NLP: Retrieval Models for Collaborative Question and Answer Archives with Video Presentation

Investigator: Professor Hae-Chang Rim, Korea College

Collaborative Research and Projects

Goal: Explore various techniques for addressing the lexical gap condition in community question retrieval models.

Probably the most representative use of our research results could be Community Question and Answer Search. A person makes its way into an issue to some Community Question Responding to service, that has saved large amounts of formerly requested questions as well as their corresponding solutions using the goal of coming back probably the most related formerly requested questions as well as their solutions. With utilisation of the techniques we investigated within this project, looking application would have the ability to retrieve questions and solutions that aren’t only lexically similar but semantically associated with the consumer question. For instance, if your user asks an issue for example, “Where can one get reduced plane tickets?”, the applying would retrieve not just results which contain the terms “cheap,” “airplane,” and “ticket,” but additionally results using the related terms “low” and “airfare,” reaping helpful benefits the consumer having a more diverse selection of information. We feel our approach may also lead with other search programs including retrieval of short texts who are suffering greatly from lexical gap problems, for example short text ads and lately-made popular twitter updates.

Papers released: “Computing Word Semantic Relatedness for Question Retrieval in Community Question Answering” in IEICE Transactions on Information and Systems 2009 “Bridging Lexical Gaps between Queries and Questions about Large Online Q&A Collections with Compact Translation Models” in EMNLP 2008

Web-Scale NLP: Aspect-based Summarization for Web Search Engine Results

Investigator: Professor Naoaki Okazaki, College of Tokyo, japan

Goal: Generate summaries for that web pages retrieved with a internet search engine.

This project created a web-based application that summarizes web pages retrieved through the Bing search API. Given a question from the user, this application immediately shows looking result acquired through the Bing search API, and creates the summarization service without anyone’s knowledge. The summarization service receives Web addresses that matches source web pages to become made clear. The service downloads the information from the web pages, strips HTML tags to acquire texts, splits the written text into sentences, lemmatizes words in sentences, computes TF*IDF lots of words, assigns bonus weights to occurrences of summarization designs within the source sentences, and calls the MACCORI solver to select summary sentences. The summarization service can easily this method roughly in 2 to 5 seconds.

Papers released: “A Discriminative Alignment Model for Abbreviation Recognition” in Coling 2008 “A Discriminative Candidate Generator for String Transformations” in EMNLP 2008 “Semi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages” in NAACL/HLT 2009 “Robust Method of Abbreviating Terms: A Discriminative Latent Variable Model with Global Information” in ACL-IJCNLP 2009

Machine Translation for Multiple Language Information Access: Bridging Morpho-Syntactic Gap Between Source and Target Sentences for British-Korean Record Machine Translation

Investigator: Professor Hae-Chang Rim, Korea College

Goal: Explore various techniques for mitigating the morpho-syntactic gap in British-Korean record machine translation.

This project developed two different techniques to lessen the morpho-syntactic gap between British and Korean in record machine translation. The very first technique is a preprocessing way of machine translation, which transforms a resource language sentence to become much nearer to a target language sentence when it comes to sentence length and word order. The 2nd technique is a publish-processing way of word alignment, which reflects POS alignment inclination to enhance traditional word alignment models

Papers released: “Bridging Morpho-Syntactic Gap between Source and Target Sentences for British-Korean Record Machine Translation” in ACL-IJCNLP 2009 “A Publish-processing Method of Record Word Alignment Reflecting Alignment Inclination between Part-of-speeches” in COLING 2010 “Discovering More Links: Using Character Alignment to enhance Chinese-Korean Machine Translation” in COLING 2010

Machine Translation for Multiple Language Information Access: Experimental study structure-matching extension for hierarchical phrase-based translation model

Investigator: Professor Tiejun Zhao, Harbin Institute of Technology

Goal: Investigate performance improvement of hierarchical phrase-based translation (HPBT) model if fine-grain syntactic understanding are integrated within the model or related process, e.g. training, tuning, and decoding.

This project effectively developed Bracket Structure Analyzer (BSA), the company of syntactic information for hierarchical phrases. In addition, we’re looking into and performing experiments with syntax-based SMT models.

Papers released: “Improve the Record Machine Translation Performance by Refining the term Alignments” in INFORMATION 2010 “A deterministic approach to predict phrase limitations of the syntactic tree” in ICIC 2010 “Chinese Named Entity Recognition having a Sequence Labeling Approach: Base on Figures, or Base on Words?” in ICIC 2010.