Register now for Technical Briefings at ESEC/FSE 2011, including
Lin Tan and Tao Xie. Text Analytics for Software Engineering: Applications of Natural Language Processing. A Technical Briefing at the 8th joint meeting of the http://www.blogger.com/img/blank.gifEuropeanhttp://www.blogger.com/img/blank.gif Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2011), Szeged, Hungary, September 2011.
Software engineering data contains a rich amount of natural language text: requirements documents, code comments, identifier names, commit logs, release notes, mailing list discussions, etc. The natural language text is essential in the software engineering process to help software engineers and researchers better understand and maintain software. Given the overwhelming amount of available natural language text, there is a high demand of text analytics including natural language processing (NLP) and text mining techniques to automatically analyze the natural language text to improve software quality and productivity. The history of applying NLP and text mining techniques to analyze software engineering data can date back to about a decade ago. In recent five years, text analytics for software engineering has become an emerging topic in the software engineering area. Various recent studies showed that automated analysis of natural language text can improve software reliability, programming productivity, software maintenance, and software quality in general.
This technical briefing (1) provides a quick overview of major text mining techniques as well as NLP techniques (e.g., Part-Of-Speech tagging, chunking, semantic labeling, semantic pattern matching, and negative-expression identification), machine learning techniques (e.g., clustering and decision-tree-based classification), and data mining techniques (e.g., frequent itemset mining); (2) introduces popular text analysis tools (e.g., WordNet and Weka); (3) summarizes major research work done in the area of text analytics for software engineering; and (4) outlines future research directions and highlights research challenges. More information on the technical briefing could be found at https://sites.google.com/site/text4se/.
The ESEC/FSE program includes a complementary technical briefing on “Management of Unstructured Information during Software Evolution: Applications of Text Retrieval”, by Andrian Marcus. We recommend attending both of them.