Integrating natural language and program structure information to improve software search and exploration

by Hill, Emily, Ph.D., UNIVERSITY OF DELAWARE, 2010, 238 pages; 3423409


Today's software is large and complex, with systems consisting of millions of lines of code. New developers to a software project face significant challenges in locating code related to their maintenance tasks of fixing bugs or adding new features. Developers can simply be assigned a bug and told to fix it—even when they have no idea where to begin. In fact, research has shown that a developer typically spends more time locating and understanding code during maintenance than modifying it.

We can significantly reduce the cost of software maintenance by reducing the time and effort to find and understand the code relevant to a software maintenance task. In this dissertation, we demonstrate how textual and structural information in source code can be used to improve software search and exploration tools. To facilitate integration of this information into additional software tools, we present a novel model of word usage in software. This model provides software engineering tool designers access to both structural and linguistic information about the source code, where previously only structural information was available. We utilize textual and structural information to improve software search and program exploration tools, and evaluate against competing state of the art approaches. Our evaluations show that combining textual and structural information can outperform competing state of the art techniques. Finally, we outline uses of the model to improve software engineering tools beyond program search and exploration.

AdvisersLori L. Pollock; Vijay K. Shanker
Source TypeDissertation
SubjectsComputer science
Publication Number3423409

About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - - or contact ProQuest Support.