Measuring the interestingness of articles in a limited user environment
by Pon, Raymond K., Ph.D., UNIVERSITY OF CALIFORNIA, LOS ANGELES, 2008, 195 pages; 3343300

Abstract:

Search engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which varies from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment, there are not enough users that would make collaborative filtering effective.

A general framework, called iScore, is presented for defining and measuring the “interestingness” of articles, incorporating user-feedback. iScore addresses various aspects of what makes an article interesting, such as topic relevancy, uniqueness, freshness, source reputation, and writing style. It employs various methods to measure these features and uses a classifier operating on these features to recommend articles. The basic iScore configuration is shown to improve recommendation results by as much as 20%. In addition to the basic iScore features, additional features are presented to address the deficiencies of existing feature extractors, such as one that tracks multiple topics, called MTT, and a version of the Rocchio algorithm that learns its parameters online as it processes documents, called eRocchio. The inclusion of both MTT and eRocchio into iScore is shown to improve iScore recommendation results by as much as 3.1% and 5.6%, respectively. Additionally, in TREC11 Adaptive Filter Task, eRocchio is shown to be 10% better than the best filter in the last run of the task.

In addition to these two major topic relevancy measures, other features are also introduced that employ language models, phrases, clustering, and changes in topics to improve recommendation results. These additional features are shown to improve recommendation results by iScore by up to 14%. Due to varying reasons that users hold regarding why an article is interesting, an online feature selection method in naïve Bayes is also introduced. Online feature selection can improve recommendation results in iScore by up to 18.9%.

In summary, iScore in its best configuration can outperform traditional IR techniques by as much as 50.7%. iScore and its components are evaluated in the news recommendation task using three datasets from Yahoo! News, actual users, and Digg. iScore and its components are also evaluated in the TREC Adaptive Filter task using the Reuters RCV1 corpus.

 
AdviserAlfonoso F. Cardenas
SchoolUNIVERSITY OF CALIFORNIA, LOS ANGELES
SourceDAI/B 70-01, p. , Mar 2009
Source TypeDissertation
SubjectsEngineering; Information science; Computer science
Publication Number3343300
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3343300
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.