COSE: Crisis Oriented Search Engine
by Novinger, Matthew T., M.S., UNIVERSITY OF COLORADO AT BOULDER, 2010, 50 pages; 1476968

Abstract:

Crisis informatics is a new area of research that studies the use of technology in and around disaster events by members of the public. In particular, crisis informatics studies the digital behaviors of people accessing social media websites as they collectively generate, consume and disseminate information about a crisis or disaster event. A key challenge for crisis informatics researchers is capturing these activities, as the information can appear without warning and is often ephemeral. For instance, on the Twitter micro-blogging service, only the last 3200 status updates of a user are stored, older updates are simply deleted. Unfortunately, crisis informatics researchers have observed users who generate more than 3200 updates during a single event, increasing the importance of being able to capture their information while it is being generated.

This thesis examines the issues surrounding the design and development of a data collection mechanism for social media websites with the constraints of crisis informatics research in mind. The system is designed to capture data while it is being generated, storing it in a persistent database for later analysis. It is designed to scale to multiple machines to ensure that many websites can be monitored at once or to ensure that all the data being generated from a single website or service (such as Twitter) can be captured without loss.

While our data collection service provides a generic framework for searching social media websites, for the purposes of this thesis, we focused on building modules for searching Twitter. Our Twitter integration allows for two types of search, one based on keyword and one based on gathering the tweets of users who contributed to the result set of a keyword search. We call the latter search a contextual search as it grabs the tweets that surround the ones that match the given keyword search.

Our data collection service provides search capabilities on top of the results that it retrieves, allowing crisis informatics researchers the ability to verify the accuracy of their search terms.

This system has seen production use in gathering data across a major disaster event that occurred in January of 2010. The data gathered by our service provided data sets with known collection-related parameters, allowing them to be useful for later empirical and ethnographic research activities by crisis informatics researchers.

 
AdviserKenneth Anderson
SchoolUNIVERSITY OF COLORADO AT BOULDER
SourceMAI/ 48-05, p. , Jun 2010
Source TypeThesis
SubjectsInformation technology; Computer science
Publication Number1476968
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:1476968
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.