Siena’s Interactive Research Assistant (SIRA): Dr. Small

Information Extraction (IE) is an important and growing field, in part because of the development of ubiquitous social media networking millions of people and producing huge collections of textual information. IE has its roots in AI (Artificial Intelligence) fields including machine learning, logic and search algorithms, computational linguistics, and pattern recognition. This project will participate in the TREC 2014 Session track. Typical users will enter an initial query into a search engine, review some results, possibly copy information, reformulate the query, and repeat this process several times until they are satisfied they have found the information they require. The TREC Session track explores the possibility of improving information retrieval results by utilizing the activity of a user during their entire search session. (Information about the Session track can be found at


Siena Environmental Review Project (SERP): Drs. Booker and Small 

Each year we are faced with new and complex environmental dilemmas. In confronting these, we use a variety of opportunities for public participation to potentially shape and inform policy and regulations. But much of this public input is difficult to catalog and process: in the end it is much less useful than it should be. At Siena we have developed an automated approach to process and “understand” public input to the environmental review process. We have focused on the public comments for potential regulation of natural gas extraction using hydraulic fracturing (fracking) in New York State. This summer’s project will build on previous student work by using computational techniques to better understand and interpret attitudes towards fracking contained in hundreds of thousands of pages of public comments contained in over 10GB of data.


Siena’s Twitter Information Retrieval System (STIRS): Dr. Lim

Microblogs, such as Twitter, are commonplace in our society and are now fertile grounds for scientific inquiry.  This project will investigate algorithms for information retrieval from microblog source material.  This project will involve the TREC (Text Retrieval Competition and Conference) microblog track that brings together participants from across the globe to develop systems that will accept a given topic of interest and return a ranked list of relevant microblog entries.  (Information about the Microblog track can be found at Participants should be familiar with a high level language (preferably Java).  Experience with Lucene is highly desirable but not required.


Siena College Analytical Typology to Trace Expository Rap (SCATTER): Drs. Eccarius-Kelly and Small

When ideologically motivated groups and organizations feel marginalized or disrespected, they sometimes engage in acts of violence to demonstrate power or to implement social change. At Siena College, an interest in tracing particular groups' willingness to increase or decrease violent tactics was tested by using textual analysis related to several internationally active guerrilla organizations.

This summer's project will focus on applying computational techniques to examine the increase or decrease of calls for violent action articulated in the lyrics of rap and hip hop music. Three particular areas will be examined: a) changes in politically motivated calls for violence, b) changes in the levels of misogynistic articulations, and c) changes in the types of targets within the establishment (police, economic leaders, etc.) The project's intent is to identify changes related to levels of violence over time and shape appropriate policy responses.


Siena’s Medical Information Retrieval System (SMIRS): Dr. Medsker

The National Institute of Standards and Technology (NIST) has been running an annual Text Retrieval Competition and Conference (TREC) since 1992.  This is a premier conference that offers researchers in the field of Computational Linguistics the opportunity to showcase their work and compare their results against other leading researchers.  Our Siena research team will participate in the TREC 2012 Medical Records Track.  The goal of the legal track is to develop search technologies that meet the needs of health professionals to engage in effective discovery in digital document collections.  Specifically, our MIRS research team will develop a system that will accept a given topic of interest, searches a medical report corpus, and return a ranked list of relevant documents, for example with the goal of setting up clinical trials.