2007
Speaker: Susan Chebotariov
Title: Are Popular Pages Getting Older? (The Effects of Search Engine Bias)
Date: January 19, 2007
Time: 12:30-1:15pm
Room: BH 4549
Abstract
If the "rich get richer" as is suggested by "Impact of Web Search Engines on 
Page Popularity" (Cho et al), then it would imply that the high-ranking results 
of search engine queries would get older over time.  The authors seek to test 
this hypothesis through various experiments comparing the high- and low-ranking 
results for queries issued to a popular search engine. In addition, the authors 
evaluate the claims made by "The Egalitarian Effect of Search Engines" 
(Fortunato et al) as they relate to the bias introduced to the web by search 
engines.

Speaker: Uri Schonfeld
Title: Rating-to-Rating Recommender System and the Netflix Data
Date: January 26, 2007
Time: 12:30-1:15pm
Room: BH 4549

Speaker: Ka Cheung Sia, Richard
Title: Capturing User Interests by Both Exploitation and Exploration
Date: February 9, 2007
Time: 12:30-1:15pm
Room: BH 4549
Abstract
One of the important research issues in the areas of information
retrieval and Web search is personalization. Providing personalized
services that are tailored toward the specific preferences and
interests of a given user can enhance her experience and
satisfaction.  However, to effectively capture user interests is a
challenging research problem. Some challenges include how to quickly
capture user interests in an unobtrusive way, how to provide
diversified recommendations, and how to track the drifts of user
interests in a timely fashion.

In this talk, we will address the issues of how to model the problem
of learning user interests in a learning framework and propose an
algorithm that actively captures user interests through an
interactive recommendation process. The key advantage of our
algorithm is that it takes into account both exploitation
(recommending items that are of users' main interest) and
exploration (discovering user potential interests). Using
learning framework, our algorithm can quickly capture diversified
user interests in an unobtrusive way, even when the user interests
may drift along time.  Experiments using both synthetic
data and user studies show that our algorithm outperforms the naive
greedy approach.

Speaker: Susan Chebotariov
Title: How to Present Like a Pro
Date: February 16, 2007
Time: 12:30-1:15pm
Room: BH 4549
Summary
Know what you're saying
- Determine what the "take away" information should be
- Handle questions with care

Know what you're doing
- Never turn your back to the audience
- Do not make unconcious movements

Practice! Practice! Practice!
- Practice in front of a live audience
- Find out what your audience learned

Find more presentation tips at http://www.garrreynolds.com/Presentation/index.html

Speaker: Barzan Mozafari
Title: On the Evolution of Wikipedia
Date: February 23, 2007
Time: 12:30-1:15pm
Room: BH 4549
Abstract
A recent phenomenon on the Web is the emergence and proliferation of new 
social media systems allowing social interaction between people. One of 
the most popular of these systems is Wikipedia that allows users to create 
content in a collaborative way. Despite its current popularity, not much is 
known about how users interact with Wikipedia and how it has evolved over time. 
In this paper we aim to provide a first, extensive study of the user behavior on 
Wikipedia and its evolution. Compared to prior studies, our work differs in 
several ways. First, previous studies on the analysis of the user workloads (for 
systems such as peer-to-peer systems [10] and Web servers [2]) have mainly 
focused on understanding the users who are accessing information. In contrast, 
Wikipedia's provides us with the opportunity to understand how users create and 
maintain information since it provides the complete evolution history of its 
content. Second, the main focus of prior studies is evaluating the implication 
of the user workloads on the system performance, while our study is trying to 
understand the evolution of the data corpus and the user behavior themselves. 
Our main findings include that (1) the evolution and updates of Wikipedia is 
governed by a self-similar process, not by the Poisson process that has been 
observed for the general Web [4, 6] and (2) the exponential growth of Wikipedia 
is mainly driven by its rapidly increasing user base, indicating the importance 
of its open editorial policy for its current success. We also find that (3) the 
number of updates made to the Wikipedia articles exhibit a power-law 
distribution, but the distribution is less skewed than those obtained from other 
studies. 

Speaker: Hamid Pirahesh, IBM
Title: Transforming Information Management and Integration for Enterprise Web
Date: March 2, 2007
Time: 12:30-2:00pm
Room*: BH 6426
*Note the temporary change of location
Abstract
Information management is going through a fundamental change, influenced by 
(1) web 2.0, information integration with service oriented architecture and 
deep web, (2) web search paradigm, (3) convergence of structured, semi-
structured (XML), and unstructured data in the context of semantically reach 
data objects. This change is affecting the data model and how data objects are 
consumed by classic DB users, and business process/search oriented users. Web 
scale solutions require new approaches to integration and information 
composition, such as Web 2.0 mashups, and Situational Applications (i.e. 
applications that come together for solving some immediate business problems). 
Contiunous integration and the scale of the web requires continuous discovery of 
information from unstructured and structured data sources. I will present 
contributions of several projects at IBM Research, including InfoSphere and 
Avatar, addressing this problem in the context of semi-structured and 
unstructured data. I will describe the key features of the XML DB project that 
aim at supporting the modern information management systems (e.g., supporting 
the schema chaos model).

Speaker: Snehal Thakkar, USC
Title: Quality Driven Geo-spatial Data Integration
Date: March 9, 2007
Time: 12:30-2:00pm
Room: BH 4549
Abstract
Accurate and efficient integration of geospatial data is an important
challenge in many applications. Previous research has enabled the
integration of geospatial data with different access methods and
formats. However, the existing geospatial data integration frameworks do
not address the key issue of the quality of the integrated data. In this
talk, I will describe a framework for quality-driven geospatial data
integration.  In particular, I will focus on representing quality of
data provided by geospatial sources and conflation operations in a data
integration system. I will also describe a reformulation algorithm to
dynamically generate integration plans that provide high-quality
geospatial data for the user queries.


Speaker: Raymond Pon
Title: iScore: Measuring the Interestingness of Articles in a 
Limited User Environment
Date: March 16, 2007
Time: 12:30-1:15pm
Room: BH 4549
Abstract
Search engines, such as Google, assign scores to news articles based on 
their relevancy to a query. However, not all relevant articles for the 
query may be interesting to a user. For example, if the article is old 
or yields little new information, the article would be uninteresting. 
Relevancy scores do not take into account what makes an article interesting, 
which would vary from user to user. Although methods such as collaborative 
filtering have been shown to be effective in recommendation systems, in a 
limited user environment there are not enough users that would make 
collaborative filtering effective. We present a general framework for 
defining and measuring the "interestingness" of articles, incorporating 
user-feedback. We show 21% improvement over traditional IR methods.

Back to 2007 Events