Speaker: Ka Cheung Sia, Richard
Title: Capturing User Interests by Both Exploitation and Exploration
Date: February 9, 2007
Time: 12:30-1:15pm
Room: BH 4549
Abstract
One of the important research issues in the areas of information
retrieval and Web search is personalization. Providing personalized
services that are tailored toward the specific preferences and
interests of a given user can enhance her experience and
satisfaction. However, to effectively capture user interests is a
challenging research problem. Some challenges include how to quickly
capture user interests in an unobtrusive way, how to provide
diversified recommendations, and how to track the drifts of user
interests in a timely fashion.
In this talk, we will address the issues of how to model the problem
of learning user interests in a learning framework and propose an
algorithm that actively captures user interests through an
interactive recommendation process. The key advantage of our
algorithm is that it takes into account both exploitation
(recommending items that are of users' main interest) and
exploration (discovering user potential interests). Using
learning framework, our algorithm can quickly capture diversified
user interests in an unobtrusive way, even when the user interests
may drift along time. Experiments using both synthetic
data and user studies show that our algorithm outperforms the naive
greedy approach.
Speaker: Barzan Mozafari
Title: On the Evolution of Wikipedia
Date: February 23, 2007
Time: 12:30-1:15pm
Room: BH 4549
Abstract
A recent phenomenon on the Web is the emergence and proliferation of new
social media systems allowing social interaction between people. One of
the most popular of these systems is Wikipedia that allows users to create
content in a collaborative way. Despite its current popularity, not much is
known about how users interact with Wikipedia and how it has evolved over time.
In this paper we aim to provide a first, extensive study of the user behavior on
Wikipedia and its evolution. Compared to prior studies, our work differs in
several ways. First, previous studies on the analysis of the user workloads (for
systems such as peer-to-peer systems [10] and Web servers [2]) have mainly
focused on understanding the users who are accessing information. In contrast,
Wikipedia's provides us with the opportunity to understand how users create and
maintain information since it provides the complete evolution history of its
content. Second, the main focus of prior studies is evaluating the implication
of the user workloads on the system performance, while our study is trying to
understand the evolution of the data corpus and the user behavior themselves.
Our main findings include that (1) the evolution and updates of Wikipedia is
governed by a self-similar process, not by the Poisson process that has been
observed for the general Web [4, 6] and (2) the exponential growth of Wikipedia
is mainly driven by its rapidly increasing user base, indicating the importance
of its open editorial policy for its current success. We also find that (3) the
number of updates made to the Wikipedia articles exhibit a power-law
distribution, but the distribution is less skewed than those obtained from other
studies.
Speaker: Hamid Pirahesh, IBM
Title: Transforming Information Management and Integration for Enterprise Web
Date: March 2, 2007
Time: 12:30-2:00pm
Room*: BH 6426
*Note the temporary change of location
Abstract
Information management is going through a fundamental change, influenced by
(1) web 2.0, information integration with service oriented architecture and
deep web, (2) web search paradigm, (3) convergence of structured, semi-
structured (XML), and unstructured data in the context of semantically reach
data objects. This change is affecting the data model and how data objects are
consumed by classic DB users, and business process/search oriented users. Web
scale solutions require new approaches to integration and information
composition, such as Web 2.0 mashups, and Situational Applications (i.e.
applications that come together for solving some immediate business problems).
Contiunous integration and the scale of the web requires continuous discovery of
information from unstructured and structured data sources. I will present
contributions of several projects at IBM Research, including InfoSphere and
Avatar, addressing this problem in the context of semi-structured and
unstructured data. I will describe the key features of the XML DB project that
aim at supporting the modern information management systems (e.g., supporting
the schema chaos model).
Speaker: Snehal Thakkar, USC
Title: Quality Driven Geo-spatial Data Integration
Date: March 9, 2007
Time: 12:30-2:00pm
Room: BH 4549
Abstract
Accurate and efficient integration of geospatial data is an important
challenge in many applications. Previous research has enabled the
integration of geospatial data with different access methods and
formats. However, the existing geospatial data integration frameworks do
not address the key issue of the quality of the integrated data. In this
talk, I will describe a framework for quality-driven geospatial data
integration. In particular, I will focus on representing quality of
data provided by geospatial sources and conflation operations in a data
integration system. I will also describe a reformulation algorithm to
dynamically generate integration plans that provide high-quality
geospatial data for the user queries.
Speaker: Raymond Pon
Title: iScore: Measuring the Interestingness of Articles in a
Limited User Environment
Date: March 16, 2007
Time: 12:30-1:15pm
Room: BH 4549
Abstract
Search engines, such as Google, assign scores to news articles based on
their relevancy to a query. However, not all relevant articles for the
query may be interesting to a user. For example, if the article is old
or yields little new information, the article would be uninteresting.
Relevancy scores do not take into account what makes an article interesting,
which would vary from user to user. Although methods such as collaborative
filtering have been shown to be effective in recommendation systems, in a
limited user environment there are not enough users that would make
collaborative filtering effective. We present a general framework for
defining and measuring the "interestingness" of articles, incorporating
user-feedback. We show 21% improvement over traditional IR methods.