BD-UCLA (Big Data - UCLA, formely DB-UCLA) Seminar - Information and Data Management

To subscribe to the bd-ucla mailing list for seminar announcement, please visit this page

BD-UCLA (Big Data - UCLA, formely DB-UCLA) Seminar : Current Schedule

Time: 12:00pm-1:00pm Fridays; Room: 3551P Boelter Hall

*To invite a guest speaker or to schedule a talk, contact Shi Gao (gaoshi at cs dot ucla dot edu)

Fall 2014
Date	Speaker	Title
09/26	Prof. Maurizio Atzori	By-Example Structured Queries on Web Document Corpora
10/03
10/10
10/17
10/24
10/31
11/07
11/14	Bin Bi	Bayesian Modeling for Analyzing Online Content and Users: Learning to discover high-quality information for web users
11/21
11/28	Holiday
12/05
12/12
12/19

Spring 2014
Date	Speaker	Title
04/04	Prof. Mark S. Handcock	Modeling networks when data is missing or sampled
04/11
04/18
04/25	Prof. Carlo Zaniolo	The Declarative Imperative: A logic-based approach to better algorithms
05/02
05/09
05/16
05/23
05/30
06/06
06/13

Winter 2014
Date	Speaker	Title
01/10
01/17	Prof. Lixia Zhang	Evolving Internet into the Future via Named Data Networking (part 2)
01/24
01/31	Prof. Lixia Zhang	Evolving Internet into the Future via Named Data Networking (part 3)
02/07
02/14
02/21	Prof. Alcino Silva	Principles behind ResearchMaps: a web tool for integrating and planning experiments in neuroscience
02/28	Prof. Ying Nian Wu	Learning Generative Models for Natural Image Patterns
03/07	Prof. Stott Parker	Hypothesis Exploration across Disciplines
03/14	Prof. Alan L. Yuille	Computer Vision meets Big Data: Complexity and Compositionality
03/21

Bayesian Modeling for Analyzing Online Content and Users:Learning to discover high-quality information for web users

Speaker:

Bin Bi

Abstract:

The immense scale of the Web has rendered itself as a huge content repository. Web users seek information content of interest primarily from search engines and social media. The sheer amount of online content, ranging from producer-generated content to user-generated content, varies greatly in quality, which can often result in confusion, sub-optimum decisions or dissatisfaction with choices made by users. In this talk, I will discuss my research that follows two routes to address this problem: 1. Learning to discover high-quality content and deliver it to users. 2. Learning to identify domain authorities who generate high-quality content, so users can obtain quality content from these authorities.

This talk will focus on two pieces of my work along Bayesian modeling for identifying authorities who create high-quality user-generated content. Two novel generative models, FLDA and TAA, were specifically designed for two different domains of social media, respectively. The FLDA model is able to discover key influencers on microblogs, such as Twitter, specific to a given query topic. FLDA analyzes the topic-specific social influences of microblog users by jointly modeling their tweet content and follow relationships. On the other hand, TAA is designed to identify topic-specific authorities from content sharing websites, such as Flickr. This Bayesian model exploits both users like clicks and tags to derive the quality of content on various topics. Finally, I will conclude my talk with a discussion on applying Bayesian generative modeling to other contexts.

By-Example Structured Queries on Web Document Corpora

Speaker:

Prof. Maurizio Atzori, University of Cagliari

Abstract:

We will present a novel paradigm called By-Example Structured (BESt) Query for DBpedia and Semantic Web applications. BESt queries are based on the use of an example (i.e., an actual instance in the dataset) to start the process of querying. This greatly simplifies the introduction of constraints on datasets whose schema is unknown to the user, as it is often the case with Semantic Web data. We will present SWiPE, a web interface that applies the BESt Query paradigm in order to query DBpedia. SWiPE makes Wikipedia's infobox fields editable, allowing the input of user constraints in the right context. We will also discuss performance and expressivity, formally characterizing the class of SPARQL queries that BESt queries can express. Finally, we will draw some future directions to extend BESt queries

Evolving Internet into the Future via Named Data Networking (part 2)

Speaker:

Prof. Lixia Zhang

Abstract:

The success of TCP/IP protocol architecture has brought us an explosive growth of Internet applications. Since applications operate in terms of data and more end points become mobile, however, it becomes increasingly difficult and inefficient to satisfy IP's requirement of determining exactly where (at which IP address) to find desired data. The Named Data Networking project (NDN) aims to carry the Internet into the future through a conceptually simple yet transformational architecture shift, from today's focus on where -- addresses and hosts -- to what -- the data that users and applications care about. By naming data instead of their locations, NDN transforms data into first-class entities, enabling direct security of data instead of data containers as well as radically scalable communication mechanisms such as multicast delivery and in-network storage.

Evolving Internet into the Future via Named Data Networking (part 3)

Speaker:

Prof. Lixia Zhang

Abstract:

Principles behind ResearchMaps: a web tool for integrating and planning experiments in neuroscience

Speaker:

Prof. Alcino Silva

Abstract:

The growth of the biological literature in the last 30 years has been astronomic. The library of medicine now includes more than 20 million articles; Our own discipline (neuroscience) includes nearly two million research articles with an estimated 15 million experiments , most published in the last 20 years. Therefore, there is a great need to develop maps (simplified abstractions) of published information that could be used to characterize what is known and to guide research decisions. With that goal in mind our laboratory developed a web tool (ResearchMaps) to help biologists with the process of integrating and planning experiments. I will discuss the principles behind this web tool, including the concept of weighted causal networks, with the hope that we may be able to establish collaborations that could accelerate its development.

Learning Generative Models for Natural Image Patterns

Speaker:

Prof. Ying Nian Wu

Abstract:

Images of natural scenes contain rich varieties of patterns. Knowledge of these image patterns can be represented by statistics models that can generate such patterns. Such generative models can be learned from training images with minimal supervision, and the learned models can be useful for object recognition and scene understanding. In this talk, I shall present our recent work on a class of generative models of object patterns and explain their connections to sparse linear regression and Markov random fields. The talk is based on the joint work with Jianwen Xie, Wenze Hu and Song-Chun Zhu.

Hypothesis Exploration across Disciplines

Speaker:

Prof. Stott Parker

Abstract:

A consequence of the abundance of data of all forms is that scientific research efforts are increasingly cutting across disciplines. Interdisciplinary research is difficult for many reasons, but among these are the difficulties of analyzing heterogeneous data and the lack of methods for collaborative construction of hypotheses. This is particularly true in fields like neuroscience, where the data is complex and ranges over many orders of magnitude in scale --- and no single individual can hope to master it all.

In this talk I describe a system for exploration of hypotheses in phenotype data, implemented with a database obtained from several studies at UCLA. ViVA is a web-based system for analyzing hypotheses about variance structure, permitting exploratory analysis of GLMs. It permits visual identification of phenotype profiles (patterns of values across phenotypes) that characterize groups (subpopulations), and includes a variety of methods for visualization of variance. Visualization supports interdisciplinary collaboration, and enables screening and refinement of hypotheses about sets of phenotypes. With several examples we illustrate how this approach supports "natural selection" on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data.

ViVA was designed for investigation of data concerning the biological bases of traits such as memory and response inhibition phenotypes --- to explore whether they can aid in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. The hypotheses and data are increasingly trans-disciplinary and sophisticated, and the impact of better methods can be enormous.

Computer Vision meets Big Data: Complexity and Compositionality

Speaker:

Prof. Alan L. Yuille

Abstract:

Big data arises naturally in computer vision because of the enormous number and variety of images and the large range of visual tasks that we want to perform on them. Computer vision researchers must pay increasingly attention to complexity issues as they develop algorithms that work on large image datasets. This talk has two parts. The first part describes practical issues that arise when working with large datasets such as Pascal and ImageNet. These include efficient algorithms, parallel implementations (e.g., GPUs), and special purpose hardware. The second part describes theoretical work that addresses arguably the fundamental problem of vision --- how can a visual system store (represent), rapidly access (do inference), and learn the enormous number and variety of objects -- and configurations of objects -- that occur in the world? We propose and analysis a simplified hierarchical compositional model that can address many of these issues, and which may relate to the structure of the human visual system.

Modeling networks when data is missing or sampled

Speaker:

Prof. Mark S. Handcock

Abstract:

Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors.

Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g. recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data).

In this talk we develop the conceptual and computational theory for inference based on partially observed network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs.

We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network, and of missing data in a friendship network among adolescents.

This is joint work with Krista J. Gile, University of Massachusetts, Amherst and Ian Fellows, University of California - Los Angeles.

The Declarative Imperative: A logic-based approach to better algorithms

Speaker:

Prof. Carlo Zaniolo

Abstract:

The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solutions to the difficulty of parallel and distributed programming. At the same time, there is more---and more varied---interest in data-centric programming languages than at any time in computing history, in part because these languages parallelize naturally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foundation for the next generation of parallel and distributed programming languages. Our recent Datalog inspired results show that logic can lead to new, more efficient big-data algorithms over a wide range of computing platforms.

University of California, Los Angeles, Computer Science Department. 2014.