The demo code is available as part of the Bop term rewriting language
J.A. Goldman, D.S. Parker, W.W. Chu
In this paper we take a real world application from a text database and present a case history. The techniques ultimately led to a discovery contradicting an accepted paradigm in seismology. Using simple, tailored, keyword extraction, we examined a text collection of earthquake data. A discovery was made when an unusual pattern emerged from the text. We then tested a more comprehensive numerical database, treating the text discovery as a hypothesis. It was verified using a standard chi-squared statistic. The hypothesis was that significant earthquakes in the longitude region that includes California occur more often in the morning hours than at any other time of day.To appear, Proceedings of the 9th International Conference on Scientific and Statistical Database Management (SSDBM'97), September 1997.
This project was originally motivated by the need to develop a foundation for constructing database systems that manipulate streams. Such a project is inherently more complex than that for ordinary database systems, because a comprehensive model of stream data processing inherently requires explicit handling of computation. What may seem a simple job at first glance becomes quite involved as one gets down to specifics.
Much of the burden of mastering any vast information base comes from the challenge of becoming familiar with it. Information filtering and exploration tools are necessary. A common approach is to impose only a limited structure on the information base, and develop a `query language' for browsing of the resulting structure.
We have concentrated on studying fundamental issues for systems handling streams and sets, but we have also considered graphs and arrays. The results of this work range from query languages to system architectures incorporating evaluation of expressions as a primitive.
appeared in: L. Sterling, ed., The Practice of Prolog, Cambridge, MA: MIT Press, 1990.
Today many applications routinely generate large quantities of data. The data often takes the form of a time series, or more generally just a stream -- an ordered sequence of records. Analysis of this data requires stream processing techniques, which differ in significant ways from what current database query languages and statistical analysis tools support today. There is a real need for better stream data analysis systems.
Stream analysis, like most data analysis, is best done in a way that permits interactive exploration. It must support `ad hoc' queries by a user, and these queries should be easy to formulate and run. It seems then that stream data analysis is best done in some kind of powerful programming environment.
A natural approach here is to analyze data with the stream processing paradigm of transducers (functional transformations) on streams. Data analyzers can be composed from collections of functional operators (transducers) that transform input data streams to output streams. A modular, extensible, easy-to-use library of transducers can be combined in arbitrary ways to answer stream data analysis queries of interest.
Prolog offers an excellent start for an interactive data analysis programming environment. However most Prolog systems have limitations that make development of real stream data analysis applications challenging.
We describe an approach for doing stream data analysis that has been taken in the Tangram project at UCLA. Transducers are implemented not directly in Prolog, but in a functional language called Log(F) that can be translated to Prolog. Many stream processing programs are easy to develop this way. A by-product of our approach is a practical way to interface Prolog and database systems.