Large scale sensor networks are being deployed for advanced civilian and military applications such as habitat monitoring, seismic monitoring, location tracking systems, etc. Although the sensors provide an unprecedented scale of access to monitor phenomena up front, their limited resources in terms of processing speed, communication power and memory render the conventional data management tools and techniques ineffective.
Distributed protocols and algorithms are being employed to store, process, and query high level semantically rich data from the raw and noisy sensor data. In this dissertation, I develop novel distributed techniques to mine and summarize spatio-temporal data in such a resource constrained sensor network.
In the first part of the thesis, I consider the important problem of object tracking, and present an efficient distributed index structure, to track deformable objects such as chemical plumes. The index structure localizes in both space and time, and answers spatio-temporal range queries with an adaptive query optimizer.
I next focus on extracting spatio-temporal correlations in sensors, via spatial clustering. I capture temporal correlation within a node using auto-regressive (AR) models, and then employ hierarchical spatial clustering techniques to aggregate sensors into clusters. Experimental results show that modeling reduces communication costs by an order of magnitude, whereas distributed algorithms reduce it by another order.
I then consider the problem of modeling high level semantic events from low level sensor signals. I transform numeric data into symbols and then model the resulting symbolic sequences using statistical models—Markov Chains (MCs) and Hidden Markov Models (HMMs). I then build a distributed index structure to answer range and nearest-neighbor queries over such sensor models.
Finally, I present a framework for minimizing the communication overhead of monitoring a global aggregate defined over the local attributes of a set of sensors. I design an optimal algorithm, when the local events are independent; whereas, when they are dependent, I show that the problem is NP-complete and develop efficient heuristics which adapt well to changing network conditions, and outperform competing techniques in terms of communication cost.