Join us for our Computer Science and Statistics Colloquium Series.
When: Friday, Oct 28th, 4PM
Where: Beaupre 105
Speaker: Dr. Noah M. Daniels, URI
Title: Manifold Mapping Enables Fast Search, Anomaly Detection, and More
Most likely, you have heard the term “Big Data.” The world has been experiencing an explosive growth in data, and that can mean either that there is a lot *more* data, or each datapoint is richer (or higher dimensional). The classical “curse of dimensionality” suggests that this prospect makes it more difficult for us to derive useful insights from data.
I introduce the notion of “manifold mapping” for large, high-dimensional datasets. Rather than reducing dimensionality, manifold mapping relies on clustering data in the metric space in which it is already embedded, while paying attention to particular geometric and topological properties of the data. The resulting cluster tree is able to allow a number insights.
Fast search (both rho-nearest neighbors or range search, and more recently k-nearest neighbors) can be achieved in sublinear time (in theory, the asymptotic complexity does not depend on the size of the dataset). Graphs induced from the manifold we map can allow simple anomaly detection that outperforms the state of the art. And in ongoing work, the manifold mapping approach appears promising for the development of algorithms for multiple sequence alignment in genomics, detection of adversarial inputs to machine learning systems, and data compression. The long-term goal is that this approach allows us to better understand the nature of high-dimensional data arising from natural or artififical processes that are constrained in certain ways, such as evolution.