Center for Applied Statistics Presents
Applied Statistics Seminar Series
Tue, 10/16/2012, 11:00 AM—12:00 PM
5225 Math Sciences Bldg.
Kiri Wagstaff
Jet Propulsion Laboratory
Interactive Discovery in Large Data Sets
What is the best way to dive in and explore a new data set? I will discuss a new machine learning problem, iterative discovery, that seeks to enable users to interactively explore a large data set and quickly identify items of interest. Our solution employs an incremental Principal Components Analysis strategy to incorporate user feedback and provide explanations for its selections, rendering it useful to mission scientists, and especially for instruments with excessively high data volumes. I will share results of experiments with hyperspectral data from instruments on Mars orbiters and rovers as well as text data (log files) from a ground-based radio telescope.