3/5/25 Seminar: "The Object Bagplot for Non-Euclidean Spaces: A Visualization and Outlier Detection Tool for Hyperbolic Data"

The CTML Seminar Series continues on March 5th! Join us for an exciting talk on "The Object Bagplot for Non-Euclidean Spaces: A Visualization and Outlier Detection Tool for Hyperbolic Data" led by CTML GSR Andy Kim. This talk will take place from 12:00PM-1:00PM at Berkeley Way West, 5th Floor, Room 5401.

Exploratory data analysis in non-Euclidean spaces is an underdeveloped field, despite their growing importance in modern machine learning applications. In particular, hyperbolic space is a useful framework for efficiently embedding data with hierarchical, tree-like, or highly structured data, such as network embeddings, natural language processing, and phylogenetics. By allowing for more compact representations, hyperbolic space preserves latent hierarchies, and enables more efficient distance-based computations compared to data embedded in Euclidean space. However, methods for visualizing and interpreting data in these kinds of non-Euclidean spaces are limited.

In this CTML talk, I introduce a general method for constructing the object bagplot in non-Euclidean spaces, with a focus on hyperbolic geometry as an example where this method is particularly effective. Building on the approach of Dai et al., the object bagplot uses metric halfspace depth and Riemannian geometry to identify typical and extreme observations, facilitating outlier detection, feature selection, and interpretation of complex data distributions. By drawing connections from depth-based visualization tools in Euclidean spaces (boxplots and bagplots) to non-Euclidean spaces, this approach offers a practical and intuitive framework for analyzing structured data in complex geometric spaces, enabling better insights and decision-making in applications where methodology is relatively sparse.