Supervised Learning with Gaussian Processes

Self supervised learning (SSL) is a machine learning system in which models learn to understand the underlying structure of data without overt supervision from labeled samples. Derived representations from SSL have proven useful for many downstream operations including concatenation, linearization, etc. To ensure the smoothness of the representation space, many SSL methods rely on the ability to generate pairs of observations that match a particular instance. However, generating these pairs can be challenging for many types of data. Furthermore, these methods lack quantitative consideration of uncertainty and may perform poorly in out-of-sample estimation settings. To address these limitations, we propose Gaussian process self supervised learning (GPSSL), a new approach that uses Gaussian process (GP) models for representation learning. GP priors are superimposed on the representations, and we obtain a generalized Bayesian posterior minimizing loss function that promotes an informed representation. The covariance function found in GPs naturally pulls representations of similar units together, serving as an alternative to using well-defined samples. We show that GPSSL is very close to both kernel PCA and VICReg, a popular neural network SSL method, but unlike both it allows for background uncertainty that can be propagated to downstream functions. Tests on various data sets, considering classification and regression functions, show that GPSSL outperforms conventional methods in terms of accuracy, quantitative uncertainty, and error control.
- † Johns Hopkins University



