Week 16
I completed the SPML model as well as a draft of the final report on my project. I will be meeting with Professor Verma next week to discuss my work.
DREAM Project
I completed the SPML model as well as a draft of the final report on my project. I will be meeting with Professor Verma next week to discuss my work.
I met with Professor Verma this week to discuss the MMC implementation, which I was able to complete with a soft constraint on the distances between dissimilar data points. Separately, we reviewed two papers on few-shot learning and delved into concepts such as Bregman divergences, Polya-Gamma augmentation and one-vs-each approximation to softmax, which I will further investigate for my own edification.
This week I met with Professor Verma to discuss the implementation of the MMC algorithm. I had worked on it over the past week but was stuck on implementing the objective function with its hard constraints. Professor Verma suggested that I changed one of the constraints to a soft constraint, which allows me to change the objective function into a Lagrangian and perform gradient descent. I will plan to code up the simplified MMC before attempting to incorporate the hard constraints.
I met with Professor Verma this week to review my two attempts in formulating a structural integrity score. My first formulation is called the Cosine Score and is based on the sum of pairwise normalized dot products of the centered data points. While the math is intuitive, this method only works for two target clusters, and breaks down for three or more target clusters.
This week I met with Professor Verma to discuss the design of the structural integrity score that can accommodate Gaussian clusters as well as data that forms concentric circles. I proposed using Spectral Clustering and Silhouette Coefficient to measure structural integrity, and was able to produce scores that are largely consistent with what visually appears to be correct. However, by calling the Spectral Clustering function in calculating the structural integrity score, we have made the loss function non-differentiable.
This week I met with Professor Verma to review my formulation of the structural integrity score. He pointed out two errors with my formula: 1) Since it assigns a higher score to clusters means that are farther apart, it does not distinguish concentric clusters with different radii. 2) My use of the determinant of the covariance matrix does not accurately reflect true spread of a cluster. Since the formula assigns a lower score to a cluster whose covariance matrix has a greater determinant, it would incorrectly assign a high score to a cluster that has no variance in one dimension and a large variance in another.
This week I implemented the structural integrity score on clusters generated by a Gaussian Mixture Model, and was able to ascribe a higher score to tight clusters than to wide clusters, and a higher score to means that are farther apart. While this provides an independent loss value for the structural integrity of the transformed data, I was stuck trying to combine it with the LMNN loss to form a differentiable loss function. Instead of using library functions for metric learning, I will try to write an algorithm that incorporates both the distance loss and the structure integrity and is differentiable by the transformation matrix.
I met with Professor Verma on July 6 to review the part of the loss function that reflects the structural integrity of the data, and received further guidance on why likelihood values cannot be compared across different models, and how likelihood encourages collapsing of data into a single mean and narrow spread. Instead, I will first find the cluster means and variances using Gaussian Mixture Model or Dirichlet Process Mixture Model / Chinese Restaurant Process, and then use the distances between the means and the narrowness of the spreads (as measured by the determinants of the covariance matrices) as the structural integrity score.
I met with Professor Verma on June 29 to discuss the problem on Gaussian Processes and my implementation of EM for GMM. I gained further clarity on the purpose of the GP problem, specifically on how kernels (and their corresponding covariance matrices) affect the shape of functions, and that the mean function is the best representation of the GP.
I spent the past week working on the nonparametric regression problem assigned by Professor Verma, and gained a better understanding in multivariate Gaussians and the effect of using kernel functions to define covariance matrices. I was stuck on the part where I had to generate a periodic posterior function, as I could not produce a positive semidefinite covariance matrix.
After reviewing papers on Gaussian Process and gradient descent without a gradient (Flaxman, 2008), I met with Professor Verma on 6/15 and we discussed the methodologies and applicability to our problem at hand. While Gaussian Process may not be the relevant tool for clustering, its underlying theory is helpful to my understanding in regression and classification. Professor Verma assigned a problem from his ML class last fall to help me solidify my knowledge in Gaussian Process.
I met with Professor Verma on June 8 to review my findings from the various readings on Bayesian Nonparametrics, and how it might help me define a loss function that includes the idea of cluster structures. I learned that a Bayesian nonparametric model can operate in an infinite-dimensional parameter space, where the number of parameters is allowed to grow with the data. To establish a prior distribution, a BN model can use a Dirichlet Process for discrete distribution, and a Dirichlet Process Mixture Model for a continuous distribution. Some of these prior distributions are described in analogies such as the Stick Breaking Process or the Chinese Restaurant Process, which is a single parameter distribution over partitions of integers.
I met with Professor Verma on May 30 to discuss my initial approach for developing a metric learning algorithm that retains the structure of the clustered data. After presenting several ideas involving Mahalanobis distance and other linear transformations, it was clear that I was too focused on solving for the metric learning portion of the problem and neglected to address how I could incorporate structural integrity into the loss function. Professor Verma suggested that I read about Bayesian nonparametrics to get some ideas.
I met with Professor Verma on May 25 to continue discussing the existing methodologies for metric learning and other relevant topics. This week we covered Siamese Neural Network (survey by Li et al., 2022) and Hierarchical Similarity Metrics (Verma et al., 2012). SNN is a pair of parallel neural nets that share the same structure and weights, and is used to learn similarity among inputs. HSM uses a nearest-neighbor framework to learn hierarchical metrics that reflect the known taxonomy structure, and provides improved classification accuracy and correct placement of unseen cateogories.
I met with Professor Verma on May 18 as I continue to gather background information on metric learning. In addition to MMC and LMNN, we discussed the paper on Neighboorhood Component Analysis (Goldberger et al., 2005), which learns the Mahalanobis distance in a k-Nearest Neighbor classification, and t-Distributed Stochastic Neighbor Embedding (van der Maaten and Hinton, 2008), which converts high-dimensional data into a matrix of pairwise similarities and allows for visualization in 2-3 dimensions. To solidify my understanding of the underlying math, Professor Verma asked me to write out the derivation of the NCA cost function, and provide a proof showing how a conic constraint for a positive semi-definite matrix is convex while a rank constraint for a square matrix is non-convex.
I had my kick-off meeting with Professor Verma on May 10 to discuss the scope of my research project, which involves an exploration of a metric learning algorithm that separates data in pre-defined clusters, but retains the structural integrity (to be defined) of each cluster. Since I have little knowledge in metric learning, Professor Verma suggested that I read the Mahalanobic Metric for Clustering (Xing et al., 2002) and Large Margin Nearest Neighbor (Weinberger and Saul, 2009) papers to get a sense for some of the existing methodologies.