Variance-based Clustering Methods and Higher Order Data Transformations and Their Applications

preview-18
  • Variance-based Clustering Methods and Higher Order Data Transformations and Their Applications Book Detail

  • Author : Nikita I. Lytkin
  • Release Date : 2009
  • Publisher :
  • Genre : Cluster analysis
  • Pages : 83
  • ISBN 13 :
  • File Size : 85,85 MB

Variance-based Clustering Methods and Higher Order Data Transformations and Their Applications by Nikita I. Lytkin PDF Summary

Book Description: Two approaches have been proposed in statistical and machine learning communities in order to address the problem of uncovering clusters with complex structure. One approach relies on the development of clustering criteria that are able to accommodate increasingly complex characteristics of the data. The other approach is based on simplification of structure of data by mapping it to a different feature space via a non-linear function and then clustering in the new space. This dissertation covers three related studies: development of a novel multi-dimensional clustering method, development of non-linear mapping functions that leverage higher-order co-occurrences between features in boolean data, and applications of these mapping functions for improving the performance of clustering methods. In particular, we treat clustering as a combinatorial optimization problem of finding a partition of the data so as to minimize a certain criterion. We develop a novel multi-dimensional clustering method based on a statistically-motivated criterion proposed by J. Neyman for stratified sampling from one-dimensional data. We show that this criterion is more reflective of the underlying data structure than the seemingly similar K-means criterion when second order variability is not homogeneous between constituent subgroups. Furthermore, experimental results demonstrate that generalization of the Neyman's criterion to multi-dimensional spaces and development of the associated clustering algorithm allow for statistically efficient estimation of the grand mean vector of a population. In the framework of the mapping-based approach to discovering complex cluster structures, we introduced a novel adaptive non-linear data transformation termed Unsupervised Second Order Transformation (USOT). The novelties behind USOT are (a) that it leverages in a unsupervised manner, higher-order co-occurrences between features in boolean data, and (b) that it considers each feature in the context of probabilistic relationships with other features. In addition, USOT has two desirable properties. USOT adaptively selects features that would influence the mapping of a given feature, and preserves the interpretability of dimensions of the transformed space. Experimental results on text corpora and financial time series demonstrate that by leveraging higher-order co-occurrences between features, clustering methods achieved statistically significant improvements in USOT space over the original boolean space.

Disclaimer: www.yourbookbest.com does not own Variance-based Clustering Methods and Higher Order Data Transformations and Their Applications books pdf, neither created or scanned. We just provide the link that is already available on the internet, public domain and in Google Drive. If any way it violates the law or has any issues, then kindly mail us via contact us page to request the removal of the link.