Novel Approaches to Neighborhood Selection for Nonlinear Dimensionality Reduction.

Date of Submission

December 2015

Date of Award

Winter 12-12-2016

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Electronics and Communication Sciences Unit (ECSU-Kolkata)


Das, Swagatam (ECSU-Kolkata; ISI)

Abstract (Summary of the Work)

This thesis is on improving the nonlinear dimensionality reduction techniques using a better approach for neighborhood selection and thereby improving classification and clustering processes.1.1 Motivation Most real datasets have very high dimensions: hence handling them is important as high dimensional data computation is costly. But due to high correlation of the data, it is safe to assume, feature extraction techniques are needed to reduce the dimensions into relevant information. In this thesis, the work is focused on how nonlinear datasets are such that they have high dimension feature space but can be embedded into a lower dimensional data. Embedding consists of planarizing a high dimension nonlinear data into linear data.1.2 Thesis OutlineThe remaining chapters are organized as follows. In Chapter 1, dimensionality reduction is discussed in detail along with other nonlinear dimensionality reduction techniques. The various other nonlinear techniques identify how our proposed algorithm is different in approach and performs better.In Chapter 2, Locally linear embedding is discussed in detail along with a short survey of all modifications done on it and all previous work done related to neighbourhood selection for nonlinear dimensionality reduction.In Chapter 3, the proposed algorithms are discussed along with their time complexities and how they are an improvement over the former.In Chapter 4, the experiments are discussed. These testing are done on the results obtained after classifying the low dimensional embedding. The embeddings are done using Locally Linear Embedding while the classification is done by applying k-Nearest Neighbor classification technique. In this report, initially we see how the LLE works and performs on manifolds as well as real datasets. LLE having no internal model tries a generalized nonlinear dimensionality reduction approach: this method is not intended as depending upon the geometry and spatial organization of the points if the point model is made adaptive, the algorithm will run faster and with better result as it would then always tend to return true neighbours not just any neighbours through Euclidean distance measure. This is the motivation for our proposed algorithms, where true neighbours are found using the neighbourhood similarity of the points. If two houses are true neighbours for each other in a locality, then the two house share a large of neighbours among each other, while the converse is also true. The other proposed algorithm is based on the fact that the neighbours are distributed on the higher dimensional space in Gaussian distribution, so the Euclidean distances of the neighbours from a point follows a Gaussian distribution. This being the factthen by central limit theorem, only the points from the central mean of threshold distance are true neighbors once the distances are fit to a normal distribution. This approach works sufficiently well and better than the original LLE algorithm as the datasets tend to follow a Gaussian distribution to the fact that the Entropy of the distances from the all other points fits a normal distribution.These proposed algorithms find a better way of finding the neighbourhood of each point, beyond that this neighbourhood is fed to the LLE algorithm which in turn does the nonlinear dimensionality reduction.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.