## Date of Submission

2-22-2015

## Date of Award

2-22-2016

## Institute Name (Publisher)

Indian Statistical Institute

## Document Type

Doctoral Thesis

## Degree Name

Doctor of Philosophy

## Subject Name

Mathematics

## Department

Theoretical Statistics and Mathematics Unit (TSMU-Kolkata)

## Supervisor

Ghosh, Anil Kumar (TSMU-Kolkata; ISI)

## Abstract (Summary of the Work)

The advancement of data acquisition technologies and computing resources have greatly facilitated the analysis of massive data sets in various fields of sciences. Researchers from different disciplines rigorously investigate these data sets to extract useful information for new scientific discoveries. Many of these data sets contain large number of features but small number of observations. For instance, in the fields of chemometrics (see e.g., Schoonover et al. (2003)), medical image analysis (see e.g., Yushkevich et al. (2001)) and microarray gene expression data analysis (see e.g., Eisen and Brown (1999), Alter et al. (2000)), we often deal with data of dimensions higher than several thousands but sample sizes of the order of a few hundreds or even less. Such high dimension, low sample size (HDLSS) data present a substantial challenge to the statistics community. Many well known classical multivariate methods cannot be used in such situations. For example, because of the singularity of the estimated pooled dispersion matrix, the classical Hotellingâ€™s T 2 statistic (see e.g., Anderson (2003)) cannot be used for twosample test when the dimension of the data exceeds the combined sample size. Over the last few years, researchers are getting more interested in developing statistical methods that are applicable to HDLSS data. In this thesis, we develop some nonparametric methods that can be used for high dimensional two-sample problems involving two independent samples as well as those involving matched pair data.In a two-sample testing problem, one usually tests the equality of two d-dimensional probability distributions F and G based on two sets of independent observations x1, x2, . . . , xn1 from F and y1 , y2 , . . . , yn2 from G. This problem is well investigated in the literature, and several parametric and nonparametric tests are available for it.Parametric methods assume a common parametric form for F and G, where we test the equality of the parameter values (which could be scalar or finite dimensional vector valued) in two distributions. For instance, if F and G are assumed to be normal (Gaussian) with a common but unknown dispersion, one uses the Fisherâ€™s t statistic (when d = 1) or the Hotellingâ€™s T 2 statistic (when d > 1) to test the equality of their locations (see e.g., Mardia et al. (1979); Anderson (2003)). Though these tests have several optimality properties for data having normal distributions, they are not robust against outliers and can mislead our inference if the underlying distributions are far from being normal. Since the performance of parametric methods largely depends on the validity of underlying model assumptions, nonparametric methods are often preferred because of their flexibility and robustness.In the univariate set up, rank based nonparametric tests like the WilcoxonMann-Whitney test, the Kolmogorov-Smirnov maximum deviation test and the WaldWolfowitz run test (see e.g., Hollander and Wolfe (1999); Gibbons and Chakraborti (2003)) are often used. These tests are distribution-free, and they outperform the Fisherâ€™s t test for a wide variety of non-Gausssian distributions.

## Control Number

ISILib-TH437

## Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

## DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/2146

## Recommended Citation

Biswas, Munmun Dr., "Some Distribution-Free Two-Sample Tests Applicable to High Dimension, Low Sample Size Data." (2016). *Doctoral Theses*. 280.

https://digitalcommons.isical.ac.in/doctoral-theses/280

## Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843145