## Doctoral Theses

### Muyltivariate and Regression Analysis Based on the Geometry of Data Clouds.

2-22-1998

2-22-1999

#### Institute Name (Publisher)

Indian Statistical Institute

Doctoral Thesis

#### Degree Name

Doctor of Philosophy

Mathematics

#### Department

Theoretical Statistics and Mathematics Unit (TSMU-Kolkata)

#### Supervisor

Chaudhuri, Probal (TSMU-Kolkata; ISI)

#### Abstract (Summary of the Work)

Median is a natural estimate of location of a data set, and there are several versions of inultivariate median studied in the literature, each of which is an interesting descriptive statistic for multivariate data and provides some nice geometric insights into the data cloud. One would expect that multidimensional median will be a natural estimate for the center of symmetry of a multivariate distribution. However, there is no unique concept of symmetry in multivariate problems. The center of symmetry can be defined in several ways there. For example, the d-dimensional random variable X is spherically symmetric about e â‚¬Rd if X - 6 and O(X - 0) are identically distributed for any orthogonal dx d matrix O, and the distribution of a random variable X is said to be elliptically symmetric if there exists some positive definite matrix E such that -12X has a spherically symmetric distribution. One can relax the criteria of symmetry in order to define central symmetry as X -0 and 0-X having the same distribution. The concept of angular symmetry was suggested by Liu (1988). The random vector X is said to be angularly symmetric about O if the direction vector (X - 0)/X â€“ 0|| is centrally symmetric about the origin.From the definitions above, it is clear that all the notions of symmetry are sufficiently intuitive and worth studying. Any point e of spherical symmetry is a point of ellÃ­ptical symmetry, and every point of elliptical symmetry is a point of central symmetry. In turn, any point of central symmetry is a point of angular symmetry. Closely related to the concept of a point of symmetry is the idea of the equivariance (or invariance) of a location estimate. For example, the univariate median is equivariant under monotone transformations of the real line, i.e. if X1,...,X, is a sample with median A(X1....Xa) and h : R+R is a monotone transformation, thenA(h(X1),...,h(X,)) = h( (X1,...,Xn))In the multivariate set up, one woukd expect an estimator of the point of spherical sym- metry to be equivariant under the group of orthogonal transformations and translations. Similarly, an estimator of the point of lliptic symmetry should be equivariant under affine transformations of the data cloud. In subsequent sections, we will again discuss this prop- erty of equivariance while discussing some of the proposed multivariate inedians in the literature.Closely related to the concept of multivariate median is the concept of multivariate quantiles. Barnett (1976) has discussed in detail several methods for ordering multivariate data. Eddy (1983, 1985) approached the problem of multivariate quantiles through nested sequence of sets. Recently, Chaudhuri (1996) defined the concept of geometric quantiles, which generalizes the concept of spatial median to the quantile problem. According to Small (1990), an approach to quantiles can be based upon the fact that the maximization of the function - Ep(X-Hl can be done by gradients, and which in a univariate situation reduces to the simple derivative. In higher dimensions, the gradient vector will typically point inwards to the center of the distribution with a length that is proportional to how exterior the location u is (with respect to the distribution or its empirical analog) from the data set.One of the carly references to the concept of bivariate median can be found in Hayford (1902). He made a clear distinction between centroid of a spatial distribution (i.e. mul- tivariate mean) and a median-like estimate of the center of a distribution. He suggested The vetor of orthogonal coordinates but clearly recognized that this higher dimensional analog of medians is dependent on the choice of the coordinate system.

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28842952

ISILib-TH246

#### DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/2146

COinS