The polyserial correlation coefficient. (English) Zbl 0536.62045
Suppose the joint probability distribution of the variables X with \(E(X)=\mu\), Var X\(=\sigma^ 2\), and \(\eta\) with \(E(\eta)=0\), Var \(\eta =I\), is binormal with correlation \(\rho_{X_{\eta}}=\rho\). Instead of the underlying continuous variable \(\eta\), the authors consider the ordinal categorical variable Y defined by a monotonic step function \(Y=y_ j\) if \(\tau_{j-1}\leq \eta<\tau_ j (j=1,2,...,r)\), with \(y_{j-1}<y_ j\) and \(\tau_ 0=-\infty\), \(\tau_{j-1}<\tau_ j\), \(\tau_ r=\infty\), whose probabilities are obviously \(p_ j=P(Y=y_ j)=\Phi(\tau_ j)-\Phi(\tau_{j-1})\) with \(\Phi(\tau)=\int^{\tau}_{-\infty}\phi(t)dt\), \(\phi(t)=\exp(-t^ 2/2)/\sqrt{2\pi}\), and thence derive the ”point-polyserial” correlation between X and Y
\[
{\tilde \rho}=\rho \sum^{r-1}_{j=1}\phi(\tau_ j)\cdot(y_{j+1}-y_ j)/\sigma_ y.
\]
This most general relation depends on r, on the threshold values \(\tau_ j\), and on the scoring ones \(y_ j\). It generalizes known results on biserial correlation \((r=2)\), and about other special scoring systems, as those studied by N. R. Cox [Biometrics 30, 171-178 (1974; Zbl 0292.62022)] and N. Jaspen [Serial correlation. Psychometrika 11, 23-30 (1946)]. The relation is used in estimating the polyserial correlation \(\rho\) from a sample of N observations \((x_ i,y_ i)\), \(i=1,...,N\), on the variable (X,Y). Assuming a scoring system with \(y_ j=\) consecutive entire numbers, there are to be estimated the unknown model parameters \(\rho,\mu,\sigma,\tau_ 1,...,\tau_{r-1}.\)
The authors study three methods: 1) simultaneous estimation of all parameters by maximum likelihood, solving a complicated non-linear equation system; 2) the two-step method in which, after having estimated \(\mu\) and \(\sigma^ 2\) by the sample statistics \(\bar x\), \(s^ 2_ x\), and \(\tau_ 1,...,\tau_{r-1}\) by the inverse of a normal distribution function applied to the observed marginal distribution of Y, a conditional maximum likelihood estimate of \(\rho\) is computed; 3) an ad hoc estimator \({\hat \rho}=r_{xy}\cdot s_ y/\sum_{j}\phi({\hat \tau}_ j)\) of \(\rho\) is determined by inserting in the above-mentioned relation the sample estimates \(r_{xy}\) for \({\tilde \rho}\), \(s_ y\) for \(\sigma_ y\), \({\hat \tau}{}_ j\) for \(\tau_ j.\)
The three methods are compared by Monte Carlo simulation (four-way 2\(\cdot 2\cdot 3\cdot 2\) factorial design with factors N, symmetry or asymmetry of threshold system (\(\tau)\), \(\rho\), r, and with 50 replications in each cell). All three methods perform well, whereas the direct use of \(r_{xy}\) would be rather misleading.
The authors study three methods: 1) simultaneous estimation of all parameters by maximum likelihood, solving a complicated non-linear equation system; 2) the two-step method in which, after having estimated \(\mu\) and \(\sigma^ 2\) by the sample statistics \(\bar x\), \(s^ 2_ x\), and \(\tau_ 1,...,\tau_{r-1}\) by the inverse of a normal distribution function applied to the observed marginal distribution of Y, a conditional maximum likelihood estimate of \(\rho\) is computed; 3) an ad hoc estimator \({\hat \rho}=r_{xy}\cdot s_ y/\sum_{j}\phi({\hat \tau}_ j)\) of \(\rho\) is determined by inserting in the above-mentioned relation the sample estimates \(r_{xy}\) for \({\tilde \rho}\), \(s_ y\) for \(\sigma_ y\), \({\hat \tau}{}_ j\) for \(\tau_ j.\)
The three methods are compared by Monte Carlo simulation (four-way 2\(\cdot 2\cdot 3\cdot 2\) factorial design with factors N, symmetry or asymmetry of threshold system (\(\tau)\), \(\rho\), r, and with 50 replications in each cell). All three methods perform well, whereas the direct use of \(r_{xy}\) would be rather misleading.
Reviewer: M.P.Geppert
MSC:
62H20 | Measures of association (correlation, canonical correlation, etc.) |
65C05 | Monte Carlo methods |
Keywords:
dichotomous variables; polychotomous variables; latent variables; ordinal categorical variable; monotonic step function; polyserial correlation; simultaneous estimation; maximum likelihood; non-linear equation system; two-step method; conditional maximum likelihood estimate; ad hoc estimatorReferences:
[1] | Nerlove, M. & Press, S. J. Univariate and multivariate log-linear and logistic models. Santa Monica, The Rand Corporation, R: 1306-EDA/NIH, 1973. · Zbl 0518.62058 |
[2] | Gruvaeus, G. T. & Jöreskog, K. G.A computer program for minimizing a function of several variables (E.T.S. Res. Bull. RB70-14). Princeton, NJ: Educational Testing Service, 1970. |
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.