statshorts

Posts

Es werden Posts vom Dezember, 2022 angezeigt.

Ridge regression in the logit model: an application to birth weight data

Dezember 31, 2022

This post presents a short application of the ridge regression method to the modeling of birth weights of infants. It is based on joint work with V. Kazakova . The logistic model employs an infamous data set by Lee & Scott (1986), known for its collinearity issues (see Seber & Wild, 1989, p.104ff.) Background Ridge regression (Hoerl, 1962) is an estimation method for models with strong covariate correlation. An accessible introduction to ridge estimation in linear models was written by M. Taboga on StatLect . A broad-view introduction is Hastie (2010). Ridge works by adding a penalty term to the sum-of-squared-residuals (SSR) minimization problem in linear regression. For generalized linear models (GLM) we may instead penalize the log-likelihood function . Assume the target vector $y$ to be distributed with density $f(Y, \beta)$, where $\beta$ is the vector of coefficients in the linear predictor part of the GLM. The maximum likelihood estimate of $\beta$ is found by solving...

Derivation of the expected nearest neighbor distance in a homogeneous Poisson process

Dezember 30, 2022

The average distance between nearest neighbors, randomly placed in $m$-dimensional real space $\mathbb{R}^m$, is a common test statistic to assess levels of clustering or avoidance (for an introduction see Getis & Ord, 1992). It is used from geostatistics to population ecology and particle physics. Though the theoretical value used in inference is well known, its derivation is skipped over in pretty much every textbook on the subject. The value in question is $\frac{1}{2} \sqrt{\frac{1}{\lambda}}$, with $\lambda$ the intensity (no. of events per unit of space) of the assumed underlying point process. The derivation is shown below. Background The $G$-function (not to be confused with others of the same name ) is a distance-based measure in the analysis of point processes. More specifically, it is the distribution function of distances between nearest neighbors . (For details, see e.g. the introductory overview by FU Berlin .) Empirically, it is defined as \[\hat{G}(r) = \frac{1}{n} ...