Method of Simulated Moments with antithetic sampling in the presence of endogeneity

This post illustrates a use of the Method of Simulated Moments (MSM) to tackle the problem of endogeneity. It is also shown how the efficiency of the resulting estimator can be improved using Monte-Carlo variance reduction. The original work was conducted together with P. Bertazzoni and V. Kazakova.

Background

The Method of Simulated Moments (McFadden, 1989), or MSM, is a variant of the Generalized Method of Moments (GMM). Rather than computation, however, it uses simulation in the moment conditions. This is especially useful when the likelihood is computationally intractable, for instance in models with missing, incomplete, or noisy data, or those with complicated dynamic formulations. The general idea is to match properties of observed data to those of data simulated under known conditions. The model is estimated by varying simulation parameters until the difference between selected moments of the empirical and the simulated sample is minimal.

Consider a set of moment conditions of the usual form \[\mathbb{E}\left[m(\mathbf{x}, \theta \,|\, \mathbf{x})\right] = 0,\] where $\mathbf{x}$ is our data and $\theta$ a generic vector of parameters to be estimated. In MSM, we don't bother with a closed-form expression for the conditional expectation (which is handy if there is none), but instead construct an unbiased simulator for the assumed data generating process, such that\[\mathbb{E}\left[\hat{m}(\mathbf{x}, \mathbf{u}_R, \theta)\right] = m(\mathbf{x}, \theta \,|\, \mathbf{x}),\] where $\hat{m}(\mathbf{x}, \mathbf{u}_R, \theta)$ denotes our simulator, calculated from $\mathbf{u}_R$, which is a length-$R$ vector of i.i.d. draws from the distribution under consideration. Let $\mathbf{m}_S$ be our vector of simulated moments such derived and $\mathbf{m}_x$ the vector of corresponding moments from the data. Our MSM-estimator is simply \[\hat{\theta}_{MSM} = \underset{\theta}{\arg\min}\;(\mathbf{m}_S - \mathbf{m}_x)'\mathbf{W}(\mathbf{m}_S - \mathbf{m}_x),\] with $\mathbf{W}$ the usual GMM weight matrix. Under pretty general conditions, $\hat{\theta}_{MSM}$ is consistent and asymptotically normal (McFadden & Ruud, 1994), though precise expressions for its variance can be cumbersome.

Estimation in the presence of endogeneity

Wie consider a classical endogenity problem posed in Carrasco (2012): The parameter $\delta = 0.1$ is to be estimated from \[ y_i = \delta \, W_i + \epsilon_i\] with \[W_i = e^{-x_i^2} + v_i\;, \quad (\epsilon_i, v_i) \sim \mathcal{N}(0, \Sigma)\;, \quad\Sigma = \begin{pmatrix}1 & 0.5\\ 0.5 & 1\\ \end{pmatrix}\;, \quad x \sim \mathcal{N}(0,1).\] As we can see, hidden in this data generating process is a non-trivial correlation between the explanatory variable and the error term, which means least squares won't work. If we see only the data though and don't account for endogeneity, we are in for a large bias. A more traditional way to treat this (in the case of tractable likelihood) would be with instrumental variables. We compare this approach with MSM, testing four different estimators:

  1. Naive Ordinary least squares [OLS] for benchmarking,
  2. GMM with instruments $\left(x_i, x_i^2, x_i^3\right)$ (Chaussé, 2010) [GMM],
  3. MSM with $R = 100$ [MSM1], and
  4. MSM with $R = 10\,000$ [MSM2].

For our demonstration we create 1000 independent samples of size $n = 400$, each drawn from a superpopulation created by the model above. The graph below summarises our estimates from each sample.

As suspected, the naive OLS is way off; the other estimators appear consistent. GMM with instruments and the MSM estimator with small $R$ are roughly equally efficient, but the latter shows some erratic behavior. However, if we increase the simulator's strength to $R = 10\,000$ (rightmost column), we estimate more efficiently than with GMM as well as avoiding bias. In fact, the standard error of the strong MSM estimator is only half that of the instrument method (0.09 vs. 0.19), though of course we did not try at all to pick optimal instruments - the point being precisely that we can avoid such hassle with a good simulator.

MSM efficiency and antithetic sampling

As would be expected from a simulation-based method, we incur additional Monte-Carlo noise. A useful decomposition is provided by Cameron & Trivedi (2005): \[\mathrm{Var}_{x,u}\left[\hat{m}(\theta)\right] = \mathrm{Var}_x\left[m(\theta)\right] + \mathbb{E}\left[\mathrm{Var}_u\left[\hat{m}(\theta)\right]\right].\] In other words, since $\mathbf{x}$ and $\mathbf{u}_R$ are independent, we can separate simulation variance from parameter variance. For the simple frequency simulator we get specifically:\[\mathrm{Var}_{x,u}\left[\hat{m}(\theta)\right] = (1 + 1/R) \cdot \mathrm{Var}_x\left[m(\theta)\right] \;\Longrightarrow\; \lim_{R \to \infty} \mathrm{Var}_{x,u}\left[\hat{m}(\theta)\right] = \mathrm{Var}_x\left[m(\theta)\right].\] This seems to hint at a straightforward trade-off between computation time (larger $R$) and variance. But since we are dealing with a Monte-Carlo approach, we can additionally use tricks from the toolbox of variance reduction.

In particular, we consider antithetic sampling. Remember the vector $\mathbf{u}_R$ of random draws in our simulator. Suppose we split it in two halfs: $\mathbf{u}^{(1)}_{R/2} = (u_1, \ldots, u_{R/2})$ and $\mathbf{u}^{(2)}_{R/2} = (u_{R/2+1}, \ldots, u_R)$. Instead of the simulator $\hat{m}(\mathbf{x}, \mathbf{u}_R, \theta)$ we may use $\hat{m}\left(\mathbf{x}, \frac{1}{2} \cdot \left[\mathbf{u}^{(1)}_{R/2} + \mathbf{u}^{(2)}_{R/2}\right], \theta\right)$. By standard results, we achieve variance reduction, if $\mathbf{u}^{(1)}_{R/2}$ and $\mathbf{u}^{(2)}_{R/2}$ have negative covariance. We can make use of this by setting \[\mathbf{u}^{(2)}_{R/2} = (-u^{(1)}_1, \ldots, -u^{(1)}_{R/2}).\] To illustrate the effect, we add one more estimator to our experiment:

        5. MSM with $R = 100$, but using the antithetic scheme [MSM3].

Results for the full set of estimators are shown below. Remarkably, even though we use only 0.5% of the random number draws in MSM3 compared to MSM2 we achieve equivalent standard error (ca. 0.09) and MSE (ca. 0.008).

Being able to achieve the same level of efficiency with $R$ orders of magnitude smaller is quite impactful, given that computing times increase roughly linearly with $R$. The following graph confirms that MSM comes with a clear-cut trade-off between low Monte-Carlo noise and fast computation. For instance, going from $R = 1\,000$ to $R = 5\,000$ gives as a roughly 90% reduction in noise at the cost of 4.5 times longer computation time. Antithetic sampling helps us get the same efficiency faster!

Trade-off between MSM estimator variance and computation time as a function of simulator strength (number of independent draws); both time and variance are depicted relative to the case $R=1\,000$.

Summary

MSM inherits some strengths from GMM, especially the ability to estimate consistently under endogeneity. It does not require a tractable likelihood, but rather well-founded distributional assumptions in the construction of simulators. Users need to find a compromise between computational resources devoted and acceptable levels of accuracy. Variance reduction techniques help a lot!

Implementation remarks

For those wanting to get started with Method of Simulated Moments and related approaches, readable introductions are Jalali et al. (2015) as well as the textbook by Gourieroux & Montfort (1996). GMM estimates were calculated using the seasoned R package gmm. The MSM implemtation used to derive all results above can be found on my GitHub page.

Literature

A.C. Cameron & P.L. Trivedi, Microeconometrics: Methods and Applications, Cambridge University Press, 2005.

M. Carrasco, "A regularization approach to the many instruments problem," Journal of Econometrics, vol.170, no.2, pp.383-398, 2012.

P. Chaussé, "Computing generalized method of moments and generalized empirical likelihood with R," Journal of Statistical Software, vol.34, no.11, pp.1-35, 2010.

C. Gourieroux & A. Montfort, Simulation-based Econometric Methods, Oxford University Press, 1996.

M.S. Jalali, H. Rahmandad, H. Ghoddusi, "Using the Method of Simulated Moments for system identification," in: Analytical Methods for Dynamic Modelers (H. Rahmandad R. OlivaOsgood eds.), MIT Press, pp.39-69, 2015.

D. McFadden, "A method of simulated moments for estimation of discrete response models without numerical integration," Econometrics, vol.57, no.5, pp.995-1026, 1989.

D. McFadden & P.A. Ruud, "Estimation by simulation," The Review of Economics and Statistics, vol.76, no.4, pp.591-608, 1994.

Kommentare

Beliebte Posts aus diesem Blog

On the reversibility of Voronoi geomasking

Herfindahl-Hirschman-Index als Maß für die Diversität von Herkünften auf Gemeindeebene [deutsch]

Derivation of the expected nearest neighbor distance in a homogeneous Poisson process