Have a personal or library account? Click to login
Statistical Inference in Missing Data by MCMC and Non-MCMC Multiple Imputation Algorithms: Assessing the Effects of Between-Imputation Iterations Cover

Statistical Inference in Missing Data by MCMC and Non-MCMC Multiple Imputation Algorithms: Assessing the Effects of Between-Imputation Iterations

Open Access
|Jul 2017

Figures & Tables

Table 1

Variables and Missing Rates.

VariablesMissing Rates
GDP per capita (purchasing power parity)0.0%
Freedom House index15.4%
Central bank discount rate32.9%
Life expectancy at birth2.6%
Unemployment rate10.5%
Distribution of family income: Gini index37.3%
Public debt22.4%
Education expenditures24.6%
Taxes and other revenues6.1%
Military expenditures43.0%

[i] Data sources: CIA (2016) and Freedom House (2016).

Table 2

Multiple Regression Analyses on GDP Per Capita.

Incomplete DataMultiply-Imputed Data
VariablesCoef.Std. Err.Coef.Std. Err.
Intercept–7.3233.953–11.545*3.495
Freedom–0.321*0.127–0.362*0.127
Central Bank0.118*0.041–0.1070.049
Life Expectancy3.922*0.7944.908*0.655
Unemployment–0.205*0.087–0.214*0.070
Gini0.1140.253–0.0180.363
Public Debt0.198*0.092–0.0020.093
Education0.0350.1640.488*0.154
Tax0.357*0.1740.471*0.151
Military0.1230.0850.299*0.109
Number of obs.86228

[i] Note: *significant at the 5% error level. Coef. stands for coefficient. Std. Err. stands for standard error. Since the distributions of these variables are skewed to the right (log-normal), the variables are log-transformed to normalize the distributions.

Table 3

Relations among DA, EMB, and FCS.

Joint ModelingConditional Modeling
MCMCDAFCS
Non-MCMCEMB
Table 4

Summary of the 20 Studies on Multiple Imputation.

AuthorsMI AlgorithmsSample SizeNumber of VariablesNumber of ImputationsNumber of IterationsMissing Rate
Barnard and Rubin (1999)DA10, 20, 3023, 5, 10Unknown10%, 20%, 30%
Horton and Lipsitz (2001)DA, FCS1000031020050%
Schafer and Graham (2002)DA50220Unknown73%
Donders et al. (2006)FCS500210Unknown40%
Abe and Iwasaki (2007)DA1004510020%, 30%
Horton and Kleinman (2007)DA, EMB, FCS1337741010541%
Stuart et al. (2009)FCS9186400101018%
Lee and Carlin (2010)DA, FCS10008201033%
Leite and Beretvas (2010)DA4001010Unknown10%, 30%, 50%
Hardt, Herke, and Leonhart (2012)DA, EMB, FCS50, 100, 2003, 13, 23, 43, 8320Unknown20%, 50%
Lee and Carlin (2012)DA1000820Unknown10%, 25%, 50%, 75%, 90%
Cranmer and Gill (2013)EMB, MHD5005UnknownNA20%, 50%, 80%
Cheema (2014)FCS10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 100004UnknownUnknown1%, 2%, 5%, 10%, 20%
Kropko et al. (2014)DA, EMB, FCS1000853025%
Shara et al. (2015)Unknown22468UnknownUnknown20%, 30%, 40%
Deng et al. (2016)FCS100200, 1000102040%
von Hippel (2016)DA25, 10025Unknown50%
Hughes, Sterne, and Tilling (2016)Unknown100, 1000550Unknown40%, 60%
McNeish (2017)DA, FCS20, 50, 100, 25045, 25, 100Unknown10%, 20%, 30%, 50%

[i] Note: DA stands for Data Augmentation, EMis for Expectation-Maximization with Importance Sampling, FCS for Fully Conditional Specification, EMB for Expectation-Maximization with Bootstrapping, and MHD for Multiple Hot Deck. Unknown means that information is unavailable. NA means Not-Applicable.

Table 5

Abbreviations and the Missing Data Methods.

AbbreviationsMissing Data Methods
CDComplete data without missing values
LDListwise deletion
EMBMI by AMELIA II
DA1MI by NORM2 with no iterations
DA2MI by NORM2 with 2*EM iterations
FCS1MI by MICE with no iterations
FCS2MI by MICE with 2*EM iterations
D-SIDeterministic SI by norm.predict in MICE
S-SIStochastic SI by norm.nob in MICE
Table 6

Bias and RMSE (Theoretical Data).

2345678910
CDBias0.0010.0030.0010.0020.0010.0010.0010.0020.001
RMSE0.0400.0470.0380.0390.0580.0260.0460.0390.047
LDBias0.0320.1350.1050.1040.3320.0850.1290.2100.116
RMSE0.0590.1530.1220.1210.3490.1030.1600.2280.155
EMBBias0.0000.0040.0020.0000.0050.0010.0050.0050.002
RMSE0.0460.0530.0500.0510.0750.0410.0690.0590.072
DA1Bias0.0010.0020.0030.0010.0010.0000.0030.0030.002
RMSE0.0460.0530.0500.0510.0740.0410.0690.0580.072
DA2Bias0.0020.0010.0050.0020.0010.0000.0010.0030.000
RMSE0.0460.0530.0500.0510.0740.0410.0690.0580.072
FCS1Bias0.0020.0010.0820.0400.0900.0470.0930.0270.233
RMSE0.0470.0530.0970.0620.1160.0650.1090.0520.239
FCS2Bias0.0010.0020.0040.0020.0010.0000.0010.0020.001
RMSE0.0460.0530.0500.0510.0750.0410.0690.0580.071
D-SIBias0.1860.2420.1740.0930.1870.0980.2310.0700.163
RMSE0.1920.2480.1820.1100.2070.1090.2480.0990.189
S-SIBias0.0020.0000.0810.0380.0900.0470.0910.0290.230
RMSE0.0500.0570.1020.0660.1240.0760.1190.0620.241

[i] Note: Biased results are in boldface, i.e., Bias > 0.010.

Table 7

Coverage of the 95% CI (Theoretical Data).

2345678910
CD95.394.994.294.096.096.095.394.994.6
LD88.547.954.656.710.865.169.232.178.1
EMB95.095.194.295.594.994.494.394.195.0
DA194.694.993.293.194.191.892.992.492.9
DA294.395.895.194.194.894.394.293.294.9
FCS194.295.075.091.684.495.584.596.86.8
FCS294.795.694.493.995.494.594.295.095.0
D-SI0.80.22.237.822.216.98.351.022.5
S-SI88.989.647.875.062.364.448.976.03.7

[i] Note: Confidence invalid results are in boldface, i.e., outside of 93.6 and 96.4.

Table 8

Lengths of the 95% CI (Theoretical Data).

2345678910
CD0.1570.1840.1440.1480.2360.1020.1840.1510.180
LD0.1890.2590.2260.2350.3840.2130.3580.3390.390
EMB0.1780.2090.1960.2000.3010.1600.2750.2290.281
DA10.1760.2070.1870.1920.2930.1450.2560.2080.253
DA20.1770.2080.1940.1980.2980.1580.2710.2230.274
FCS10.1780.2090.2370.2110.3240.2480.3060.2230.299
FCS20.1780.2090.1970.2010.3020.1610.2750.2280.281
D-SI0.1430.1740.1330.1490.2440.1030.2050.1500.188
S-SI0.1570.1840.1610.1550.2380.1450.1880.1490.186
Table 9

Computational Time (Theoretical Data).

2345678910
EMB0.460.530.530.590.710.780.971.271.69
DA20.100.160.290.420.551.091.392.223.63
FCS22.475.9814.4821.3325.4054.7159.1485.69133.17

[i] Note: Reported values are the time in seconds to perform multiple imputation, which is averaged over 1,000 simulation runs. The fastest results are in boldface.

Table 10

Bias and RMSE (Realistic Data).

2345678910
CDBias0.0030.0020.0020.0020.0010.0020.0000.0020.002
RMSE0.0740.0860.0680.0670.0660.0650.0700.0690.075
LDBias0.0340.0470.0370.0540.0820.0990.0830.0720.085
RMSE0.0950.1280.1040.1180.1410.1540.1570.1590.188
EMBBias0.0010.0020.0020.0050.0010.0000.0000.0020.006
RMSE0.0840.1130.0910.0900.0890.0920.1020.0990.110
DA1Bias0.0060.0010.0030.0030.0010.0010.0010.0010.002
RMSE0.0840.1120.0900.0890.0870.0910.1000.0960.105
DA2Bias0.0090.0000.0020.0040.0020.0040.0000.0010.001
RMSE0.0840.1110.0890.0880.0860.0900.0980.0940.102
FCS1Bias0.0070.0130.0060.0050.0020.0080.0060.0120.000
RMSE0.0840.1060.0810.0810.0800.0810.0860.0830.088
FCS2Bias0.0070.0010.0020.0020.0030.0050.0020.0030.005
RMSE0.0840.1120.0880.0880.0860.0900.0970.0930.100
D-SIBias0.1880.0750.0110.0350.0370.0470.0230.0340.059
RMSE0.2070.1630.1150.1180.1180.1230.1300.1270.151
S-SIBias0.0050.0140.0070.0060.0020.0060.0050.0090.006
RMSE0.0890.1160.0960.0950.0910.0940.1000.1020.105

[i] Note: Biased results are in boldface, i.e., Bias > 0.010.

Table 11

Coverage of the 95% CI (Realistic Data).

2345678910
CD94.695.395.894.795.296.494.695.394.8
LD92.291.692.891.586.885.089.890.090.8
EMB94.394.194.793.996.194.294.094.494.7
DA194.192.294.493.495.792.293.192.993.1
DA294.094.094.894.495.994.593.895.095.0
FCS194.694.796.396.797.097.096.796.997.7
FCS294.793.895.595.796.494.394.895.296.1
D-SI32.774.579.277.677.774.175.375.168.8
S-SI87.983.282.382.584.282.181.080.381.2

[i] Note: Confidence invalid results are in boldface, i.e., outside of 93.6 and 96.4.

Table 12

Lengths of the 95% CI (Realistic Data).

2345678910
CD0.2790.3340.2680.2660.2670.2610.2780.2740.289
LD0.3330.4410.3890.4120.4360.4570.5160.5430.631
EMB0.3140.4290.3640.3560.3620.3590.3970.3960.432
DA10.3130.4140.3480.3420.3430.3370.3700.3640.390
DA20.3150.4230.3560.3510.3530.3510.3830.3800.410
FCS10.3150.4160.3530.3480.3500.3500.3820.3800.406
FCS20.3160.4290.3590.3550.3580.3520.3890.3860.413
D-SI0.2880.3800.2920.2890.2910.2780.3020.2940.315
S-SI0.2810.3250.2620.2570.2590.2550.2690.2670.277
Table 13

Computational Time (Realistic Data).

2345678910
EMB0.140.150.160.200.230.280.360.440.53
DA20.040.050.060.100.150.220.330.470.67
FCS21.052.554.228.9212.0215.5920.8226.7835.95

[i] Note: Reported values are the time in seconds to perform multiple imputation, which is averaged over 1,000 simulation runs. The fastest results are in boldface.

Language: English
Submitted on: Nov 30, 2016
Accepted on: Jun 23, 2017
Published on: Jul 28, 2017
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2017 Masayoshi Takahashi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.