Have a personal or library account? Click to login
On Proxy Variables and Categorical Data Fusion Cover

On Proxy Variables and Categorical Data Fusion

By: Li-Chun Zhang  
Open Access
|Dec 2015

Abstract

The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.

Language: English
Page range: 783 - 807
Submitted on: Jul 1, 2013
|
Accepted on: Sep 1, 2015
|
Published on: Dec 16, 2015
Published by: Sciendo
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2015 Li-Chun Zhang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.