Errors-in-variables model
Errors-in-Variables is a robust modeling technique in statistics, which assumes that every variable can have error or noise. Errors-in-Variables (EIV) is also referred to as Total least squares (TLS), in a broad sense, in the literature of computational mathematics and engineering. However, TLS in a strict sense implies the application of EIV or orthogonal regression to a linear model <math>\mathbf{A x} = \mathbf{b}<math>.
Robust linear regression
In linear regression, one field of statistics, the least squares (LS) has variant versions according to the error configuration such as Total least squares, Data least squares (DLS), Constrained or structured TLS and so on.
Given an observation vector <math>\mathbf{b} \in \reals^n<math> and a data matrix <math>\mathbf{A} \in \reals^{n \times m}<math>, consider the solution of the overdetermined system of equations <math>\mathbf{Ax \approx b}<math>. The ordinary least square method (OLS) yields the solution <math>\mathbf{x}<math> that minimizes the Euclidean norm of error or residual <math>||{\mathbf{Ax-b}}||_2<math>. Equivalently, the problem can be solved by
- <math> \min_{\mathbf{x}}||\Delta\mathbf{b}||_2 \quad
\mbox{ subject to }\quad \mathbf{Ax}=\mathbf{b}+\Delta\mathbf{b}. <math>
If the data matrix <math>\mathbf{A}<math> is also noisy, the OLS solution is no longer optimal. In such case, TLS can offer a proper formulation:
- <math> \min_{\mathbf{x}} ||{[{\Delta\mathbf{A}\,\Delta\mathbf{b}}]}||_F \quad
\mbox{ subject to }\quad (\mathbf{A}+\Delta\mathbf{A})\mathbf{x}=\mathbf{b}+\Delta\mathbf{b},<math>
where <math>||{\cdot}||_F<math> is the Frobenius norm; and the perturbations <math>\Delta\mathbf{A}<math> and <math>\Delta\mathbf{b}<math> are used to compensate for the noisy signals <math>\mathbf{A}<math> and <math>\mathbf{b}<math>, respectively. This formulation of TLS also implies that the errors are identically distributed both in <math>\mathbf{A}<math> and <math>\mathbf{b}<math>. Note that the objective can have a weighting matrix according to the distribution of errors if the distribution is known or well-estimated, which is called the constrained or structured TLS.
In the other case, where the noise is only in <math>\mathbf{A}<math>, DLS can be used alternatively as
- <math> \min_{\mathbf{x}} ||{[{\Delta\mathbf{A}}]}||_F \quad \mbox{ subject to } \quad (\mathbf{A}+\Delta\mathbf{A})\mathbf{x}=\mathbf{b}.<math>
The solution of OLS can be obtained using (pseudo-)inverse of data matrix. The other solutions of TLS or DLS have been shown to be closely connected to a set of singular vectors of (augmented) system-related matrix corresponding to the minimum singular value.
References
- S. V. Huffel and P. Lemmerling, Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2002.
- S. Jo and S. W. Kim, "Consistent normalized least mean square filtering with noisy data matrix," accepted for publication in IEEE Trans. Signal Processing, 2004.
- R. D. DeGroat and E. M. Dowling, "The data least squares problem and channel equalization," IEEE Trans. Signal Processing, vol. 41, no. 1, pp. 407411, Jan. 1993.
- T. Abatzoglou and J. Mendel, "Constrained total least squares," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP87), Apr. 1987, vol. 12, pp. 14851488.
Categories: Statistics