Centralized processing （mean centering） The myth and truth of - Blog

[{"createTime":1735734952000,"id":1,"img":"bandupan_350_218.jpg","link":"https://pan.baidu.com/s/1T03izdWtRSeMqOXoT9HCug?pwd=draw","name":"百度网盘下载","status":9,"txt":"百度网盘下载","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"qk_443_300.png","link":"https://pan.quark.cn/s/6229b93c70d0","name":"夸克网盘","status":9,"txt":"夸克网盘","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

Let's talk about centralization “ Multicollinearity ”. Multicollinearity means that in a regression model , There is a high correlation between variables . Multicollinearity can cause significant regression coefficients to become insignificant ; Because this variable is highly correlated with other predictive variables , When other variables are controlled constant , The variable is also largely invariant , The explanation rate of variance of dependent variable is very low , So it's not significant .

Some scholars are worried , In the adjustment model ,X and M May be with XM High correlation , Leading to Multicollinearity , So the estimation of regression coefficient is not accurate , There is a big standard error , Reducing the statistical testing power of interaction

“X and M are likely to be highly correlated with XM and this will produce
estimation problems caused by multicollinearity and result in poor or “strange”
estimates of regression coefficients, large standard errors, and reduced power
of the statistical test of the interaction.”

So they suggest that it's right X and M All of them are centralized （mean centering）, That is, subtract the corresponding average , And then they go back ：

perhaps ：

In order to test whether centralization can really increase the accuracy of regression coefficient estimation , And the testing power of interaction , This paper uses the same batch of data , The regression analysis of decentralization and centralization was carried out

Model 1 It is the result of decentralization regression , Model 2 It is the result of regression after centralization ; We can see that both regression coefficients , What is the standard error t Value or p value , Only the interaction items remain unchanged .

therefore , Whether to centralize or not has no effect on the prediction result of the coefficient of the interaction term .

Does centralization affect multicollinearity among predictors , And the standard error of the coefficient of the interaction term （ That is, the width of its confidence interval , Or the accuracy of the estimates ） What about ?

In multiple regression analysis , Predictive variable j The standard error of the coefficient can be calculated by the following formula ：

among , It's using other variables to predict variables j The square of the multivariate correlation coefficient （ It's a variable j As dependent variable , Other variables are used as independent variables , Worked it out R square）; Is the variance of the predicted variable ;n It's the sample size ;
It's prediction Y The error of regression model ( It's in the previous formula ） Mean square of ;

Represents a variable j The ratio of variance that can be explained by other predictors ; It's a ratio that can't be explained , They are called predictive variables j Tolerance of （tolerance）; Its reciprocal is called
Variance expansion factor （variance inflation factor,VIF）; Therefore, the above formula for calculating the standard error of coefficients can be rewritten as ：

It can be seen from the formula ,VIF It can represent predictive variables j To what extent is the standard error of the coefficient affected by its correlation with other predictors . The more relevant they are , The lower the tolerance , The larger the variance expansion factor is , As a result, the larger the standard error of the coefficient is .

Back to the data ,

We found that , Before centralized processing ,X and XM The correlation coefficient is very large （.766）, thus XM The tolerance is very small （0.061）, The variance expansion factor is very large （16.357）; But after centralized processing ,X and XM The correlation coefficient is smaller （.092）,XM And tolerance has increased （0.991）, The variance expansion factor also decreased （1.009）;

therefore , Centralization can indeed reduce the correlation between predictive variables , Reduce the impact of Multicollinearity

however , Does it affect the standard error of the coefficient of the interaction term , Which will affect the test of the effect of interaction terms ?

Back to the previous formula ,

We find that the standard error of the influence coefficient is not treated by the variance expansion factor , There are also predictive variables j Variance of （）
, The smaller the variance of prediction variables is , The larger the standard error of coefficient is ; Return to the table of data results , We can see that , After centralized processing , The variance of the interaction term is also reduced , This will increase the standard error of the coefficient of the interaction term ; We've seen before, after centralization , The variance expansion factor as a molecule decreases , This reduces the standard error , Now we find out again , After centralized processing , The variance as the denominator also decreases , This will increase the standard error , Is the final standard error larger or smaller ?

We can see from the data table above , The reduction of expansion factor is the same as that of variance reduction （16.357 / 1.009 = 16.21; 9489.221 /
585.166 = 16.221）;
So the last synthesis , Predictive variable XM The standard error of the coefficient has not changed ; So we can't say , Central processing can reduce the standard error of the coefficient of interaction terms , Or centralization can increase the testing power of interaction effects .

although , Centralization is not as divine as some scholars claim , But when your regression model is due to multicollinearity , There is a high correlation between the predicted variables , Not in the SPSS When we do the analysis （ because SPSS The default minimum tolerance is 0.000001）, You can reduce the correlation between variables by centralization , But it's rare （ in my submission , Tolerance is so small , So the correlation between this prediction variable and other variables must be very large , In this case, we should consider whether there is a problem with the experimental design , Instead of using statistical methods to reduce multicollinearity ）.

however , We have only shown that centralization has no effect on the prediction of interaction coefficients , But our previous data show that ,X and M The regression coefficient is changed , Will centralization affect the processing ? Write later when you have time , hey ~

reference ：

Hayes, A. F. (2013). Truths and Myths about Mean Centering. Introduction to
mediation, moderation, and conditional process analysis: a regression-based
approach (pp. 282-288). New York: The Guiford Press.

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...