Finding Interactive Relationship in Logit Models
Interactive term & Subsample
- interactive relationship= conditional effect
- product regressor=product term=interactive term
x: independent variable, e.g., the total Chinese aid received in this province
d: a dummy variable of a group ID, e.g., Chinese and non-Chinese
y: dependent variable, e.g., Opinion toward China
RQ: Does receive more Chinese aid in their province generate a positive opinion toward China?
Hypotheses:
a) People who reside in provinces that have received more China aid have a more positive opinion about China government.
b) Chinese people have a more positive opinion about China government.
c) (an interactive relationship) Compared to Chinese, non-Chinese who reside in provinces with more China aid have a more positive opinion about China's government.
Q:
When I did a subsample, the result confirms my argument of heterogeneous effect. Among the non-Chinese population, more aid received in their provinces, they are more likely to have a positive opinion toward China. This result is statistically significant. Among the Chinese population, there is no statistical significance. Yet, if I interact with Chinese*aid(x), it is statistically insignificant. Why there is a different result?
Though this question is in the context of regression models, this question also occurs frequently in experimental design. See these slides made by Matthew Blackwell.
1. Why subsample is not a good way to detect if there is an interactive relationship?
a. Even if you find a statistically significant effect of x on y from group 1, it does not mean there is a statistically significant effect in a full sample.
b. In a full sample, other covariates affect the relationship across x1, x2, and the product regressor. In other words, unless you have no control variables or covariates in your model, these two results (the subsample and the full sample) are always different.
2. Does the model specification make a difference in finding whether there is an interactive relationship? Yes!
The interaction effect works differently in
linear regression models and logit models. (See
In a linear regression model, when you find statistical
significance in your product regressor, it basically said that there is an
interaction effect. "If you use the concept Y*, then test for interaction by testing the significance of the product term."
Yet, in a logit model, since there may be a false positive interactive relationship due to compression, examining the second difference is essential to find if there is an interactive effect. Rainey (2016) explains how the logit model may introduce a false positive even if the true data generation process does not contain any statistically significant interaction. "If you rely on Pr(Y), then test for interaction by simulating confidence intervals around a carefully chosen second-difference or second derivative."(Rainey, C. (2016: 633))
Therefore, to determine whether there is a conditional relationship in a logit model, we need to run margins (to see "the difference in first derivatives") and mlincom (to see the second-difference) in Stata. See professor DavidArmstrong’s note on this issue.
If I suspect that there is a conditional
effect of an independent variable with a control variable, use LASSO instead of
putting a lot of product regressors (interactive terms).
Note about derivatives (or see this website):
a) first derivative: measures the rate at which the original function changes. =the slope of the function at a specific point= the rate of change
b) second derivative: measures the rate at which the first derivative changes. =the pattern of the function. Concave or Convex =how fast the change is?
3. Difference between the odds ratio and marginal effect
a) odds ratio; Ω(x)
multiplicative change in Ω(x) for change in Xk holding other variables constant.
Ω(x)=𝜋(x)/(1-𝜋(x))=exp(x'β)
b) marginal effect; 𝜋(x)
additive change in probability for change in Xk holding other variables at specific values.
𝜋(x)=exp(x'β)/(1+exp(x'β)
3. Does it matter if one sample (Chinese) has only 354 people, but the other sample (non-Chinese) has 3499 people? Does this size difference affect the result?
No, it does not affect the result.
Notes from Rainey, C. (2016)
second-difference: "the difference in the effect of Pr(Y) of changing X from a low value to a high value as Z changes from a low value to a high value."
Reference:
Blackwell, M., & Olson, M. (2020). Reducing model misspecification and bias in the estimation of interactions. Political Analysis.
Rainey, C. (2016). Compression and conditional effects: A product term is essential when using logistic regression to test for interaction. Political Science Research and Methods, 4(3), 621-639.
Comments
Post a Comment