Essay for you

Local Average Treatment Effect Assumptions And Critical Thinking

Category: Critical thinking


EconPapers: Testing Local Average Treatment Effect Assumptions

Testing Local Average Treatment Effect Assumptions

Ismael Mourifie and Yuanyuan Wan

Working Papers from University of Toronto, Department of Economics

Abstract: In this paper, we discuss the key conditions for the identification and estimation of the local average treatment effect (LATE, Imbens and Angrist, 1994): the valid instrument assumption (LI) and the monotonicity assumption (LM). We show that the joint assumptions of LI and LM have a testable implication that can be summarized by a sign restriction defined by a set of intersection bounds. We propose an easy-to-implement testing procedure that can be analyzed in the framework of Chernozhukov, Lee, and Rosen (2013) and implemented using the Stata package of Chernozhukov, Kim, Lee, and Rosen (2013). We apply the proposed tests to the "draft eligibility" instrument in Angrist (1991), the "college proximity" instrument in Card (1993) and the "same sex" instrument in Angrist and Evans (1998).

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

More papers in Working Papers from University of Toronto, Department of Economics 150 St. George Street, Toronto, Ontario.
Series data maintained by RePEc Maintainer ( ).

This site is part of RePEc and all the data displayed here is part of the RePEc data set.

Is your work missing from RePEc? Here is how to contribute.

Questions or problems? Check the EconPapers FAQ or send mail to.

Other articles

CiteSeerX - Citation Query Identification and Estimation of Local Average Treatment Effects

by Esther Duflo, Emmanuel Saez. 2002

". This paper analyzes a randomized experiment to shed light on the role of information and social interactions in employees’ decisions to enroll in a Tax Deferred Account (TDA) retirement plan within a large university. The experiment encouraged a random sample of employees in a subset of department. "

This paper analyzes a randomized experiment to shed light on the role of information and social interactions in employees’ decisions to enroll in a Tax Deferred Account (TDA) retirement plan within a large university. The experiment encouraged a random sample of employees in a subset of departments to attend a benefits information fair organized by the university, by promising a monetary reward for attendance. The experiment multiplied by more than 5 the attendance rate of these treated individuals (relative to controls), and tripled that of untreated individuals within departments where some individuals were treated. TDA enrollment 5 and 11 months after the fair was significantly higher in departments where some individuals were treated than in departments where nobody was treated. However, the effect on TDA enrollment is almost as large for individuals in treated departments who did not receive the encouragement as for those who did. We provide three interpretations, differential treatment effects, social network effects, and motivational reward effects, to account for these results.

roup 11, 10, or 00. Obviously, for each individual ij, we observe only one of the three potential outcomes for fair attendance. As the literature on differential treatment effects has recognized (see =-=Imbens and Angrist, 1994-=-), in order to be able to identify parameters of interest, we need to make the following assumption: Assumption 1 Monotonicity assumption: For each individual i, fij(11) ≥ fij(10) ≥ fij(00). This assu.

by D. Angrist, William N. Evans, D. Angrist, William, N. Evans, R. Rosenzweig, Kenneth I. Wolpin. 1998

". JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms. "

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

How to Cite Estimating average and local average treatment effects of education when compulsory schooling laws really matter from The American Economi

Related Citations
  • Wealth neutrality and local choice in public education from The American Economic Review
  • Human capital versus sorting: the effects of compulsory attendance laws from The Quarterly Journal of Economics
  • International measures of schooling years and schooling quality from The American Economic Review
  • Volume, volatility, price, and profit when all traders are above average from The Journal of Finance
  • Industry concentration and average stock returns from The Journal of Finance
  • Nonparametric bounds on treatment effects from The American Economic Review
  • A pricing method for options based on average asset values from Journal of Banking & Finance
  • Why are there returns to schooling? from The American Economic Review
  • Estimates of the economic return to schooling for the United Kingdom from The American Economic Review
  • Characteristics, covariances, and average returns: 1929 to 1997 from The Journal of Finance
Popular Citations
  • The European Central Bank: Decision Rules and Macroeconomic Performance from CEPR Discussion Papers
  • The disposition effect and underreaction to news from The Journal of Finance
  • Validating Instruments in MIS Research from Management Information Systems Quarterly
  • Breast Cancer Subtype Approximated by Estrogen Receptor, Progesterone Receptor, and HER-2 Is Associated With Local and Distant Recurrence After Breast-Conserving Therapy from Journal of Clinical Oncology
  • Time variations and covariations in the expectation and volatility of stock market returns from The Journal of Finance

© 2016 WebFinance Inc. All Rights Reserved.
Unauthorized duplication, in whole or in part, is strictly prohibited.

Nonparametric IV estimation of local average treatment effects with covariates

Nonparametric IV estimation of local average treatment effects with covariates Abstract

In this paper nonparametric instrumental variable estimation of local average treatment effects (LATE) is extended to incorporate covariates. Estimation of LATE is appealing since identification relies on much weaker assumptions than the identification of average treatment effects in other nonparametric instrumental variable models. Including covariates in the estimation of LATE is necessary when the instrumental variable itself is confounded, such that the IV assumptions are valid only conditional on covariates. Previous approaches to handle covariates in the estimation of LATE relied on parametric or semiparametric methods. In this paper, a nonparametric estimator for the estimation of LATE with covariates is suggested that is root-n asymptotically normal and efficient.

JEL classification Keywords
  • Instrumental variables ;
  • LATE ;
  • Evaluation ;
  • Treatment effect ;
  • Matching ;
  • Unobserved heterogeneity

Corresponding author at: Universität St.Gallen, Bodanstrasse 8, SIAW, 9000 St. Gallen, Switzerland. Tel. 41 71 224 2329; fax:41 71 224 2298. 1

The author is also affiliated with the Institute for the Study of Labor (IZA), Bonn.

Copyright © 2006 Elsevier B.V. All rights reserved.

The microeconometric estimation of treatment effects—An overview

The microeconometric estimation of treatment effects—An overview

Received: 14 March 2005 Revised: 26 July 2005

Cite this article as: Caliendo, M. & Hujer, R. Allgemeines Statistisches Arch (2006) 90: 199. doi:10.1007/s10182-006-0230-4


The need to evaluate the performance of active labour market policies is not questioned any longer. Even though OECD countries spend significant shares of national resources on these measures, unemployment rates remain high or even increase. We focus on microeconometric evaluation which has to solve the fundamental evaluation problem and overcome the possible occurrence of selection bias. When using non-experimental data, different evaluation approaches can be thought of. The aim of this paper is to review the most relevant estimators, discuss their identifying assumptions and their (dis-)advantages. Thereby we will present estimators based on some form of exogeneity (selection on observables) as well as estimators where selection might also occur on unobservable characteristics. Since the possible occurrence of effect heterogeneity has become a major topic in evaluation research in recent years, we will also assess the ability of each estimator to deal with it. Additionally, we will also discuss some recent extensions of the static evaluation framework to allow for dynamic treatment evaluation.


Evaluation effect heterogeneity matching dynamic treatments JEL C40, H43, J68

The authors thank Stephan L. Thomsen, Christopher Zeiss and one anonymous referee for valuable comments. The usual disclaimer applies.


Abbring, J. H. Van den Berg, G. J. (2003). The non-parametric identification of treatment effects in duration models. Econometrica71 1491–1517. MATH MathSciNet CrossRef

Angrist, J. (1998). Estimating the labor market impact of voluntary military service using social security data on military applicants. Econometrica66 249–288. MATH CrossRef

Angrist, J. D. Imbens, G. W. Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association91. 444–472. MATH CrossRef

Ashenfelter, O. (1978). Estimating the effects of training programs on earnings. Review of Economics and Statistics60 47–57. CrossRef

Blundell, R. Costa Dias, M. (2002). Alternative approaches to evaluation in empirical microeconomics. Portuguese Economic Journal1 91–115. CrossRef

Blundell, R. Dearden, L. Sianesi, B. (2004). Evaluating the impact of education on earnings in the UK: Models, methods and results from the NCDS. Working Paper No. 03/20, The Institute of Fiscal Studies, London.

Caliendo, M. Kopeinig, S. (2005). Some practical guidance for the implementation of propensity score matching. Discussion Paper No. 1588, IZA, Bonn.

Dawid, A. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society, Series B41 1–31. MATH MathSciNet

Fay, R. (1996). Enhancing the effectiveness of active labor market policies: Evidence from programme evaluations in OECD countries. Labour Market and Social Policy Occasional Papers. OECD, Paris.

Frederiksson, P. Johansson, P. (2004). Dynamic treatment assignment—The consequences for evaluations using observational data. Discussion Paper No. 1062, IZA, Bonn.

Frölich, M. (2002). Programme Evaluation and Treatment Choice. Lecture Notes in Economics and Mathematical Systems, Springer, Berlin. MATH

Heckman, J. (1978). Dummy endogenous variables in a simultaneous equation system. Econometrica46 931–959. MATH MathSciNet CrossRef

Heckman, J. (2001). Micro data, heterogeneity, and the evaluation of public policy: Nobel lecture. Journal of Political Economy109 673–748. CrossRef

Heckman, J. Ichimura, H. Smith, J. Todd, P. (1998). Characterizing selection bias using experimental data. Econometrica66 1017–1098. MATH MathSciNet CrossRef

Heckman, J. Ichimura, H. Todd, P. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Review of Economic. Studies64 605–654. MATH CrossRef

Heckman, J. LaLonde, R. Smith, J. (1999). The economics and econometrics of active labor market programs. In Handbook of Labor Economics Vol. III (O. Ashenfelter, D. Card, eds.), 1865–2097. Elsevier, Amsterdam.

Heckman, J. Robb, R. (1985). Alternative methods for evaluating the impact of interventions—An overview. Journal of Econometrics30 239–267. MATH CrossRef

Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association81 945–960. MATH MathSciNet CrossRef

Hui, S. Smith, J. (2002). The labor market impacts of adult education and training in Canada. Report prepared for the Human Resources Development Canada (HRDC), Quebec.

Imbens, G. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. The Review of Economics and Statistics86 4–29. CrossRef

Imbens, G. Angrist, J. (1994). Identification and estimation of local average treatment effects. Econometrica62 467–475. MATH CrossRef

Lechner, M. (2002). Some practical issues in the evaluation of heterogenous labour market programmes by matching methods. Journal of the Royal Statistical Society, Series A165 59–82. MATH MathSciNet

Lechner, M. (2004). Sequential matching estimation of dynamic causal models. Discussion Paper No. 1042, IZA, Bonn.

Lechner, M. Miquel, R. (2002). Identification of effects of dynamic treatments by sequential conditional independence assumptions. Working Paper, SIAW, University St. Gallen.

Puhani, P. A. (2000). The Heckman correction for sample selection and its critique. Journal of Economic Surveys14 53–68. CrossRef

Rosenbaum, P. R. (2002). Observational Studies. Springer, New York. MATH

Rosenbaum, P. Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika70 41–50. MATH MathSciNet CrossRef

Roy, A. (1951). Some thoughts on the distribution of earnings. Oxford Economic Papers3 135–145.

Rubin, D. (1974). Estimating causal effects to treatments in randomised and nonrandomised studies. Journal of Educational Psychology66 688–701. CrossRef

Sianesi, B. (2004). An evaluation of the active labour market programmes in Sweden. The Review of Economics and Statistics86 133–155. CrossRef

Smith, J. (2000). A critical survey of empirical methods for evaluating active labour market policies. Schweizerische Zeitschrift für Volkswirtschaft und Statistik136 1–22.

Smith, J. (2004). Evaluating local development policies: Theory and practice. Working Paper, University of Maryland.

Smith, J. Todd, P. (2005). Does matching overcome LaLonde's critique of non-experimental estimators? Journal of Econometrics125 305–353. MathSciNet CrossRef

Vytlacil, E. (2002). Independence, monotonicity and latent index models: An equivalence result. Econometrica70 331–341. MATH CrossRef

Copyright information

Testing Local Average Treatment Effect Assumptions by Ismael Mourifie, Yuanyuan Wan

Testing Local Average Treatment Effect Assumptions Ismael Mourifie

University of Toronto - Department of Economics

Yuanyuan Wan

University of Toronto - Department of Economics

November 19, 2015

In this paper, we discuss the key conditions for the identification and estimation of the local average treatment effect (LATE, Imbens and Angrist, 1994): the valid instrument assumption (LI) and the monotonicity assumption (LM). We show that the joint assumptions of LI and LM have a testable implication that can be summarized by a sign restriction defined by a set of intersection bounds. We propose an easy-to-implement testing procedure that can be analyzed in the framework of Chernozhukov, Lee, and Rosen (2013) and implemented using the Stata package of Chernozhukov,Kim, Lee, and Rosen (2013). We apply the proposed tests to the “draft eligibility” instrument in Angrist (1991), the “college proximity” instrument in Card (1993) and the “same sex” instrument in Angrist and Evans (1998).

Number of Pages in PDF File: 22

Keywords: LATE, hypothesis testing, intersection bounds, conditionally more compliers

JEL Classification: C12, C15, C21

Date posted: April 27, 2014 ; Last revised: December 11, 2015 Suggested Citation

Mourifie, Ismael and Wan, Yuanyuan, Testing Local Average Treatment Effect Assumptions (November 19, 2015). Available at SSRN: or

Contact Information Ismael Mourifie (Contact Author)

University of Toronto - Department of Economics ( email )

150 St. George Street
Toronto, Ontario M5S3G7

Does Regression Produce Representative Estimates of Causal Effects, P

A “causal empiricist” turn has swept through economics over the past couple decades. As a result, many economists are primarily interested in internally valid treatment effects according to the causal models of Rubin, meaning they are interested in credible statements of how some outcome Y is affected if you manipulate some treatment T given some covariates X. That is, to the extent that full functional form Y=f(X,T) is impossible to estimate because of unobserved confounding variables or similar, it turns out to still be possible to estimate some feature of that functional form, such as the average treatment effect E(f(X,1))-E(f(X,0)). At some point, people like Angrist and Imbens will win a Nobel prize not only for their applied work, but also for clarifying precisely what various techniques are estimating in a causal sense. For instance, an instrumental variable regression under a certain exclusion restriction (let’s call this an “auxiliary assumption”) estimates the average treatment effect along the local margin of people induced into treatment. If you try to estimate the same empirical feature using a different IV, and get a different treatment effect, we all know now that there wasn’t a “mistake” in either paper, but rather than the margins upon which the two different IVs operate may not be identical. Great stuff.

This causal model emphasis has been controversial, however. Social scientists have quibbled because causal estimates generally require the use of small, not-necessarily-general samples, such as those from a particular subset of the population or a particular set of countries, rather than national data or the universe of countries. Many statisticians have gone even further, suggestion that multiple regression with its linear parametric form does not take advantage of enough data in the joint distribution of (Y,X), and hence better predictions can be made with so-called machine learning algorithms. And the structural economists argue that the parameters we actually care about are much broader than regression coefficients or average treatment effects, and hence a full structural model of the data generating process is necessary. We have, then, four different techniques to analyze a dataset: multiple regression with control variables, causal empiricist methods like IV and regression discontinuity, machine learning, and structural models. What exactly do each of these estimate, and how do they relate?

Peter Aronow and Cyrus Samii, two hotshot young political economists, take a look at old fashioned multiple regression. Imagine you want to estimate y=a+bX+cT, where T is a possibly-binary treatment variable. Assume away any omitted variable bias, and more generally assume that all of the assumptions of the OLS model (linearity in covariates, etc.) hold. What does that coefficient c on the treatment indicator represent? This coefficient is a weighted combination of the individual estimated treatment effects, where more weight is given to units whose treatment status is not well explained by covariates. Intuitively, if you are regressing, say, the probability of civil war on participation in international institutions, then if a bunch of countries with very similar covariates all participate, the “treatment” of participation will be swept up by the covariates, whereas if a second group of countries with similar covariates all have different participation status, the regression will put a lot of weight toward those countries since differences in outcomes can be related to participation status.

This turns out to be quite consequential: Aronow and Samii look at one paper on FDI and find that even though the paper used a broadly representative sample of countries around the world, about 10% of the countries weighed more than 50% in the treatment effect estimate, with very little weight on a number of important regions, including all of the Asian tigers. In essence, the sample was general, but the effective sample once you account for weighting was just as limited as some of “nonrepresentative samples” people complain about when researchers have to resort to natural or quasinatural experiments! It turns out that similar effective vs. nominal representativeness results hold even with nonlinear models estimated via maximum likelihood, so this is not a result unique to OLS. Aronow and Samii’s result matters for interpreting bodies of knowledge as well. If you replicate a paper adding in an additional covariate, and get a different treatment effect, it may not reflect omitted variable bias! The difference may simply result from the additional covariate changing the effective weighting on the treatment effect.

So the “externally valid treatment effects” we have been estimating with multiple regression aren’t so representative at all. So when, then, is old fashioned multiple regression controlling for observable covariates a “good” way to learn about the world, compared to other techniques. I’ve tried to think through this is a uniform way; let’s see if it works. First consider machine learning, where we want to estimate y=f(X,T). Assume that there are no unobservables relevant to the estimation. The goal is to estimate the functional form f nonparametrically but to avoid overfitting, and statisticians have devised a number of very clever ways to do this. The proof that they work is in the pudding: cars drive themselves now. It is hard to see any reason why, if there are no unobservables, we wouldn’t want to use these machine learning/nonparametric techniques. However, at present the machine learning algorithms people use literally depend only on data in the joint distribution (X,Y), and not on any auxiliary assumptions. To interpret the marginal effect of a change in T as some sort of “treatment effect” that can be manipulated with policy, if estimated without auxiliary assumptions, requires some pretty heroic assumptions about the lack of omitted variable bias which essentially will never hold in most of the economic contexts we care about.

Now consider the causal model, where y=f(X,U,T) and you interested in what would happen with covariates X and unobservables U if treatment T was changed to a counterfactual. All of these techniques require a particular set of auxiliary assumptions: randomization requires the SUTVA assumption that treatment of one unit does not effect the independent variable of another unit, IV requires the exclusion restriction, diff-in-diff requires the parallel trends assumption, and so on. In general, auxiliary assumptions will only hold in certain specific contexts, and hence by construction the result will not be representative. Further, these assumptions are very limited in that they can’t recover every conditional aspect of y, but rather recover only summary statistics like the average treatment effect. Techniques like multiple regression with covariate controls, or machine learning nonparametric estimates, can draw on a more general dataset, but as Aronow and Samii pointed out, the marginal effect on treatment status they identify is not necessarily effectively drawing on a more general sample.

Structural folks are interested in estimating y=f(X,U,V(t),T), where U and V are unobserved, and the nature of unobserved variables V are affected by t. For example, V may be inflation expectations, T may be the interest rate, y may be inflation today, and X and U are observable and unobservable country characteristics. Put another way, the functional form of f may depend on how exactly T is modified, through V(t). This Lucas Critique problem is assumed away by the auxiliary assumptions in causal models. In order to identify a treatment effect, then, additional auxiliary assumptions generally derived from economic theory are needed in order to understand how V will change in response to a particular treatment type. Even more common is to use a set of auxiliary assumptions to find a sufficient statistic for the particular parameter desired, which may not even be a treatment effect. In this sense, structural estimation is similar to causal models in one way and different in two. It is similar in that it relies on auxiliary assumptions to help extract particular parameters of interest when there are unobservables that matter. It is different in that it permits unobservables to be functions of policy, and that it uses auxiliary assumptions whose credibility leans more heavily on non-obvious economic theory. In practice, structural models often also require auxiliary assumptions which do not come directly from economic theory, such as assumptions about the distribution of error terms which are motivated on the basis of statistical arguments, but in principle this distinction is not a first order difference.

We then have a nice typology. Even if you have a completely universal and representative dataset, multiple regression controlling for covariates does not generally give you a “generalizable” treatment effect. Machine learning can try to extract treatment effects when the data generating process is wildly nonlinear, but has the same nonrepresentativeness problem and the same “what about omitted variables” problem. Causal models can extract some parameters of interest from nonrepresentative datasets where it is reasonable to assume certain auxiliary assumptions hold. Structural models can extract more parameters of interest, sometimes from more broadly representative datasets, and even when there are unobservables that depend on the nature of the policy, but these models require auxiliary assumptions that can be harder to defend. The so-called sufficient statistics approach tries to retain the former advantages of structural models while reducing the heroics that auxiliary assumptions need to perform.

Aronow and Samii is forthcoming in the American Journal of Political Science; the final working paper is at the link. Related to this discussion, Ricardo Hausmann caused a bit of a stir online this week with his “constant adaptation rather than RCT” article. His essential idea was that, unlike with a new medical drug, social science interventions vary drastically depending on the exact place or context; that is, external validity matters so severely that slowly moving through “RCT: Try idea 1”, then “RCT: Try idea 2”, is less successful than smaller, less precise explorations of the “idea space”. He received a lot of pushback from the RCT crowd, but I think for the wrong reason: the constant iteration is less likely to discover underlying mechanisms than even an RCT, as it is still far too atheoretical. The link Hausmann makes to “lean manufacturing” is telling: GM famously (Henderson and Helper 2014 ) took photos of every square inch of their joint venture plant with NUMMI, and tried to replicate this plant in their other plants. But the underlying reason NUMMI and Toyota worked has to do with the credibility of various relational contracts, rather than the (constantly iterated) features of the shop floor. Iterating without attempting to glean the underlying mechanisms at play is not a rapid route to good policy.

Edit: A handful of embarrassing typos corrected, 2/26/2016

I’m confused why you claim machine learning can’t deal with omitted variables. Machine learning can be used to estimate parameters of any model. Different types of models require different machine learning methods, it’s not magic, but to say that omitted variables are out of scope just seems completely wrong. If your model has a catch-all error term, machine learning methods will find a parameter for it.

This is not correct. The problem is not that an omitted variable, if added, would fit the model better. The problem is that a particular form of omitted variable, in conjunction with auxiliary assumptions, will lend a causal interpretation to the marginal effects in the fitted model. Consider the simplest possible example, dating back to the 40s: supply and demand. You have a huge number of points (P,Q) representing price and quantity. What is the supply and demand curve? Price and quantity can both increase either because demand shifts out and supply is constant, or because supply shifts in and demand shifts out, or… We can use an IV of things known to shift only supply or demand to recover the curves. Now in principle the idea of IV should work with nonlinear ML style functions, but it turns out there are serious statistical problems with, say, nonlinear IV, and the same goes for other attempts to combine “causal assumptions” with “machine learned nonparametric regression”.

Thanks for another great post Kevin.

A note: I would say Arronow and Samii are two hotshot political scientists. Unless, of course, we are making an argument about the artificiality of the barriers between the two disciplines, in which case, I wholeheartedly agree.

Oh jeez – of course I mean “political scientists”, but us economists are imperialistic even subconsciously!

Thanks for the pointer to this interesting paper. However, I thought that Aronow and Samii are overselling their results a bit. The last section before the conclusion reveals that the typical setup of the matching literature (selection on observables plus common support or, in their terminology, unconfoundedness and positivity) allows one to estimate average causal effects that are representative for the population for which the assumptions hold. Aronow and Samii metion inverse probability weighting (IPW) rather than matching but this should be equivalent, as far as I know. Read with this in mind the paper reminds us that multiple linear regression makes strong functional form assumptions (like constant coefficients) and does not generally identify an average treatment effect on the treated when treatment responses are heterogeneous. It’s good to make this point clear. But I don’t think that this is a very novel insight, although I could be mistaken. Taking matching or IPW as a solution to the problem, the focus should be on the positivity assumption which is crucial for the external validity of the results. But again, it should be obvious that one relies heavily on out-of-sample-predictions (based on the assumed functional form) if positivity fails and instead any parametric regression technique is applied. Essentially one then would impute a counterfactual treatment status for data points where this counterfectual is observed with probability zero.
Am I doing the paper injustice here? Would love to hear your thoughts on this.

I was intrigued by your write up on Aronow and Samii, which lead me to believe that you consider the problem of external validity to be central to statistical methodology. I agree with you, and would like to call your attention to a recent solution of this problem.

If you examine this paper (or this or this ), you see that external validity, including transporting experimental findings across heterogeneous populations and generalizing from biased sample to the population at large has been reduced to syntactic derivation. It can safely be considered a “solved problem”, in the sense that we have a full mathematical characterization (if and only if condition) of when a causal query can be answered (transported) in the target with information from the source.

Thanks for an interesting and useful summary of the arguments. Would you agree that there is congruence here with a critical realist position of concern about conventional quantitative methods in the social sciences? Simplistically, this would argue that the search for ‘average causal effects’ across populations is illusory, and what we need are better theories on context-specific configurations of causes for specified types of cases.

Seasonality of treatment and Average Treatment Effect - Cross Validated

I have panel data of sales for many stores in two comparable cities. One of the cities holds a special event once a month (the treatment ) which is expected to boost sales across the board on that day. I would like to estimate the average treatment effect on the treated, i.e. identify whether or not the special event does what it should.

While the two cities are quite comparable (and I can find a number of controls for the stores), sampling is clearly not random. Furthermore, there is autocorrelation in sales between dates at each individual store, and it seems reasonable to think the monthly seasonality of the treatment has some impact.

My question is twofold.

1/ Do you think running a difference-in-differences analysis in this “experimental” setup makes sense at all?
2/ The examples of diff-in-diffs I have seen are all pre-treatment/post-treatment. Do you have any idea of how I could account for the periodicity of the treatment? It seems wrong to consider each month separately because of the autocorrelation.

Thank you very much for your help!

asked Jan 20 '12 at 22:38

Sorry to ask another question, when you say you were thinking of aggregate sales data, do you mean aggregate by day or aggregate by a longer period? If you had multiple days of sales data by store, that is a repeated measures design, which is completely "do-able". What software do you have available for analysis? – Michelle Jan 21 '12 at 9:11

I did mean aggregate by day. Repeated measures design is doable, but it means abandoning the data for the untreated city, hence not being able to control for other underlying market conditions and seasonality of sales which would impact simultaneously both cities. My preferred software is R, but I can use something else if needed. – René R Jan 21 '12 at 9:22

Let $y_$ be the outcome in city c at time t, $x_$ be a city-specific variable at time t, $z_t$ be a non-city-specific variable at time t (e.g. day of the week), and $T_$ be the treatment indicator (e.g. boolean). For cities A and B, $$\beginy_ &= \alpha_a + \beta_a x_ + \gamma_a z_t + \delta T_ + \epsilon_ \\ y_ &= \alpha_b + \beta_a x_ + \gamma_a z_t + \delta T_ + \epsilon_ \end$$ Assume that city A was treated, while city B was not. Let's take the difference: $$\begin y_ - y_ = (\alpha_a - \alpha_b) + \beta_a x_ + (- \beta_b) x_ + (\gamma_a - \gamma_b) z_t + \delta T_ + (\epsilon_ - \epsilon_) \end$$ What's the magic of difference-in-differences?

  • Suppose that $x$ is unobservable. If $\beta_a x_ = \beta_b x_$, for example, if the cities have the same value of the covariate and the same impact of that covariate, then this unobserved factor cancels out.
  • Suppose that $z$ is unobservable. If $\gamma_a = \gamma_b$, this term drops out.

These features are why we are interested in finding comparable cities---unobservable components are assumed to be the same across cities and they cancel out.

(Note: this is a single difference. Where's the other difference? We care about $\delta$, which is the difference between non-treatment and treatment periods.)

Now, we are left to figure out what to do with the error terms. If we think that $y$ has a unit root, we should use $\Delta y_$ rather than $y_$ from the beginning. Do a Dickey-Fuller or KPSS test on each city's $y$ series to find out.

Otherwise, we should refer to this paper: How Much Should We Trust Differences-in-Differences Estimates? by Bertrand, Duflo, and Mullainathan. Collapsing your data into a before-and-after way actually helps to alleviate the autocorrelation, the opposite of your intuition. Better ways involve the bootstrap or clustered standard errors.

answered Jan 31 '12 at 15:28

Thanks a lot, this is very helpful! – René R Jan 31 '12 at 19:20

Caveat 1: I have just starting getting into mixed effects models, so I am a little worried I am over-finding applications for this general type of method. I'm familiar with the nlme and lme4 packages, and there could be a better package than those that I am unfamiliar with. Caveat 2: I don't have much call in my work for fitting nested models, so I welcome a more experienced person to check my suggestion below and provide feedback as I can learn from that as well.

I was thinking along the lines of trying a linear mixed methods approach. You'll need to check the normality assumptions for your sales data, but these methods mean you could fit store as a random effect, which would take account of the repeated measures aspect of your data.

As you have a nested design, perhaps a model like this might work:

This model suggests fitting store as a random effect grouped within city (the last term).

Have a look at the lme4 package in R, and at the lmer option. I'm not sure if all the stores have the same number of opening hours, and if some stores are open longer than others, that may be another variable to fit.