Valid Post-Selection Inference

In the classical theory of statistical inference, data is assumed to be generated from a known model, and the properties of the parameters in the model are of interest. In applications, however, it is often the case that the model that generates the data is unknown, and as a consequence a model is o...

Ausführliche Beschreibung

Gespeichert in:  
Bibliographische Detailangaben
1. VerfasserIn: Zhang, Kai (VerfasserIn)
Medienart: Elektronisch Buch
Sprache:Englisch
Veröffentlicht: 2012
In:Jahr: 2012
Online-Zugang: Volltext (kostenfrei)
Verfügbarkeit prüfen: HBZ Gateway

MARC

LEADER 00000nam a22000002 4500
001 1866324527
003 DE-627
005 20231019043656.0
007 cr uuu---uuuuu
008 231019s2012 xx |||||o 00| ||eng c
035 |a (DE-627)1866324527 
035 |a (DE-599)KXP1866324527 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 2,1  |2 ssgn 
100 1 |a Zhang, Kai  |e VerfasserIn  |4 aut 
245 1 0 |a Valid Post-Selection Inference 
264 1 |c 2012 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
520 |a In the classical theory of statistical inference, data is assumed to be generated from a known model, and the properties of the parameters in the model are of interest. In applications, however, it is often the case that the model that generates the data is unknown, and as a consequence a model is often chosen based on the data. In my dissertation research, we study how to achieve valid inference when the model or hypotheses are data-driven. We study three scenarios, which are summarized in the three chapters. In the first chapter, we study the common practice to perform data-driven variable selection and derive statistical inference from the resulting model. We find such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid post-selection inference by reducing the problem to one of simultaneous inference. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing simultaneity insurance for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always more precise than full Scheffé protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results. In the second chapter of this thesis, we propose a different approach to achieve valid post-selection inference which corresponds to the treatment of the design matrix predictors as random. Our methodology is based on two techniques, namely split samples and the bootstrap. Split-sample methodology generally involves dividing the observations randomly into two parts: one part for exploratory model building, a.k.a. the training set or planning sample, and the other part for confirmatory statistical inference, a.k.a. holdout set or analysis sample. We use a training sample only to seek a subset of predictors, and then perform the estimation and inference on the holdout set. As far as inference after selection in linear models is concerned, the main advantage of this technique is, roughly speaking, that it separates the data for exploratory analysis from the data for confirmatory analysis, thereby removing the contaminating effect of selection on inference. We show that the our procedure achieves valid inference asymptotically for any selection rule. The third part of the thesis is an application of the split samples method to an observational study on the effect of obstetric unit closures in Philadelphia. The splitting was successful twice over: (i) it successfully identified an interesting and moderately insensitive conclusion, (ii) by comparison of the planning and analysis samples, it is clearly seen to have avoided an exaggerated claim of insensitivity to unmeasured bias that might have occurred by focusing on the least sensitive of many findings. Under the assumption of no unmeasured confounding, we found strong evidence that obstetric unit closures caused birth injuries. We also showed this conclusion to be insensitive to bias from a moderate amount of unmeasured confounding 
856 4 0 |u https://core.ac.uk/download/76395774.pdf  |x Verlag  |z kostenfrei  |3 Volltext 
912 |a NOMM 
935 |a mkri 
951 |a BO 
ELC |a 1 
LOK |0 000 xxxxxcx a22 zn 4500 
LOK |0 001 4392970256 
LOK |0 003 DE-627 
LOK |0 004 1866324527 
LOK |0 005 20231019043656 
LOK |0 008 231019||||||||||||||||ger||||||| 
LOK |0 035   |a (DE-2619)CORE19102017 
LOK |0 040   |a DE-2619  |c DE-627  |d DE-2619 
LOK |0 092   |o n 
LOK |0 852   |a DE-2619 
LOK |0 852 1  |9 00 
LOK |0 935   |a core 
OAS |a 1 
ORI |a SA-MARC-krimdoka001.raw