Valid Post-Selection Inference

Zhang, Kai

Valid Post-Selection Inference

In the classical theory of statistical inference, data is assumed to be generated from a known model, and the properties of the parameters in the model are of interest. In applications, however, it is often the case that the model that generates the data is unknown, and as a consequence a model is o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. VerfasserIn:	Zhang, Kai (VerfasserIn)
Medienart:	Elektronisch Buch
Sprache:	Englisch
Veröffentlicht:	2012
In:	Jahr: 2012
Online-Zugang:	Volltext (kostenfrei)
Verfügbarkeit prüfen:	HBZ Gateway

MARC


LEADER	00000nam a22000002 4500
001	1866324527
003	DE-627
005	20231019043656.0
007	cr uuu---uuuuu
008	231019s2012 xx \|\|\|\|\|o 00\| \|\|eng c
035			\|a (DE-627)1866324527
035			\|a (DE-599)KXP1866324527
040			\|a DE-627 \|b ger \|c DE-627 \|e rda
041			\|a eng
084			\|a 2,1 \|2 ssgn
100	1		\|a Zhang, Kai \|e VerfasserIn \|4 aut
245	1	0	\|a Valid Post-Selection Inference
264		1	\|c 2012
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
520			\|a In the classical theory of statistical inference, data is assumed to be generated from a known model, and the properties of the parameters in the model are of interest. In applications, however, it is often the case that the model that generates the data is unknown, and as a consequence a model is often chosen based on the data. In my dissertation research, we study how to achieve valid inference when the model or hypotheses are data-driven. We study three scenarios, which are summarized in the three chapters. In the first chapter, we study the common practice to perform data-driven variable selection and derive statistical inference from the resulting model. We find such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid post-selection inference by reducing the problem to one of simultaneous inference. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing simultaneity insurance for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always more precise than full Scheffé protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results. In the second chapter of this thesis, we propose a different approach to achieve valid post-selection inference which corresponds to the treatment of the design matrix predictors as random. Our methodology is based on two techniques, namely split samples and the bootstrap. Split-sample methodology generally involves dividing the observations randomly into two parts: one part for exploratory model building, a.k.a. the training set or planning sample, and the other part for confirmatory statistical inference, a.k.a. holdout set or analysis sample. We use a training sample only to seek a subset of predictors, and then perform the estimation and inference on the holdout set. As far as inference after selection in linear models is concerned, the main advantage of this technique is, roughly speaking, that it separates the data for exploratory analysis from the data for confirmatory analysis, thereby removing the contaminating effect of selection on inference. We show that the our procedure achieves valid inference asymptotically for any selection rule. The third part of the thesis is an application of the split samples method to an observational study on the effect of obstetric unit closures in Philadelphia. The splitting was successful twice over: (i) it successfully identified an interesting and moderately insensitive conclusion, (ii) by comparison of the planning and analysis samples, it is clearly seen to have avoided an exaggerated claim of insensitivity to unmeasured bias that might have occurred by focusing on the least sensitive of many findings. Under the assumption of no unmeasured confounding, we found strong evidence that obstetric unit closures caused birth injuries. We also showed this conclusion to be insensitive to bias from a moderate amount of unmeasured confounding
856	4	0	\|u https://core.ac.uk/download/76395774.pdf \|x Verlag \|z kostenfrei \|3 Volltext
912			\|a NOMM
935			\|a mkri
951			\|a BO
ELC			\|a 1
LOK			\|0 000 xxxxxcx a22 zn 4500
LOK			\|0 001 4392970256
LOK			\|0 003 DE-627
LOK			\|0 004 1866324527
LOK			\|0 005 20231019043656
LOK			\|0 008 231019\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|ger\|\|\|\|\|\|\|
LOK			\|0 035 \|a (DE-2619)CORE19102017
LOK			\|0 040 \|a DE-2619 \|c DE-627 \|d DE-2619
LOK			\|0 092 \|o n
LOK			\|0 852 \|a DE-2619
LOK			\|0 852 1 \|9 00
LOK			\|0 935 \|a core
OAS			\|a 1
ORI			\|a SA-MARC-krimdoka001.raw