# Abusing Chi Squared

Posted in math on Friday, September 02 2016

This is one of those questions I could probably answer with research, but I'm lazy so I am going to do simulations. Anyways, it came up in my life to check a data-set to see if the values are normally distributed. There are a couple of ways of doing this (I lean towards doing a KS test) but one that was recommended was to do a $\chi^2$ test. Of course the $\chi^2$ test typically requires the data to be in discrete bins, and this got me thinking: surely the test itself is highly dependent upon the bin size I choose so, presumably, I could fiddle with that variable to get whatever answer I wanted. Presumably.

This is not an answer I have an immediate response to. My old statistics book does do $\chi^2$ tests on already binned, normally distributed data with no discussion on how you arrive at those bins (most of the chapter on $\chi^2$ testing for goodness of fit is devoted to discrete distributions). A quick google search turned up a bunch of people arguing about what is the best method but I didn't immediately find an answer to my question per se.

I could, however, do some simulations and see what that looks like. It won't tell me what is optimal or best in some abstruse mathematical sense but it could give me a sense for what "works" (for some definition of "works"). Sort of analogously to back when I tried out different ways of parameter estimation to see whether the differences were really spectacular in a few particular cases.

### The Plan of Attack

I am going take a bunch of different number of bins, from 4 bins to 50 going up by 2, and for each bin I will

1. Randomly draw 100 data points $Y_{i} \sim N \left( \mu = 1, \sigma^2 = 4 \right)$,
2. Calculate $\hat{\mu} = \bar{Y}$ and $\hat{\sigma}^2 = \frac{1}{99} \sum_{i=1}^{100} \left( Y_i - \bar{Y} \right)^{2}$
3. Calculate the Pearson $c$ statistics relative to a normal distribution $N\left( \hat{\mu}, \hat{\sigma}^{2} \right)$.
4. Calculate the critical $\chi^{2}_{0.95, k-3}$ where k is the number of bins
5. Reject the hypothesis that the data is normally distributed when $c \ge \chi^{2}_{0.95, k-3}$

Then I repeat this procedure 10,000 times and add up how many times the null hypothesis got rejected. Basically I am counting how many times I commit a type I error.

### Running the Simulation and Collecting the Results

After the simulation I end up with a huge table of results I need to aggregate it in some meaningful way. I am going to make a few assumptions right off the bat.

1. For bin $k$ the number of rejections $X_{k} \sim B \left( n=10000, p \right)$
2. I can estimate the probability of rejection $\hat{p_{k}} = { x_{k} \over 10000 }$ and standard error $SE_{p} = \sqrt{ { \hat{p} ( 1 - \hat{p} ) \over 10000 }}$
3. Suppose n is large enough to approximate the distribution as normal, then the 95% confidence interval is $\hat{p} \pm 1.96 \cdot SE_{p}$

Of course one set of data points isn't super useful, so I ran the simulation again with 150 samples and 200 samples. These are plotted below.

Obviously I can't generalize a whole lot from this, since this is just one small set of simulations. From these results it appears that the actual rejection rate (which is a false rejection mind you, the data is normally distributed) shows a correlation up to some mid-range number of bins, then it flattens out around 10%. This is far from reassuring as it implies that, with a change in bin size, you can almost double the chance that you reject the null hypothesis (i.e. conclude your data is not-normal). Consider also that the data was deliberately sampled from a normal distribution, I am not considering at all the possibility that you could mistake your data for being normal simply because you binned it in a particular way.

I think this provides a good reason for using a KS-test (or something else), as it does not rely on binning to make its decision. There isn't that extra knob to twiddle until you get the results you want.