Picking the Right Fight
Today in Roni’s class we talked about picking fights. I think you’re within your rights to tackle your problem any way you want, but it’s an important battle to “pick the right fight”.
What I mean about “picking the right fight” in statistics, is to tread carefully when you make conclusions about the analysis. Often, we’re trying to show or prove or disprove or undermine or support some claim given some data about the claim. So let’s say we’ve performed some experiment and have two binary questions: Do we have enough data, and do we have low error?
-
Option 1: If you have tons of data and you have very low error, you should GO HOME. There’s nothing else to be done here. Finish up, because you aren’t going to come up with anything interesting any more. Honestly, what could be more perfect? This is like the ideal, frictionless scenario.
-
Option 2: If you have not a lot of data and you have very high error, you should GO HOME. There’s nothing which can be done here. It’s an abysmal situation. You can either continue to make poor predictions, or you could just start making shit up. Either way, you have to rethink your strategy. This is where picking the right fight comes in later.
-
Option 3: If you have a lot of data but you have very high error, you should CRY. You dun goofed. Either your data is totally irrelevant (you should do more statistical analysis), your algorithm is terrible (you should do more statistical analysis), or you did something terribly wrong. More statistical analysis wouldn’t hurt.
-
Option 4: If you don’t have a lot of data but you have very low error, you should be SKEPTICAL. It’s a fluke, do the test again. Don’t buy in to thinking that there’s a practically significant relationship between children’s shoe sizes and their IQ. You’re not thinking through this clearly enough and you’re making mistakes along the way.
So what option are you?
Personally, I like being put into situation #2. There is some sort of morbid pleasure in being able to wreak order out of chaos. But if you’re going to go down that path, you need to be aware that you can only take those risks by picking the right fight. How could we rectify the situation (pick a better fight) if we have too little data to speak of? How about bootstrapping more samples? How do we sample a more variable population if our current data is too homogenous? How about bootstrapping the residuals?
OK, maybe you can just solve all of your problems by bootstrapping. Thanks 402.