holy shit this is bad. from the first article, here's an email he sent to one of his grad students. it's... basically a how-to manual on p-hacking. you couldn't make up a more damning description of p-hacking if you tried. (if you don't know what p-hacking is, it's basically a variety of ways to dishonestly pick and choose your data or what bits of your data you look at until you find a result that "looks statistically significant" but is actually garbage. I'll get into the details below breaking down this email.)
The text from the image is in alt text but I will also break it all down piece by piece below.
Hi Ozge,
Glad you had a chance to take an initial look at the data.
I don't think I've ever done an interesting study where the data "came out" the first time I looked at it. The interesting stories come from seeing when things like the 1/2 price buffet -- works and when it doesn't.
I— Ok. Data "not coming out" at first is not a thing. Fiddling around with data until it "comes out" is the definition of p-hacking. He's just admitted he's in this for the interesting stories, not the science, and that he's done this in every single study he's done.
I would like you to really dig into this to find a number of situations or people for which this relationship does hold -- that is where the 1/2 price buffet did result in a difference.
Method #1 of p-hacking: Pick and choose which conditions you look at.
In most non-medical studies, a result is considered "statistically significant" if it would happen less than 5% of the time by chance alone; that is, p < 0.05. Now, 5% is a 1 in 20 chance. If you have 20 different conditions going, chances are about 1 of them will appear to have significant results with p < 0.05 by pure chance. This xkcd explains it wonderfully: If you look for a link between jellybeans and acne, and don't find one, but then you break it down into the 20 different colors of jellybean and find a p < 0.05 value for a single color, of course it doesn't mean that green jellybeans cause acne. Wansink is explicitly telling his student to "find a number of situations or people for which this relationship does hold"— aka, find the handful of colors of jellybean that you can pretend have a significant correlation with acne.
Here's some things to do.
First, look to see if there are weird outliers (in terms of how much they ate). If there seems to be a reason they are different, pull them out but specially note why you did so, so that this can be described in the method.
Method #2 of p-hacking: Selectively remove outliers.
Outliers need to be removed all the time. But ideally, outliers should be removed based on a predetermined method of deciding what counts as an outlier that is then applied equally across all the data. Based on the fact he is telling her to pull out outliers until the data "comes out", that's not what's happening here. If you decide your cutoff for outliers is justttt enough to catch those people that are swinging your result away from what you hypothesized but not enough to catch those pulling it the other direction, that's p-hacking.
Second, think of all the different ways you can cut the data and analyze subsets of it to see when this relationship holds. For instance, if it works on men but not women, we have a moderator. Here are some groups you'll want to break out separately:
Males
Females
Lunch goers
Dinner goers
People sitting alone
People eating with groups of 2
People eating in groups of 2+
People who order alcohol
People who order soft drinks
People who sit close to buffet
People who sit far away
and so on…
Remember that whole "break your data up into 20 different conditions so that one of them will look significant by chance" thing? Yup, that was the how-to manual.
Third, look at a bunch of different DVs. These might include
# pieces of pizza
# trips
Fill level of plate
Did they get dessert
Did they order a drink
and so on…
Method #3 of p-hacking: Pick and choose your dependent variables.
Dependent variables (DVs) are the variables you're measuring, to see whether or not they are affected by your independent variables. This is basically the same thing as method #1— method #1 was just about picking and choosing independent variables and/or moderators, and this one is about picking and choosing dependent variables.
This is really important to try and find as many things here as possible before you come. First, it will make a good impression on people and helps you stand out a bit. Second, it would be the highest likelihood of you getting something publishable out of your visit.
Translation: Do this if you want the job. Do this if you want to get published. I hold your career in my hands and if I tell you to commit academic fraud, you will.
Work hard, squeeze some blood out of this rock, and we'll see
you soon.
Best,
Brian
Translation: I am fully aware that these data show nothing. I am openly acknowledging it's like squeezing blood from a stone, an expression generally used to express something is impossible. Because I know this is impossible to do without being completely fraudulent in the way I represent these data. I am a shameless liar.
These days, there's more oversight and awareness of p-hacking— most reputable journals make you pre-register your hypotheses and method. Fortunately, someone probably couldn't get away with this now. But this norm of pre-registration is still very new (as in, within the last decade) and I expect we'll be seeing older studies exposed for shoddy science for quite a while.