Evidence of the future affecting the present: discussions on what constitutes scientific evidence

11/2010

Here’s a very cool study: wackily deciding to reverse the normal order of some standard psychology experiments, a researcher has found evidence in eight experiments of future actions or events influencing behavior.

For example, people were shown a list of words, then tested on their recall of those words, and then asked to type words randomly chosen from the list. Words the subjects typed were more likely to have been remembered (not dramatically, but at a statistically significant level).

In another experiment, people had to choose whether a picture was hidden behind the screen image of a left or right curtain. Sometimes the pictures were erotic and sometimes they were not (these could be neutral, negative, positive, or romantic but nonerotic). While the correct curtain was chosen for each category of nonerotic picture at the expected chance level (50%), the correct curtain was chosen for the erotic pictures at a level significantly above chance (53.1% — statistically significant at a probability level of .01, ie 1%).

Interestingly, in initial trials, this result was only found for woman; men rose to the same level only after the erotic pictures were changed for ones that were more arousing. Moreover, when the participants were assessed for sensation seeking (a measure of your tendency to seek out stimulation), those who scored higher on this scale guessed the future position of the picture on 57.6% of the erotic trials. Their scores on nonerotic trials did not exceed chance level (49.9%).

Clearly, much of this hinges on the how the words/picture locations were randomly chosen, but the researcher goes into the randomization procedure in some detail, and the report of this study (which is being published in a very reputable journal) assures us that it has been peer-reviewed up the wazoo. No one can find anything wrong with the procedure (although they all believe there must be something).

So is precognition ‘true’? That is, can we now authoritatively say psychologists have proven that precognition exists? Assuredly not, which leads me to a discussion I’ve been meaning to have for years. What constitutes proof? How much weight can we put on research results?

I’ve been reporting on memory research for ten years, and this issue has always been at the back of my mind. Do my readers understand these questions? Do they have the background and training to give the proper amount of weight to these particular research findings? I put in hints and code words (“pilot study”; “this study confirms”; “adds to the evidence”; “conclusive”; and so on), but are these enough?

So here is the article I’ve always meant to write.

First of all: proof. I never talk about proof. “Proof”, in the colloquial sense of absolute certainty, is not something scientists are ever comfortable about claiming. All we can do is weigh the evidence.

Now, weighing the evidence is what it’s all about, and this is something that has become progressively harder in all scientific fields as we delve into the detail. Like quantum physics, like genetics, like medicine, modern psychology is usually about statistical inference. It’s hard to have a situation where everything is so clear-cut that we can point to one group of people that all did something and another group of identical people that all did something completely different, and there is only one point of difference between them that we can pounce on with joy and say: this is it, the smoking gun. This is what has this effect.

People are variable. Rarely do you have an experimental intervention that is so dramatic that it has an absolutely clear effect that doesn’t need abstruse statistics to reveal. And the statistics have become progressively more abstruse. Today there are so many different, and complex, tests, each one appropriate for a specific situation, that no one knows them all. Scientists learn the few they are told are appropriate for the sort of experiments they run, and then try to keep up when they are told a new test is better — more discerning, more subtle, better able to sort the wheat from the chaff. Is it true? At the end of the day it’s a matter of faith; few researchers have the statistical background to really understand the statistics they’re using.

So that tempers how much faith we can put in statistical results.

But the main point is simply understanding that it is a matter of statistics. Research is all about significance — is this result showing a significant difference, or not. And significance is a statistical term with a very precise meaning. It means a statistical test has been passed, that as a matter of probability (5% is standard; 1% is great; 1/10% is absolutely terrific), the experimental result is unlikely to have occurred by chance. That is, in the case of the standard 5%, the difference between experimental groups is only likely to have occurred as a matter of random chance one out of twenty times.

In other words: it could have occurred by chance.

That is why replication is so important.

When I report on a research study, I do so on the basis that it is interesting. That it is part of a body of research, or that it may become part of a body of research.

On its own, no experimental result is proof of anything.

So the important thing is building up experiments, preferably by different researchers, on the same question. We want replication, which is repeating the experiment the exact same way, and we want broad and fine differences in the experimental procedure. And we want different approaches that connect the results to a broader picture.

It’s all about consistency.

Conspiracy theorists can rant against the scientific establishment, and claim that it ignores findings that don’t fit into the established beliefs, but the issue is rather more subtle. No one is more excited than a scientist by a truly new finding, but the less it is consistent with all the other evidence, the greater the evidence must be.

Repeat after me: no single study is proof. Ever. Of anything.

Because scientists make mistakes. Because scientists are human and to be human is to see the world through our minds, not our eyes. Because physical objects (eg, cellular material) can become contaminated; because human subjects are influenced by far too many factors to list, including the experimenter’s beliefs. And because, at the end of the day, results are a matter of statistical probability.

So, we have to weigh the evidence. We weigh it on the basis of numbers of subjects (was it a pilot study, a large study, a very large study — the greater the number of experimental subjects, the less likely it is that the difference occurred by chance), on the basis of type of study (e.g., was it an experimental intervention or a population-based epidemiological study), on the basis of how well the experimenters designed the study, on the statistical significance (is the probability that this result occurred by chance five in a hundred, or one in a thousand, or one in ten thousand?).

And, most of all, we weigh it on the basis of how many studies are all saying the same thing, and how well different results are all chiming in to tell a consistent story. So, if we’re wondering if blueberries really are good for the brain, we look at animal studies and human studies and cell studies. Human studies are important because, at the end of the day, we have to confirm these findings in our own species. But we can’t control all the variables with humans as we can with captive animals, so animal studies are needed to construct the procedurally tight experiments we need to truly compare the effects of , say, a daily dose of blueberries. And cell studies are important to tell us why blueberries might have this effect.

If we can point to a specific effect in the cells that could have the sort of cognitive effect we have observed, then we have a much stronger basis for believing in the effect.

We also weigh it in the knowledge that this is consistent with a much larger body of research looking at the effects of fruit and vegetables and their constituents.

As lay people trying to weigh the evidence (and given the extreme specialization needed now in all the sciences, everyone is a ‘lay person’ in most areas), we also need to realize that different standards are necessary for different results.

I’ve been eating blueberries (or boysenberries or blackberries) every day for years, since I saw the first reports that blueberries were good for the aging brain. Why not? I like them; they fit into my diet (I have them in a smoothie either for breakfast or lunch); they are very unlikely to do me any harm.

My standard for taking a drug would be WAY higher (which is why I heartily recommend a recent article in The Atlantic — “Lies, Damned Lies, and Medical Science”).

When deciding whether to act on research findings, you need to weigh the costs and benefits. You also should be making different decisions depending on whether you are making the decision for an individual or a group. Experimental results are always only pointers at an individual level. Group differences, I say again, are statistical. That means, some individuals will react one way, and some another. No research result will tell you whether something is true for an individual (witness those people who smoke heavily for decades and live till 90 — but the odds are heavily against you).

So how can we now answer the question with which I began this article? Well, the precognition experiments are absolutely fascinating, and it would be wonderful if they pan out, but until the study is replicated, intriguing is all it is. And because the findings are so inconsistent with so much of our understanding, we need a number of studies, both exact replications (done by other researchers) and variants that test the conditions and boundaries of the putative phenomenon.

But we should also note that the study involved 1000 participants, eight different procedures, straightforward statistical analysis, and rigorous peer-reviewing. Moreover, the gender and personality differences substantially strengthen the case. And a possible mechanism (albeit one little understood) in the form of quantum entanglement has been postulated. It’s certainly earned the right to be taken seriously, and I look forward to seeing more studies on this (and kudos to the researcher, Daryl Bem, for doing everything he can to ensure that other researchers can easily replicate his methods).

I hope that’s put my news reports in context, and perhaps helped you judge research findings in the wider media.

References: 

The precognition paper is available as a preprint at http://www.dbem.ws/FeelingFuture.pdf

Add new comment