What's wrong with Science?

This essay grew out of a decade of losing illusions about how natural science works in practice through my experiences as a physicist researching in the field of Ultrarelativistic Heavy-Ion Collisions (URHIC). My observations hence directly concern this field and I have much more limited insights into how e.g. particle physics or solid state physics would work in practice, and even less how things are done in science outside physics. However, I will state some observations pertinent to such generalizations later.

I can give names, dates and places for the events described in this essay, however I will not do so since I'm writing about people I know and respect and I have no wish to publicly put them on display - as I will argue, I largely don't see problems as the personal fault of the involved people.

As far as conspiracy theories go, I would like to stress that by the measures usually used, I'm not a fringe scientist. I've received fellowships by the Marie Curie program of the European Union, the German Humboldt foundation and the Finnish Academy, given plenary talks on the major conferences in my field and written over a hundred of well-cited papers. I'm writing this not as a crackpot opinion on how science should be and why it should value my pet theories higher but out of a genuine concern with the way I see things developing.

What's the point of science?

In the following, I will use 'science' short for 'natural science' - I know too little about the humanities to really form an educated opinion. The point of doing science is then to find out about nature. For this end, the scientific method is applied.

It is surprisingly difficult to define in a comprehensive way what the scientific method actually is - in such attempts, often themes like repeatability of experiments, the ability to falsify hypotheses or consistency of explanations across many different phenomena are cited.

However, here we need not be concerned with the details of a definition satisfying a philosopher - the main aim of the scientific method has been succinctly summarized by Richard Feynman - it minimizes the chance that we fool ourselves while finding out about nature. What we in the end establish as results should as much as possible driven by how nature actually is and as little as possible by our preconceptions, expectations or other biases on how nature should be.

The theme of this essay then is that I increasingly see that this is no longer true with the way science is nowadays done in practice. Rather than designed to not fool ourselves and learn about how nature is, science seems increasingly concerned with producing a never-ending string of ever more spectacular stories about how nature might be with the careful set of checks and balances designed to sort true from false slowly eroding.

In the following, I will try to show this from examples as well as suggest explanations as to the why.

Wanting to find out is unnatural

Let me start with the perhaps surprising observation that in my experience only a minority of scientists is genuinely interested in finding things out. This however is a rather natural tendency of the human mind: We much prefer to find out that we have been right than that we have been wrong. This is especially true for a theorist who has perhaps invested years of work into developing and testing a particular model - finding out that this model is not representing nature is first and foremost a painful process. Hence what most theorists actually want is to confirm that their model works, not to find out no matter what comes out.

In an ideal world, the scientific community would compensate for this tendency by giving incentives for proving one's own models wrong - after all, we learn more from the fact that a model cannot be made to work even with trying hard than we learn from a model that is only ever shown where it works and the crucial tests are left undone. Scientists should be respected for disproving their own ideas in the face of data, doing so should help them in their career, in attracting attention and funding.

The opposite is true in practice - models which do not describe data are quickly cast away as doesn't work and the credit, funding and attention goes largely to the remaining models. It takes hence an extraordinary amount of scientific integrity to not only critically probe one's own ideas till they break, but then to also make this public and risk the damage to one's career. Seeing a scientist admit in public that he was wrong and giving the reasons why is hence a rare event, and I have witnessed it less than a handful of times.

This would not be so bad if the checks and balances would work, i.e. if other scientists were then interested in disproving ideas, if they, for instance, would refuse to continue to show results of a particular computation unless crucial checks are provided. In particular, the experimental community should be the natural enemy of theorists, striving to confront lofty theoretical modeling with the hard experimental reality.

Some actually do - and even then, it's an uphill struggle. It is fairly easy to compute idealized theoretical quantities, however to find out how these relate to the actual (much more messy) experimental observables is less easy. Theorists hence have the option for an easy retreat - do a schematic calculation, if it works, fine, and if it doesn't, one can always claim that it would improve if it would take into account the real experimental conditions. One never actually needs to do this computation however. Often then, theoretical modeling becomes a moving target which can't be pinned down against data.

What I found more disturbing however is that many experimentalists take a postmodern attitude to theoretical modeling instead. One can hear sentences like It's not an experimentalists' job to judge the validity of theoretical modeling. The only standard of comparison between model and data then is visual - does the curve agree with the data? What is not asked are questions like: Is this a prediction or a postdiction (it's in general much easier to make a curve fit data points if one knows where the data points are beforehand)? Does the model describe only this observable, or is it consistent with other data? Does the model have an adjustible parameter or is it completely constrained before?

In this scenario, physics arguments made to distinguish between models magically transform into opinions of different theorists - energy conservation is then no longer a hard physics requirement, but somehow just the pet peeve of a particular theorist. I've seen a drastic example during a Quark Matter (that's the largest conference in the field) plenary talk when the speaker took apart one model after the other for the so-called ridge correlation. For some he showed that the underlying assumptions led to untenable consequences otherwise, for others that they had problems with causality - all a series of clear, simple and compelling physics arguments.

I expected that this would be the end of all of these models. I was wrong. In the course of the next days, all arguments changed into the opinion of the speaker - people weren't ready to acknowledge that he had shown these models to be untenable, they were merely ready to recognize that this was his opinion. Some of the models were used years later (with the same flaws). I wonder what the students present at the conference learned about how science is done from this event.

I suspect the prime motivation for the experimentalist attitude is that it is simple, while delving into the fine-print of theoretical modeling is not. Modern experiments are complicated, it takes a lot of time and energy to understand just how a detector works in detail, which leaves little time to commit to understand theoretical modeling. The situation is not better on the other side of the divide after all - precious few theorists can have an educated discussion about measurement techniques with experimental colleagues. Still - once the (rare) discussion between theorists about model validity are chalked up as incomprehensible, another set of checks and balances is gone. Add to that that we instinctively do not want to argue too harshly with colleagues and friends with whom we hang out on conferences, and you might see why it's tempting to accept different opinions rather than try hard to find out whic opinion is justified and which is not.

The Citation Game

Citations - how often a paper is mentioned in other papers - are one of the major currencies of science. Citations supposedly prove scientific relevance: An author mentions another paper because he judges it relevant for what he is doing, hence counting the number of citations is expected to correlate with the relevance of a work as judged by the whole scientific community, a kind of swarm intelligence measure.

Except - that's not quite how it actually works. First, one needs to be aware that the author has limited control over the citations he makes in his paper - in particular, he often does not get the chance to not cite a paper because he feels it is not relevant. How so? Because the paper goes through a peer review process before being published (more on that later) during which the referee can request that relevant citations be added (the referee at this stage can also request that citations to his own papers are added - naturally he would think of them as relevant, so this isn't a bad thing per se). For an author, defending the position that a particular paper is not relevant is close to impossible - if he would try, the editor would inevitably side with the referee who judges it relevant, and usual the policy is that adding references is safer than removing them. But the outcome of this process is that a paper does not only cite work the authors consider relevant but also what the referee(s) and the editor consider relevant.

Second, equating a number of citations with scientific relevance assumes that papers are largely cited because they're particularly relevant. However, primarily in order to be cited, papers have to be known, and what is known is not always what is relevant or correct. On any day, easily three to five potentially interesting publications arrive at the preprint server. Even a cursory read through all of them would cost so much time that a scientist could no longer do research of his own - hence we do not know all papers which are published.

Which ones do we know then? Chiefly what is presented on the conferences and workshops we go, then what colleagues mention, then what we track ourselves - in other words, predominantly we know the work of people who are known in the community, get invitations to meetings and are part of our network. The same idea published by a well-known senior scientist would hence gather much more citations than if published by a student from a little known university, for the simple reason that everyone would know the first researcher, but hardly anyone the student who moreover lacks the travel funds to go everywhere. It is hence not true that citations only judge the relevance of an idea - they also measure how well known the author already is.

Third, people may actually cite works for other reasons than assigning relevance. Suppose A makes a bold (wrong) claim that everyone who uses technique X is wrong. If A is sufficiently well-known, everyone using X is forced to react - so in every subsequent publication they have to cite A only to refute the claim being made. Yet, a hundred such refutations and statements that A got it wrong still count for a hundred citations - while it is difficult to see how a wrong claim which forced countless others to refute it really advanced science in any way.

Thus, an efficient way to garner a large number of citations is to spot the precise moment when a new idea emerges in the field or a puzzling observation is made and to be the first to claim anything about it - it doesn't matter whether the claim is right or wrong, it just needs to be made fast - once other scientists become interested, they always have to cite the first claim - even if they judge it irrelevant - the referee process will see to this. If the turf can be taken quickly enough, success is near assured. A striking example of such turfing can be found in particle physics: when the OPERA experiment seemed to measure superluminal neutrinos, dozens of theoretical preprints were submitted during the first 24 hours of the report. Surely doing thorough science within such a small time period is difficult, but anyone publishing later had to cite the first-comers. Personally I doubt whether counting citations in such instances measures relevance at all.

Somewhat ironically, people who happen to have highly relevant ideas do not get all that much citations for their trouble - once an idea is so valuable that it ends up in textbooks, it becomes textbook knowledge which according to the standard conventions does not have to be cited.

The Practice of Peer Review

One other currency measuring the performance of a researcher is the count of publications in peer-reviewed journals. Peer review is what distinguish science from conspiracy theories and unappreciated revolutionary theories in the internet - an idea must convince one or more other scientists (the reviewers) and the editor of a journal (who usually is a scientist as well) - otherwise it won't be published. Thus, in theory any peer reviewed work that is published contains science that has been okayed by a number of other scientists already. The number of peer reviewed articles, as opposed to, say, the number of letters to the editor of the local newspaper, is hence considered a measure of actual scientific productivity rather than just talkativeness.

Let's leave the question of whether the number of articles is a good proxy for the content of the articles aside for later and focus on the review process itself (which is similarly used for allocating research grants, more on that later as well).

In the peer review process, the editor sends a manuscript to one or more reviewers, asking for recommendations on the manuscript which range from accept over accept provided a number of corrections is done to do not publish. The author of the manuscript is known to the reviewer, but the identity of the reviewer is hidden from the author (dependent on how large a field actually is, one can sometimes guess though).

One important characteristic in this is that the reviewer is not accountable. If he makes an error in verifying the conclusions of the paper, if he gives a recommendation without reading half the paper, if he criticizes points which have never been claimed in the manuscript - none of this will tarnish his reputation (as far as I know, the review process in Mathematics is not anonymous, and here the reviewer will share the success or failure of a manuscript if he makes an error). The worst that can happen is that the author asks for a new reviewer, arguing to the editor why the present review doesn't make any sense.

That however is difficult. The general idea is that the author has to deal with criticism, not request a different reviewer in case of a problem - the reviewer must almost be obviously unqualified to succeed in pleading with the editor. That in turn means that there is a relatively wide corridor which a malvolent reviewer can exploit to delay or stall the publication of a competing group, likewise he can use the recommendation to ask for some of his own work to be cited (which may, as explained above, not necessarily be malicious).

Things become even more murky when the reviewer is asked to judge how interesting the manuscript is for a general audience (this happens in the less specific journals which are supposed to cater for a general-physics or even general-science audience). While scientific correctness is a category which can in principle be settled with enough math or data, the question whether something is interesting for a larger audience is highly subjective. A reviewer doesn't even have to try to recommend not publishing a manuscript for such journals - he can simply claim it would not be interesting or relevant, and that's more or less it, it is basically impossible to appeal against such a judgement.

The opposite case is just as likely - a reviewer might decide to let a paper pass which does not meet the usual quality standards. Sometimes, e.g. because he knows the author personally, this might happen as he is consciously or unconsciously more lenient towards a friend. But in general, a sound motivation to let papers pass without much quality control is that it's far less work to do so than to recommend not to publish.

Reviewers aren't paid any money for their efforts, neither do they get much public credit. There's limited recognition in terms of 'outstanding reviewer awards' etc. by journals, but such prizes go to few people indeed. There's also no accountability - hence doing a good review is something a scientist does from time taken from other obligations without much compensation. Letting a paper pass is easy, it just takes a few friendly lines. Arguing that a paper should not be published (unless this is a judgement of how interesting a paper is, see above) will inevitably incur a response by the authors, leading to an argument which the editor needs to settle - solid evidence needs to be assembled, a good case written, and a few iterations of a changed manuscript coming back may take place. The workload is easily ten times more for rejecting a paper in these cases, thus when pressed for time, the temptation to simply let is pass is there.

It is finally worth mentioning that the reviewer for a journal gives a recommendation but that the editor makes the judgement based on the recommendation. An editor might decide to publish an article against the objection of a reviewer - in this case, a work that technically failed the peer review process may appear in a peer-reviewed journal without many people being aware of this.

A prime example of this are conference proceedings. Sometimes these appear in special issues of peer-reviewed journals with the conference organizers as guest editors (sometimes they also appear elsewhere). The proceeding manuscript are then sent to other conference participants for review.

What I did not know as a young postdoc is that one is not actually supposed to review them thoroughy. I wrote a half-page rejection, arguing that the model would be physically unsound - and the next thing I knew of the whole affair was a polite letter informing me of the decision of the editor to accept the manuscript. Of course, the author actually said the things in the manuscript in his talk, so for the proceedings to reflect the conference accurately it was necessary to include this, and everyone in the field knows that these are proceedings with a different review standard. Yet - for presentations for funding agencies, in CVs etc. such 'reviewed' proceedings often count as peer-reviewed publications, and the journals certainly send mails around upholding their high quality standards in public. Which makes me suspect that there are at least some people who are not fully aware of how 'peer review' in this context is really meant.

Prestigious and other journals

There is the notion in the scientific community that there are more and less prestigious journals for publications. At the top of the hierarchy are publications like 'Nature' and 'Science', 'Physical Review Letters' is considered better than the topical 'Physical Review' (mostly C in my field) which is about on par with European Journals like 'Nuclear Physics A' and from there it descends to lesser known publications. Thus, for reputation (which usually translates into funding), publishing in 'Science' contributes much more than in, say, 'Journal of Physics G'.

Now, to take the journal something is published in as proxy for the scientific value assumes that the standards during the review process correlate such that in highly prestigious journals, only work is published that is not only correct but also of the highest quality and of general interest, and hence most likely to make an impact.

Unfortunately, to gauge the impact, usually citation measures are used (which favour turfing, see above). One might also question in general whether the value of an article should not be judged by its content in the first place. As for the review process, it is probably true that some of the lesser-known journals have a very lenient attitude - simply to find enough authors. But at the level of leading topical journals, say Physical Review C, there are certainly many more authors than are published, and the review requests for an article go to the leading experts of the field, i.e. it is impossible to find more qualified reviewers. Where, then, does the additional prestige beyond that come from?

Supposedly, it comes from the additional requirement that results have to be not only correct but also of broad interest.

Now, here is the catch with this idea: While editors of special interest journals know quite well who is qualified to peer review a manuscript, that is less true for editors of general physics or general science publications, i.e. articles have an increasing tendency to not end up with reviewers intimately familiar with the topic and more with reviewers who judge based on a general impression and what they find interesting and appealing. Combine that with a stringent page limit for 'Letter' type media, and it becomes apparent that here is a mechanism which prefers the catchy story with a schematic model over a detailed systematic study of which the reviewer might find the details hard to understand.

Case in point: As far as I can judge for URHIC theory, the vast majority of theory publications appearing in 'Physical Review Letters' have been schematic models quickly illustrating an interesting idea - and most of them have either been disproven or found the predicted effect size dramatically reduced by detailed follow-up investigations which are then published in other journals.

A reader of 'Physical Review Letters' from a different field, i.e. the 'general physics audience' hence gets a completely wrong impression of what actually happens in URHIC physics - he gets to see the field as a parade of schematic ideas contradicting each other over time and never sees the results of the actual in-depth investigations.

There is a single paper in URHIC theory that has made it into 'Science' known to me - and I have yet to find the colleague who takes that seriously.

I simply can't see in practice that the ideas which turn out to be actually valuable in hindsight and the results which last would appear in journals which are considered prestigious - in my estimation, the opposite seems to be true.

Continue this essay with remarks about Funding

Back to main index     Back to essays

Created by Thorsten Renk 2015 - see the disclaimer and contact information .