OK, so you might think that
in this age of air travel, channel tunnels and the like, this is not the
cleverest of commercial developments, since there already exists a well-proven
non-invasive solution to the problem of sea-sickness: I mean, of course, not
travelling by sea. However, some of us, in this age of global warming, are
preparing for the era of rising sea-levels and, having noticed that the typical
altitude of the world's airport is pretty pathetic, are drawing our own
conclusions. And just imagine what will happen to the Chunnel once the
sea-level reaches the height of Shakespeare Cliffs. We are preparing for the
day when the capital of
But along with perspicacity, foresight and self-effacing modesty, I count philanthropy as one of my many virtues and so I am prepared to give the world details not only of my cure for sea-sickness but also of my method of proving that it works.
The cure is salt and water. The proof of the efficacy will follow in due course but first note the cunning convenience of the solution (H2O, Na+, Cl-). The cure is none-other than sea-water itself and thus the more that global warming leads to rising sea-levels and hence to the spread of sea-travel with the attendant increase in the potential for that human misery known as mal-de-mer, the more the prophylactic material will be to hand. It is true that the general increase in sea-level will be due to the melting of the Antarctic and Greenland ice-caps and thus an increase in sea-level will be accompanied by a general dilution of the active ingredient (sodium chloride) but the rough calculations that I have performed have shown that this effect may be ignored as unimportant.
The efficacy of
saltwater as a cure for seasickness has been established in a fully randomised
double blind parallel group trial in a single centre (Elixir Laboratories of
Pannostrum Pharmaceuticals) using cups of tea (English Breakfast at a strength
of three bags per pot) as a control. A cross-over trial from
I am pleased to announce that the results were overwhelmingly in favour of saltwater, a highly significant difference being found.
Carping critics (and I note in passing, that the carp is a freshwater fish) have complained that the result is spurious and entirely due to the use of a baseline taken after treatment. Indeed, they claim that the "benefit" due to sea-water arises simply because using differences from "baseline" reflect in inverted form the emetic effect of saltwater as captured by the baselines rather than its protective effect as expressed in the outcomes.
If this is the sole criticism these small-fry can produce then, as a dab-hand at dealing with gripes and obloquy, I leave them to flounder in the sea of their own perverted logic.
It is true that the baseline was taken just before provocation and some time after treatment. It is true that at this point there was a considerable difference between the two groups. It is true that the saltwater group had much higher NAUSE scores that the tea groups. Many of them were indeed puking their guts out before the provocation. But I have two unanswerable replies to the criticism. First, you can always correct baseline imbalance by subtracting the baselines. Everyone knows this. Secondly, if you look at provocation trials in general you will see that this approach is frequently used: with glucose provocations in diabetes, exercise tests in angina, or methacholine, histamine or allergen challenges in asthma. In such trials the outcome is invariably referred to (or else titrated with reference to) a "baseline" taken before provocation but after treatment. Percentage drop from "baseline" in FEV1 is the standard measure in provocation trials in asthma, for example. Indeed, I am not even the first to use this general approach in a trial of treatments to prevent nausea.
Clearly we cannot change an approach that is already a standard because then our results would no longer be comparable to those of others. After all, it is more important for physicians to use procedures that are consistent with those used by other physicians than to use procedures that correctly measure effects. Consistency takes precedence over veracity.
In short, any statistical technique that has won the hearts of so many physicians can't be wrong.
I am a physician who has worked happily in the pharmaceutical industry for many years developing (well, to be honest, trying to develop) drugs for hypertension. Now my life is being made a misery by the statisticians with whom I work. They keep on giving me conflicting advice.
There are three in
particular with whom I deal regularly. Last year I consulted them on the best
way to summarise the change from baseline results (baseline minus current) in
blood pressure for a series if open trials which we had planned for various potential
products. The first statistician I consulted,
All this conflicting advice left me very confused. I then tried out various plans. First I tried a two-stage plan, plan B. (I haven't spent all these years in drug development. I know by now that the other plan is usually best.). Plan B was as follows. First I established the value that Norman, Nick and Robin recommended. Secondly I used the mode of the three. After a while, however, I noticed that I always ended up using Robin's estimate. This seemed rather unsatisfactory so I consulted the three again. Without telling them the reason for my asking, I told them that I had three possible estimates of a treatment effect and wanted to combine them. What should I do.
Robin told me that I should
always take the highest of the three estimates, whereas Nick said that I should
always take the median and
I now resolved on Plan A, a three-stage procedure. Stage 1: calculate the three statistics that each originally recommended. Stage 2 combine these three using the three rules suggested by Norman, Nick and Robin. (I now noticed that by the end of stage 2 Robin's rule always produced the same answer as Nick's and although Norman's second stage answer hardly ever agreed with Robin's and Nick's his first stage answer often agreed with their second stage result.) Stage 3. I then agreed to use the answer that Robin's method and Nick's approach agreed on.
I then came upon an article which suggested that it was always important to establish the asymptotic properties of any statistic and so decided to implement an infinite stage estimation procedure (Plan C) using all three rules. Much to my relief I found that as the iterations increased, Norman's estimate converged on Nick's and Robin's and indeed, this limit was simply the first stage (and every subsequent) stage answer of Robin. Thus, having investigated the thoroughly I carried on with a clear conscience always using whichever of mean and median turned out higher.
Unfortunately, some months
later I happened to mention what I was now doing to Norman and Nick. They were
both horrified. They said that although they hardly ever agreed with each
other, they both agreed that what Robin was doing was clearly biased and quite
unacceptable. They advised me that I must make sure that I NEVER used Robin's
statistic. This left me with no choice but to fall back on plan Z. I now
Can you help?
What a cock and bull story. I don't believe a word of it. I don't even believe that you are a physician (although it is true that you appear to like open trials and change from baseline). You are clearly one of these damn Bayesian troublemakers and have nothing better to do than to embarrass ageing frequentists by asking awkward questions. And, by the way, when did you ever find two physicians who agreed on a diagnosis?
In my time I have been accused (even by myself) of being a frequentist but the accusation is the result of a misunderstanding. People have assumed that they can identify what I am for on the basis of what I am against. In fact, as a matter of principle, I am more or less against everything, so although I am a non-Bayesian, I am also a non-frequentist. The practicalities of life, however, force me to sail under various flags of convenience from time to time but to ask, "what is McPearson's philosophy of statistics?" is to make the same mistake as to ask, "what was the religion of the Vicar of Bray?". The answer to the latter question is, of course, that although he professed many, in truth he owned none. (The aforementioned Vicar of Bray is not be confused with that minister of Tunbridge Wells, Thomas Bayes, a man who, as far as I can tell, was true to an old faith and quite unaware that he was founding a new religion.)
I had assumed until recently, however, that I would never actually be required to do any Bayesian statistics and could continue to earn my living producing frequentist fables. I was aware, of course, of the DeFinetti-Lindley limit ("we shall all be Bayesians bu the year 2020" Theory of Probability 1 p ix) but had assumed that, since I shall (in all probability) either be retired or dead by then, it was of no practical consequence for your truly. Recently, however, I was idly looking through my copy of De Finetti when I came across the following: " A probabilistic explanation of the diffusion of heat must take into account the fact that heat could accidentally move from a cold body to a warmer one..water being frozen rather than boiled when put on the stove." (ibid p214)
I suddenly realised, with a jolt, that De Finetti (or Lindley for him) had implicitly put a probability of 1 on the 2020 prediction despite, in principle, believing that anything was possible (even heating water to make ice). Now since the probability of the remaining unconverted statisticians in the world (even if there is only one) being converted exactly at midnight on 31 December, 2019 must be infinitesimal, the only coherent conclusion possible is that De Finetti believed almost surely that the conversion would be completed before 2020. Perhaps his median prevision date was, in fact, very much earlier: in which case I might be in danger of being converted before I retire.
Worried by this threat to the McPearson philosophical inertia I started on one of those introspections of internal coherence so beloved of the modern Bayesian. I soon came to the disturbing conclusion that I myself was exhibiting symptoms of incipient Bayesianism: one of my legs was decidedly Bayesian and as for my posterior... This opened up the alarming prospect that at some stage in the future, when asked my advice on analysis, I might find myself suggesting a Bayesian approach with no prospect of actually being able to carry it out. (It is true that until recently this was exactly the position in which every Bayesian found himself but now I gather that an application of the long-run frequency properties of random numbers, which would have made De Finetti shudder, has via Gibbs sampling, solved all the problems.)
I immediately resolved, therefore, on a stringent and rigorous prgramme of education in practical Bayesianism: I would try an analysis of a simple problem and overcome in De Finetti's phrase (at least as translated by an English word-Smith) my, "reluctance to abandon the inveterate tendency of savages to objectivise and mythologise everything," (ibid p22). (Not to be confused with the, "inveterate tendency of Savage's to subjectivise and psychologise everything".)
In Guernsey Has a Go: Part II I shall probably tell you how I got on.
In part I, I told you my reasons for taking the momentous decision to undertake a Bayesian analysis. I decided to choose something simple and to try and previse, as the Bayesians would put it, the outcome of the 6th toss of a coin having tossed it 5 times.
Now I dimly remembered this sort of thing from my undergraduate days and, if I recall correctly, the trick was to assume a prior distribution for the probability, q, of obtaining a head and then to update this using Bayes theorem. I am not quite sure how the Bayesian conjugates the verb to previse but in this case he does it with a beta.
However, I had been much
impressed by an article I had read in which a prominent Bayesian had taken
frequentists to task for carrying out all their analyses in a sort of
"Greek hinterland". (I quote from memory.) What with
Now having no reason to otherwise, I decided to assign each of the 64 sequences a prior probability of 1/64 of occurring. Now, of course, You may think otherwise but that is Your business and not My concern. (I, as a Bayesian, have a tendency to capitalise pronouns but I don't care what You think. Strictly speaking, as a new convert to subjectivist philosophy, I don't even care whether you are a Bayesian. In fact it is a bit of mystery as to why we Bayesians want to convert anybody. But then "We" is in any case a meaningless concept. There is only I and I don't care whether this digression has confused You.) I then set about acquiring some experience with the coin. Now as De Finetti (vol 1 p141) points out, "experience, since experience is nothing more than the acquisition of further information - acts always and only in the way we have just described: suppressing the alternatives that turn out to be no longer possible..." (His italics)
Now of the 64 sequences, 32 end in a head. Therefore, before tossing the coin my prevision of the 6th toss was 32/64. I tossed the coin once and it came up heads. I thus immediately suppressed 32 alternative sequences beginning with a tail (which clearly hadn't occurred) leaving 32 beginning with a head of which 16 ended with a head. Thus my prevision for the 6th toss was now 16/32. (Of course, for a single toss the number of heads can only be 0 or 1 but THINK prevision is not prediction anymore than perversion is predilection.) I then tossed the coin and it came up heads. This immediately eliminated 16 sequences, leaving 16 beginning with 2 heads, 8 of which ended in a head. My prevision of the 6th toss was thus 8/16. I carried on like this, obtaining a head on each of the next three goes and amending my prevision to 4/8, 2/4 and 1/2 which is where I then was after the 5th toss having obtained 5 heads in a row.
Now this was not very encouraging. I didn't seem to be learning anything and yet the Bayesian approach, as all WE Bayesians know, provides the perfect solution to every problem. I couldn't see where I had gone wrong. It is true that as I started thinking about the problem, form time to time, My thoughts led me down byways which seemed helpful but I soon perceived that these were heretical, involving, as they did, meaningless speculations about the propensities of different coins and, heaven forbid, Greek hinterlands. On the other hand, I think My behaviour can be shown to have been perfectly coherent and as We Bayesians know, that is all that matters.
Next issue: How I went in
search of the birthplace of Alexander the Great and met the famous young lady
*There was a young lady of Thrace etc.
Shortly before the
nulliguernsian hiatus, so cruelly imposed by editorial policy upon the readers
of this journal, I described my first primitive attempts at applying Bayesian
analysis to a problem: that of tossing a coin. "What has
I was called in to help design a clinical trial by Dr Percy Vere, that well-known trialist. Of course I decided to start with the elicitation of priors and was astonished with the ease with which Dr Vere provided me with the necessary information. With benefit of hindsight I should have been highly suspicious but I innocently proceeded with the work. Some many months later, when the trial was over, I was recalled to analyse the data. This was, of course, simply a matter of using the likelihood and the prior to calculate a posterior and provide my client with the result. This I duly did. So far so simple; so far so obvious. It was here, however, that events took an unexpected turn.
"Very nice", said Dr Vere, "can you please now perform a meta-analysis using the data from my previous trial?" This flummoxed me, as I had mistakenly assumed that the trial I had worked on was the first trial in this area, but thinking quickly, I realised there was no particular problem. "That's unnecessary," I said, "because the prior with which you furnished me obviously took account of the results from the previous trial. Hence the posterior I have given you is the meta-analysis." This flummoxed him but I was not to be let off so lightly. "How can that be?," he replied, "Are you telling me that the prior with which I provided you was a valid summary of the results of the previous trial? If that were the case I would obviously be a natural statistical genius and wouldn't need you at all. I am surprised that you have the nerve to charge for your work. I can assure you, however, that what I gave you is a genuine prior and had nothing to do with the results from the previous trial."
Some of these remarks were rather surprising, if not downright peculiar, in particular the unjustified one about emolument, as my services were, in fact, being provided free of charge to Dr Vere, courtesy of my employers Pannostrum Pharmaceuticals. Nevertheless, at this point I began to appreciate that perhaps a rather fuller investigation of the problem was needed, as my client was not, in fact, coherent (as we Bayesians put it). I asked after the data from the previous trial and was informed that there was a statistical report available with an analysis by Professor Smith and his assistant. Now, I happen to know Smith and know him for a Bayesian. (No, he's not that Smith, nor that one, nor either of the other two.) It thus seemed to me highly likely that by picking up the report I should find the prior for the previous trial available, to which I then only needed to add the data from both trials. Alternatively, and perhaps even simpler, if the posterior were available, I could use that as my starting prior for the data from the latest trial. As it transpired, both were available and to my astonishment I discovered that the prior Dr Vere had given me was the same that he had given Smith. When I pointed this out to him he made unjustified sarcastic observations about statisticians expecting different answers to the same question. Waste not want not was his philosophy. He had assumed that what was good enough as a prior for Smith should be good enough for McPearson.
I realised that we had been at cross-purposes all the while but that the situation was rescuable. Adding the data from "my" trial to the under-Professor Smith's-supervision-calculated- posterior (to use a rather Teutonic construction) gave exactly the same result as adding the data from both trials to the "prior". We were home and dry.....or so I thought.
Dr Vere was delighted. "Excellent," he said, "can you also please include now the first trial in this series of three? The statistician consulting that year was Professor Jones. I can provide you with his assistant's report." Now I realised that we were in deep trouble. I also know Jones and know him for a frequentist. (No he's not that Jones, nor the other one.) It seemed to me highly unlikely that his report would contain a posterior, let alone a prior and so it turned out to be: data, descriptive statistics, any number of tedious laboratory shift tables, point estimates, confidence intervals and P-values (ugh!) but nary a posterior distribution in sight.
Now I was faced with a real dilemma. I could take the data from the three trials and add them to the prior for the second. (It may be that Vere tried to ignore data from all trials when determining his prior.) The danger in doing this would be if the prior for the second were, in fact, a posterior to the first. In that case I would be counting data from the first trial twice, a clearly inadmissible procedure. On the other hand I could ignore the data from the first trial altogether. (After all, the first time that Vere provided a prior he may have tried to accurately express his beliefs but then subsequently acted under the erroneous belief that this prior would do for every problem.) If, however, these data were not reflected in his prior then I should be ignoring relevant information and thus violating the principle of total information: a very serious Bayesian crime.
It was then, in a flash of inspiration, that I found salvation. As any student of De Finetti will know, everything is soluble given a wide enough resort to the device of specifying priors. (Puzzled as to which model to use? Just introduce a meta-model with priors over the class of models.) Faced with an uncertainty about the possible prior beliefs of my client, all I needed to do was to introduce my beliefs regarding his beliefs into the model. It is true that a sort of hybrid creature arises, a chimeric* posterior (my best bet about what his best bet ought to be), but who cares.
Thus liberated, all I had to do was introduce a prior probability of 1 that Vere was a complete idiot. (It's my prior and I can do what I like with it; I might hesitate to make a similar remark about my posterior.) This freed me to use an uninformative prior for the whole thing (after all, what do I know about medicine?), calculate a frequentist confidence interval using all three trials, palm if off on my client as a Bayesian credible interval and retire to the Cock and Bull for a well earned pint.
* Chimeric from chimera, an improbable creature with the prior of a lion, the likelihood of a goat and the posterior of a snake.
(See also Statistical Issues in Drug Development)
In my career as a medical statistician in drug development I never found anything quite as effective in winning disputes as the Finally Decisive Argument. For the benefit of readers of this journal, I illustrate its force with the example of that old chestnut, not to say canard or red herring (food for thought? appropriate for menu-driven drug development programmes?): type II and type III sums of squares.
The following is a Socratic dialogue between two statisticians, one of whom is of the McPearson school of statistics and one who is not.
Secundus : I see, Tertius,
that you have weighted all centres equally in your estimate of the treatment
effect. Why is that?
Tertius: It is because, Secundus, any other weighting would be entirely arbitrary.
Secundus: This does indeed appear to be an excellent reason Tertius. However, I am puzzled to understand one thing, and that is on what basis you chose the centres in your trial?
Tertius: Oh that is quite simple, Secundus. All the physicians concerned have good reputations and promised to deliver an adequate number of patients.
Secundus: These are indeed excellent reasons, oh Tertius. However, I cannot help noting that, although some physicians have indeed provided many patients, some seemed to have delivered very few patients at all.
Tertius: Indeed, some of the physicians have disappointed me, but when running trials in future I will not use them.
Secundus: A very wise precaution, Tertius, but it seems to imply that provided centres perform well, you do not mind which centres are in the trial.
Tertius: This is indeed true Secundus, the main thing is to have enough high quality data.
Secundus: I see. So that provided only that the centres delivered enough patients in total you would be indifferent as to whether the trial was based on say centres 1,3 and 7, or on centres 4, 5 and 8, or on centre 1,2, 8 and 9 or indeed on any set of centres.
Tertius: That is indeed so, Secundus.
Secundus: And suppose for argument's sake that centre 3 could give you all the patients you needed would you use it alone?
Tertius: (Smiling) Indeed I would Secundus. This would make life much simpler. Unfortunately, clinical trials don't usually work like that.
Secundus. What a pity. And if centre 4 could give you all the patients you needed would you be happy to use that?
Tertius: Of course, Secundus. The centre is unimportant.
Secundus: But this implies Tertius that you are quite happy to base your treatment estimate on centre 3 alone, if only it has enough patients and on centre 4 alone, if only it has enough patients and indeed on any centre at all, provided it has enough patients.
Tertius: (Impatiently.) Quite so. This is obvious.
Secundus: But then, your only preference amongst centres is based on the precision of the information which they provide, not on any peculiar feature of any given centre and, since you are otherwise indifferent between them, I fail to understand why you insist on weighting them equally and in an inefficient manner.
Tertius: I begin to understand your point, but what is the alternative?
Secundus: The alternative is to weight the centres in such a way that the precision of the treatment estimate is as high as possible.
Tertius: But does that not correspond to the Type II philosophy?
Secundus: It does indeed.
Tertius: Then I am sorry, Secundus, but you have been wasting my time. That is a dangerous heresy.
Secundus: Why so?
Tertius: Because the Finally Decisive Argument says so.
Secundus: In that case I do indeed apologise for having wasted your time, Tertius. The Finally Decisive Argument is transcendental in nature and cannot be defeated by mere logic.
How on earth did we ever manage without guidelines and standard operating procedures I wonder? It makes me blush when I look back and consider those innocent days at Pannostrum Pharmaceuticals, before we had the benefit of the Erewhon Statistical Guidelines. How did we survive, for example, before we knew that it was essential to keep a screened-patient log?
In those days, we thought the important thing about clinical trials was that you should report the results and characteristics of those patients you had actually experimented upon. It is true that we had already progressed from that pre-historic naiveté when we had thought that experimentation was defined by treatment, to a state of relative maturity where we realised that it was defined by randomisation (see McPearson, G., 'Early Days at Pannostrum: from "Per Protocol" to "Intention to Treat"', Journal of Statistical Whimsy, 6, 113-124) but it never occurred to us, I am ashamed to say, to record the patients we hadn't even included in the trial. What a blessing, therefore, that the Erewhon Guidelines have arrived in the nick of time to inform us that unless we record the demographic characteristics of the patients we didn't include we don't know how to generalise our results.
Suppose, for example, you, as a doctor, wish to know whether a particular beta-agonist will be of any use in treating a given severely asthmatic patient. This is the age of evidence based medicine so, of course, you do extensive background research on the marketed product. You discover, however, that all the trials which established its efficacy were run in either moderate or mild asthmatics. Clearly, then, you cannot generalise these results to your patient. However, on reading further you note that no severe asthmatics were deliberately excluded from the trials. They would certainly have been included, if only there had been any in the practices in which the trials were run, but as it turns out there weren't (rather as in the Flanders and Swann song about the rhinoceros who would use his horn for taking stones out of a horse's hoof if only he ever met such a quadruped thus distressed). The fact that such patients would not have been excluded makes all the difference, of course, and means that you can generalise the results with confidence.
It is not, however, the excellent inferential logic behind this requirement which I wish to praise, but the practical implications. The golden age of medical statistics is upon us. For how are we to decide whether we have screened a patient or not? We have to be extremely careful not to be narrow and arbitrary. Supposing a doctor is in the habit of taking all sorts of measurements on his patients which might be required for entry onto a clinical trial. He can see at a glance by looking at his case notes whether he can enter the patients in the trial or not. Is this not a screening? Suppose that the clinical research associates are in the habit of asking the doctors at various potential centres whether they have enough suitable patients with a view to excluding those centres who don't. Is this not also a screening? If we decide on economic or practical grounds to run the trial in some countries but not in others, is this not too a screening? If we run the trial today, rather than yesterday or tomorrow, have we not also indulged in screening?
The more I have thought through the implications of all this the more excited I have become. "Inspired", I think is the word. Indeed, I am now prepared to share with you McPearson's law of screening which goes:
Every patient who is not in your trial has been screened out of it.
And by every I mean every: not just the patients in the centres you chose who weren't included but also those in the centres you didn't include (physician-screening) as well as those in the countries you never considered (international screening) and in the eras you didn't study (temporal screening) and of course who refused consent (auto-screening). Furthermore, why distinguish between actual and potential patients? We always screen out those who aren't yet ill (well nearly always) from our clinical trials (health-status screening).
Now, I agree, that once the implications of McPearson's law of screening come to be appreciated, application of the Erewhon Statistical Guidelines is going to become rather difficult: several millions if not billions of patients will have to have their demographic characteristics presented for every clinical trial. But can this be bad for statisticians? Not at all. It all means more work and more work means more money. And of course, once the implications sink in, it also means that one will have to accept that no results can ever be generalised at all. (This does, it has to be confessed, undermine the inferential value of this device, but so what.) But this simply implies that all products will have to be permanently on clinical trials and this again means more work for statisticians. Hence, I confidently prophesy that the golden age of medical statistics is dawning. I may even have to consider changing my name to Guineas McPearson.
Of course, if one accepts that the logic of clinical trials is comparative and not representative, and if one believes that in any case generalisation has to do with that which one has specifically studied and that to which one wishes to generalise, then a log of patients screened is a complete irrelevancy. Nobody, however, could seriously maintain this position.
Next issue: How I meet Screening Lord Sutch and join the Monster Raving Loony Party.
It stands to reason, of course, that the treatment effect in a multi-centre trial must be the straightforward arithmetic average of the treatment effects from each centre. Anything else would be abhorrent and illogical and to be eschewed by all right- thinking persons working in drug development (as well as by all those right thinking persons in drug development who aren't working and, believe me, there are some of those too). There are two excellent reasons as to why the treatment effect must be defined in this way, the second of which, is even more excellent than the first. 1) The expected value of such an estimator does not depend on the number of patients you happen to have recruited to each centre. 2) A certain prominent regulatory authority requires it.
The reasons why the second of these arguments is the more excellent of the two are also twofold. 1) The first argument is only correct if you happen to condition on the centres you actually recruited. If you consider all the centres you might have recruited but didn't then the expectation does depend rather intimately on the number of patients recruited. On the other hand it only depends on whether a given centre contributed 0 or 1 patients on the one hand or 2 or more patients on the other (no treatment estimate possible unless you have at least one patient on each treatment) rather than, say, exactly how many patients it contributed. This means, of course, that although it is not a perfectly excellent argument it is nevertheless an excellent argument. 2) The second argument is, however, a perfectly excellent argument. I know this because I have never observed anybody prevail against it, whatever the context, in all the years I have heard it used at Pannostrum Pharmaceuticals and elsewhere in the industry. For this reason I refer to it as the Finally Decisive Argument.
However, observing the highest standards sometimes brings its penalties. I learned this lesson when I first started my work at Pannostrum Pharmaceutical's Elixir Laboratories. I was set to work on a project developing enteric coated suppositories for dysentery (Strombolite®), in which the trials had been plagued by drop-outs. The medical advisor on this project, Dr Durchfall, had developed a continuous outcome measure whose details need not concern us, except that you have my complete assurance that it had, of course, been completely validated. Now, TROT7 was a four centre trial of two treatments which planned to recruit 36 patients per centre. One of the centres just didn't perform at all, however, so you can imagine my relief when I discovered that the three other centres had recruited 52 patients each. The trial as originally planned would have had variance proportional to 4(1/36+1/36)/16 = 0.014 but now had variance proportional to 3(1/52 + 1/52)/9 = 0.013, which was slightly better even then planned. I broke the news of these calculations to Dr Durchfall.
"Great news, Guernsey my boy und now haf I got good news for you. Centre 4 haz recruited 8 patients avder all. A liddle late und not much but zen it is bedder zan nozing."
"But no, Dr Durchfall," I replied, "It is worse than nothing. Now we are really in deep ... I mean in trouble." I whipped out my pocket calculator (these were in the early days of these devices when they actually had no more keys than you knew how to use). "You see," I said "the variance is now proportional to:
(Well I didn't actually say all those fractions but you get the gist.) "Those extra eight patients have actually increased the variance by 70%."
"Oh no," said Dr Durchfall, "Zis means ve haf an underpowered trial. Ze beta vill be too high. Bud surely Guernsey zere is some mistake, how can more be vorse than less?"
"Well it wouldn't be," I said, "if we didn't have to weight the centres equally. But I have talked to (or, as they would say, with) our statisticians in the other place and they assure me that this a case where the Finally Decisive Argument applies."
"Vell, in zat case ve haf no choice," said Dr Durchfall making the sign against the evil eye.
However, thinking about it later I was able to come up with a new way of looking at the data which was extremely helpful. I put it to Dr Durchfall like this.
"You know how we always randomise in blocks of 4 in all of our two group parallel trials (although, of course, we never say so in the protocols because we don't want the investigator to guess the block size). Well of course, it is also generally accepted that "as was the randomisation so is the analysis". It seems to me that we should include the block in the model. A further argument is that one reason we don't use historical controls, even at the same centre, is that we know that recruitment is subject to time trends. Clearly patients could differ from block to block. Furthermore, centre 1 is a two-consultant centre we could have declared it as two centres if we wanted to. Perhaps we should really regard the blocks as pseudo-centres. It is also the case that if we remove the block effect from the model we are removing the centre effect since blocks are confounded with centres. Now it stands to reason I think, with all these arguments about the importance of blocks we should weight them equally. It would be absurd if we didn't. And by doing this we shall be able to claim we are treating the possibility of differences between centres very seriously indeed since not only are we allowing for differences between them but for differences within them."
Thus was the policy of Good Mixed Centre Practice (GMCP) introduced to Pannostrum Pharmaceuticals. Did it work? Yes, indeed. The efficiency of the trial was miraculously restored and the cure I had proposed worked a charm. I wish I could say the same for Strombolite® but I can't say its poor efficacy was entirely a surprise.
From the start I'd had a gut feeling to that effect.
Apparently a certain pharmaceutical company has now instituted a policy that no trial must have more than 80% power. Now read on...........
What a pleasure to see that that old custom has been revived of offering a libation to the gods: of making sure that part of every good thing is burnt up as an offering. Assuredly this must be a way of attracting luck and good fortune and, more important, of averting the calamity and disaster which follows hard upon the heels of success . The Greeks understood the necessity of this sort of thing well. Consider the story of Polycrates the tyrant of Samos. (That isle which is famed to PSI members, not only as being the birthplace of Pythagoras and Aristarchus but also a favourite holiday destination of all those bright young CRAs.) Polycrates was so fortunate in everything, that Amasis the King of Egypt advised him to avert disaster by parting with something dear to him. Polycrates took his advice and threw a prized ring into the sea but a few days later found it again in the stomach of a fish that had been served to him. It was clear to all now that he was doomed and sure enough not long afterwards he got at cross purposes with Oroetes who crucified him.
So there it is, drug development is a tricky business. Some use a rabbit's foot to bring luck, some throw away 36% of all just acceptable compounds. Hold on. Where did the 36% come from? Well don't forget the two trials rule. You have got to have significance twice. So if you use 80% power and the treatment under investigation just has the clinically relevant effect (and you have done your planning correctly) and you run two trials, the probability that both will be successful is 0.8 x 0.8 = 0.64. Hence the probability that at least one is unsuccessful is 1 - 0.64 = 0.36. (I apologise to the PSI membership and associate membership for going through the glaringly obvious. It is not to you that these calculations are addressed but to any non-statistician, say a manager, who might read these remarks.)
Now of course, there is an ethical argument in favour of not having too much power. For serious diseases it may be unethical, as it may be unacceptable for the patients to continue to be randomised to a treatment which is known to be inferior, although I sometimes wonder if this problem doesn't really require handling in a different way altogether and in any case, nobody develops drugs for serious illnesses: there is, after all, no money in that. Then, again, it is well known to Bayesians that there comes a point when it is not worth increasing the power of the test unless you also reduce the size of the test but this, of course, would reduce the proportion which the regulator sacrifices to the gods and regulators also need their rabbit's foot. (How else can we explain baseline testing?)
However, I think that this sacrifice business doesn't go far enough. The ancients understood this: food, possessions, animals are all very well but to really avert bad luck there is nothing like sacrificing people. Therefore, I have a modest proposal to make. That is that any manager who proposes that no trial should have more than 80% power should go without his annual bonus whenever one of the two pivotal trials in a drug development programme is significant and another is not significant. This should happen to 32% = (0.8 x 0.2 + 0.2 x 0.8) of all drugs having a treatment effect equal to the clinically relevant difference. Of course, we probably need to scale this by the number of compounds in development. Say that there are k due to report in a given year. The manager could lose 1/k times his bonus for every time this happens. And again perhaps he should have his baseline annual bonus slightly increased, say by 9.5% = 2(0.05 x 0.95) to account for the occasions where a useless drug produces the phenomenon. We should be able to save some money on managers' salaries using this scheme. And after all, in this era in which directors' pay rises faster than profits, every little bit helps.
This reminds me that a cynical spy informs me that I have it all wrong. The real reason for demanding that no trial should have more than 80% power is not to placate the fates but to save money. I dismiss this as a vile and vicious rumour. No person having achieved a position of prominence in drug development could seriously believe that insisting that no trial have more than 80% power was a rational policy. They would surely know that every case has to be reviewed on its merits and that only hard calculation and careful thought will indicate the correct course of action. After all it is unthinkable that a multi-billion pound business should have its fate determined by a slogan and a formula misunderstood and misapplied by rote.
I am relying on memory here, but I seem to recall a Roald Dahl story with this name, which followed the (mis)fortunes of a transatlantic passenger who made a large and unwise bet on the ship's crossing time and then made an even more unwise attempt to influence it. (It then turned into a dark tale of survival analysis and censored observations.) But more relevant to my theme is, in fact, a quotation from one of my favourite books, The Phantom Tollbooth. It occurs in that delightful chapter 'Unfortunate Conclusions' in which, you may recall, Milo and his companions, the watchdog, Tock and the Humbug, having jumped to Conclusions, an island in the Sea of Knowledge, find they have to swim back. Tock and Milo emerge drenched but not so the bug, for 'you can swim all day in the Sea of Knowledge and still come out completely dry. Most people do'.
Well, it seems to me that with the explosion in journals and databases, not to mention (but of course I will) the so called 'World Wide Web', we are currently swimming in data. This brings me to my theme. Which sided of the great divide are you on? Do you believe that meta is better or do you hold instead that pooling is fooling? Well, to nail my colours to the mast, I belong to the former school. It seems to me that there is no other topic in medical statistics, with the possible exceptions of cross-over trials, bioequivalence and n-of-1 studies, which has the same capacity as this one to rot the brains. Every time that another meta-analysis gets published in a medical journal, the editor feels it behoves him to commission some idiot to write a sanctimonious guest editorial or discussion piece which jaws on about publication bias, the dangers of pooling different studies, the benefit of judgement compared to calculation, or the importance of stratifying studies by baseline risk and so forth. (This latter is a vile habit I can scarcely bring myself to contemplate.)
For example, it never seems to occur to such persons that the publication bias of a meta-analysis arises through the bias in selecting the individual studies. It is hardly possible, therefore, to ascribe to the whole a sin which is not shared by the part. (Of course, for the pharmaceutical industry, as we know, the 'file drawer' is always empty, so that this is not a problem, is it?) In fact, nearly all the problems of meta-analysis are difficulties of individual trials too. For example, a study of meta-analyses showed that for various indications, the results of the largest trials were rather imperfectly predicted by the meta-analyses of the rest. This was quite enough to have several pundits mouthing off about inherent difficulties of pooling and so forth. What nobody seemed to want to do was ask how well the second largest trial on its own would have done as a predictor of the largest. But I am reminded of what WC Fields famously replied to the reporter who enquired what it was like growing old: "It's better than the alternative."
Then what about this business of not pooling different studies? What makes them different? The protocols, the populations? If we can't pool them, what specific feature of the trials do we use in coming to conclusions? It seems to me that people who object to pooling different studies but would quite happily accept any one of them on its own, if only it were large enough, for the purpose of informing medical decision making, should be given thinking lessons. Furthermore, the standard of information we require for individual published trial reports is, if this is true, grossly inadequate. If we really feel that we can make use of those special particular individual features of a given trial for deciding what to prescribe for future patients in different clinics, in different years in different continents, then we really ought to make a much better stab at describing these trials.
This is not to say, of course, that I like these meta-analysts. Far from it. On the whole they seem to me to be a repulsively charmless and messianic lot. (Not quite as repulsive as the pharmacoeconomists, it is true. Which reminds me that I have not yet given you a definition of this individual so now I will. Pharmaceoeconomist: one who when evaluating a treatment for dysentry, enquires after the price of toilet paper.) On the other hand, we should at least be grateful for one thing, the meta-analyst will, I hope, finally see off that even more odious and overpaid individual the medical expert as used in the hilarious expert reports which used to grace European submissions. Which reminds me, I haven't given you my definition of a medical expert either: one who sums up without bothering to add (except, of course, his fee).
Negatives have their commercial uses. For example, there is a cunning German advert for a lottery. Two individuals discuss a third. I translate. 'You mean to say that he has not bought a lottery ticket? How ridiculous. Then he has absolutely no chance of winning x million marks.' This is true, of course, but somewhat beside the point. Now I recently had cause to travel the London Underground where the following advertisement caught my eye. 'Nothing is proven to work better than hedakegone'. Of course, being the sort of sarcastic character I am, my skull-cinema, to use a phrase beloved of the late John Hillaby which I believe he adopted from the even later JB Priestley(or do I mean earlier), immediately played a scene in which GMcP rings up the person who wrote this inspired piece of advertising equivocation and says, 'Since nothing is proven to work better than hedakegone, I presume that I am better off taking nothing'. This, however, is not what the advert is meant to convey. You are meant to think, I imagine, that hedakegone is better than most alternatives and at least as good as anything else. At this point we can all permit ourselves a wry smile, since, as sophisticated, mature and intelligent statisticians (almost a tautology) we all know that equivalence cannot be claimed by default, but has to be proven. After all, in a one-gate slalom I might be able to ski as well as Tomba. (And in a one-gate slalom our Editor might even be able to ski as well as me, but I digress.)
Let us not be too smug, however. I recently read a paper on bio-equivalence trials which, despite much mathematical brilliance, was such statistical nonsense that it must have sent my blood pressure up 30 points. (I don't know why, but there is something about trials in which the same individual is treated more than once which encourages 'statisticians' to write nonsense. But I mustn't cross-over from the topic in hand.) It was shown how, at the expense of some mathematical manipulation, a test could be produced which had superior properties to the two one-sided tests at the 5% level which are now commonly used. It started from the observation that the two one-sided approach has an overall 'size' (type one error rate) of less than 5%. For very small sample sizes it can be much less than 5% and if the sample is small enough it can even be zero. (The two one-sided tests correspond to requiring that conventional 90% limits are between limits of equivalence. If the standard error is high enough, this can actually be impossible, since the confidence interval can be wider than the limit of equivalence. Hence, under such circumstances, the conventional test has zero size.)
Now I know that I have made the odd sarcastic jibe at Bayesians but this 'improved' procedure strikes me as frequentism gone mad. You and I might think that there are occasions where you might just accept the fact that your type I error rate is going to be less than 5%, but some Neyman-Pearson types just can't abide the thought of it. If the size is less than 0.05 it means that they have room to manoeuvre and can change the test to get more power. Having a type one error rate of say 3% is just going to keep them awake at night worrying about that lost power. (Which reminds me of a joke. Did you hear the one about the statistician who complained about his salary slip, which showed that he had only been paid for one day in twenty? His boss replied that this was the going rate for doing nothing.) So N-P addicts will 'improve' their equivalence procedure to recover the power.
Of course, as you get smaller and smaller sample sizes, the procedure is more and more like rolling an icosahedral die. (For the benefit of younger PSI members I should explain that that is a regular solid with twenty triangular faces.) But so what. What does the frequentist do when stuck for a solution?: (s)he tosses a coin. (And no doubt there are some Bayesians who think frequentists are a bunch of tossers.) Rolling a die, tossing a coin: it's all much more interesting than analysing data, as I am sure you will agree. Furthermore, supposing you know that your drug is really in-equivalent to the reference. What is your best bet of proving it is equivalent after all? Why it's simple. Not to collect any data at all. Just roll that icosahedral die. The type I error rate is 5% so who can complain? Of course you are likely to fail to prove equivalence but so what. If you do prove equivalence you can say 'in a most powerful test at the 5% level Conalol ( was proved to be equivalent to the reference product.' (Don't believe me? You try finding a data-less procedure with more power than my icosahedral die.) And that, as we all know, is scientific statistics and hence much more impressive than the sort of misleading rubbish that advertisers put out on the London Underground.
Pop Charts latest
This Week's Number one: 2 Become 1. Dice Girls.
COMFREY. An herb of Saturn,
and I suppose under the sign of Capricorn, Culpepper.
To be used for what the old goat has sat on, McPearson.
I open my paper to find a passionate plea from the chairperson of a well-known cosmic consciousness cosmetics company, The Figure Franchise, which has me in tears. Don't let those Eurocrats touch our natural remedies. It's outrageous what they are going to require. Anybody selling comfrey, dog's mercury, rupture wort, Greenland scurvy grass or the like as medicines will be required to prove their effectiveness and safety. Outrageous! It is quite inappropriate that standards which apply to the multi-million, global, (not to mention international), scientist-riddled and by definition thoroughly evil pharmaceutical industry should apply to philanthropic not-for-profit corner shops bringing the wisdom of the ages to technology-blighted customers. After all, you can't argue with the Druids, Phoenicians, Aztecs and so forth (well I've never found one you can argue with) and where would the world be without mistletoe, purple and chocolate? And did you know that if you leave your razor-blade overnight under a cardboard pyramid it will be as good as new in the morning and that if you want to leave it under a cardboard icosahedron (see GMcP passim) you are going to have difficulty in constructing one?
I quite agree that it would be inappropriate to apply industry standards to alternative medicine. Take homeopathic medicine, for example. (A case, perhaps, where the alternative is null.) The more you dilute it the stronger it gets. Think of the drug disposal problem. Flush the stuff down the sink and it mixes in the sewers just spreading wider and wider and getting more potent in the process. The fish in the sea must be as high as hippies at the Glastonbury festival, not to mention the whales. (Which reminds me that I don't recall that Greenpeace have ever addressed this problem.) If pharmaceutical industry standard operating procedures for drug disposal were applied to homeopathic medicines it would drive them underground. And we can't want that. Kids around the world shooting up on tincture of arnica and the like. The mind boggles. Mind you, don't let's knock the homeopathic theory. It's an excuse worth trying if you are ever caught over the limit. 'Honest, officer, there was less than a drop of tequila in my margarita. It's just that the barman insisted on shaking it when he mixed it.'
But there is a problem. What is a natural remedy? Obviously not acetylsalicyclic acid, digitalis, reserpine. On the other hand, extract of willow bark, foxglove and rauwolfia are clearly natural remedies. This just goes to show the importance of names, something which Guernsey McPearson, for one, would never deny. Hyoscyamine hydrobromate, or even atropine are clearly dangerous substances in need of drug regulation but belladonna and deadly nightshade? Why, any sweet shop should be allowed to sell them. Just think of all the years that were wasted developing cyclosporin when it could have been sold right away as Scandinavian soil mould.
But as statisticians, we have to be careful in naming things ourselves. Just look at the mistake that was made with the bootstrap. It sounds far too friendly. Every Tom, Dick and Harry is at it. If only it had been called Autologous Replacement Sampling Estimation, something no-one would dare apply an acronym to (although if they did, comfrey might have its uses: see above), we could have kept this as the special preserve of the statistician. After all, we, unlike philanthropic natural-remedy-sellers with nothing but the public good at heart, must look to our professional interests.
Aztecs: an apology. Guernsey McPearson would like to apologise to the Aztecs for any doubts inadvertently cast on their wisdom. Chocolate was a great discovery.
Mayas: an apology. Guernsey McPearson would like to apologise to the Mayas for giving the credit for discovering chocolate to the Aztecs. He admits you were there first but then, hey, that's drug development for you.
That chocolate site in
That FDA and complementary medicine site in full: http://cpmcnet.columbia.edu/dept/rosenthal/legal/Fed.html
Pharmacoeconomics: drug development's dismal science.
I believe that it was Umberto Eco, or it may have been Humbert Humbert, who said, rather wittily, that The Three Musketeers is really the story of the fourth. I forget the finer details of Alexandre Dumas père's novel, it being some 25 years or so since I read it (en français, bien sûr!). I do recall, however, some exciting scenes involving the nymphomaniac ex of Athos (Miladi) and a sinister executioner: evidently the double entendres of, 'having it off,' made an impression on my teenage mind. However, I think that by the end of the book the three musketeers (Athos, Porthos and Aramis: character shorthands for honour, courage and sensibility) are formally joined by the fourth (d'Artagnan, a sort of cipher for youthful spirit) so that Umberto Eco is right, indeed. (I hope that you all appreciate what you get in this column: not just statistics and drug development but cod literary criticism too.)
Not infrequently, I pass through airport news-agents in search of serious entertainment for the long haul (studiously avoiding, of course, the top-shelf magazines). I have not seen The Three Musketeers in such establishments. What I have seen recently, on more than one occasion and always prominently displayed, is a work of popular fiction entitled The Regulators. As you might expect from the title, it is a horror story. We of the PSI, of course, are well acquainted with The Regulator and his three musketeers: quality, efficacy and safety (Ethos, Pathos and I'm a risk) but it now seems that many feel it is time that the fourth ('debt and gain') joined the fray. In fact, opening the latest copy of the APE ( The Albion Physician's Enquirer) what do I find but a clarion call from an eminent professor of health economics that all drugs should prove value for money before being registered? What an excellent suggestion! It must surely be welcomed by health economists up and down the country! For it is quite clear that requiring value for money will require money for valuation and who shall we call to perform this task? Why, the health economists. And of course, if a certain university established a monopoly in such evaluations what better motto to adopt than, 'all for one and one for all'. (By the by. What is the difference between a health economist and an economist? You couldn't give an F? Quite right. That is all it takes to turn utility into futility.)
But don't let us be hypocritical about this. For, have we statisticians not mounted a most successful campaign ourselves? It started with the introduction of integrated reports. (The old system was that The Regulator just got to see the physician's report: a curious mixture of plagiarisms from the statistician's report and extravagant and false inventions.) Then we got the CPMP guidelines to require the participation of a qualified statistician in all stages of the clinical trial and, of course, the APE and its rival The Speculum, now have statistical review of their papers. Furthermore, it seems that even the good old medical expert's days are numbered. He has been replaced by the statistician's meta analysis. So let us not begrudge another profession's attempts to make itself indispensable. But the question I ask you dear reader is this: can we have too much of a good thing? After all, it is but a bootstrap from feathering nests to festering nethers: a fate too dire to contemplate. I imagine the scene in the year 2010.
We are unable to grant you registration with the General Medical Council. It is true that you diligently attended all your lectures. (Is this a work of fiction: Ed?) You also performed brilliantly in your exams and gave most effective solutions to all the problems set. Your clinical work has been of the highest standard and the quality is excellent. Your elective was most impressive. In a hospital setting you showed commendable tolerability when faced with awkward patients and difficult colleagues. However, you have failed to satisfy us that you are intending to spend several years as a junior hospital doctor working 100 hours a week for minimal reward. There is a strong suspicion that your intention is to enter Harley Street and charge outrageous fees. This being the case it is clear that you will not give value for money and hence we cannot accept your registration.
Perish the thought! On the other hand, I don't know. Have you noticed a tendency in the pharmaceutical industry for the medically qualified to earn more for the same job and performance than their fellow scientists? What? It had escaped your attention. I think you need to call in the health economists!
It has been stated that we are wrong to give the credit for building a great cathedral or palace to the architect. It belongs instead to the stonemasons and bricklayers whose labours raised the edifice. In my opinion, this insight has been rather hastily attributed to Bertolt Brecht, who is usually given as the author of it. Not enough credit has been given to the typesetters and printers not to mention paper-mill workers and lumberjacks of this world for this observation. Of course, had BB lived to see the age of desk-top publishing we could perhaps legitimately give him the credit for Mother Courage, The Caucasian Chalk Circle and Galileo , which he so plainly doesn't deserve.
According to Alfred Hitchcock, directors should treat actors like cattle. "What's my motivation in this scene, Mr Hitchcock?" "Just say the lines," he would reply. I have always felt that a similar attitude should be adopted towards trialists. The last thing you want is some jumped up physician with ideas of his own. "Just follow the protocol." (If only they would!) This at least is one of the blessings of multi-centre trials. There are so many physicians involved that you can reasonably use the excuse that if you were to start changing the protocol at the request of one, there would be so many others you would have to go back and get approval from that the process would be impossible. Nowadays, you find that many of the top Hollywood movie stars won't appear in a film unless they are granted co-scriptwriting rights. Does anybody seriously believe that the films that are made are any the better for this? Not the critics. Not the directors. Probably not the general public. Will this stop this happening? Not at all. Because of something called "box-office pulling power". There will always be movie marketing men who will think that if the price of getting the star's name on the cast list is letting that star hack the script to bits, it is worth paying.
Despite that, however, Hitchcock's policy paid off in the long run. In the end his name as director was as great a pull as that of any of the stars in his films. And names can be important. Take an example. A film of which I am very fond is Bill Forsyth's Gregory's Girl. (The film has a truly excellent protocol.) In the wake of its success there were umpteen interviews with the admittedly comely actress who plaid the eponymous role. What the critics had failed to notice, however, is that the film was not about Gregory's Girl 1 but about Gregory, and that Gordon John Sinclair's acting in the lead role was an important part of its success. I always felt that he was denied an acclaim that was due to him simply on the basis of the film's title.
Does anybody think that a pharma-industry trial is the better for having let some prima-donna "opinion-leader" force his or her embellishments on a protocol which has been worked and re-worked by the sponsor's clinical trial experts. Apparently so. Hollywood is not the only industry in thrall to the marketeers. As far as pharma marketeers are concerned, it is obvious that the product is a world-beater as regards efficacy, tolerability and quality (especially quality of life), even before any work has been done on it. The only thing that remains in doubt is whether the average GP can be brought to see the benefits of this wonder product. This is where the really difficult part of the drug development begins. (Outsiders have no idea how hard it is to sell effective remedies to desperately ill patients who don't have to pay for them.) There is nothing like having an opinion leader's name on your publication to sell a drug. And if part of the price (in addition to the fee) of getting that opinion leader to co-operate is letting him or her hack the protocol to bit, so be it.
Well what, you may say, has this got to do with Bertolt Brecht? Just this. There are, I am pleased to say, a considerable number of cases where industry trials get performed without using "opinion leaders": trials in which the physicians are quite happy to carry out the protocol as designed without feeling they need to modify it. This can be a great blessing. After all, the theory of experimental design was first established in the field (literally) of agriculture and the difference between agricultural and medical research is that the former is not performed by farmers2. The more the professional scientists lead the trials the better. Some of these trials are rather successful. Inevitably some of these trials make the Press. A new breakthrough having been developed, The Daily Sensation will inevitably wish to interview the hero in the white coat, the Dr Kildare, who made the breakthrough. It will hardly want to be fobbed off with the industry chemists who first synthesised the molecule, nor the team who did the background investigation of the mechanism of action, still less the industry physician and statistician who designed the trial. So courtesy perhaps of the pharma marketing department one of the trialists will be pressed into service and propelled into the media spot-light as the discoverer/developer of the new wonder-product.
But we shouldn't gripe. It is a generalisation of that phenomenon which Stephen Stigler very wittily and modestly dubbed Stigler's Law of Eponymy: if a discovery is named after somebody, then he didn't discover it. So the process is almost inevitable. You will just have to accept that Dr Limelight from St Enema's is going to get the credit for your protocol. And the only consolation you will get, from the coverage granted to near irrelevant persons by the media, is that of being provided with yet another proof, if proof were needed, that when it comes to drug development, television and press don't know their base from their apex.
1. In any case, the point of the film is that it is not the girl that you think who is Gregory's Girl
2. This witty remark is due Michael Healy. See his paper: Frank Yates, 1902-1994 - The Work of a Statistician, International Statistical Institute, 63, 272-288 (1995).
The Frightfully Drunk Alcoholic's (FDA) plasma concentration has been affecting his mental concentration and he is on his knees scrabbling around under a lamppost when the Police Superintendent (PSI) comes by.
PSI What are you doing here?
FDA Looking for my keys which I lost 200 yards down the road.
PSI So why are you looking here?
FDA The light's better, of course!
I have had cause to remark in previous SPINs on the lunacy that prevails whenever the topic of bioequivalence is raised. Now we have fresh evidence that the madness continues. A certain regulatory agency has produced a document for consultation in which it is proposed to replace the old notion of average bioequivalence with those of population and individual bioequivalence. And if you haven't already heard of prescribability and switchability you are going to hear a lot more of them in the future.
Suppose that a physician is faced with the choice between using a brand name product (for which an enormous amount of regulatory evidence for efficacy, quality and safety has been provided) or a generic for a newly diagnosed patient. If he has evidence that the generic is equivalent to the brand-name drug in the sense that for a newly presenting patient there is no reason (other than price) to choose between brand-name and generic, then the generic may be said to be prescribable. Since, in practice, the generic will not have a mountain of direct evidence to back up its claims, this requires that some good evidence has been provided that the generic is equivalent to the brand-name drug. Until recently, it was considered adequate to prove mean equivalence. This was not because drug developers were unaware that two distributions could be similar in terms of means and different in terms of variance. Far from it. It was because it was considered that life was short and there were more important matters to look at.
Suppose, however, that you were concerned that the generic product, despite being the same on average, was more variable than the innovator product. Would you not be concerned to investigate thoroughly potential sources of such variability, for example from batch to batch? Wouldn't you think it odd to fill a document full of symbols for various components of variance, between, within and interactive but have no time to consider the manufacturing process itself? Wouldn't you think it was peculiar not to say anything at all about how the samples to be compared should be chosen? Wouldn't allowing the sponsor to use product from a single batch chosen by him be as illogical as looking under a lamppost for keys you had lost two-hundred yards away?
Now suppose that you are a physician whose patient is currently under a brand-name product and one of those nasty purchasing agencies is putting pressure on you to switch him to a generic. Now, it is argued that you need the concept of switchability. This is because it is theoretically conceivable that two products could have same mean bioavailability but one might be relatively more bioavailable for one kind of patient and relatively less bioavailable for another. This is would be an example of treatment by patient interaction. Now, you might say what business should this phenomenon be of a regulatory agency. After all we don't just sell to the prevalence we also sell to the incidence and for new patients, the concept of switchability is irrelevant. You might think that a simple label inside the generic product 'although KopyKat is on average similar to Bigbucksalol, there may be the odd patient who will experience difficulty if switched from Bigbucksalol to KopyKat,' would do. How wrong you would be! The regulator is not just there to enforce regulations but to create regulations. With many ICH documents being finalised there is a severe danger of "all quiet on the regulatory front," and that would never do.
Don't get me wrong. It is not that I think that patient by treatment interaction is an unimportant topic - far from it. However, there is one thing about interaction that I remember rather well from my undergraduate days and the study of linear models. An interaction is usually less important than the marginal main effects. Studying patient by treatment interaction, say, when comparing two quite different treatments in heterogeneous patients is much more reasonable than say, looking for interactions when comparing two formulations of the same treatment in healthy volunteers. Curiously, there is no requirement upon sponsors to do anything serious about the former. In fact, you are positively discouraged from using the sort of cross-over trials that would enable you to seriously investigate treatment by patient interaction (and which are now de rigeur in bioequivalence) because, as everybody knows, the parallel group trial is the gold-standard for investigating all possible questions (except bioequivalence).
And there is another thing I remember about interactions. As soon as you abandon the parallelism assumption and admit the possibility of interaction the effect you will see depends upon the subjects you choose. Now, unless I am very much mistaken, we don't prescribe to healthy volunteers, let alone switch them from one formulation to another. If switchability is a concern it is a concern for patients. If subject by formulation interaction is possible, the effect will depend on the subjects chosen. It seems rather remarkable that this new guideline says nothing, therefore, about choosing subjects and in particular that nothing is said about the absolute need of running bioequivalence studies in patients.
Why can this be? Presumably the light is better in healthy volunteers.
Next issue. A do-it-yourself guide to increasing your scientific status by drafting complicated guidelines on hitherto neglected matters. Watch out for these hot topics.
How many stars for significance in astrology? A user's guide to star-sign by treatment interaction.
One Yank and it was off. The effect of British versus American spelling on patient compliance.
A sticky problem. The effect of chewing bubble-gum on absorption from suppositories.
Is your drug a frequent flyer? Prescribing treatments in the age of jet lag.
Bad Vibrations. Oscillation damping and drug shipment. An essential quality measure to avoid homeopathic potentiation of treatments.
This is a rather complicated number but extremely popular world-wide and turns up in the most far-flung places of medical research. It has a number of local variants. I merely describe one here. The dance is divided into two major sections the forward and reverse reel. They have a pleasing symmetry to them.
1. The introduction. The medical adviser searches for a suitably precise and sensitive "instrument". Usually a rating scale with many categories is employed. (Typical example Hamilton for depression.) Alternatively a continuous measurement scale is used and highly precise measurements employed, (e.g. FEV1 to the nearest ml).
2. The presentation. The scale is used to measure patients, subjects etc.
3.Treading the measure. The scale is arbitrarily divided to form two sections labelled 'responder' and 'non-responder'. (Some local variants have an intermediate stage known as tripping along the baseline.)
4. The envoi. The dichotomies are handed over to "the statistician".
1. The acceptance. The data are accepted by the statistician who now refers to them as "binary".
2. The link. A suitable link is chosen to relate the binary data to a continuous expectation. (This is necessary because direct models for binary data don't work well.)
3. The model. A model is introduced into the dance. (This stage can involve some very delicate footwork. To avoid accidents dancers are requested to register their steps with the local caller before the dance.)
4. The analysis. The binary data are converted back to predictions on the continuous scale chosen.
Any tune at all as long as it is played on the fiddle.
Object of the exercise
As with all dances, the object is to introduce elaborate, superfluous and complex movement. It has nothing to do with logical progression from A to B and indeed the object is to end up (nearly) where you started. It should be viewed as an art form, not a science.
Next issue. The full Monty or the art of statistical strip-tease.
Now you have no doubt been thinking: this Guernsey McPerson has been going for a long time: how long can he keep it up? Well that's a very personal question, madam, and none of your business! It does, however, bring me on to a subject of current interest, namely a currently rather infamous drug, which I shall refer to as Fullaggro, used to treat a rather embarrassing condition, "condition I", with increasing prevalence amongst males as they mature past middle age. I've already said, Madam, that it's none of your business! Fullaggro is so marvellous that not only is it a cure for "condition I" but also for the general malaise regarding profits affecting the industry. I believe it was William Buckley Jr who defined dancing as the vertical expression of a horizontal desire but I think that the Fullaggro profits could be described as the vertical expression of diagonal desire.
However, one man's profit is another man's cost and politicians seem to be becoming very worried about the cost of Fullaggro. Scarcely a day goes by without some politician's sound bite on the subject, so to speak, but let's not digress to the political situation in the US. And Fullaggro certainly is bringing the idiotic commentators out of the woodwork. For a long time it seemed that you were not going to be able to get Fullaggro on the NHS. The Minister's position appeared to be that you could waste any amount of public money in your quest for treatment for condition I. You could besiege your GP, pester consultants, be prescribed umpteen treatments, as long as none of these were successful in curing "condition I".
Now however, the Minister has relented. You will be allowed your Fullaggro once a week. Has the man gone mad? What a cock up! Let me tell you an old joke by way of explanation. (Alas, not a GMcP original but our powers are not what they were.)
The European Commission were deciding on the ideal number of condoms there should be in the Euro Condom Packet (ECP). "Four", said the Germans. "Four?" Yes, Monday, Tuesday, Wednesday, Thursday but the weekend is for drinking and relaxing. "Eight," said the French. "Eight?!!". Yes. Monday to Saturday and for Sunday, twice". "Twelve," said the British. "Twelve!!!????!!!" "Yes, January, February,..."
You see the point. George Mikes famously remarked that the British don't have sex they have hot water bottles. Once a month Fullagro would have been quite enough. Now the Minister will be encouraging thousands of elderly males to be all dressed up with nowhere to go, four times a month, when once was probably as much as they were ever used to.
What on earth can they make of it all on the continent? Proof positive that Les Vaches Folles have finally got to the British. Has no one pointed out to the cost conscious Minister that a lot more Fullagro is going to be sold out of the UK than in it, and that since it is a British invention partly developed in Britain some of the profits are bound to come back to UK limited. Perhaps he should have a word with the Chancellor.
"He says he's a beautician and sells you nutrition.
And keeps all your dead hair for making up underwear."
David Bowie, The Jean Genie
Shrink with horror?
In a shock statement last night, it was revealed that genetically modified calculations have been entering the statistics chain for several years now. It can now been revealed that for the past nine years statistical scientists have been splicing so-called "prior beliefs" into data using "Bayesian methods" in an attempt to make them more resistant to chance fluctuations. This has led to an immediate call from statistics consumer groups that all such calculations should be clearly labelled as having been Bayesianically modified and in fact that a three year moratorium on all such methods being used for public policy should be imposed.
Frank and Stein estimation?
Statisticians had long postulated that Bayesian calculation was possible but until the start of the 1990s nobody had actually succeeded in doing one. It was also known that so-called shrinkage estimators were theoretically superior to natural "raw data". Then, a breakthrough in computation showed how, with the help of "mixed-up calculation, muddled computing," algorithms (MCMC) the goal of n-Stein restriction could be achieved. Within a short while, statisticians of the so-called Bayesian persuasion throughout the world were engaged in mass sessions of self-simulation. By this technique, brain waves are fed into the computer, mixed with the data, subject to millions of random mutations and then fed back into the subject's brain via a so-called graphical interface. There are claims, however, that this technique is extremely dangerous and that, in particular, failure to converge can result in a bad trip. Non-Bayesian scientists are, in fact, claiming that Bayesianism is not a science but a cult, and point to mass indoctrination sessions held at regular if infrequent intervals in Spain during which mind-numbingly boring mantras are repeatedly chanted by Bayesian adherents.
Freak and Twist
Statisticians of the Bayesian school are fighting back however. They claim that there is absolutely no cause for the public to be alarmed and that Bayesian calculations are delicious, refreshing and nutritious; indeed, that they provide all that is needed for a coherent diet. They point out that many naturally occurring data-sets contain contaminants that Bayesian methods will help to down-weight. They also claim that the common so-called "freak and twist" methods have been contaminated with P-values for years.
The Bugs spread
Consumer groups are unimpressed. They point out that these calculations have been released without prior consultation. In a shock claim that has not been denied, they state that one of these statistically modified viruses has escaped from a laboratory in Cambridge and has spread like wildfire over the Internet appearing at thousands of locations throughout the world and indeed that these techniques are now being widely used by persons who have no idea at all what they are doing.
ICH and scratch
They also accuse the world-wide regulatory body for drug development, the so-called ICH, of having caved in to Bayesian demands without consulting the public. It seems that according to the notorious ICH E9 guideline, "the use of Bayesian and other approaches may be considered when the reasons for their use are clear and when the resulting conclusions are sufficiently robust."
Storm in a t-test
A Downing street spokesman, said that there was no need for the public to be alarmed. Successive British governments had been controlling the release of public statistics in terms of content and timing for years and that there was no intention of changing this policy.
Of Simples and Simpletons
Simple A medicine of one constituent.
Simpleton. One who thinks statistics can and should be made simple. One who seeks to serve the cause of medicine by promoting easy lies in preference to difficult truths.
All right thinking readers of SPIN will, I am sure, be well aware that the NNT (numbers needed to treat) is a moronic way to summarise the results of a clinical trial. For one thing, it depends entirely on the background risk, hardly an ideal property to summarise a clinical trial that is unlikely to be perfectly representative of the target population. No two statisticians communicating with each other would ever use such a device, preferring, I am sure, something like odds-ratios or log-odds ratios. Recently, however, it was put to me that physicians cannot understand odds-ratios let alone log-odds ratios and so they should not be used.
I can't pretend that I find this argument entirely unwelcome. I must confess to being rather jealous of all those statisticians who get to work in survival analysis. Look at all the books on the subject that they have produced: more than 30 when I last counted. There is even a journal devoted to this subject alone. Now consider proportional hazards, right censoring, Kaplan-Meier estimates, not to mention frailty models, time-dependent covariates and counting-processes. It is pretty clear to me that none of this is comprehensible to physicians, so we can just cut it all out. That should wipe some smiles off a few faces.
This argument opens the possibility of a whole new approach to medical research. Before employing any scientific device, we should make sure that the physician understands it. We may have our work cut out. Is that spirometer measuring forced expiratory volume in one second as the area under a flow-rate curve? If so, please check that the trapezoidal rule is being employed. If Simpson's rule or something more complicated is being used, you will have to replace the spirometer because the physician will never understand it.
And just think how much simpler life is going to be. The average physician has a very basic understanding of genetics, immunology and pharmacology. We can more or less cut out research in those areas altogether, since the results can never be applied. Just think how many ethical dilemmas we can avoid. For that matter, any physician using a computer, say for word-processing, should be forced to demonstrate that they have a working knowledge as to how it is put together.
And in this democratic age we can take the issue even further. No physician should be allowed to prescribe a drug to a patient if the patient does not understand the science. This will solve the problem of finding a cure for Alzheimer's at a stroke (not to mention the cure for stroke) since on this basis if you have it you can't be treated. We could more or less wipe out paediatrics as a discipline. Intensive care? Forget about it. And as for obstetrics: how many foetuses understand forceps, epidural blocks, caesareans or the finer points of scanning?
Yes, I think that this policy could really make life a lot simpler.
On the other hand, an alternative approach might be to "black box" things. We could stick with our log-odds ratios and simply invite the physician to input a background risk to some suitable software. Then, via the magic of computer programming, get the computer to spew out the predicted probability of death, cure or survival (or whatever) under the treatment.
Silly me! This is obviously far too complicated to contemplate.
Statistics. The scrubber science. All anyone wants is a quickie without the encumbrance of a meaningful relationship.
Medical statistician, OBN. A sort of physician's flunkey.
This issue the GMcP column looks forward to the sort of medicine, and news items about medicine, we can expect in this new millennium. This will include
The Fullagro-Nicotine Combi-Patch
Increasing awareness that smoking is a major cause of impotence leads to a burgeoning market in combination therapy. (Why did it take us so long to realise this side effect of nicotine? How often have film directors given us shots of couples in bed smoking before they had sex?) To prevent those taking nicotine patches suffering a disastrous loss of libido, a Fullagro-nicotine patch is developed. Fullagro cigarettes also prove to be a big hit and an unanticipated benefit turns out to be that if you change your mind you can stuff them back in the pack much more easily. Fullagro cigars are particularly popular in America. Shag tobacco takes on a new meaning. Whisky distillers take the hint. "Highland Caber," a new blend of malt and fullagro is particularly popular with the Ibiza set and is ordered by saying "give me a stiff one".
The Emergence of APRIS.
Over-exposure to Australian soap operas leads to vast sectors of the British population being infected by Antipodean Posterior Rising Intonation Syndrome (APRIS). This acquired speech impediment causes every statement to be turned into a question by raising the intonation at the end of the sentence. This can lead to confusing social interactions. (In the following examples, the symbol ^ indicates that rising intonation follows.)
Waiter. Will you have the apple pie or the plum pudding?
Customer. The apple pie^.
Waiter. It's a sort of covered pastry tart with cooked apples in it.
Boy at party. Where do you live?
Home and away addict.
Boy at party. I said where do you live.
Girl at party. What do you do?
Neighbours-watching get-a life moron. I am a statistician^.
Girl at party. Well if you don't know, I don't know.
The Scourge of GNUS
Genetic Nonsense Uttering Syndrome (GNUS) is an advanced form of Alzheimer's afflicting many who work in the pharmaceutical industry and in particular chief executive officers. Particularly noticeable when these give public lectures in which phrases like 'human genome project', 'genotyping', 'targeted drugs', 'new opportunities', 'specific reaction', 'more precise clinical trial' and 'increased profits' are improbably linked together with hot air.
The public's delight with all things alternative leads to an increasing interest in homeopathic medicine on the NHS. The National Institute for Clinical Excellence does a cost efficiency analysis and rules that such medicines are reimbursable on the NHS provided that the profit margin on the cost of active ingredient does not exceed 1million percent. With street prices of arnica at one pound a milligram, homeopathic practitioners throughout the country are left wondering how they can make a living by charging 10,000 times nothing at all.
The Cure for Nicotine Addiction
As 'patching' sweeps the world, there is an alarming increase in the incidence of skin cancer. People who wear nicotine patches are particularly affected. Several eminent scientists claim that there is no proof of any causal link: it could plausibly be that those who are genetically disposed to "patch" are genetically disposed to skin cancer, rather in the same way that those who are genetically disposed to accept payment from patch manufacturers are genetically disposed to say this sort of thing. A new cure for patch addiction is discovered. It consists of rolling up tobacco inside a tube of paper, lighting it and inhaling. With the help of this ingenious device, patch dependence can be eliminated.
Call me old-fashioned, call me unimaginative, call me a curmudgeonly stick-in-the- mud, call me a cynic, why, yes, call me a statistician, call me what you what will but I have a very simple view of matters. Whenever the medical advisor asks me how many patients we need I always give him the standard answer, "too many (and then some)". If you ask me why it is that we run clinical trials with at least dozens, often hundreds and sometimes thousands of patients, I will tell you that that is how many patients you need to tell that the treatment works at all: that is to say in some patients. Yes, yes my friends this is the sort of pessimism that statistics induces.
How much nicer to be a physician. A physician can look at a single patient and tell you whether the treatment worked or not. In fact the most important part of the clinical trial is the awards ceremony at the end. That's when the doc hands out the medals: a responder tag to him, a non-responder label to her. That's the genius of medicine. You only have to look at a patient to tell what would have happened to him or her, had he or she been treated differently. This is no doubt why the medical profession managed so well without clinical trials before statisticians came around poking their noses in everywhere and spoiling a good system. "It is true that Mr Brown died after I treated him but he would have died anyway, whereas Mrs Smith, who recovered, would not have done so without my help." Don't knock it, it's a system that gave us scalding baths for cholera and blood-letting for tuberculosis and let's face it, nobody would ever do anything so cruel to their patients unless they were utterly convinced it was in their best interest, so they must have been very good treatments. (And they were also so cheap!)
This reminds me of a joke. Two ducks in Ballymena. One says, "quack, quack". The other says, "he said ducks not docs"* .
How much better still to be a manager. It surely didn't escape your attention that recently geneticists have completed sequencing the human genome. (Did you know by the way that gene is a four-letter word and so is genome?) Expect a lot more in the way of portentous announcements from upper management and not just from CEOs in charge of USA limited and UK limited. Not only is the human genome project going to deliver a lot of targets for drugs, it's going to make clinical trials a lot cleaner because we are going to be able to screen for non-responders.
Of course, if I were going to play the bad fairy at this particular party, I would point out that in most clinical trials patient-by-treatment interaction is not identifiable (to use some nasty statistical jargon) and so not distinguishable from noise. Thus we do not know that the reason that some patients appear to respond and some do not is anything to do with a specific reaction to the treatment. It may be the operation of a hidden cause only temporarily associated with the patient or it may even be due to measurement error. What we also know, is that patient-by-treatment interaction provides an upper bound to gene-by-treatment interaction since human beings differ by more than their genes.
For example, we now know that grapefruit juice can interfere with the elimination of some pharmaceuticals. Therefore, unless and until we identify the gene for wanting grapefruit for breakfast, it seems reasonable to suppose that this is one aspect of variability (amongst many) that the human genome project is not going to eliminate. (Of course grapefruit juice itself is eliminated but that is another matter.)
We could have been investigating this sort of thing all these years. We could have been running multi-period cross-over trials in chronic diseases. We could have been using sequences of n-of-1 trials to check whether there really is patient-by- treatment interaction or not but of course this would have left us open to the possibility of carry-over, a problem so serious that it prohibits all within-patient designs.
How fortunate then that our industry has such wise captains, who can "know" what the cause of variability in clinical trials is without having had to run the experiments to find out. It's a pity that they never succeeded in persuading the MCA and the FDA to let genius, flair and insight decide whether treatments worked, rather than being forced to waste time on all those expensive trials, collecting something as old-fashioned as evidence. To take a related field, just think of all those years that psychologists were carefully carrying out twin and sib studies to see what was nature and what was nurture. If only they could have had the wisdom of pharmaceutical industry management, think of the trouble they could have saved themselves. That is what genius, flair and insight does for you.
Edison may have said, "Genius is one percent inspiration, ninety nine percent perspiration," but he was talking about genius not geneius.
This summer's best seller. Men are from Mars, Women are from Venus, CEOs are from Uranus.
*In the original version the other says, "for goodness sake I'm going as quack as I can."
Meet the Archies
A paper in the Albion Physician’s Enquirer (APE) has compared meta-analyses produced by members of the Archie Association (AA), an organisation devoted to summarising and disseminating medical evidence, with those produced by the pharmaceutical industry (PI). The authors, who are members of the AA, find that when compared using an instrument created by members of the AA and validated by members of the AA, meta-analyses carried out by other members of the AA are of higher quality than those performed by the PI. This staggering piece of impartial research (rather like comparing ballet shoes and ski-boots at Covent Garden with Darcey Bussell doing the judging), deserves to be made as widely known as possible. Yours truly was reminded that there is a long-running programme on the wireless (the word is appropriate for something that started fifty years ago) in which rural drama in deepest England is used as a vehicle for imparting nuggets of wisdom to the intellectually impoverished. I have thus proposed to the Corporation that an episode of The Archies be devoted to this very topic and am in the privileged position of being able to give readers of SPIN an aperçu of the script, which now follows.
The Archies: an Everyday Story of Database Summarisers
Dan Archie, a stalwart overviewer.
Doris Archie, his wife an equally stalwart analyst.
The Reverend Man, the local vicar.
Scene: It is a Wednesday evening in the kitchen of Dan and Doris Archie’s evidence base in the village of Umbrage near Boretester. Dan and Doris are sitting at the table. There is a bottle in front of them and each has a glass.
Doris. But I thought we was in favour of farmers, Dan.
Dan. Aye, that we be, Doris, but you mean farmers not pharmas. It be these pharmas we must look out for.
Doris. How so?
Dan. Now Doris, you haven’t been a-sleeping through the vicar’s sermon again have you? Surely you remember what he said last Sunday.
Doris. Can’t say as I do. What did he say? [There is a knock at the door. Doris gets up to answer.]
Doris. Why, hello Vicar! What a coincidence! You’ll never believe this, but we was just talking about you. Will you come in and have a glass of Cowslip wine?
The Reverend Man (For it is he.) With pleasure Doris, provided only that the dose is large enough. Anything else would be unethical. [Doris rises to get a glass. Various clinking and gurgling noises, provided courtesy of the Corporation’s Stereophonics Workshop, may be heard.]
Dan. Vicar, we was just discussing your sermon on Sunday. Could you remind us of some of the finer points?
Rev Man. I don’t want to chide you, Dan, but surely you will remember that the important thing in summarising anything is to include all the points, even those of lesser quality? To do otherwise is to lay oneself open to the charge of bias. [Doris and Dan make the sign of the cross.] Of course, you may always draw attention to the doubtful nature of low-quality points in an aside.
Dan [Crestfallen] Well, tell us the whole thing then, Vicar.
Rev Man. I was discussing "The Ten Commandments of Overviewing", which are as follows
Doris. Is that it?
Rev Man. Well, of course, these are just the points I covered in last Sunday’s sermon. But this is an ongoing story. We can always find more bits to add. I think you will agree that what has been said so far is highly significant. Over the next few Sundays, as I add more and more to this theme, I shall prove how relevant it is.
Dan. But I have heard tell as to how there be an eleventh commandment?
Rev Man. Ah, yes Dan. You are referring to, "thou shalt castigate as idiocy all recourse to random-effect models". That is a highly controversial point. It has been proposed by the Oxford Movement, but although it has a long and venerable tradition, it has not been universally accepted as canon law. In particular Bishops of London and Cambridge have spoken against it. You and Doris have only done the Alpha Course. You need to do the Beta and Delta Courses on sample sizes first before I can discuss that one. In the meantime I would stick to more fundamental matters, if I were you.
Doris. You mean we should be fundamentalists?
Rev Man. [Smiling.] No not exactly, Doris. I mean, that if you always remember to go through your quality assessment exercises you can’t go far wrong. Remember that we have been promised that when two or three of us are gathered together we may validate our instruments. Now that reminds me. It seems to me that it is a long time since either of you went to the annual retreat and you know how important those are to your religious development…
…to be discontinued
Stop Press News: Ski-Boots Better After all.
In a recently conducted trial in Kitzbuhel, Hermann Maier (The Hermannator), using a slalom course set by the Austrian ski-coach, claimed to have proved conclusively that ski-boots were, contrary to previous research, better than ballet shoes. A tight-lipped spokesperson for the Archie Association said that there was no evidence that slalom courses had been validated, and, in the absence of such evidence, no credence whatsoever could be given to this research.
'There's no such thing as objective marking.'
Malcolm Bradbury. The History Man
In my dim and distant youth, in the days before I joined the Elixir Laboratories of Pannostrum Pharmaceuticals, I taught adults for a living in an institute of higher education: the College of Buchaillemore. The teaching load included a fair number of ancillary courses. One that will not easily be forgotten was 'introductory mathematics and statistics' for the first year quantity surveyors or "Brick One", as they were known. What scallywags they were: filling in the temporary register with names such as Dick Tater, Cliff Erosion, Hugh Jarse and Juan Kerr in the hope that their lecturer would read them out at the next lesson. The extremely comely tutorial assistant had a busy time of it, poor thing. A male hand would be raised for help. However, if yours truly approached, the hand would be lowered. Still, I suppose everybody benefited: my voluptuous assistant got exercise both physical and mental, which can only have done her good, the quantity surveyors discovered pleasures in learning statistics they could not have imagined and I was entertained.
It occasionally happened that students appeared to fail on one or other of the courses that I taught. I say appeared to fail for such occurrences were usually temporary. If for any reason the examination board had failed to give their case a sympathetic hearing, the director of student services, Mrs Mahen, could always come and plead their case at the appeals board.
Mrs Mahen: The reason that John McSlacker has failed is that he has been working nights as a barman to pay off his debts and so has been unable to study at all. (Murmurs of sympathy all round.) Personally, I think that speaks volumes for his sense of social responsibility.
Dr McPearson: (Suspecting from McSlacker's class attendance record that he must have a day job too.) But is his student grant* not sufficient? (Intakes of breath, tut-tutting at this unsympathetic and ungentlemanly line of enquiry etc., etc.)
Mrs Mahen: (Triumphantly putting down a smart-arse.) He had to pay off the loan obtained for the purchase of his motorbike.
Principal Conniver: (Intervening swiftly, fully aware that McPearson is aware, that the hall of residence in which McSlacker lives is half a mile from the College and seeing McP's next question a mile, if not half a mile, off). Well what could be clearer than that? I think we must, in the name of fairness, condone this failure. Who is the next student we need to consider?
However, sometimes, in order to obviate the necessity for an appeal and if, perhaps, there had been an excessive number of failures on the course, there was a decision at the examination boards to re-scale the marks or, as they say in America, "mark on a curve". Attentive readers will know that your columnist is a statistical totalitarian, who holds the radical view that statistical method should be applied to everything, even statistics. Not to apply statistical reasoning to statistics itself would, of course, be the ultimate hypocrisy.
However, hypocrites is what the majority of statisticians are, believing that statistical reasoning should be applied to anything and everything but not statistics. Do you think I am being unfair? Well, what statistician when consulted with a request to design an experiment would say, 'don't worry about that just start and see how you get on'? But is that not how all statisticians do simulations, which are, after all, their own experiments? What statistician would recommend presenting results from an experiment without measures of precision. But how many statisticians in quoting the results of simulations give you standard errors as well as means? If a physician comes to a statistician saying, 'I want to screen for disease. The false positive rate of my test is 10%, the false negative rate may be as high as 70%, the exact prevalence is unknown but the disease is fairly rare and the treatment is not very effective," will the statistician reply, "excellent, go ahead!"? Yet this is exactly the approach that PSI recommended at one time for screening for carry-over in cross-over trials.
To return to examination boards, my radical proposal was that since we wished to transform marks on the interval 0 - 100 so that they remained on the interval 0 - 100 (but were increased),we should use a logit transformation. All my colleagues, were of the unanimous opinion that this was far too difficult to contemplate.
But, I persisted. I prepared a little table for the next examination board. A more modern version of this is in the Excelâ Sheet attached. You then simply imagined what you would consider a mark of 50 ought to be improved to and this then predicted how all other marks should change. For example if you thought that 50 should really be 58, then you looked for the row which had 50 in it and then found the column with 58. This was then the column you used for re-scaling all marks so that, for example a mark of 35 becomes a mark of 42.6
It made no difference. It was perfectly acceptable for us to teach our own mathematics and statistics students the theory of generalised linear models and to examine them in it (bearing in mind, of course, the necessity of being generous in marking) but re-scaling marks using a logit transformation. Who on earth could have any faith in something so complex? Who would understand it. How would you explain it? My scheme was not implemented and we continued to adjust marks on the ancient and trusted piecewise linear system.
It occurs to me, however, that my little table might be useful after all: if not at examinations boards then to the biostatistical community. Most physicians, I am told, have the greatest of difficulties in understanding odds-ratios and seem to be as averse to them as statisticians at an examination board. However, apparently nomograms are very much appreciated in the same quarter. Indeed some who think physicians will have difficulty understanding odds ratios have promoted quite complex nomograms for sample size determination. So I have added on a little odds-ratio coda to my table.
First catch your odds ratio. Then once you have it, see how it transforms the mark of 50. Once you have that, you can use the main table to see how it transforms any mark whatsoever. Yes, folks, that's the magic of statistics for you.
Of course, anybody who wishes to have a more extended version of the table is welcome to contact your truly who will supply it in return for a typically modest+ fee.
* Yes, in those dim and distant days, students got grants
+This being a highly appropriate adjective for anything associated with GMcP
How wonderful it is that we have such a high quality medical press. It cheered my heart recently to read editorials in the Speculum and the Albion Physician's Enquirer (APE) announcing that the editors of the world's leading medical journals are banding together to put a stop to the evil machinations of the pharmaceutical industry.
For example, it seems to be a particular nasty and growing habit of this industry to employ so-called Contract Research Organisations for running and analysing trials: to use professionals (I can scarcely bring myself to write this ugly word) when they could be using amateurs such as academics. Yes yes, instead of having audited data trails, quality control, timely and high quality data, and so forth we could go back to the good old days, of having, mixtures of text and numeric data in the same field, missing consent signatures, source data filed in the waste-paper basket and all the other advantages of the academic approach to collecting data. I am reminded of the wonderful scene in Chariots of Fire when the masters of Trinity and Caius (played by Sir John Gielgud and Lindsey Anderson) confront Harold Abrahams (Ben Cross) with his awful crime of employing a professional coach. "Here at Cambridge we favour the approach of the amateur." And who, when asked to compare the two forms of that game with two codes, (which the French call La Philosophie Ovale) would not have affirmed the moral superiority of Union (as it was!) over League. Publicly paying working men for time lost on the field? The very idea! Bungs under the table and Masonic handshakes, is the way that things should be done.
And think of the moral dimension we shall gain. CROs are in it for the money. This makes them inherently evil. Academics on the other hand, apart from a desire to publish as much as possible, and thereby earn fame, promotion and a better living, have nothing at heart but the good of mankind. Look at the history of the Nobel Prize. Never in all the hundred years of its existence have there been any attempts to influence judges, steal credit, re-write history, unfairly upstage colleagues and so forth by any scientist anywhere in the world: truly Nobel behaviour.
There is the further advantage, the so-called Teflon factor. Mud does not stick in the academic world. For example, if (as has regrettably often been the case) academics have faked data, their co-authors are morally blameless. The University involved will set up a commission, which will roundly condemn the disgusting individual concerned. Particular stress will be laid on the awful crime he or she has committed in bringing his or her senior colleagues into disrepute by association when all they wanted to do was add another publication to their CVs. The idea that these co-authors, let alone the university, still less the whole academic community, should be tarnished by association is quite frankly ludicrous. On the other hand, if one ambitious marketing person in one company attempts to exert some undue influence on a publication, then the whole of the evil and global (ugh!) pharmaceutical industry is guilty. This is because these swine are all feeding at the same trough. Academics on the other hand are ploughing a lonely furrow, following their divine inspiration wherever it leads them (albeit frequently congregating on the author lists of publications).
And there is yet another advantage. For example a fair way to compare standards in the pharmaceutical industry and the medical press does not involve average quality. No, no! It is appropriate to take the very best examples of mega-trials published in the literature (with collaborative input from eminent statisticians) and compare these to the worst examples of abuse in the industry. The bottom of the one distribution should be compared with the top of the other. (It is an indictment of the total lack of imagination of statisticians that this procedure is found in no standard statistical textbook.)
And think of all the creativity we can let loose as soon as we analyse things the way they do in the Speculum and the APE. A favourite example of mine from the Speculum some years ago illustrates the sort of innovation in analysis we could have. Patients were treated for several months in a cross-over trial. For each patient, for several outcomes a significance test was carried out using days as independent replications to compare the two treatments. Patients were treated for several months on each arm and in fact the investigators had values of n1 and n2 in excess of 100 for each patient for each outcome. The authors had significance pouring out of their ears. (A slight criticism must be entered here. If only the authors had used hours instead of days this procedure could have been made even more efficient.) On the other hand if we look at the miserable ICHE9 guidelines that the pharmaceutical industry uses, the authors would have had to pre-specify the analysis and would certainly not have been allowed to use the one that appeared in the Speculum. They would not have been allowed to treat dependent data as independent and would have to have dealt with the multiplicity problem. Just imagine, how much more rapidly medicine would progress if we could have the editor of the Speculum run the MCA.
So I am looking forward to this brave new and pure world. I am just hoping that that other most controlled of all industries is sitting up and paying attention. The next time I fly to Paris I don't want to hear "this is your captain speaking" over the intercom. What I want to hear is "Hallo. This is Professor Norbert Know-all, Daniel Bernoulli Professor of Aeronautics at the University of Perfection-on-Smug. I shall be flying your plane tonight. You will be pleased to know that together with my colleagues we have made a few improvements to the design of the machine. The stewardesses, who are all sociology postgraduates, will shortly be coming round with some questionnaires for you to fill out. These when analysed, will appear in their PhD theses. Unfortunately this means they will not have time to serve you any food or drinks and we apologise if our pre-flight publicity has been misleading in this respect. We should be cleared for take-off shortly and you will be pleased to know that the Hillingdon Amateur Radio Association, who meet every Wednesday in the local scout hut, are in charge of traffic control at Heathrow tonight. "
STOP PRESS NEWS. Speculum wins Nobel prize .... for fiction.
Invincible Ignorance. "A term in moral theology denoting ignorance of a kind which cannot be removed by serious moral effort." The Concise Oxford Dictionary of the Christian Church.
Company Regulatory Affairs at Pannostrum Pharmaceuticals (CRAPP) do not usually feel the need to bring statisticians with them when they go to see the Pangean Commission (which I shall just refer to as the Commission from now on) at the Pangean Pharmaceutical Evaluation Agency, (which I shall just refer to as "the Agency" from now on). This is partly because the Agency (unlike another famous agency) does not employ statisticians (of which more anon) but also because CRAPP have found from bitter experience that statistical arguments are much easier to understand if you don't have statisticians to explain them. For example, statisticians will try and say things like, "a P-value of 0.04 does not mean that there only 4 chances out of a 100 that the drug does not work," when every non-statistician knows that it does. They will also baffle you by informing you that you cannot assume that things are the same because they are not significantly different. As regards the latter point, I once gave the head of CRAPP this helpful analogy by way of explanation: a person who has been acquitted of child-molesting is not automatically your first choice as babysitter. She replied that a) her children were now grown-up, b) she had always taken the greatest care in her choice of babysitter and c) if only I knew how difficult it was to raise a family while pursuing a career I would not make such hurtful remarks.
However, in some fit of madness, CRAPP decided to take me along to a recent hearing at the Agency for one of our products, Sniffgonä , which we wished to extend to use in SARN (Seasonal Allergic Runny Nose). The Agency, of course, does not employ statisticians (of which more anon) but some of the Pangean member states do and in any case it is of course the member states, or their representatives, via the Commission who decide on the fate of submissions. As regards statisticians, Hyperborea has several, Teutonea has one or two and Calcamalbion now has three or four. Admittedly, of the Tethic states only Aegea has one but all in all there must be at least a dozen statisticians working for Pangean agencies, which is an average of nearly one per member state and therefore clearly adequate. You might think, therefore, that at a meeting of the Commission you might find a statistician or two. You'll be lucky! Each member state sends two representatives and with only two to send, they are unlikely to send a statistician, assuming they have one. Beside which, the Agency doesn't employ statisticians (of which more anon) so why should the Commission send any.
This time our luck was out. On entering, I scanned the room carefully. My heart sank. No statisticians that I could recognise were to be seen.
This did not, however, stop one of the assessors from launching into a long 'statistical' criticism. In one of our trials it seems that the allergic status of patients had been established two weeks before the start of the trial. This meant, the assessor explained, that since the status of some patients may improve from time to time, the trial could have consisted of a mixture of responsive and non-responsive patients. This in turn meant that the groups could now differ randomly and the test of significance would be invalid since the results might be due to "random chance" and this could be the explanation of the observed value of P=0.0003.
I explained as tactfully as possible (tact being one of my natural virtues, along with modesty) that the whole point of a significance test was to explain to what extent the results could be explained by "random chance." In this respect the results would be no more or less valid from this trial than from any other. Indeed there were probably dozens, possibly hundreds, maybe even thousands and perhaps as many as 30,000 different ways patients in any trial could differ from group to group through genetics alone, if the biologists were to be believed* . The whole point of randomisation, I continued, was to make sure that there were only two possible explanations of any result: "random chance" or an effect of treatment. Of course, I carried on, where covariates had been measured, and they were believed to be predictive of outcome, they could be used to refine one's opinion as to the extent to which "random chance" was the explanation. However, the whole point of randomisation was that it permitted one to apply the property of the average to the individual case to the extent that the individual case could not be recognised as differing from the average. In the same way, insurance companies could validly set premiums for individuals using the experience of populations, provided that the individuals could not recognise that their risk differed from the population average.
I was rather enjoying the debate at this point, seeing it as providing me with a golden opportunity of proselytising heathens in a place in which statisticians are famously absent (of which more anon). My argument, however, was surprisingly badly received. It seems that the assessor preferred to think that the reason that I was disagreeing was not that the subject lay in my competence rather than the assessor's but simply that I was clutching at straws in a desperate attempt to defend a dossier whose fatal flaw had now been exposed. It might be the case that five out of our six trials were significant. However, only two were pivotal Phase III trials, and one of these had now been exposed as having results that might be due to "random chance", which left us with only one pivotal significant trial, which was not enough. I looked around the room to see whether this nonsense was producing the same effect on the rest of the assessors that it was on me. However, there were of course no statisticians present, the Commission taking its cue from the Agency, which does not employ any (of which more anon).
I was about to reply again, when a sharp kick on the shins from the head of CRAPP warned me that it had been decided to beat a hasty retreat. We then thanked the Agency for their trenchant criticisms, made several placatory noises and left.
On the way back to the office I expressed my dissatisfaction with the whole process to the head of CRAPP. Why, I said, (this is now the "more anon") since the whole thing was run by amateurs who clearly had no knowledge of clinical trials, did they not employ a statistician as a sort of "clerk of the court" who could give a technical opinion where technical advice was needed. She explained to me that the Agency was not involved in evaluation and therefore did not need to employ statisticians. I remarked that this was a brilliant principle. Could she tell me who, apart from statisticians, were usually involved in assessing regulatory dossiers? Why, she replied, I must know that physicians, pharmacists, pharmacokineticists and so forth were all involved in this process. In that case, I replied, since the Agency was not involved in evaluation, I assumed it took care to employ none of the above. Don't be ridiculous she replied, how could the Agency perform its job unless it employed such people and had it ever occurred to me that I had a very odd way of looking at things and a most perverse manner of expressing them?
I am sorry, I replied tactfully, it's just that I have this prejudice that when it comes to logic, people who can't count don't count.
* That is assuming that genes act singly and we don't need to worry about interactions and that a group of scientists who recently thought that there were 100,000 genes can be trusted when they now say that there are 30,000.
Next issue. The role of Astrology in evaluating medicines: Cancer and oncology, Gemini and infertility, Libra and vertigo, Aquarius and urology, Virgo and impotence...Taurus and the Agency.
An anniversary is a time to look back but also a time to look forward; and what, you are doubtless itching to know, does GMcP see when looking forward? Well here is a riddle for you. "What do lap-dancers and GMcP have in common?" Answer: they both see a lot of silicone looking forward. Yes, folks, the future of PSI is silicone.
Now I know what you're thinking: Guernsey, with his penchant for terrible puns and sarcasm (or should that be Sarkasm), has decided to talk about the breast-implants story and its implications for the pharmaceutical industry and we are now in for some terrible and tasteless puns: implants go bust, storm in a D-cup, thanks for the mammaries, and so forth. Rest assured; this article is not about that and I shan’t use any of those terrible puns: delicacy forbids. This is not to say that this is not an important topic. Nobody is immune from the cupidity of lawyers and the stupidity of juries. "Twenty years ago my client had breast implants, five years ago she developed connective tissue disease. The one clearly caused the other. We now need $10m in compensation." More than half of this, of course, will go to the shysters. Clearly if that can work for implants why not for pharmaceuticals? We should all tremble for our pensions. (Or marry a lawyer as an insurance policy.) Well, I may cover this topic at a future date (I find myself curiously attracted to it), but instead I am going to talk about that other use of silicone: to make chips for computers. Yes, GMcP is keeping abreast of all developments. I am going to talk about simulation.
Simulation, virtual drug development, in silico development: these are all crucially vital topics of the day. We are in for a wonderful new era in drug development. This was made clear to me the other day. I had an invitation from the Pannostrum marketing department to join them hear a presentation on the topic by a management consultant. It was amazing stuff. He started out with PK data from eight patients, fed in a possible PD model, pressed a button on his laptop and before you could say "modern miracle" had therapeutic outcomes for 200 patients. "So," he said, "we can now feel a lot more comfortable and confident about the effects of the drug." It is no exaggeration to say that all present were absolutely staggered.
Admittedly our reasons for amazement were somewhat different. The marketing men were amazed that information about the effects of drugs could be had so easily. I was amazed that marketing men could be had so easily. This, despite years of having had to put up with the nonsense they produce. I thought the time was ripe for a little question. "Two hundred patients is rather few for a Phase III study", I said, "would it be possible to simulate data for 1000?" The speaker shot me a look that would have curdled milk. I could see he had noticed me for the first time. I was rumbled: battered tweed jacket, tie at half-mast, very old-fashioned teeth. It was quite clear that I was quite a different proposition to the Armani suits filling the rest of the room. He’d sussed me for a statistician. "Well, of course", he added, "simulation could never replace a Phase III programme. However, it can make us more confident about what we will see in a Phase III programme." The Armani suits all nodded wisely in agreement.
Actually, I nearly agreed with him. Why? Well because simulation is just mathematics by other means and of course mathematics, or at least Statistics, is the way that we calculate the implications of the work that we have done. Simulation is just a means of doing convolutions, which is to say of performing the numerical integrations that are necessary to mix one distribution conditional on another. Why, I have used simulations myself to check theoretical results and the two have always agreed most admirably, thus proving that either my theory was correct or that I had made the same mistake in my simulation as my theory. And, of course, we even simulate to do calculations these days. The robustniks bootstrap everything, we use multiple imputation for missing data, and the way that Bayesians use simulation to do their calculations bugs everybody. No, it’s not using simulation to do integration that I object to – that is done all the time – it is using simulation for multiplication that I find objectionable. Eight patients are eight patients and so should remain.
However, if you can’t beat them, join them. And that is my advice to you all. We are all going to have to learn a lot more about in silico development. At least I know that this is true for Pannostrum Pharmaceuticals. The Marketing Department has a lot more influence than Biostatistics and we had some wonderful memo from on high only the other day about streamlining development, faster time to market, proof of concept blah, blah, blah, virtual drug development, simulation. Did I forget anything? Oh yes, pharmacogenomics and theranostics. The share price reacted very favourably. So I have enrolled on a course that is now being run within the company by the same group that provided the management consultant.
This led to a strange and disturbing conversation in the GMcP household and I should be grateful if any member of PSI, perhaps some female member with a better understanding of the thought processes of the fair sex, could explain it to me.
GMcP. Pannostrum are paying for me to attend a course to learn about simulation.
Mrs McP. People pay to learn about that? I have been doing it for years.
In the year 3535,
Ain't gonna need to tell the truth, tell no lies;
Everything you think, do, or say
Is in the pill you took today.
Zager and Evans
PSI annual conference is a great event, to be sure, but somewhat depressing for us wrinklies. Everybody else looks so young. There is scarcely anybody there who could be expected to remember that in August 1969 the song, 'In the Year 2525,' by one-blockbuster-wonders Zager and Evans reached number one in the UK charts, thus repeating the trick it had pulled off in the previous month in the US. Perhaps, at the next annual meeting, rather than dancing to 'YMCA' for the third time of the evening we could have a go at Z&E.
Now, to have a top-seller both sides of the Atlantic is something of which our marketing men are always dreaming, but it has to be admitted that even by the recent rather torpid standards of Pannostrum Pharmaceuticals, a launch data of 3535 is rather tardy and not particularly impressive to the investors. Our management is becoming increasingly anxious about performance. New ideas are what we need, apparently, at least so we were told in a memo recently from our CEO, Sir Lancelot Pastit, informing us that he had detailed his 'Millennium Task-Force' (MTF) to come up with some. Call me cynical and old-fashioned, but I thought we needed new drugs not ideas. The MTF is peopled by re-cycled marketing men, and whereas I have never denied their ability to secure old drugs and thereby benefit the Colombian economy, I have found them about as much use as a laptop in a kayak when it comes to developing new ones.
In the last GMcP I told you of my unfortunate involvement with the simulation-sellers. You would have thought that my management would have learned from experience, but blow me (to use a rather old-fashioned phrase) if I didn't find myself signed up for the theranostics presentation organized by the MTF. "Theranostics?", I hear you cry. "What can that be?" Well, let me say it could just as well have been called 'diapeutics'. It is the synergistic fusion of diagnostics and therapeutics to deliver personalized targeted medicine based on individual characteristics. Or, so it was explained to me.
Wonderful, I thought. The PK people will love this. At long last marketing is going to have to cave in and allow us to dose drugs by bodyweight which is what we have been trying to persuade them to do for years. Everybody knows that whereas with our current one-size-fits-all mentality the rugby front row don't get the full effect and the little grannies are likely to suffer overdoses. Dosing by bodyweight is logical and scientific.
No, I was told. It is far too difficult to sell drugs by bodyweight. It is too complicated for the prescribing physician and in any case the opposition with their one-dose, once-a-day alternatives would kill us dead.
"Let me get this straight", I said. "You will give patients a different pill based on their individual genetic codes but you are not prepared to give a different dose based on body-weight. You will diagnose people with gene chips but not with bathroom scales?"
"Yes, we must exploit the exciting promise of pharmacogenomics," came the reply, "and Pannostrum will be at the forefront of the new theranostic technology."
Now, if I look at the history of medicine, it seems to me that it has not been the diagnostics that have been lacking but the therapeutics. We could diagnose diabetes more than two millennia before we could treat it. My humble opinion is that we had better spend our time rather urgently at Pannostrum finding some new drugs that actually worked, rather than worrying about establishing who they worked best for. Simultaneously finding wonderful drugs and perfect patients wasn't going to make life much easier for us. (And don't think I don't realize, as a statistician, that we have to think factorially. It is not main effects we would be looking for but interactions, and in any case you can't allocate patients their genes. These are blocks not treatments.)
However, I am nothing if not positive and helpful and volunteered the following wonderful creative insight to the MTF. If Sir Lancelot doesn't nominate me for a special bonus this year, he is not the man I take him for.
This is my idea. There is an extremely important division of human beings on the basis of genetics, which has the potential to make a considerable difference, if not to the actual treatment that should be given, then to the optimal dose. Furthermore, this genetic division forms two subtypes distributed with almost equal frequency within the human population, which is the optimal situation if you wish to tailor-make therapy. (After all, if there are more than two subtypes it gets complicated and if one is very rare, it is hardly worth bothering about.) This is a major genetic difference, which has its origin in a massive chromosomal deficiency in one subtype and which leads to a considerable difference in phenotype. The medical importance of this can be judged by the fact that life expectancy at birth in the deficient subtype is several years less than in the normal form. Yet currently, for the vast majority of treatments, we take no account of this genetic phenomenon in prescribing drugs. The only bad news is that this genetic difference is rather easy to detect, so that we will have some problems patenting the diagnostic part of the therapeutic strategy. On the other hand we shall be well ahead in the therapeutic race if we start organizing our trials to look at these differences.
I think I had the MTF quite excited. What are these genetic subtypes?
"Oh that's simple", I said. "They are called men and women."
Dredging for P
I write this at the end of another festive season. So cheers! Is your glass half empty or half full? The question is not without relevance but I must remind you that patience is a virtue, Rome was not built in a day, make haste slowly and short cuts make long delays, as all of us who work in drug development surely know. You will have to wait for the relevance of this to be revealed.
Your columnist is occasionally called upon to lecture. This is not, of course, because he is believed to have anything interesting to say. I suspect the phenomenon is similar to that which affects the pop charts these days. In addition to seeing them occupied by the fit, young and beautiful, the occasional haggard has-been can be seen on television crooning (or croaking) some duet with some nubile young thing. Why people like this, I don't know. It may be nostalgia or simply charity. Sometimes I feel that I am being wheeled out (not quite literally yet) as a warning to the youngsters. 'Look what happens to you if you don't make that career move into project management or marketing - you turn into a boring old statistician.'
Occasionally, double-acts have their unintended opportunities for humour -at least of the debased sort that makes GMcP laugh. A couple of years back, when contributing to a series of statistical lectures being given to a medical audience, I was rather depressed at the prospect of having to follow a lissom young lady. She was getting a terrific reaction. All my subsequent appearance was going to do was produce a tidal wave of disappointment from the male half of the audience without any compensatory ripple of appreciation from the ladies.
However, serendipity, is the greatest factor in any success, as anybody who has made any survey of the performances and abilities of pharmaceutical CEOs must surely conclude. I was offered an unexpected gift. The young lady explained that the concept of the 5% significance level, like all great ideas, occurred in the bath. Before Fisher and Yates, tables were tabulated in terms of the statistic rather than the significance level. It was for copyright reasons that Fisher decided on reverse tabulation (like anti-logs), using significance levels rather than values of the statistic. But what level to use? This is where the bath comes in. Fisher spotted his five toes - et voila - 5%.
This, of course, was a godsend to yours truly, who opened his lecture by explaining that he was tempted to provide the missing explanation as to how Fisher, in his bath, came upon the idea of the 1% level of significance, but that good taste had prevailed. From that moment onwards, the success of the lecture was assured. And this story brings me to the point, so to speak, of this piece. In fact, the purpose of this rambling, is precisely to discuss significance levels and also P-values.
Now, I don't know how things are where you work, but here at Pannostrum Pharmaceuticals we occasionally get disappointing results. In fact, we often get poor results, but since experience has taught us to expect them, we are only occasionally disappointed. By a poor result, of course, what I mean is P > 0.05. However, to explain what our medical advisors do with such results I am going to have to make yet another diversion. This time to recount an old joke by Bennet Cerf (slightly adapted) on the two brothers: a pessimist (surely a statistician) and an optimist (who must have been a medical advisor). On opening their Christmas presents, the statistician, a great connoisseur of malt whisky, is glum to find a crate of rare cask-strength Glen Kinchie in numbered bottles. 'What a terrible hangover I shall have', he remarks. The medic, equally a fan of malt whisky, receives a bottle of Irn Bru, famed throughout Caledonia as a cure for hangovers. 'Goody, goody. Somebody has bought me a crate of malt.'
So what do we do at Pannostrum? We have two standard devices. The first is, 'the one that got away'. Its presence is marked by statements of this sort, 'unfortunately the result was not quite significant (P = 0.11) due to the trial being too small'. This is what we employ medical advisors for, of course, to know that the reason the result is not significant is not because the drug doesn't work but simply because the trial isn't large enough. Who needs patients. A medical advisor is worth a hundred of them: in fact, worth more than a hundred patients because with the patients you never know what the results might show. Before you sneer, however, ask yourself if you have ever fallen into this trap: you have a meta-analysis of results to date which show significance and confidently expect that when the next trial comes in the overall analysis will be even more significant.
The second is, 'the cheque is in the post'. For this we have statements of the form, 'the result showed a trend towards significance, p=0.09'. This is really rather similar to the previous one. A P-value, despite the name, is a position not a motion. Is the glass half empty or half full? 'The wind bloweth where it listeth, and thou hearest the sound thereof, but canst not tell whence it cometh, and wither it goeth'. But as far as our medical advisors are concerned, P-values are a northerly gale and the direction they are heading is 'down'.
I upset a medical advisor the other day by slipping in, just for a joke, the following statement. 'The result should be treated with caution, as it shows a trend towards non-significance, p=0.03'. 'Why did you do this, Guernsey?,' she asked. 'Symmetry and logic,' I replied. 'If 0.05 is a magic boundary and results such as 0.09 are trending to cross it into significance, then to keep the thermodynamic balance the results below it must be trending into non-significance. By the way,' I added, 'why is it that when we have P=0.09 for a side-effect, we never describe it as, "trending towards significance"? .' 'You have a very perverse and unhelpful way of looking at things,' was the reply. I had failed to appreciate that medical advisors are to P-values what Maxwell's demon is to thermodynamics
(see http://www.maxwellian.demon.co.uk/name.html for an explanation of the latter). They selectively open the little gate of significance to let the important results through.
Is the glass half empty or half full? I find it difficult to say. If the wine is being poured into it you might have a case for saying it is half full. When the wine is being drunk, it is clearly half empty. Most of my glasses are (very briefly) half empty. Which reminds me. I got a rather fine bottle of The Glenlivet for Christmas. Are any of you out there working on a cure for hangovers?
"You cannot hope
to bribe or twist
thank God! the
But, seeing what
the man will do
no occasion to"
Benjamin Franklin once said, "But in this world nothing can be said to be certain, except death and taxes" and somewhat later, John Maynard Keynes remarked that "in the long run we are all dead". These sayings, you may think, are examples of that phenomenon whereby if you are famous or infamous enough the blindingly obvious or mundane is put down as a witty or profound remark and preserved for posterity. "I think I could eat one of Bellamy's veal pies", Pitt; "we had better wait and see", Asquith; "This will never do", Francis Lord Jeffrey and so forth. However, I am going to demonstrate to you that far from its being accepted as obvious, many of our fellow citizens, that is to say the vast majority who suffer from the grave disability of not being statisticians, deny this truth. I am referring here to the death bit, not the taxes. Whenever I view my tax bill, my bile rises and I come to the jaundiced conclusion (a case of hepato-fiscal-toxicity) that Benjamin Franklin was not right after all and that whole swathes of society are not paying taxes at all: not only living at the expense of others but dying at their expense too. However, I mustn't get started on the subject of taxes. Let me get back to death.
A consequence of the Keynes dictum is that if you study enough patients long enough some of them will die almost surely, to use probabilistic jargon. A corollary is that just because some patients have died who took a drug, it doesn't mean that it's the drug what did it. However, try telling that to the journalists of London's famous street Magazine, CAPITAL LETTERS which only a few years back ran a story under the heading "STIFF STIFFS" as follows. "They thought that Fullagro was a wonder cure for impotence but now only a year after launch, 31 users are dead. Who is to blame?" Who indeed? I blame it on the education system in this country, which does not stress clearly enough the importance of thought. I wrote in protest to CAPITAL LETTERS pointing out that several former readers of that journal must now be dead and asking if they would be withdrawing the magazine pending an investigation. What was missing, if we were to make any sense of these figures, was an estimate of exposure. I got a reply from the journalist concerned saying there was no need to be patronising and he was well aware of the argument that so many thousands died every year who had drunk cups of tea. To which I would say, being aware is one thing and drawing the consequences is another.
In fact, there have now been a number of formal studies of Fullagro and they have come to the rather baffling conclusion that there are fewer deaths than expected. This, of course, is a challenge to the marketing department of the company concerned, as it is a reversal of the usual state of affairs. Quality of life is generally seen as the last ditch face-saver you resort to when you have not managed to improve survival after all. Here we had a drug that was deigned to boost quality of life but which is actually causing men to live longer. Perhaps it is giving them something to live for.
To return to Keynes's dictum, however, it would be nice to think that it is only those who work outside the pharmaceutical industry who cannot see the implications. Unfortunately, this is to reckon without the marketing department of Pannostrum Pharmaceuticals, a group of individuals who seem to have been born without the organ of logic. A few years back we registered a treatment in asthma which now, courtesy of all their creativity, is sold under the brand name Zeffer. "Zeffer: it's a breath of fresh air."; "Get your second wind with Zeffer"; "Zeffer: a blow against asthma," and so forth. This is not the point of my story. The point is that as the drug was launched I had to sign off on a huge uncontrolled (ugh!!!) phase IV study. "What is the point of this piece of nonsense?," I enquired diplomatically. "It is to give a number of key physicians vital experience with this product," they replied disingenuously. "Have we got our story ready for the deaths that will occur?," I asked. "There you go again. Pessimistic as usual. You know that the phase III trials showed no excess risk." "That's not the point, " I replied, "Zeffer is not an elixir of life". I did not convince them. The trial went ahead. It turned out that, just like the rest of mankind, patients on Zeffer were not immortal. My prediction was proved correct. The deaths arrived and we were left scrabbling to explain to the health authorities that the rate was no higher than expected.
But are we statisticians guiltless of failing to think this issue through? I think not. Just because drugs can't guarantee immortality doesn't mean that any safety record at all is acceptable. We need to be studying the problem actively. Do you think that the way we summarise safety data-bases is useful: all tables and no analysis? Medium and long-term trials flung together? Controlled studies of different durations pooled with their open follow-ups? Is that what we got our qualifications in statistics to do?
There is a bafflingly illogical point of view, to which I have sometimes heard even statisticians subscribe, that since safety data are rarely in the form of a targeted single variable for a controlled clinical trial, they are therefore beyond statistical analysis, as if statistical sophistication were necessary to interpret simple matters but de trop for complex issues. Au contraire. Jimmy Savage once said that a statistical model should be as big as an elephant. Guernsey McPearson once said that any damn fool can analyse a randomised clinical trial and frequently does. When the situation is complex, formal analysis is precisely what is needed: put down the pea-shooter and get out the elephant gun. So here is my advice to any head of statistics who hears any of his statisticians defend the point of view that because the data are not in the form produced by a planned RCT, they are only suitable for tabulation and not statistical analysis.
Send them to work for marketing.
Or better still sack them. Let's have them on the street selling CAPITAL LETTERS.
See SPIN passim, in particular "Guernsey McPearson" (1999) Hard Times and Stiff Competition, SPIN, March 1999, 9.