Panic at
Panacea. It seems that Sir Lancelot Pastit’s much-vaunted pipeline is but a
pipedream. We have products failing left right and centre and the fact that we
have had to pull Redybrex® from the market has sent
the share price into freefall. Panacea Investor Supporter Section (a group in
our marketing department) together with Public Outreach Operations and
Relations (another such group; the two always seem to go together) have been
working overtime trying to play down the damage and talk up our prospects.
Personally, I think the most positive contribution we could get these people to
make to the company would be to have them resign and save not only their
salaries but also the mayhem they cause. What we need to do is find new drugs
that work, not find new things to say about drugs that don’t.
However,
imagine my surprise, when summoned to a meeting with the marketing groups in
question to discuss what to do with our latest failure (the CLOT trial of ThrombgonÒ), to find the talk of statistics.
And I don’t mean statistics of sales we might make if only the latest hoped for
block-buster would work. I mean statistics of clinical trials and, not just
simple stuff like means and medians but, of all things, power.
Now I occasionally
get called upon to give in-house courses in which I am supposed to explain
statistics to the numerically challenged, which is just about everybody who
works for Panacea with the exception of some
members of the statistics department. We have a go at explaining P-values,
confidence limits, that sort of thing, but the one they have most difficulty
with is power. This is hardly surprising. Some of the stats department seem to
have some problems with it too. There was a time for example, when every single
report we got from our office in Medicine Springs would include, for any
non-significant result, a retrospective power calculation informing us what the
probability was that we would have failed to find a difference between
treatments if the difference was exactly that which caused us to fail to find
it. This produced an extraordinary sequence of
reports regarding failed trials in which it turned out that the power
had always been less (often much less) than 50%. In fact, of course, 50% or
indeed any percent is a gross overstatement. Fed up with this idiocy, I
actually proposed the following law, which I modestly entitled McPearson’s Law of Power. The probability of success for any trial for which a retrospective
power calculation has been calculated is zero. Think about it and you will
see that it is true.
So anyway, to return to the meeting. This was the usual weekly marketing strategy powwow,
with Thrombgon item number 3 on the agenda. I was not
required for the whole meeting but required to be on call to turn up when they
were ready. As luck would have it the Redybrex
fiasco, which was item 2, dragged on and on. I was scheduled for
I pride
myself on my punctuality, so it put me very much on the back foot to find a
room full of suits waiting with some irritation for my arrival. Rod ‘Blast’
Furnace, (rumour has it that the sobriquet has as much to do with his
consumption of ‘smokeless fuel’ as it does with his surname) a member of Public
Outreach Operations and Relations seemed to be in the middle of a presentation,
since he was standing by the screen.
It also
didn’t help my sang-froid that as I arrived Dr Angina Cutter (see SPIN passim), the project leader, was
sitting rather cosily next to Clive Viper a member of Panacea Investor Support
Section, and that they were not talking about P-values, about which that group,
in my opinion is naturally qualified to talk, but also power on the basis of a
slide that Furnace was projecting.
At least, I
thought it was a slide when I first
saw it, thus is the power of prior prejudice. I should have been warned,
however, by the fact that there were no bullet points,
no graphs divided using a vertical and a horizontal line into four regions, no
bullshit bingo phrases (pushing the envelope, thinking outside the box etc.)
and no graphs with misleading axes, just numbers. Imagine my horror when I
realised that I was looking at a live projection of a calculation using
N-Power®, a nice piece of software but unfortunately, so easy to use that any
idiot can calculate a power with it and frequently does. On this occasion, the
idiot in question was Furnace.
On seeing
me enter, Angina gave me the benefit of one of her sweetest smiles. ‘Ah,
An
aggressive and unpleasant voice cut in. ‘Yes. Your power is too low, McPearson. It’s only just over 20%.’ This was Viper
speaking, a real snake in the grass if ever there was one, although with the
sort of lifestyle he appeared to aspire to it wouldn’t surprise me if there was
often a lot of grass in this snake. I turned to the screen, which was
projecting a table that looked something like this.
|
|
|
|
Significance level |
0.050 |
|
1 or 2 sided test? |
2 |
|
Control
proportion, |
0.080 |
|
Test
proportion, |
0.070 |
|
Power ( % ) |
22 |
|
n per group |
2000 |
‘And where
exactly did you get these from?’ I said. ‘Oh,’ said Angina, ‘this is a
wonderful idea of Clive’s’, and she turned to gaze fondly at him. ‘And mine,’
added Furnace, with some irritation. ‘Oh yes of course. They came up with it
together. It’s awfully clever. I can’t think why we’ve never used it. They just
put the figures from CLOT into the power software to see what the power is. You
see it’s too low. The reason the trial failed is that the power is too low.’
‘Most
interesting,’ I said. ‘You used the sample size and the observed proportions
and then calculated the power. These are the data,’ I consulted my notes, ‘that
gave us a P-value of 0.25. There were 140 DVTs in the
Thrombgon group and 160 in the placebo group. Am I
right?’
‘Absolutely
right,’ said Viper. ‘You screwed up McPearson. The
power is too low. That’s the reason the trial failed.’
‘Let me
understand this,’ I replied, ‘you would accept the negative result if only the
power were higher.’
‘Yes, but
the power’s too low. The trial is useless.’
‘So your
position is that the higher the power, the more inclined you are to believe the
negative result.’
‘That’s
right,’ said Furnace. ‘Indubitably,’ said Viper. ‘But surely that’s reasonable?’ said Cutter.
Speaking of
power, I had powered up my laptop by now and had been playing around with some
figures. ‘Well let’s see’, I said, ‘what happens to your nice little
calculation if we keep all the parameters the same except the Thrombgon proportion and make that 0.64.’
‘Easy peasy,’ said Furnace, and produced the following table,
which he projected on the screen.
|
|
|
|
Test significance
level |
0.050 |
|
1 or 2 sided test? |
2 |
|
Control
proportion |
0.080 |
|
Test
proportion |
0.064 |
|
Power ( % ) |
49 |
|
n per group |
2000 |
‘Very
interesting’, I said, ‘let me calculate what the P-value would be with the
corresponding figures of 160 DVTs under placebo and
128 under Thrombgon using this handy software I have
here.’ I was referring to that well known program for calculating significance
for exact tests, P-PreciseÒ. ‘Well fancy that,’ I said, ‘it
seems that the P-value is 0.058. I believe that this is what Dr Cutter would
describe as, “a trend towards significance”’.
‘So?’, said Viper, ‘Your point is?,’ said Furnace. ‘I do hope
that you’re not being negative.’ said Angina, ‘I think that these power
calculations are very helpful.’
‘Well’, I
said, ‘let me summarise. The smaller the P-value the more credence we give to
the possibility that the treatments are not equal. This habit is certainly not
without its critics but I can’t ever recall any of the wonderful medical
scientists we have working for Panacea nor any of the inventive market,’ hear I
paused at somewhat of a loss as to what to say next, ‘ears’ I added, ‘having
claimed the contrary. On the other hand, I am led to believe that if a trial is
negative, you are more inclined to believe the result if the retrospective
power is high. However, there seems to be a contradiction if higher power means
equivalence since the case with the lower P-value has the higher power.’ (Note from the Editor:. McPearson’s
argument here is strangely reminiscent of a fine paper in The American Statistician1.)
‘So what,’
said Viper. ‘It’s non-significant, see. It’s amongst
the trials that are not significant that you have to compare power.’
‘So what
sort of retrospective power do you find acceptable?’
‘Fancy
asking us that?’ crowed Furnace, ‘Yes,’ added Viper, ‘aren’t you always banging
on about how we need 80% power?’
‘So would
you please type in a value of 0.057 for the Thrombgon
group?’
‘Where did
that come from?’ said Furnace, typing in the figure and obtaining the answer
80%. ‘Yes where?’ added Viper. ‘Gosh that sounds awfully familiar,’ said Angina.
‘It
should.’ I added, ‘It’s the value you had me write in the protocol after
lengthy discussion with the Marketing Department. It just so happens that the
placebo rate is just as we anticipated but unfortunately the Thrombgon rate is not. However, if the observed proportion
in the Thrombgon group had been equal to 0.057 the P-value would actually have been 0.005.
In fact, you can’t have a retrospective one-sided power of greater than 50% if
the result is not significant. ’
‘So what’s
your explanation for the CLOT trial?’ said Viper, ‘Yes, what?’ added Furnace.
‘It is rather baffling,’ said
Angina.’
‘No,’ I
replied, ‘it’s actually very simple and not particular surprising. The drug is quite
possibly acronymical.’
‘Acronymical?’
‘Yes,’ I
replied ‘acronymical in the sense that it is clearly
perfectly suited to the departments of Panacea Investor Supporter Section and
Public Outreach Operations and Relations. Or, to put it
another way. “Why is Thrombgon like Panacea
Marketing?”’
“Why?” they
all said together.
I summoned up
every last drop of scorn I could muster from my not inconsiderable reserves and
said, ‘because it hardly works at all’.
Reference
1. Hoenig JM, Heisey DM. The abuse of power: The pervasive
fallacy of power calculations for data analysis. American Statistician 2001;55(1):19-24.
Return to Guernsey McPearson Prose
Return to Guernsey McPearson Homepage