The following imaginary case vignette helps us examine the clinical relevance of the major findings of the STAR-D trial.
The patient was a 46-year old female epidemiologist, who was referred to me by her primary care physician for symptoms of moderate depression.
Pt.: I had asked my doctor about the new findings from the NIMH depression study. He said it was complicated, and suggested that I bring my questions to you.
Dr. Carlat: You’re probably talking about the STAR-D trial?
Pt.: That’s the one. It’s billed in NIMH press releases as “the nation’s largest clinical trial for depression.” I’ve heard that most of the results are out, and I was hoping that you could help me understand them. What’s the best treatment for me, according to the study?
Dr. Carlat: Let’s start at the beginning. This was a $35 million study, funded entirely by taxpayers, with no drug industry money. In the first step of the trial, 2876 patients were started on Celexa, and after about 7 weeks of treatment on an average dose of 41.7 mg/day, 790 of those patients, or 28%, got well (Trivedi et al, Am J Psychiatry 2006;163:28-40).
Pt.: You mean I only have a 28% chance of improving after almost 2 months on an antidepressant? That doesn’t sound very good.
Dr. Carlat: Not so fast. Unlike most antidepressant studies in the past, this one focused on remission, meaning the virtual absence of any depressive symptom. Most other studies have been content to look at response, defined as a 50% reduction in symptoms.
Pt.: All right, but I’ve been so miserable lately that a 50% improvement sounds just fine to me. What were the response rates in that study?
Dr. Carlat: In the Celexa arm, the response rate was 47%.
Pt.: That sounds better, but still not great. Is this the best response rate psychiatrists have to offer – less than 50%?
Dr. Carlat: That depends on the drug, the population, and the design of the study. The STAR-D study was an open-label study, meaning that there was no placebo control and all patients knew what pill they were taking. These kinds of studies typically yield very high response rates, in the range of 60%-70%. But the STAR-D study enrolled patients who are more severely ill than in most studies. The typical STAR-D patient had continuous depressive symptoms for at least 2 years, and six prior episodes of depression.
Pt.: So do these results apply to me? This is the first bout I’ve had with depression.
Dr. Carlat: Remission rates were highest in patients like you – that is, well-educated females who didn’t have medical problems or other psychiatric problems.
Pt.: What if you start me on Celexa and I don’t improve? What should we do then, according to STAR-D?
Dr. Carlat: One of the main goals of STAR-D was to answer that very question. Patients who failed the initial Celexa trial were assigned to different treatment strategies, including switching from Celexa to a different antidepressant, augmenting the Celexa with a second drug, or receiving cognitive behavioral therapy.
Pt.: Sounds like a good study. Now psychiatrists will know which treatment strategy is the best one to turn to when patients fail their first medication.
Dr. Carlat: Unfortunately, the study didn’t end up providing any answers to that question. The reason is that patients were not randomly assigned to different treatments. Instead, they were allowed to choose which type of treatment they wanted to receive. Because of this, we can’t really compare results in the different treatment arms.
Pt.: That doesn’t sound like a very smart way to do research. Why didn’t they randomize? Dr. Carlat: Mainly because they were afraid that too many patients would drop out of the study if they were forced to be randomly assigned. And if too many people drop out of a study, you have a couple of major problems. First, whatever patients are left in the study may not be very representative of the the patients we see in our practices, and second, you may not end up with enough statistical power to find differences between treatments.
Pt.: I guess I follow you. But by not randomizing, weren’t they guaranteeing that they wouldn’t be able to compare the treatments?
Dr. Carlat: They were, and I’m sure some of the researchers are regretting that decision. But it may not have been a total loss. Within each of the treatment arms, patients were randomized to a specific treatment. Thus, patients who chose to switch to another antidepressant were randomly assigned to treatment with Effexor XR (average dose 194 mg/day), Wellbutrin SR (283 mg/day), or Zoloft (135 mg/day).
Pt.: So there was a randomized doubleblind component to STAR-D!
Dr. Carlat: Randomized, yes, but not double blind. Patients and their doctors knew what medications they were assigned to. This is a problem, because we don’t know how much of the response to these meds was due to the actual medication vs. positive or negative expectations on the part of patients or their treaters.
Pt.: I assume you’re referring to the placebo effect– that “extra” benefit that even a sugar pill provides if people believe in it. But could the placebo effect play a big enough role here to significantly affect the response rates?
Dr. Carlat: Unfortunately, yes. In one study that looked at all antidepressant studies submitted to the FDA from 1987 to 1997, the placebo effect was shown to account for 75% of all improvement on active treatment (Khan et al, Arch Gen Psychiatry 2000;57:311-317). That’s why it’s so important for studies to incorporate a placebo control.
Pt.: I understand. Still, just out of curiosity, which of the “switch to” treatments did the best?
Dr. Carlat: Effexor XR did the best, with a 25% remission rate, followed by Wellbutrin SR (21%) and Zoloft (18%). But these differences were not statistically significant, and since there was no placebo comparison, we don’t know whether patients would have done just as well if they had been kept on Celexa for an extra few weeks. And while there was a “signal” that Effexor was superior, this slight advantage may have been due entirely to higher expectations, since Effexor already had a reputation in psychiatric circles as being more effective than SSRIs.
Pt.: So what’s the bottom line for a patient like me? If I don’t end up responding to an SSRI, which drug should I switch to?
Dr. Carlat: Unfortunately, because of the way STAR-D was designed, it provides us no help at all in answering that question.
Pt.: That’s frustrating. But what about the augmentation arm?
Dr. Carlat: Patients who decided to stay on their Celexa were randomly assigned to augmentation with either BuSpar (average dose, 41 mg/day) or Wellbutrin SR (267 mg/day).
Pt: Let me guess: this was open label, and there was no placebo group?
Dr. Carlat: It was open-label, but in this case, the researchers inadvertently included a kind of placebo treatment: BuSpar. In all three prior placebo-controlled trials of BuSpar augmentation of SSRIs, BuSpar has never done better than placebo (J Clin Psychiatry. 1998 Dec;59(12):664-8; J Clin Psychiatry. 2001 Jun;62(6):448-52; J Affect Disord. 2003 Sep;76(1-3):223-7). So in essence, BuSpar acted as a placebo control for Wellbutrin augmentation. Pt.: And how did Wellbutrin augmentation do?
Dr. Carlat: No better than BuSpar/placebo. They both produced 30% remission rates. Nonetheless, we can’t really interpret this as implying that Wellbutrin is ineffective for augmentation, because, like the rest of STAR-D, this arm was not double-blinded. High or low expectations may have significantly altered remission rates on either treatment, in an unpredictable way. This means that these augmentation results, like the switch results, provide no guidance to clinicians.
Pt.: But what about the rest of the study?
Dr. Carlat: Patients who did not come to remission in any step could go into additional trials. But again, all of these minitrials were open label, and none included a placebo group, so none of the data resulting from them is of any clear use to clinicians. Here are the numbers: For Step 3: Switch to Remeron (12.3% remission rate) vs. switch to Nortriptyline (19.8%); Lithium augmentation (16%) vs. thyroid augmentation (25%). And for Step 4: Switch to Parnate (7%) vs. switch to Effexor + Remeron (14%).
Pt.: I can see that you’re not very impressed with the STAR-D results. But the latest press release from the American Psychiatric Association was much more positive. To quote it: “Results indicate that 67 percent of patients who complete from one to four treatment steps can reach remission.”
Dr. Carlat: Yes, the idea that there was a 67% “cumulative remission rate” was reported in the most recent paper on STAR-D, which was a summary analysis of the entire trial (Rush et al, Am J Psychiatry 2006;163:1905-1917). This was an example of creative statistics. For example, in prior papers, the primary outcome measure was reported as the industry – standard Hamilton Depression Scale (HamD), whereas this analysis used only the QIDS-SR16 (self report), which consistently yielded higher remission rates than the Hamilton. Furthermore, this analysis restores 795 patients who had originally been deemed ineligible because their depression was too mild (HamD scores < 14). By doing this, they enriched the sample with patients who were already very close to remission, thereby inflating apparent remission rates.
Pt.: So where does this leave us?
Dr. Carlat: Pretty much where we were before STAR-D: Start with an SSRI, hope it works, and then move on to whatever strategy we prefer, based on our clinical experience and our understanding of the literature.