Structured diagnostic interviews are the gold standard, but their reliability varies widely by diagnosis
STUDY TYPE: Systematic review and meta-analysis
FUNDING: Independent
Background
Structured diagnostic interviews include the Structured Clinical Interview for DSM (SCID) and the Mini-International Neuropsychiatric Interview (MINI). They are the gold standard for diagnosis, and nearly all the research on psychopharmacology are based on these tools. This meta-analysis asked how reliable they are when the same patient is interviewed twice.
The Study
- 57 studies, 46 included in the meta-analysis, covering 8,146 adults assessed with 17 different structured interviews.
- Each study gave patients the same interview twice, by different interviewers, to measure test-retest reliability using Cohen’s kappa (κ), where 0 = chance agreement and 1 = perfect agreement.
- Studies spanned 26 countries and four decades of diagnostic criteria, from DSM-III through DSM-5 and ICD-10.
Results
Overall test-retest reliability landed at κ = 0.69, which falls in the “substantial” range but with enormous variability across disorders.
Substance use disorders scored higher (κ = 0.72) than mental disorders (κ = 0.65). Among mental disorders, bipolar disorder had the best reliability (κ = 0.74) and nonaffective psychoses the worst (κ = 0.55). Among substance use disorders, opioid use disorder topped the list (κ = 0.81) and hallucinogen use disorder came in lowest (κ = 0.59).
None of the methodological quality factors — sample size, retest interval, interviewer blinding — explained the variability. The one exception: for substance use disorders, newer diagnostic criteria (DSM-III-R, DSM-IV, ICD-10) outperformed the older DSM-III.
Practice Implications
- Behavioral criteria (eg, addictions) are more reliable than those that are subjective or require interpretive judgment (eg, psychosis).
- These instruments need to be augmented by collateral history, longitudinal observation, treatment response, mental status, and associated signs. The Bipolarity Index does that for mood disorders.
- Despite these problems, structured interviews are more accurate than unstructured ones, and are sadly underused in practice.
- I’ve used them routinely for two decades, and created a free version (and am computerizing it). Use them, distribute them, no permission needed.
—Chris Aiken, MD
Director, Psych Partners
Editor in Chief, Carlat Psychiatry Report







