Research Methodology in ICU: Study Design, Statistics, and Critical Appraisal
Study Design Hierarchy: RCTs provide highest internal validity for intervention effects; observational studies (cohor... CICM Second Part Written, CICM Secon
Clinical board
A visual summary of the highest-yield teaching signals on this page.
Urgent signals
Safety-critical features pulled from the topic metadata.
- Never rely on p-values alone; consider clinical significance and confidence intervals
- Surrogate endpoints may not translate to patient-centered outcomes
- External validity limits applicability of single-center trials
- Publication bias overestimates treatment effects in meta-analyses
Exam focus
Current exam surfaces linked to this topic.
- CICM Second Part Written
- CICM Second Part Hot Case
- CICM Second Part Viva
Editorial and exam context
Research Methodology in ICU: Study Design, Statistics, and Critical Appraisal
Quick Answer
Research Methodology in ICU encompasses the scientific framework for designing, conducting, analyzing, and interpreting clinical research in the intensive care setting. Understanding research methodology is essential for evidence-based practice, critical appraisal of the literature, and contribution to the ICU evidence base.
Key Concepts:
- Study Design Hierarchy: RCTs > cohort studies > case-control studies > case series > expert opinion
- Bias Prevention: Randomization, blinding, allocation concealment, intention-to-treat analysis
- Statistical Interpretation: p-values, confidence intervals, NNT/NNH, power, sample size
- Evidence Synthesis: Systematic reviews, meta-analyses, GRADE assessment
- Quality Assessment: Cochrane risk of bias tool, GRADE certainty ratings
Critical Skills:
- Recognize study design and its limitations
- Identify sources of bias
- Interpret statistical results in clinical context
- Apply evidence to individual patient care
- Contribute to ICU research through ANZICS-CORE CTG network
ICU-Specific Challenges: High mortality endpoints, heterogeneous populations, surrogate outcome validity, ethical constraints in emergency research, cluster randomization for unit-level interventions.
CICM Exam Focus
What Examiners Expect
Second Part Written (SAQ):
Common SAQ stems:
- "Critically appraise the following RCT abstract. List the strengths and limitations of this study design."
- "A new trial shows p=0.04 for mortality reduction. Discuss how you would interpret this result and decide whether to change your practice."
- "Compare and contrast randomized controlled trials and observational studies for evaluating ICU interventions."
- "Describe the components of the GRADE system for assessing quality of evidence."
- "Outline the statistical considerations for planning an ICU clinical trial."
Expected depth:
- Know all major study designs and their appropriate applications
- Understand bias types and mitigation strategies
- Explain statistical concepts (power, sample size, NNT, CI interpretation)
- Critically appraise landmark ICU trials
- Understand ANZICS-CORE CTG role in Australian/NZ research
Second Part Hot Case:
Typical presentations:
- Patient with condition where conflicting evidence exists (e.g., steroids in septic shock)
- Discussion of why a treatment is or is not evidence-based
- Family asking about experimental or unproven therapies
Examiners assess:
- Ability to cite evidence appropriately
- Understanding of evidence strength and limitations
- Communication of uncertainty to patients/families
- Application of evidence to individual patient context
Second Part Viva:
Expected discussion areas:
- Study design hierarchy and when each is appropriate
- Bias identification and prevention
- Sample size calculation principles
- Interpreting forest plots and meta-analyses
- GRADE system for evidence quality
- Adaptive trial designs
- Cluster randomization for ICU-level interventions
- Ethical issues in ICU research (consent, emergency research)
Examiner expectations:
- Sophisticated understanding of evidence evaluation
- Ability to critically appraise published literature
- Knowledge of Australian/NZ contribution to ICU evidence base
- Understanding of research governance and ethics
- Appreciation of statistical vs clinical significance
Common Mistakes
- Confusing statistical significance with clinical importance
- Not recognizing selection bias in observational studies
- Misinterpreting confidence intervals as probability statements
- Failing to consider external validity when applying trial results
- Not knowing key Australian/NZ trials (SAFE, NICE-SUGAR, ARISE, CHEST)
- Overinterpreting subgroup analyses
Key Points
Must-Know Facts
-
Study Design Hierarchy: RCTs provide highest internal validity for intervention effects; observational studies (cohort, case-control) useful for prognosis, harm, and when RCTs impractical; each design has specific bias profiles (PMID: 11034753)
-
Randomization Purpose: Eliminates selection bias and balances known AND unknown confounders between groups; block randomization maintains equal group sizes; stratified randomization ensures balance for key prognostic variables (PMID: 10584742)
-
Blinding Levels: Single-blind (participants), double-blind (participants + clinicians), triple-blind (participants + clinicians + outcome assessors); prevents performance and detection bias; often impossible in ICU (ventilation strategies, procedures) (PMID: 17336618)
-
Statistical Significance: p-value is probability of observing result at least as extreme as observed if null hypothesis true; p
< 0.05is arbitrary threshold; does NOT indicate clinical importance or probability that treatment works (PMID: 27209009) -
Confidence Intervals: 95% CI means if study repeated infinitely, 95% of calculated CIs would contain true effect; width indicates precision; clinically interpret whether CI excludes clinically important effects (PMID: 15951599)
-
NNT/NNH: Number Needed to Treat = 1/ARR; clinically meaningful; always report with time frame and baseline risk; NNT of 20 acceptable for mortality, unacceptable for minor symptom relief (PMID: 7769685)
-
Sample Size and Power: Power = probability of detecting true effect if one exists; typically target 80-90%; sample size depends on expected effect size, variability, and alpha level; underpowered studies risk false negatives (PMID: 8683809)
-
Meta-Analysis: Quantitatively combines multiple studies; increases statistical power; identifies heterogeneity (I² statistic); forest plots display individual and pooled effects; vulnerable to publication bias (PMID: 21478451)
-
GRADE System: Certainty ratings (High, Moderate, Low, Very Low) consider: risk of bias, inconsistency, indirectness, imprecision, publication bias; RCTs start high, observational starts low (PMID: 18565887)
-
ANZICS-CORE CTG: Australia/NZ's major ICU research network; conducts large pragmatic RCTs (SAFE, NICE-SUGAR, ARISE, CHEST, TRANSFUSE, SPICE III); embedded research model with high recruitment and protocol adherence (PMID: 27849418)
Memory Aids
Study Design Hierarchy Mnemonic (MRCCE):
- M: Meta-analyses of RCTs (highest evidence)
- R: Randomized Controlled Trials
- C: Cohort studies (prospective)
- C: Case-control studies
- E: Expert opinion/case series (lowest evidence)
Bias Types Mnemonic (SPADE):
- S: Selection bias (groups differ at baseline)
- P: Performance bias (unequal treatment)
- A: Attrition bias (differential dropout)
- D: Detection bias (outcome assessment differs)
- E: Evaluation/reporting bias (selective outcome reporting)
GRADE Downgrade Factors (RIIPP):
- R: Risk of bias
- I: Inconsistency (heterogeneity)
- I: Indirectness (not applicable)
- P: Imprecision (wide CI)
- P: Publication bias
Forest Plot Elements (POINT):
- P: Point estimate (square)
- O: Overall effect (diamond)
- I: Interval (horizontal line = 95% CI)
- N: Null effect (vertical line at 1.0 for RR or 0 for MD)
- T: Total (sample sizes on right)
Definition and Epidemiology
Definition
Research Methodology refers to the systematic framework of principles, procedures, and techniques used to design, conduct, analyze, and interpret clinical research studies. In the ICU context, this encompasses the unique challenges of conducting research in critically ill populations.
Key Terminology:
| Term | Definition |
|---|---|
| Internal Validity | Extent to which study results reflect true relationship between intervention and outcome in study population |
| External Validity (Generalizability) | Extent to which study results can be applied to other populations and settings |
| Efficacy | Effect of intervention under ideal, controlled conditions |
| Effectiveness | Effect of intervention in real-world clinical practice |
| Precision | Inverse of random error; reflects sample size; measured by confidence interval width |
| Bias | Systematic error that distorts study results away from true effect |
| Confounding | Distortion of treatment effect by a third variable associated with both exposure and outcome |
Research Question Frameworks:
| Framework | Components | Use |
|---|---|---|
| PICO | Population, Intervention, Comparator, Outcome | Therapeutic questions |
| PECO | Population, Exposure, Comparator, Outcome | Observational studies |
| SPIDER | Sample, Phenomenon of Interest, Design, Evaluation, Research type | Qualitative research |
Epidemiology of ICU Research
Global ICU Research Landscape:
- Published RCTs: >500 ICU-focused RCTs published annually (PMID: 28267352)
- Meta-analyses: >100 ICU-relevant systematic reviews/meta-analyses per year
- Practice-changing trials: Approximately 1-2 major trials per year significantly alter ICU practice
- Research waste: Estimated 50-85% of biomedical research is "wasted" due to poor design, conduct, or reporting (PMID: 24411643)
Australian/NZ ICU Research:
ANZICS Clinical Trials Group (CTG) represents one of the world's most productive ICU research collaborations:
- Established: 1994 as collaborative research network
- Participating sites: >60 ICUs across Australia and New Zealand
- Major RCTs completed: >15 large pragmatic trials
- Patient enrollment: >50,000 patients enrolled in RCTs (PMID: 27849418)
- NHMRC funding: >$100 million in competitive research grants
Impact of Australian/NZ ICU Research:
| Trial | Year | Finding | Practice Impact |
|---|---|---|---|
| SAFE | 2004 | Albumin = saline for resuscitation | Stopped routine albumin for resuscitation |
| NICE-SUGAR | 2009 | Intensive glucose control increases mortality | Changed glucose targets to 6-10 mmol/L |
| ARISE | 2014 | EGDT = usual care for sepsis | Simplified sepsis resuscitation |
| CHEST | 2012 | HES increases RRT need | HES removed from most ICUs |
| TRANSFUSE | 2017 | Fresh blood = standard blood | Removed requirement for fresh blood |
| SPICE III | 2019 | Dexmedetomidine = standard sedation | No change but informed practice |
| RECOVERY | 2020 | Dexamethasone reduces COVID mortality | Global practice change |
Indigenous Health Research Considerations:
Research involving Aboriginal and Torres Strait Islander peoples and Maori must adhere to specific guidelines:
- NHMRC Guidelines for Ethical Conduct in Aboriginal and Torres Strait Islander Health Research (2018)
- Te Ara Tika - Guidelines for Maori Research Ethics (2010)
- Community consultation and partnership essential
- Indigenous governance structures for research oversight
- Data sovereignty considerations
- Benefit-sharing principles
Applied Basic Sciences
Study Design Fundamentals
Experimental vs Observational Studies:
| Feature | Experimental (RCT) | Observational |
|---|---|---|
| Intervention allocation | Investigator assigned | Natural exposure |
| Randomization | Yes | No |
| Causal inference | Strong | Limited |
| Confounding control | Randomization | Statistical adjustment |
| Ethical constraints | Cannot assign harmful exposures | None |
| Generalizability | May be limited by strict criteria | Often higher |
Randomized Controlled Trials (RCTs)
Types of RCT Designs:
1. Parallel Group RCT:
- Most common design
- Participants randomized to intervention or control
- Each participant receives single treatment
- Example: NICE-SUGAR (intensive vs conventional glucose control) (PMID: 19318384)
2. Crossover RCT:
- Participants receive both treatments in sequence
- Each participant serves as own control
- Requires washout period
- Unsuitable for ICU (high mortality, treatment effects persist)
3. Factorial Design:
- Tests two or more interventions simultaneously
- 2x2 factorial: 4 groups (A only, B only, A+B, neither)
- Efficient for testing interactions
- Example: SMART-SIPS (saline vs balanced fluids AND slow vs rapid rate)
4. Cluster Randomized Trial:
- Randomization at unit/hospital level, not individual
- Essential when intervention applies at ICU level
- Requires larger sample sizes (design effect)
- Example: ABCDEF bundle implementation studies
- Statistical considerations: intracluster correlation coefficient (ICC) (PMID: 22169231)
5. Stepped-Wedge Design:
- All clusters start as control, sequentially switch to intervention
- Pragmatic for implementation research
- All sites eventually receive intervention
- Example: Quality improvement interventions (PMID: 21680799)
6. Adaptive Trial Designs:
- Pre-planned modifications based on interim data
- Response-adaptive randomization
- Sample size re-estimation
- Seamless Phase II/III designs
- Bayesian adaptive designs
- Example: REMAP-CAP (multi-factorial adaptive platform) (PMID: 32678530)
Observational Study Designs
Cohort Studies:
- Follow exposed and unexposed groups over time
- Prospective or retrospective
- Best for: incidence, prognosis, harm (when RCT unethical)
- Measure: Relative Risk (RR), Hazard Ratio (HR)
- Limitations: confounding, selection bias, loss to follow-up
- ICU example: EPaNIC (early vs late PN in ICU) initially observational cohort (PMID: 21714640)
Case-Control Studies:
- Compare cases (with outcome) to controls (without)
- Retrospective: looks backward from outcome to exposure
- Best for: rare outcomes, multiple exposures
- Measure: Odds Ratio (OR)
- Limitations: selection of controls, recall bias
- ICU example: Risk factors for ARDS (PMID: 11794169)
Before-After (Pre-Post) Studies:
- Compare outcomes before and after intervention
- No concurrent control group
- Vulnerable to secular trends, regression to mean
- Best used with interrupted time series analysis
- ICU example: Checklist implementation studies
Interrupted Time Series (ITS):
- Multiple data points before and after intervention
- Controls for secular trends
- Strongest quasi-experimental design
- Requires sufficient pre-intervention data points (typically ≥8)
- Example: Bundle implementation studies (PMID: 15187053)
Bias Types and Prevention
Selection Bias:
Definition: Systematic differences in baseline characteristics between groups
Types in ICU research:
- Allocation bias (non-random assignment)
- Self-selection (differential consent)
- Survivorship bias (analyzing only survivors)
- Immortal time bias (misclassifying follow-up time)
Prevention:
- Randomization with allocation concealment
- Intention-to-treat analysis
- Propensity score matching (observational studies)
- Instrumental variable analysis
Performance Bias:
Definition: Systematic differences in care provided apart from intervention
Causes:
- Unblinded clinicians provide different co-interventions
- Hawthorne effect (behavior change due to observation)
- Expertise bias (intervention delivered by specialists)
Prevention:
- Double-blinding (often impossible in ICU)
- Protocolized care for both groups
- Pragmatic trial design
Detection Bias:
Definition: Systematic differences in outcome assessment between groups
Causes:
- Unblinded outcome assessors
- Different follow-up intensity
- Differential use of diagnostic tests
Prevention:
- Blinded outcome assessment
- Central adjudication committee
- Objective outcome measures (mortality)
- Identical follow-up protocols
Attrition Bias:
Definition: Systematic differences in withdrawals/losses between groups
Causes:
- Differential dropout
- Loss to follow-up
- Missing data
Prevention:
- Intention-to-treat analysis
- Multiple imputation for missing data
- Sensitivity analyses
- High follow-up rates (>95%)
Reporting Bias:
Definition: Selective reporting of outcomes based on results
Types:
- Publication bias (positive studies published more)
- Outcome reporting bias (only significant outcomes reported)
- Time-lag bias (positive studies published faster)
Prevention:
- Pre-registration of trial protocol
- Published statistical analysis plan
- Reporting all pre-specified outcomes
- ClinicalTrials.gov or ANZCTR registration
Randomization Methods
Simple Randomization:
- Computer-generated random sequence
- Equal probability each treatment
- May produce imbalanced groups in small trials
- Appropriate for large trials (n > 200)
Block Randomization:
- Ensures equal numbers in each group at regular intervals
- Block sizes typically 4, 6, 8
- Random block sizes prevent prediction
- Most common method in ICU trials
Stratified Randomization:
- Separate randomization within strata of key prognostic variables
- Ensures balance for important confounders
- Typically limit to 2-3 stratification factors
- Example: Stratify by site and baseline severity
Minimization:
- Adaptive randomization balancing multiple factors simultaneously
- Not truly random but achieves better balance
- Controversial but widely used
- Example: Balancing by site, age, severity simultaneously
Cluster Randomization:
- Randomize groups (ICUs, hospitals) not individuals
- Accounts for contamination and unit-level interventions
- Requires inflation of sample size (design effect = 1 + (m-1) x ICC)
- ICC typically 0.02-0.05 for ICU outcomes
Blinding Levels
| Level | Who is Blinded | Bias Prevented |
|---|---|---|
| Open-label | Nobody | None |
| Single-blind | Participants only | Placebo effect |
| Double-blind | Participants + clinicians | Performance bias |
| Triple-blind | + Outcome assessors | Detection bias |
| Quadruple-blind | + Statistician | Analysis bias |
Challenges to Blinding in ICU:
- Ventilation strategies (visible settings)
- Procedures (cannot hide surgery)
- Drug side effects (unmasking)
- Complex interventions (care bundles)
Solutions When Blinding Impossible:
- Objective primary outcome (mortality)
- Blinded outcome adjudication committee
- Protocolized co-interventions
- Pre-specified analysis plan
- PROBE design (Prospective Randomized Open Blinded Endpoint)
Statistical Concepts
P-Values and Significance Testing
Definition: The p-value is the probability of observing a result at least as extreme as that observed, if the null hypothesis were true.
Common Misconceptions:
- P-value is NOT probability that null hypothesis is true
- P-value is NOT probability that finding is due to chance
- P
< 0.05is an arbitrary threshold - Statistical significance ≠ clinical importance
- Non-significant result ≠ no effect
ASA Statement on P-Values (2016) (PMID: 27209009):
- P-values can indicate data incompatibility with a specified model
- P-values do not measure probability that hypothesis is true
- Conclusions should not be based solely on p
< 0.05threshold - P-values do not measure effect size or importance
- P-values alone do not provide good evidence about a model
Multiple Comparisons Problem:
- Testing many outcomes increases false positive rate
- Bonferroni correction: α/n (very conservative)
- False Discovery Rate (FDR) control (less conservative)
- Pre-specify primary outcome
- Secondary outcomes hypothesis-generating
Confidence Intervals
Interpretation: A 95% confidence interval means that if the study were repeated infinitely, 95% of calculated intervals would contain the true population parameter.
Clinical Interpretation:
- Does the CI exclude the null effect (1.0 for RR, 0 for mean difference)?
- Does the CI exclude clinically important effects?
- How wide is the CI (precision)?
Examples:
| Finding | Interpretation |
|---|---|
| RR 0.80 (95% CI 0.65-0.98) | Statistically significant (excludes 1.0), likely clinically meaningful |
| RR 0.80 (95% CI 0.50-1.28) | Not significant, imprecise, study underpowered |
| RR 0.99 (95% CI 0.95-1.03) | Not significant but precise; true effect likely small |
| RR 0.60 (95% CI 0.30-1.20) | Not significant, wide CI, could still be important effect |
Measures of Treatment Effect
Absolute Risk Reduction (ARR):
ARR = CER - EER
Where CER = Control Event Rate, EER = Experimental Event Rate
Relative Risk (RR):
RR = EER/CER
Relative Risk Reduction (RRR):
RRR = (CER - EER)/CER = 1 - RR
Number Needed to Treat (NNT):
NNT = 1/ARR = 1/(CER - EER)
Number Needed to Harm (NNH):
NNH = 1/|ARR|
Clinical Context for NNT:
| Intervention | NNT | Baseline Risk | Interpretation |
|---|---|---|---|
| Prone positioning in ARDS | 6 | 33% | Very effective |
| Dexamethasone in COVID ARDS | 8 | 41% | Very effective |
| Low tidal volume ventilation | 9 | 40% | Effective |
| Early goal-directed therapy | No benefit | - | ARISE refuted |
| Tight glucose control | NNH 13 | - | Harmful |
NNT Varies with Baseline Risk:
The same RRR produces different NNTs depending on baseline risk:
- RRR 25%, baseline risk 40%: NNT = 10
- RRR 25%, baseline risk 10%: NNT = 40
- RRR 25%, baseline risk 4%: NNT = 100
Sample Size and Power
Key Concepts:
- Alpha (α): Probability of Type I error (false positive); typically 0.05
- Beta (β): Probability of Type II error (false negative); typically 0.10-0.20
- Power (1-β): Probability of detecting true effect; typically 80-90%
Sample Size Determinants:
| Factor | Effect on Sample Size |
|---|---|
| Larger expected effect size | Smaller sample needed |
| Smaller expected effect size | Larger sample needed |
| Lower alpha (e.g., 0.01) | Larger sample needed |
| Higher power (e.g., 90%) | Larger sample needed |
| Higher outcome variability | Larger sample needed |
| Cluster randomization | Larger sample needed |
Sample Size Formula (Simplified for Binary Outcome):
n = [2(Z_(alpha/2) + Z_(beta))^2 * p(1 - p)] / (p1 - p2)^2
Where:
- Z(α/2) = 1.96 for α = 0.05 (two-sided)
- Z(β) = 0.84 for power 80%
- p = pooled proportion
- (p1 - p2) = absolute difference to detect
Example Calculation:
Detecting 5% absolute mortality reduction (30% to 25%) with 80% power:
- n per group ≈ 2 × (1.96 + 0.84)² × 0.275 × 0.725 / (0.05)²
- n per group ≈ 1,232 patients
- Total n ≈ 2,464 patients
Survival Analysis
Kaplan-Meier Curves:
- Non-parametric survival estimation
- Handles censored data (patients lost to follow-up)
- Log-rank test compares curves between groups
- Median survival: time when 50% have experienced event
Hazard Ratio (HR):
- Ratio of instantaneous event rates
- HR
< 1: treatment reduces hazard - HR > 1: treatment increases hazard
- Assumes proportional hazards (constant HR over time)
Cox Proportional Hazards Model:
- Multivariate survival analysis
- Adjusts for confounders
- Hazard function: h(t) = h₀(t) × exp(β₁X₁ + β₂X₂ + ...)
- Proportional hazards assumption should be tested
Meta-Analysis and Systematic Reviews
Systematic Review Methodology
PRISMA Statement (2020) (PMID: 33782057):
Key Components:
- Protocol registration: PROSPERO database
- Eligibility criteria: PICO framework
- Search strategy: Multiple databases (PubMed, EMBASE, Cochrane, etc.)
- Study selection: Dual independent screening
- Data extraction: Standardized forms
- Risk of bias assessment: Cochrane RoB 2.0 or ROBINS-I
- Synthesis: Narrative or quantitative (meta-analysis)
- GRADE assessment: Certainty of evidence
Search Databases for ICU Research:
- PubMed/MEDLINE (primary)
- EMBASE (European emphasis)
- Cochrane Central (RCTs)
- CINAHL (nursing literature)
- ClinicalTrials.gov (ongoing trials)
- ANZCTR (Australian/NZ trials)
- Grey literature (conference abstracts)
Meta-Analysis Methodology
Fixed-Effect vs Random-Effects Models:
| Feature | Fixed-Effect | Random-Effects |
|---|---|---|
| Assumption | One true effect size | True effects vary across studies |
| Weights | Based on precision only | Precision + between-study variance |
| CI width | Narrower | Wider |
| Interpretation | Effect in similar populations | Average effect across populations |
| Use when | Studies methodologically similar | Heterogeneity expected |
Heterogeneity Assessment:
I² Statistic:
- Percentage of variability due to true differences (not chance)
- I² = 0%: No heterogeneity
- I² = 25%: Low heterogeneity
- I² = 50%: Moderate heterogeneity
- I² = 75%: High heterogeneity
Cochran's Q Test:
- Tests whether heterogeneity is statistically significant
- Low power in meta-analyses with few studies
- p
< 0.10often used as threshold
Sources of Heterogeneity:
- Clinical: Different populations, interventions, co-interventions
- Methodological: Different study designs, risk of bias
- Statistical: Different outcome measures, time points
Subgroup and Sensitivity Analyses:
- Pre-specified in protocol
- Explore sources of heterogeneity
- Limit to credible subgroups (biological plausibility)
- Multiple testing increases false positives
- Never data-dredge for favorable subgroups
Forest Plot Interpretation
Components:
Study Events/Total Weight RR (95% CI)
Control Treat
------------------------------------------------------
Study A 15/100 10/100 15.2% 0.67 (0.32-1.38) |----■----|
Study B 30/200 22/200 22.8% 0.73 (0.44-1.22) |---■---|
Study C 45/300 30/300 35.5% 0.67 (0.44-1.01) |--■--|
Study D 25/150 18/150 26.5% 0.72 (0.42-1.24) |---■---|
------------------------------------------------------
Total 115/750 80/750 100% 0.70 (0.54-0.90) <■>
Heterogeneity: I² = 0%, p = 0.98
Test for overall effect: Z = 2.65, p = 0.008
------------------------------------------------------
Favors Treatment | Favors Control
Interpretation Checklist:
- Overall effect: Point estimate and 95% CI
- Does CI cross null line (RR = 1.0)?
- Heterogeneity: I² value and Q-test p-value
- Individual study weights
- Direction and consistency of individual studies
- Precision of estimates (CI width)
Publication Bias Assessment
Funnel Plot:
- Plot of effect size vs precision (usually standard error)
- Symmetric if no publication bias
- Asymmetry suggests publication bias or heterogeneity
Egger's Test:
- Statistical test for funnel plot asymmetry
- Regression of standardized effect on precision
- Significant p-value suggests publication bias
Trim and Fill Method:
- Imputes "missing" studies
- Calculates adjusted pooled estimate
- Sensitivity analysis for publication bias impact
Contour-Enhanced Funnel Plots:
- Shows significance contours
- Helps distinguish publication bias from heterogeneity
- Missing studies in non-significant zones suggest publication bias
Quality Assessment
Cochrane Risk of Bias Tool (RoB 2.0)
Domains (PMID: 31462531):
| Domain | Assessment |
|---|---|
| Randomization process | Allocation sequence generation, concealment, baseline differences |
| Deviations from intervention | Blinding, protocol deviations, co-interventions |
| Missing outcome data | Completeness, reasons for missingness, handling |
| Outcome measurement | Blinding of assessors, objective vs subjective |
| Selection of reported result | Pre-specification, multiple outcomes, subgroups |
Overall Judgment:
- Low risk: Low risk in all domains
- Some concerns: Concerns in at least one domain
- High risk: High risk in at least one domain, or multiple concerns
GRADE System
GRADE (Grading of Recommendations Assessment, Development and Evaluation) (PMID: 18565887):
Starting Points:
- RCTs start as HIGH certainty
- Observational studies start as LOW certainty
Downgrade Factors (each can reduce by 1-2 levels):
| Factor | Considerations |
|---|---|
| Risk of bias | Methodological limitations, lack of blinding, allocation concealment |
| Inconsistency | Heterogeneity (I² > 50%), different directions of effect |
| Indirectness | Different PICO from question, surrogate outcomes |
| Imprecision | Wide CI, small sample, few events, OIS not met |
| Publication bias | Funnel plot asymmetry, commercial sponsorship |
Upgrade Factors (for observational studies):
| Factor | Considerations |
|---|---|
| Large effect | RR > 2 or < 0.5 (upgrade 1); RR > 5 or < 0.2 (upgrade 2) |
| Dose-response | Clear gradient |
| Confounding | Residual confounding would reduce effect |
GRADE Certainty Levels:
| Level | Definition | Symbol |
|---|---|---|
| High | Very confident effect estimate close to true | ⊕⊕⊕⊕ |
| Moderate | Moderately confident; true effect likely close to estimate | ⊕⊕⊕⊝ |
| Low | Confidence limited; true effect may be substantially different | ⊕⊕⊝⊝ |
| Very Low | Very little confidence; true effect likely substantially different | ⊕⊝⊝⊝ |
Optimal Information Size (OIS)
Concept: The sample size an adequately powered single RCT would need to detect the effect.
Purpose: Assess whether cumulative evidence is sufficient for conclusions.
Calculation: Same as single trial sample size calculation.
Application:
- If total n < OIS: CI is imprecise, cannot rule out clinically important effects
- If total n ≥ OIS: CI is informative
- Sequential analysis (Trial Sequential Analysis) accounts for multiple comparisons in cumulative meta-analyses
Critical Appraisal of Key ICU Trials
Landmark Australian/NZ Trials
SAFE Trial (2004) - Albumin vs Saline (PMID: 15163774)
Design: Multicenter, blinded, RCT Population: 6,997 ICU patients requiring fluid resuscitation Intervention: 4% albumin vs 0.9% saline Primary Outcome: 28-day mortality Result: No difference (RR 0.99, 95% CI 0.91-1.09)
Critical Appraisal:
- Strengths: Large sample, blinding, multicenter, protocol adherence
- Limitations: Open-label after unblinding, heterogeneous population
- Subgroups: TBI (albumin harmful), sepsis (albumin trend beneficial)
- Impact: Stopped routine albumin use; prompted ALBIOS, SAFE-TBI
NICE-SUGAR Trial (2009) - Glucose Control (PMID: 19318384)
Design: Multicenter RCT, 42 centers Population: 6,104 ICU patients requiring ≥3 days ICU Intervention: Intensive (4.5-6.0 mmol/L) vs conventional (≤10 mmol/L) Primary Outcome: 90-day mortality Result: Intensive control increased mortality (27.5% vs 24.9%; OR 1.14, 95% CI 1.02-1.28)
Critical Appraisal:
- Strengths: Pragmatic, large, multicenter, long-term outcome
- Limitations: Open-label, hypoglycemia rates (6.8% vs 0.5%)
- Mechanism: Hypoglycemia likely mediator of harm
- Impact: Global practice change to target 6-10 mmol/L
ARISE Trial (2014) - EGDT in Sepsis (PMID: 25272316)
Design: Multicenter RCT, 51 centers Population: 1,600 patients with early septic shock Intervention: EGDT protocol vs usual care Primary Outcome: 90-day mortality Result: No difference (18.6% vs 18.8%; p = 0.90)
Critical Appraisal:
- Strengths: Multicenter, pragmatic, low crossover
- Limitations: Lower mortality than Rivers (2001), usual care improved
- Context: ProCESS and ProMISe showed same result
- Impact: Simplified sepsis resuscitation; deconstructed EGDT bundle
CHEST Trial (2012) - HES vs Saline (PMID: 23075127)
Design: Multicenter RCT, 32 centers Population: 7,000 ICU patients requiring fluid resuscitation Intervention: 6% HES 130/0.4 vs 0.9% saline Primary Outcome: 90-day mortality Result: No mortality difference; increased RRT with HES (7.0% vs 5.8%; RR 1.21, 95% CI 1.00-1.45)
Critical Appraisal:
- Strengths: Large, blinded, pragmatic
- Limitations: Saline control (not balanced crystalloid)
- Safety signals: Increased pruritus, RRT
- Impact: HES largely abandoned in ICU; regulatory warnings
TRANSFUSE Trial (2017) - Fresh vs Standard Blood (PMID: 28151185)
Design: Multicenter RCT, 59 centers Population: 4,919 ICU patients requiring RBC transfusion Intervention: Freshest available (<7 days) vs standard (mean 22 days) Primary Outcome: 90-day mortality Result: No difference (24.8% vs 24.1%; p = 0.69)
Critical Appraisal:
- Strengths: Pragmatic, large, clinically important question
- Limitations: Mean age difference only 11 days
- Impact: Removed requirement for preferential fresh blood
International Landmark Trials
ARDSNet ARMA Trial (2000) - Low Tidal Volume (PMID: 10793162)
Design: Multicenter RCT, 10 centers Population: 861 patients with ARDS Intervention: 6 mL/kg vs 12 mL/kg PBW Primary Outcome: Hospital mortality Result: Mortality reduced (31.0% vs 39.8%; RR 0.78; NNT 9)
Critical Appraisal:
- Strengths: Clear benefit, biological plausibility
- Limitations: Control arm tidal volumes higher than usual care
- Ongoing debate: Optimal PEEP (subsequent ALVEOLI, LOVS, EXPRESS trials)
- Impact: Universal adoption of 6 mL/kg PBW
PROSEVA Trial (2013) - Prone Positioning (PMID: 23688302)
Design: Multicenter RCT, 27 centers (France/Spain)
Population: 466 patients with severe ARDS (P/F < 150)
Intervention: Prone ≥16h/day vs supine
Primary Outcome: 28-day mortality
Result: Mortality reduced (16.0% vs 32.8%; RR 0.49; NNT 6)
Critical Appraisal:
- Strengths: Dramatic effect size, consistent with physiology
- Limitations: Highly selected population, experienced centers
- Context: Earlier negative trials used shorter proning duration
- Impact: Standard of care for severe ARDS (P/F
< 150)
RECOVERY Trial (2020) - Dexamethasone in COVID-19 (PMID: 32678530)
Design: Multicenter, adaptive platform RCT Population: 6,425 hospitalized COVID-19 patients Intervention: Dexamethasone 6mg daily for 10 days vs usual care Primary Outcome: 28-day mortality Result: Mortality reduced overall (22.9% vs 25.7%); greatest benefit in ventilated patients (29.3% vs 41.4%; NNT 8)
Critical Appraisal:
- Strengths: Rapid enrollment, pragmatic, adaptive design
- Limitations: Open-label, no benefit in non-oxygen patients
- Impact: Global practice change within weeks of publication
ANZICS-CORE Clinical Trials Group
Structure and Function (PMID: 27849418):
The ANZICS Clinical Trials Group (CTG) represents a model for efficient, high-quality ICU research:
Governance:
- Research sub-committee of ANZICS
- Executive committee with rotating membership
- Site principal investigators at each participating ICU
- Central coordinating center (George Institute, Monash University)
Methodology Innovations:
- Embedded research model within existing ICU infrastructure
- Waiver of consent for low-risk interventions
- Central randomization and data management
- Standardized data collection using ANZICS CORE database
- High protocol adherence (>95% in most trials)
Trial Portfolio:
| Trial | Topic | Status |
|---|---|---|
| SAFE | Albumin vs saline | Completed 2004 |
| NICE-SUGAR | Glucose control | Completed 2009 |
| CHEST | HES vs saline | Completed 2012 |
| ARISE | EGDT in sepsis | Completed 2014 |
| TRANSFUSE | Fresh vs standard blood | Completed 2017 |
| SPICE III | Dexmedetomidine sedation | Completed 2019 |
| PLUS | Balanced vs saline | Completed 2022 |
| STRESS-L | Early lactulose in cirrhosis | Ongoing |
| EXCEL | Extended infusion piperacillin | Ongoing |
Impact Metrics:
-
60 participating ICUs
-
50,000 patients enrolled
-
15 major trials completed
- Multiple NEJM, Lancet, JAMA publications
- Global practice changes from findings
Indigenous Health Research Considerations
Ethical Frameworks
NHMRC Guidelines for Ethical Conduct in Aboriginal and Torres Strait Islander Health Research:
Core Principles:
- Spirit and Integrity: Respect for cultural values, difference
- Cultural Continuity: Connection to country, language, kinship
- Equity: Fair distribution of benefits and burdens
- Reciprocity: Mutual obligations and benefit
- Respect: Recognition of rights and culture
- Responsibility: Accountability to communities
Practical Requirements:
- Community engagement before research design
- Indigenous governance committees
- Consultation with Elders and Traditional Owners
- Plain language information in appropriate language
- Community benefit-sharing
- Data sovereignty and ownership discussions
Te Ara Tika - Maori Research Ethics
Framework Principles:
- Whakapapa: Relationships and connections
- Tika: Doing what is right, ethical conduct
- Manaakitanga: Caring for and protecting participants
- Mana: Power, authority, prestige
Consultation Requirements:
- Engagement with iwi, hapu, whanau
- Maori Health Research Committee approval
- Cultural supervision and oversight
- Te reo Maori resources where appropriate
ICU-Specific Considerations
Challenges in Indigenous ICU Research:
- Higher severity at admission (later presentation)
- Family structures differ from Western nuclear model
- Decision-making may involve extended family, Elders
- Cultural protocols around death and dying
- Interpreter and cultural liaison needs
- Geographic barriers to research follow-up
- Historical research exploitation and mistrust
Best Practice Approaches:
- Partner with Aboriginal Community Controlled Health Organizations (ACCHOs)
- Engage Aboriginal Hospital Liaison Officers (AHLOs)
- Extend consent discussions to include extended family
- Provide research updates to communities, not just individuals
- Consider Indigenous-specific analyses with community guidance
- Report findings back to communities before publication
Assessment Content
SAQ Practice Questions
SAQ 1: Critical Appraisal of RCT (20 marks)
Time Allocation: 10 minutes
Stem:
You are reviewing the following abstract for journal club:
"A randomized controlled trial of intervention X vs standard care in septic shock. 200 patients were enrolled at a single center. Patients were randomized 1:1 to intervention or control. Primary outcome was 28-day mortality. Results: Mortality was 25% in intervention group vs 35% in control group (p = 0.04). Subgroup analysis showed greatest benefit in patients with APACHE II > 25."
Question 1.1 (8 marks)
Critically appraise this study. List FOUR strengths and FOUR limitations.
Model Answer:
Strengths (4 marks, 1 each):
- Randomized design: Minimizes selection bias and balances confounders
- Hard endpoint: 28-day mortality is objective and patient-centered
- Statistically significant result: p = 0.04 suggests effect unlikely due to chance alone
- Clinically meaningful effect size: 10% absolute risk reduction (ARR) is substantial
Limitations (4 marks, 1 each):
- Single-center: Limited generalizability; may not apply to other ICU settings
- Small sample size: 200 patients may be underpowered; wide confidence intervals expected
- Blinding status unknown: If unblinded, performance and detection bias possible
- Post-hoc subgroup analysis: APACHE > 25 finding is hypothesis-generating only, not confirmatory
Question 1.2 (6 marks)
Calculate the following measures of treatment effect and explain their clinical interpretation: a) Absolute Risk Reduction (ARR) b) Relative Risk Reduction (RRR) c) Number Needed to Treat (NNT)
Model Answer (6 marks, 2 each):
a) ARR (2 marks):
- ARR = CER - EER = 35% - 25% = 10% (or 0.10)
- Interpretation: 10 fewer deaths per 100 patients treated with intervention X
b) RRR (2 marks):
- RRR = (CER - EER)/CER = (0.35 - 0.25)/0.35 = 28.6%
- Interpretation: Intervention X reduces relative risk of death by 28.6%
c) NNT (2 marks):
- NNT = 1/ARR = 1/0.10 = 10
- Interpretation: Need to treat 10 patients with intervention X to prevent one additional death (compared to standard care) over 28 days
Question 1.3 (6 marks)
The p-value was 0.04. Explain THREE reasons why this p-value alone is insufficient to change your clinical practice.
Model Answer (6 marks, 2 each):
-
Statistical vs clinical significance (2 marks):
- P-value only indicates probability of result if null hypothesis true
- Does not indicate magnitude of effect or clinical importance
- Need to examine confidence interval and effect size
-
Precision and confidence interval (2 marks):
- Small sample size likely produces wide confidence interval
- CI may include both clinically important benefit and harm
- Cannot exclude that true effect is much smaller (or larger)
-
External validity limitations (2 marks):
- Single-center study may not apply to my population
- Patient selection, co-interventions, expertise may differ
- Need replication in larger, multicenter trials before practice change
SAQ 2: Meta-Analysis Interpretation (20 marks)
Time Allocation: 10 minutes
Stem:
You are presented with a forest plot from a meta-analysis of Intervention Y for acute respiratory failure:
Study Weight RR (95% CI)
Study A 15% 0.65 (0.35-1.21)
Study B 25% 0.82 (0.58-1.16)
Study C 30% 0.73 (0.55-0.97)
Study D 20% 0.90 (0.62-1.31)
Study E 10% 0.55 (0.25-1.21)
---------------------------------
Overall 100% 0.76 (0.62-0.93)
Heterogeneity: I² = 22%, p = 0.27
Question 2.1 (8 marks)
Interpret this forest plot. Address the following: a) Overall effect and statistical significance b) Heterogeneity assessment c) Consistency of individual study results d) Clinical interpretation
Model Answer (8 marks, 2 each):
a) Overall effect (2 marks):
- Pooled RR = 0.76 (95% CI 0.62-0.93)
- Statistically significant as CI excludes 1.0
- Suggests 24% relative risk reduction in mortality with Intervention Y
b) Heterogeneity (2 marks):
- I² = 22% indicates low heterogeneity (
< 25%threshold) - Q-test p = 0.27 is non-significant, supporting homogeneity
- Studies likely measuring same underlying effect
c) Individual study consistency (2 marks):
- All five studies show point estimates
< 1.0(favoring intervention) - Only Study C reaches individual significance (excludes 1.0)
- Consistent direction of effect across studies
d) Clinical interpretation (2 marks):
- Moderate certainty of mortality reduction (~24% RRR)
- CI excludes no effect but relatively wide
- Clinical significance depends on baseline risk and intervention feasibility
Question 2.2 (6 marks)
Using the GRADE framework, assess the certainty of evidence for this outcome. Identify any factors that would downgrade or upgrade the evidence.
Model Answer (6 marks):
Starting point (1 mark):
- RCTs: Start as HIGH certainty
Downgrade factors assessment (4 marks, 1 each):
-
Risk of bias: Not assessed from forest plot alone; would need individual trial appraisal. Assume LOW risk if well-conducted RCTs (no downgrade)
-
Inconsistency: I² = 22% (low), consistent direction. No downgrade
-
Indirectness: Not assessable from data provided; assume studies directly answer clinical question. No downgrade
-
Imprecision: 95% CI 0.62-0.93 is moderately wide but excludes 1.0 and clinically important harm. May consider no downgrade OR downgrade one level if OIS not met
-
Publication bias: Cannot assess without funnel plot. If only 5 studies, potential concern. Consider downgrade one level if suspected
Final assessment (1 mark):
- MODERATE certainty (downgrade one level for potential imprecision or publication bias)
- Symbol: ⊕⊕⊕⊝
Question 2.3 (6 marks)
Outline THREE potential sources of publication bias and describe TWO methods to assess publication bias in a meta-analysis.
Model Answer (6 marks):
Sources of publication bias (3 marks, 1 each):
- Positive result bias: Studies showing statistically significant results more likely to be published
- Commercial sponsorship bias: Industry-funded trials more likely published if favorable to sponsor
- Time-lag bias: Positive studies published faster than negative studies
- Language bias: English-language journals may preferentially publish certain results
- Citation bias: Positive studies more cited, appear more prominent
Assessment methods (3 marks, 1.5 each):
-
Funnel plot:
- Plots effect size vs precision (standard error)
- Symmetric if no publication bias
- Asymmetry suggests publication bias or heterogeneity
- Egger's test provides statistical assessment of asymmetry
-
Trim and fill method:
- Imputes "missing" studies to create symmetric funnel plot
- Recalculates pooled estimate including imputed studies
- Provides sensitivity analysis for publication bias impact
Viva Scenarios
Viva 1: Study Design and Bias (12 minutes)
Stem: "You are designing a study to evaluate whether a new early mobilization protocol reduces ICU length of stay. The intervention requires training of physiotherapists and nursing staff at the ICU level."
Opening Question:
"What study design would you choose for this research question and why?"
Expected Answer (3 minutes):
Recommended design: Cluster randomized controlled trial
Rationale:
- Intervention at ICU level (staff training) cannot be applied to individuals
- Contamination likely if individual randomization (trained staff treat controls)
- Cluster = individual ICU or hospital
- Randomize clusters to intervention vs control
Design considerations:
- Need more clusters and participants than individual RCT (design effect)
- ICC for ICU LOS typically 0.02-0.05
- Stepped-wedge variant could ensure all sites eventually receive intervention
- Balance sites by size, case-mix, existing mobilization practice
Follow-up Question 1 (2 minutes):
"What biases are you particularly concerned about in this study, and how would you mitigate them?"
Expected Answer:
Key biases:
-
Selection bias:
- Cluster imbalance despite randomization (few clusters)
- Mitigation: Stratified randomization by site characteristics
-
Performance bias:
- Unblinded intervention (impossible to blind staff training)
- Mitigation: Standardize co-interventions, use objective outcomes
-
Detection bias:
- Unblinded outcome assessment for LOS
- Mitigation: Use objective outcome (discharge date), central adjudication for secondary outcomes
-
Hawthorne effect:
- Both groups may improve due to study participation
- Mitigation: Run-in period, stepped-wedge design
Follow-up Question 2 (2 minutes):
"The biostatistician tells you the ICC for ICU LOS is 0.03 and you need 20 clusters with 50 patients each. Explain why the sample size is larger than an individual RCT."
Expected Answer:
Design effect (variance inflation factor):
- DE = 1 + (m - 1) × ICC
- DE = 1 + (50 - 1) × 0.03 = 1 + 1.47 = 2.47
Explanation:
- Patients within same ICU share characteristics (correlated outcomes)
- Not fully independent observations
- Effective sample size reduced by design effect
- Must inflate sample size by DE to achieve same power
Calculation:
- If individual RCT needed 400 patients
- Cluster RCT needs: 400 × 2.47 ≈ 1,000 patients
- With 50 patients per cluster: need 20 clusters
Follow-up Question 3 (2 minutes):
"Your trial is negative (no difference in LOS). How would you interpret this result?"
Expected Answer:
Interpretation framework:
-
True negative (most likely if well-powered):
- Early mobilization does not reduce ICU LOS
- May still have other benefits (function, delirium)
-
Type II error (false negative):
- Underpowered study
- Effect size smaller than anticipated
- Need to examine confidence interval
-
Implementation failure:
- Poor protocol adherence
- Contamination between groups
- Need process evaluation
Key questions:
- What was actual effect size and 95% CI?
- Was sample size adequate?
- Was intervention delivered as intended?
- Were there barriers to implementation?
Follow-up Question 4 (3 minutes):
"How would you consider Indigenous health in this research design?"
Expected Answer:
Ethical considerations:
- Engage Aboriginal and Torres Strait Islander and Maori communities early
- Community consultation before protocol finalization
- Indigenous governance representation on steering committee
- NHMRC and Te Ara Tika guidelines compliance
Practical considerations:
- Include sites with high Indigenous populations
- Pre-specify Indigenous-specific subgroup analysis
- Ensure cultural appropriateness of mobilization intervention
- Consider family involvement (whanau, extended family)
- Aboriginal Hospital Liaison Officer involvement
Data and reporting:
- Collect Indigenous status with consent
- Report disaggregated outcomes with community guidance
- Benefit-sharing: report findings back to communities
- Data sovereignty considerations
- Avoid deficit framing; focus on strengths and solutions
Viva 2: Statistical Interpretation (12 minutes)
Stem: "A new trial of Drug Z for septic shock shows 90-day mortality of 28% in treatment group vs 32% in placebo group (p = 0.08, 95% CI for RR 0.79-1.02)."
Opening Question:
"How would you interpret this result?"
Expected Answer (3 minutes):
Statistical interpretation:
- Relative Risk = 28/32 = 0.875 (12.5% RRR)
- Absolute Risk Reduction = 32% - 28% = 4%
- p = 0.08 exceeds conventional 0.05 threshold
- 95% CI 0.79-1.02 includes 1.0 (null effect)
- Technically "not statistically significant"
However, clinical interpretation:
- Point estimate suggests potential 12.5% relative mortality reduction
- CI lower bound (0.79) represents meaningful benefit
- CI barely crosses 1.0 (upper bound 1.02)
- Cannot exclude clinically important benefit OR modest harm
- Study may be underpowered (Type II error possible)
Conclusion:
- Inconclusive result, not evidence of no effect
- "Absence of evidence is not evidence of absence"
- Need to examine sample size, power, and whether clinically meaningful difference was targeted
Follow-up Question 1 (2 minutes):
"Calculate the NNT for this result and discuss whether it would be clinically meaningful."
Expected Answer:
Calculation:
- ARR = 32% - 28% = 4% = 0.04
- NNT = 1/0.04 = 25
Clinical interpretation:
- Need to treat 25 septic shock patients to prevent one death
- For a life-threatening condition, NNT of 25 is potentially worthwhile
- Compare to accepted interventions:
- "Low tidal volume ventilation: NNT ~9"
- "Prone positioning: NNT ~6"
- "Dexamethasone in COVID: NNT ~8"
Considerations:
- Drug Z cost and adverse effects matter
- If inexpensive and safe, NNT 25 acceptable
- If expensive or significant toxicity, may not be worthwhile
- Need to consider NNH for adverse effects
Follow-up Question 2 (2 minutes):
"The investigators conducted post-hoc subgroup analyses and found significant benefit in patients with lactate > 4 mmol/L (p = 0.02). How do you interpret this?"
Expected Answer:
Interpretation framework for subgroups:
Caution required:
- Post-hoc (not pre-specified): exploratory only
- Multiple testing: increased false positive risk
- Often biological implausibility
- Subgroups may differ in other characteristics
Criteria for credible subgroup effects (PMID: 20228402):
- Pre-specified hypothesis?
- One of few hypotheses tested?
- Between-group comparison (interaction test)?
- Consistent across studies?
- Biologically plausible?
- Supported by indirect evidence?
This case:
- Post-hoc: fails criterion 1
- Likely many subgroups tested: fails criterion 2
- No interaction test reported: fails criterion 3
Conclusion:
- Hypothesis-generating only
- Would need prospective validation trial
- Should NOT change practice based on this subgroup finding
Follow-up Question 3 (2 minutes):
"If you were designing a follow-up trial, what sample size considerations would you have?"
Expected Answer:
Sample size considerations:
Effect size:
- Observed ARR was 4% (32% vs 28%)
- For realistic power, might target 5% ARR to be conservative
- Smaller effects need larger samples
Power and alpha:
- Typically 80-90% power
- Alpha 0.05 (two-sided)
Baseline event rate:
- Control mortality 32% (from prior trial)
- Higher event rates need smaller samples
Sample size estimate:
- For 5% ARR (32% to 27%), 80% power, alpha 0.05:
- Approximately 1,100-1,200 patients per arm
- Total ~2,400 patients
Additional considerations:
- Loss to follow-up (inflate by expected dropout)
- Cluster effects if cluster randomized
- Adaptive design could allow early stopping for efficacy/futility
- Bayesian design could incorporate prior trial data
Follow-up Question 4 (3 minutes):
"Discuss the concept of clinical versus statistical significance in this scenario."
Expected Answer:
Statistical significance:
- P-value arbitrary threshold (0.05)
- Reflects sample size and effect size
- Large trials find small, unimportant effects "significant"
- Small trials miss important effects (Type II error)
- Dichotomous thinking (significant vs not) is problematic
Clinical significance:
- Would this effect size change practice?
- Depends on: outcome importance, NNT, cost, harm, alternatives
- 4% ARR for mortality (NNT 25) likely clinically important
- Same 4% ARR for minor symptom would not be
This scenario:
- Statistically non-significant (p = 0.08)
- But potentially clinically important (4% ARR for mortality)
- CI includes both benefit (RR 0.79) and near-null effect (RR 1.02)
- Result is inconclusive, not negative
What should we do?:
- NOT: Conclude drug doesn't work
- NOT: Recommend against use
- INSTEAD: Recognize uncertainty, await larger trial
- Consider meta-analysis with other trials if available