Pillar guide · English

Evidence hierarchy, without jargon

Meta-analysis, randomized trial, observational study, animal, in vitro. What each level says, what it doesn't say, and how to read 'studies show' without getting fooled.

Translated and adapted from the canonical Portuguese version: /guias/entenda-hierarquia-evidencia.

Quick answer

Clinical evidence sits on a ladder. At the top: well-conducted meta-analyses of randomized controlled trials (RCTs). Below: individual RCTs with adequate design. Then: large observational cohort studies. Below those: case series and case reports. Near the bottom: animal models. At the base: in vitro work and mechanistic plausibility. A claim about humans built only on rat data is inference, not evidence. The ladder also has internal nuances — a meta-analysis is only as good as the trials feeding it, an RCT can be small or poorly designed, and an observational study with 100,000 people may carry useful signal that no RCT will ever produce. Knowing where a claim sits on the ladder is the first filter for reading peptide marketing in 2026.

Why this guide exists

Most peptide marketing — and a fair amount of mainstream coverage — flattens the ladder. "Studies show" is presented as if all studies were equivalent. A rat experiment, a case series of 5 people, and a phase 3 RCT with 5,000 randomized adults end up cited side-by-side as if each carried the same weight.

They don't. The point of this guide is to give the reader a framework for asking, when faced with a claim about a peptide: what level of evidence is this, and what does that level actually authorize?

This guide is not a methodology textbook. It does not replace the Cochrane Handbook, the GRADE framework, or Sackett's classic levels of evidence. It is a practical bridge between those formal frameworks and the reading the adult patient needs to do before walking into a medical visit.

The ladder, top to bottom

Meta-analysis of RCTs — top of the ladder

A meta-analysis pools data from multiple randomized controlled trials testing the same question, using statistical methods to produce a combined estimate of effect. When the included trials are well-conducted, large, and similar in design, a meta-analysis is the strongest type of evidence available in clinical medicine.

Key point: a meta-analysis is only as good as the trials inside it. Garbage in, garbage out. A meta-analysis combining 12 small biased trials produces a precise-looking number with no real validity. The Cochrane methodology — risk-of-bias assessment, heterogeneity analysis, GRADE certainty rating — exists precisely to surface these issues.

When a peptide claim cites "meta-analyses" the operational questions are: how many trials? Total participants? Risk of bias? Statistical heterogeneity? Most peptide claims that cite "meta-analysis" turn out, on inspection, to cite preclinical reviews of animal studies — a different level of evidence entirely.

Randomized controlled trial — second rung

The RCT is the gold standard for testing whether a specific intervention causes a specific outcome. Random allocation balances known and unknown confounders between groups; blinding (masking) prevents placebo effects and assessor bias from contaminating results.

Quality markers within the RCT level:

Multicenter — reduces site-specific bias.
Double-blind — neither participants nor researchers know allocation.
Placebo-controlled — when ethical, the comparator is inert.
Adequate sample size — pre-specified power calculation.
Pre-registered protocol — primary endpoint declared before data collection (ClinicalTrials.gov, EudraCT).
Intention-to-treat analysis — participants analyzed in their assigned group regardless of dropout.

A small phase 1 trial in 20 healthy volunteers is an RCT. A multicenter phase 3 trial in 5,000 people with the disease, 72 weeks of follow-up, and pre-specified MACE endpoint is also an RCT. Both belong on the same rung but carry very different weights. STEP 1 (n=1,961, semaglutide) is not equivalent to Teichman 2006 (n=43, CJC-1295) — both are randomized, but the second cannot answer the question of long-term safety in clinical populations.

Cohort and case-control — observational studies

Observational studies follow people without assigning the intervention. The researcher observes what happens to those who chose (or were prescribed) treatment X compared to those who did not. The largest cohorts produce massive datasets — UK Biobank, Framingham, NHANES — that capture signals no RCT can capture, especially for rare events and very long-term outcomes.

The fundamental limitation: correlation is not causation. Confounding by indication is the classic problem — people who receive a treatment may differ systematically from people who do not, in ways the researcher can or cannot measure. Statistical adjustments reduce but never eliminate this risk. Mendelian randomization and propensity score matching are tools to mitigate confounding, never to fully replace randomization.

In peptide therapy, observational evidence is currently scarce because most use happens outside structured registries (parallel markets, off-label compounding). An exception is GLP-1 in approved indications — large pharmacovigilance datasets exist for liraglutide, semaglutide, and tirzepatide.

Case series and case reports

A case series describes a small number of people receiving the intervention without a control group. Useful for generating hypotheses, identifying rare adverse events, and characterizing new clinical syndromes. Insufficient for establishing efficacy.

The Lee 2021 study cited as "human evidence for BPC-157" is a retrospective case series of 17 people with knee pain. It does not contain a control group, was not blinded, did not pre-specify outcomes, and the outcome was self-reported pain in 14 of 16 contacted at follow-up. This is a hypothesis-generating signal, not an efficacy demonstration.

Animal studies — the base of the inference ladder

Animal models — rodents, dogs, pigs, primates — are essential for early-phase pharmacological development. They authorize hypotheses about mechanism, dose-response, and potential toxicity. They do not authorize claims about humans.

The historical record is unambiguous. Most interventions that "worked in rats" failed in human trials. Estimates from oncology suggest that fewer than 10% of compounds with positive preclinical signal in animals produce positive phase 3 results in humans. The translation gap is large and structural — interspecies differences in metabolism, receptor distribution, immune response, and tissue physiology accumulate.

For BPC-157, over 80% of indexed literature is preclinical, concentrated in one research group (Sikiric, University of Zagreb). Reading those studies as direct evidence for human use is a category error. They authorize the question — does this work in people? — not the answer.

In vitro and mechanistic plausibility

In vitro studies (cell culture, isolated organ preparations) and mechanistic plausibility arguments sit at the base of the ladder. They are the foundation on which all upstream evidence is built, but on their own they authorize nothing about clinical practice. A peptide that "modulates VEGFR2 in HUVEC cells" has a plausible mechanism; whether that mechanism translates to healing in a human ankle is a question requiring trials, not assumptions.

What changes the rung's weight

Even within the same level of evidence, study quality varies enormously. Consider these modifiers:

Sample size. n=43 versus n=4,300 are both RCTs but answer different questions. Small samples can detect large effects but miss moderate ones and cannot characterize safety adequately.
Follow-up duration. A 12-week trial says nothing about a peptide used for years. Most peptides outside GLP-1 lack human trials longer than 12 months.
Outcome relevance. A trial measuring weight loss is different from a trial measuring cardiovascular events. Surrogate outcomes (a biomarker change) are weaker than hard outcomes (death, infarction, fracture).
Risk of bias. Inadequate blinding, post-hoc analyses, undisclosed conflicts of interest, selective reporting — all reduce confidence in reported effect size.
External validity. A trial in healthy young men in Stockholm tells you something different from what a trial in older adults with diabetes in Brazil tells you. Generalization across populations is not automatic.
Replication. A single trial with positive result is weaker evidence than three independent trials reaching the same conclusion. The replication crisis has shown that single-trial findings often shrink or disappear when reproduced.

How to read "studies show" in peptide marketing

The practical heuristic for an adult patient or curious reader is to ask, in order:

What level of evidence is this? RCT, observational, animal, in vitro?
What is the sample size? A trial of 8 people is not a trial of 800.
What was the comparator? Placebo, active comparator, or no comparator?
What was the duration? Days, weeks, months, years?
What outcome was measured? A surrogate biomarker, a clinical endpoint, a patient-reported measure?
Was the study replicated? Single trial or convergent finding across multiple trials?
Who funded the study? Manufacturer-funded studies are not invalid, but the conflict of interest is real.

For most peptide claims circulating outside the GLP-1 class, the answers will be: animal model or small in vitro, n less than 50 if any human data, no comparator, weeks not months, surrogate outcome, no replication, often manufacturer-affiliated authors. That answer profile does not invalidate the question — it sets the level of confidence the reader can extend to the claim.

Anecdote, testimonial, influencer

Personal testimony — "I took peptide X and felt better" — sits below the formal evidence ladder. It is real experience and useful for hypothesis generation, but contaminated by placebo effect, memory bias, regression to the mean, and self-selection. The person who tries five peptides and reports on the one that "worked" filters out four negatives by definition.

Influencer content compresses these biases at scale. The format rewards positive testimony, dramatic before/after, and certainty — exactly the features that move clicks but the opposite of what cautious clinical reading requires.

Why this matters for peptides specifically

The peptide field in 2026 has a dataset asymmetry. A few molecules — liraglutide, semaglutide, tirzepatide — have multi-thousand-participant phase 3 RCTs, cardiovascular outcome trials, and over a decade of post-marketing pharmacovigilance. They are well up the ladder.

Most others — BPC-157, TB-500, CJC-1295, ipamorelin, MK-677 — sit much lower. The literature is dominated by animal studies, small phase 1 or 2 trials in healthy volunteers or single populations, and case series. That is the actual evidence base, regardless of how confidently catalogs and forums present it.

A reader who has internalized the ladder reads "BPC-157 has been shown to heal tendons" differently from someone who has not. The first reader knows to ask: shown where, in whom, with what design, replicated by whom? The second reader takes the claim at face value.

What we know

Well-conducted meta-analyses of large RCTs sit at the top of the clinical evidence ladder.
RCTs vary enormously in quality — sample size, duration, blinding, outcome relevance all matter.
Observational studies capture signals RCTs cannot, but cannot establish causation alone.
Most "studies show" claims in peptide marketing cite preclinical animal work or small uncontrolled human series.
Replication across independent studies strengthens any finding; single-trial findings often shrink under replication.
Within the GLP-1 class, evidence depth ranges from extensive (liraglutide, 15 years post-marketing) to early (retatrutide, phase 2 only).

What we don't yet know

Whether smaller observational signals for off-label peptide use will be confirmed by future RCTs. For most non-GLP-1 peptides, those RCTs have not been planned, let alone conducted.
The long-term post-marketing safety profile of tirzepatide and the long-term pharmacovigilance signals for retatrutide once it reaches approval.
How AI-assisted evidence synthesis tools (automated systematic reviews) will reshape the ladder. The methodology is evolving.
The right framework for evaluating peptide compounding-pharmacy data, which is rarely structured for indexed publication.

How to use this in a medical visit

The ladder is not just for reading articles. It is also a tool for the medical visit itself. Practical questions a patient can ask the prescriber:

What is the evidence supporting this peptide for my indication?
At what level — RCT in my population, RCT in another population, animal data, mechanism only?
What sample size and duration?
Is there an alternative with stronger evidence?
What are the gaps the literature has not yet filled?

A physician comfortable with their prescription will answer these questions calmly. Discomfort or dismissiveness when asked is itself a signal.

Editorial closing

The evidence ladder is not a way to dismiss any claim short of meta-analysis. It is a way to read claims with proportionate confidence. Mechanism plausibility is a real reason to study a compound; it is not a reason to use it. Animal data justifies a phase 1; it does not justify daily prescription. Small phase 1 in healthy volunteers justifies phase 2; it does not justify recommendation for chronic use.

The most consequential claim in peptide marketing in 2026 is not the false positive — it is the level confusion. Animal data presented as if it were human evidence. Mechanism plausibility presented as if it were clinical efficacy. Single-trial finding presented as if it were a settled fact. Reading those claims with the ladder in mind is the first defense.

pephealth's editorial position is not skepticism for its own sake. It is calibration. Some peptides have strong evidence in narrow indications. Most have weak evidence outside investigational use. Recognizing that distinction in plain language is the work of this guide — and the work the patient continues into the medical visit.

Last reviewed: 2026-04-27. This is an English adaptation. For regulatory specifics in Brazil, see the Portuguese version.