Two Unsurprising Updates On The Ongoing Self-Immolation Of Expert Authority

Plus: an April event New Yorkers will want to come to

Mar 19, 2025

This article is about expert authority. Perfect timing for me to announce that on April 28 at 6:00 p.m., I’ll be interviewing Dave Zweig at the Village Underground in Manhattan about his upcoming book, An Abundance of Caution: American Schools, the Virus, and a Story of Bad Decisions.

You can buy tickets here, and what’s cool is that if you do, you basically get the book for free. Its list price is $40, but tickets are just under $33. (That’s the power of bulk book-ordering.) If you want to come, you should order tickets as soon as possible because a) I think it will see out given the controversial nature of Dave’s book (and the handsomeness of the man interviewing him), and b) the sooner we know how many books to order, the sooner we can place the order. I’m not making any money off the event; rather, I’m plowing all the revenue into preorders. There will be some sort of drinks thing nearby afterward — location TBA to ticket holders.

Buy buy buy!

Last week I published an article with the straightforward title “If The New England Journal Of Medicine Doesn’t Correct This Error, You Cannot Trust Anything It Publishes.”

We have our answer!

To briefly recap: The article in question was a Perspective piece titled “The Future of Gender-Affirming Care — A Law and Policy Perspective on the Cass Review,” and it was written by the law professors Daniel G. Aaron of the University of Utah (also a physician) and Craig Konnoth of the University of Virginia.

It was a complete mess, and the talented Substacker Void if Removed did an excellent job dissecting it.

My own piece ignored most of what he covered and focused on a single two-sentence passage:

The [Cass] Review calls for evidentiary standards for GAC [gender-affirming care] that are not applied elsewhere in pediatric medicine. Embracing RCTs [randomized-controlled trials] as the standard, it finds only 2 of 51 puberty-blocker and 1 of 53 hormone studies to be high-quality.

I argued that the authors were making a plainly false claim: that the reviews in question — actually commissioned by the Cass Review and conducted by University of York researchers, rather than carried out directly by Hilary Cass and her team — graded the studies based in part on whether they were RCTs. If you simply look at the two papers in question, I wrote, it’s clear that this is wrong: Rather, the reviewers used a grading scale specifically designed for non-randomized studies. So Aaron and Konnoth’s claim needed to be corrected. I reached out to the NEJM to inquire about this, and the authors promised me a response was forthcoming. (I never heard anything from the editor of the Perspective section, Debra Malina, or from the journal’s editor in chief, Eric Rubin, despite following up with them separately.)

I nudged Aaron and Konnoth a few days ago, and you can probably guess what I’m about to tell you.

Give a gift subscription

Get 20% off a group subscription

Here’s the entire response I got from Aaron, verbatim but for some formatting changes to make it more newsletter-friendly, like turning their footnotes into numbers in parentheses and adding links:

Hi Jesse,
Thanks for reaching out to us. You can find our response below. In addition, we are considering publishing our response to comments we have received. Have a great Sunday, and thanks for catching #2.
Warmly,
Profs. Aaron and Konnoth
1. Suggestion that we erred in pointing to the Cass Review’s embrace of RCTs:
We do not say that the Cass Review calls for randomized controlled trials of puberty blockers. We note that the review applies a higher standard to gender affirming care than is applied to pediatric care in general, and that it “[e]mbrac[es] RCTs as the standard.” Both statements are correct. The Cass Review states that RCTs are the “gold standard.” (1) At various points, the Review suggests deficiencies in evidence because RCTs have not been conducted. (2)
This approach is arguably even more problematic than simply outright stating that RCTs are required (again, a claim we never make about the Review). It creates the impression that RCTs are a desirable standard in this context, without outright saying that is the case, thus allowing the authors to seek to avoid criticism when the matter is raised (as this exchange reflects).
Thus, many authors and readers have concluded that the Review demands RCTs. (3) “The Cass Report sets out that randomised controlled trials (RCTs) are the gold standard to assess the efficacy of gender-affirming care (GAC), leading to the implication that the ‘research protocol’ which will be the only method of accessing puberty blockers will be an RCT.” (4) Those supportive of the Review continue its practice of impugning the evidence for lack of RCTs in this context, but still not outright demanding them. (5)
2. Pointing out that, for the puberty blockers study, 1 in 50 studies was deemed high-quality—not 2 in 51.
We agree and will request a correction.
(1) See Cass Review at 49 (identifying RCTs as the “gold standard”).
(2) See id. at 50, 177 (“There are hardly any RCTs in children and young people receiving endocrine treatment”); (“It is often the case that when an intervention is given outside a randomised control trial (RCT), a large treatment effect is seen, which sometimes disappears when an RCT is conducted.”).
(3) See Brief for Amici Curiae Expert Researchers and Physicians in Support of Petitioner, United States v. Skrmetti, 2024 WL 4122031, at *5 (“The Cass Review commits another fundamental error by holding this area of medicine to an evidentiary standard that is not required or typical in pediatrics. The Review asserts that the evidence on gender-affirming medications provided to adolescents with gender dysphoria is “weak” because the research was not based on “high quality” evidence such as randomized controlled trials (“RCTs”).).
(4) Chris Noone et al., Critically Appraising the Cass Report: Methodological Flaws and Unsupported Claims (June 11, 2024), at 9. See also Cal Horton, The Cass Review: Cis-Supremacy in the UK’s Approach to Healthcare for Trans Children, Int’l J. Transgender Health (2024), at 17 (“In the Cass Review approach affirmative healthcare is held to an unachievable standard of RCT evidence.”).
(5) See Kathleen McDeavitt, J. Cohn & Stephen B. Levine, Critiques of the Cass Review: Fact-Checking the Peer-Reviewed and Grey Literature, 51 J. Sex & Marital Therapy 175 (2025) (noting that while the Review does not call for RCTs, “randomization (non-blinded and not placebo-controlled) could reasonably involve treatment arms like psychological support and/or support for social transition” and citing claims that RCTs in pediatric populations are appropriate).

I don’t know if Aaron and Konnoth are being intentionally slippery here or are just very confused, but this isn’t a credible response. For one thing, they seem to think my claim was that they said the Cass Review “calls for” RCTs. There’s a subtle and important difference between that and what I actually emailed them about: their claim that the Cass Review used RCTs as the standard in its official grading of the available evidence (which it didn’t).

The closest they come to actually defending their own claims by citing their own words comes here:

We note that the review applies a higher standard to gender affirming care than is applied to pediatric care in general, and that it “[e]mbrac[es] RCTs as the standard.” Both statements are correct. The Cass Review states that RCTs are the “gold standard.” (1) At various points, the Review suggests deficiencies in evidence because RCTs have not been conducted. (2)

This is baffling. Clearly, their complete sentence about RCTs and the Cass Review, “Embracing RCTs [randomized-controlled trials] as the standard, it finds only 2 of 51 puberty-blocker and 1 of 53 hormone studies to be high-quality,” says exactly what I claimed it says: that RCTs were held to be the gold-standard grading criteria by the Cass Review (or, more accurately, the University of York systematic reviewers Cass hired). Aaron and Konnoth seem to be arguing that the first part of the sentence was meant to be totally disconnected from the second — that their intention was to write a sentence saying, The Cass Review embraced RCTs as a standard for judging medical research, and also — totally unrelatedly — look how few studies it judged favorably.

That’s simply not how anyone would read the sentence. Just for kicks, I queried my three favorite chatbots, ChatGPT, Perplexity, and Claude, with this slightly vanilla-fied passage (I didn’t want there to be any chance that the controversy over this subject in particular would interfere with the answers):

An author writes the following in a medical journal: “The Smith Review calls for evidentiary standards for Medicine X that are not applied elsewhere in pediatric medicine. Embracing randomized-controlled trials as the standard, it finds only 2 of 51 Category A studies and 1 of 53 Category B studies to be high-quality.”
Is the author claiming that The Smith Review, in evaluating which studies are “high-”quality versus lower, uses whether they have the characteristics of an RCT as a criterion?

It won’t surprise you that all three chatbots responded the same way: Yes, that’s what the author is claiming. That’s because these chatbots have basic reading comprehension! (Follow the footnote if you want to read their responses.)1

The most annoying part about this, aside from Aaron, Konnoth, and the NEJM’s editors trying to pretend that a sentence doesn’t say what it clearly says, is the aspersions the authors cast on the Cass Review merely for pointing out certain very basic facts.

That “gold standard” language they point to, for example, comes from the first page of a three-page “Explanatory Box” within the Cass Review simply listing some of the different types of studies found in medical research:

One of the tasks of the Cass Review was to evaluate the available evidence for youth gender medicine. In doing so, the Review pointed out that 1) RCTs are the gold standard of medical research, and 2) this area of research is lacking in them. There is no way for a good-faith reader familiar with both the report and its accompanying materials to jump from these simple observations to Aaron and Konnoth’s claims. In fact, the Cass Review ended up relying on systematic reviews which used a very forgiving scale that ensured the available studies wouldn’t get downgraded solely on the basis of not being RCTs. (And still the conclusion remained the same as all the other government-sponsored reviews of this area of research: The studies are not high-quality!) It’s remarkable that this process is being spun as “biased.” I can’t look inside the heads or hearts of Aaron or Konnoth, but in many cases, I believe, these claims are being disseminated by bad-faith actors who can’t muster substantive complaints about the Cass Review, but who are instead simply upset about what it found and who will therefore do anything they can to discredit it.

I was particularly struck by Aaron and Konnoth’s claim that the Cass Review “creates the impression that RCTs are a desirable standard in this context, without outright saying that is the case. . . . [and t]hus many authors and readers have concluded that the Review demands RCTs. (3)” They seem to be shifting the blame away from the readers of the Cass Review who are misunderstanding it and onto the author of the Review herself.

The citation here is to a brief filed in United States v. Skrmetti, the soon-to-be-decided SCOTUS case about youth gender medicine bans:

(3) See Brief for Amici Curiae Expert Researchers and Physicians in Support of Petitioner, United States v. Skrmetti, 2024 WL 4122031, at *5 (“The Cass Review commits another fundamental error by holding this area of medicine to an evidentiary standard that is not required or typical in pediatrics. The Review asserts that the evidence on gender-affirming medications provided to adolescents with gender dysphoria is “weak” because the research was not based on “high quality” evidence such as randomized controlled trials (“RCTs”).).

Claims of the sort “Many people are saying X” that cite exactly one example are always a red flag, but setting that aside, this quoted bit from the brief is clearly incorrect, for all the aforementioned reasons.

I just don’t understand how the Cass Review could have “given the impression” that it demanded RCTs for youth gender medicine when the most direct evidence we have about its views on the matter — that the systematic reviews did not hold the extant literature to this standard — points in the exact opposite direction, and when Cass’s only mentions of RCTs are to 1) explain what they are; and 2) to point out that non-randomized studies can have inflated effect sizes that shrink considerably when the same subject is studied in an RCT context (which is clearly true, and clearly bears on the question of the overall health of this literature).

Here’s an explanation I find much more likely than that the Cass Review was too confusing for public-health experts to understand: Early critics of the Review distorted it or misunderstood it, and then Aaron and Konnoth credulously repeated one of their false claims (as evidenced by the fact that they didn’t even read the abstracts of the research they were critiquing closely enough to accurately convey its content). Then, once their error was pointed out to them, instead of doing the right thing and correcting that error, they pretended not to have written what they clearly wrote.

This whole thing is embarrassing. If the greatest minds in the country — the people who write NEJM articles and briefs for major SCOTUS cases — are incapable of reading the Cass Review and its supplementary material and accurately conveying its content, then what is their value to policy debates?

Now, was I too harsh, as one commenter suggested, in claiming in my headline that the NEJM’s refusal to correct this would mean we shouldn’t trust anything the journal publishes anymore? Perhaps. I could have phrased it as a question, asking why, if the journal won’t make corrections, we should trust it. After all, I obviously think that the NEJM, like any other journal, publishes plenty of high-quality research that is trustworthy.

But the problem is that if a publication cannot pluck the lowest-hanging fruit imaginable — correcting clearly false claims — then that really does put readers in a difficult position with regard to evaluating that publication’s more obscure or complex claims. Surely the politics of this issue are partly to blame here, but if Malina and Rubin and the NEJM’s other higher-ups will cede accuracy to politics on this subject, who’s to say they won’t cede accuracy to politics (or other considerations) in other contexts?

I don’t have a good answer to that question.

Also, CNN!

About a year ago I published an article headlined “Why Is The Same Misleading Language About Youth Gender Medicine Copied And Pasted Into Dozens Of CNN.com Articles?” I pointed out that dozens of CNN.com articles had had the same false sentence copy-pasted into them, usually verbatim: “Gender-affirming care is medically necessary, evidence-based care that uses a multidisciplinary approach to help a person transition from their assigned gender — the one the person was designated at birth — to their affirmed gender — the gender by which one wants to be known.”

I reached out to CNN about this, didn’t get a convincing answer, and then in early November I X’ed a little bit of a victory lap, writing that “since my piece, the language hasn’t appeared agan [sic].”

What hubris on my part! It’s popped back up at least twice since I posted that:

“26 states have passed laws restricting gender-affirming care for trans youth” by Annette Choi, published December 3, 2024, contains this language: “Gender-affirming care includes medically necessary, evidence-based care that uses a multidisciplinary approach to help a person transition from their assigned gender — the one the person was designated at birth — to their affirmed gender, the gender by which one wants to be known.”

So they swapped it from is to includes — much better!

“A father of a trans man voted for Trump. Now he fears an order targeting gender-affirming care will upend his son’s treatment” by Emma Tucker, published March 15, 2025, contains a box titled “Endorsing Care” with three bullet points. One of them is: “Gender-affirming care is medically necessary, evidence-based care that uses a multidisciplinary approach to help a person transition from their assigned gender — the one the person was designated at birth — to their affirmed gender — the gender by which one wants to be known.” [emphasis in the original.]

There is clearly some standards editor or other higher-up at CNN.com, or simply an entry in a house style guide, strongly encouraging or dictating that this language be inserted into stories on this subject. If you know anything about this, send me an email!

This has been your regular update about how once-trusted sources of authority have decided to commit public ritual suicide. Stay tuned — more on this subject soon, I’m sure.

Questions? Comments? Leaked emails or Slack chats from CNN? I’m at singalminded@gmail.com, on Twitter at @jessesingal, or on Bluesky at @jessesingal.com. Image via Getty.

ChatGPT:

Yes, the author appears to be implying that The Smith Review applies a higher evidentiary standard — randomized controlled trials (RCTs) — to Medicine X than is typically applied elsewhere in pediatric medicine. By stating that The Smith Review “embraces randomized-controlled trials as the standard” and then highlighting the low number of high-quality studies found, the author suggests that the review considers RCTs to be the benchmark for determining study quality.
However, the statement does not explicitly say that The Smith Review categorically defines study quality only by whether a study is an RCT. It is possible that the review has additional criteria for assessing quality, but the author’s wording implies that the emphasis on RCTs plays a central role in the review’s assessment.

Perplexity:

Yes, the author is claiming that The Smith Review uses randomized-controlled trials (RCTs) as a criterion for evaluating the quality of studies on Medicine X. This can be inferred from the statement “Embracing randomized-controlled trials as the standard,” which suggests that The Smith Review considers RCTs as the benchmark for high-quality research.8
The author points out that this standard is being applied specifically to Medicine X, while implying that such stringent criteria are not typically used in other areas of pediatric medicine. This is evident from the phrase “evidentiary standards for Medicine X that are not applied elsewhere in pediatric medicine.”8
The author further supports this claim by providing statistics on the number of studies that meet this high-quality standard:

Only 2 out of 51 Category A studies are considered high-quality.
Only 1 out of 53 Category B studies is considered high-quality.

These low numbers suggest that the RCT standard being applied by The Smith Review is indeed quite stringent, resulting in very few studies being classified as high-quality.5
It’s worth noting that while RCTs are often considered the gold standard for clinical research, their application as the sole criterion for quality can be controversial. Some researchers argue that other study designs can also provide valuable evidence, especially in fields where RCTs may be challenging to conduct.6, 7

Claude:

Yes, the author is claiming that The Smith Review is using randomized-controlled trials (RCTs) as the standard or criterion for evaluating study quality. The sentence “Embracing randomized-controlled trials as the standard, it finds only 2 of 51 Category A studies and 1 of 53 Category B studies to be high-quality” directly states that The Smith Review is using RCTs as its benchmark for quality assessment.
The author is also suggesting that this standard is inconsistently applied, as they note these evidentiary standards “are not applied elsewhere in pediatric medicine.” This implies the author believes The Smith Review is holding Medicine X to a higher or different standard than other areas of pediatric medicine by using RCTs as the quality criterion.

Gregg

Pedant! :D

Seriously, though, this just sort of explains why people don't trust 'science' anymore. Who has time to read all the pieces like Jesse's, let alone to do the research independently. It seems like every 'study' or 'survey' that bumps up against a political issue is subject to questions of bias, inaccuracy or worse.

This is why I tend to look at things based largely on my life's experience. I'm not saying this is a good state of affairs, but who am I going to believe, the New England Journal of Medicine, or my own lyin' eyes?

1 reply

Meefburger

Something that seems a little crazy to me about this response is the double standard for who bears responsibility when readers misunderstand authors. Aaron and Konnoth seem to be defending their critique, in part, by arguing that people are interpreting the Cass review to be making a claim, so it should be treated as if it is making that claim, and it's the authors' fault if they wanted people to think they said something different. But then they write something that most reasonable readers will interpret one way (or, at least, Jesse, me, and three LLMs), and they say it's reader's fault if they misunderstood it.

39 more comments...

Discussion about this post

Ready for more?