There are a lot of unanswered questions here, which is unfortunate
I'm the parent of a child who had severe gender dysphoria of the ROGD variety. We didn't go down the affirmation path and his dysphoria has been gone for almost two years.
Having gone down the rabbit hole deep on this topic, here's something obvious to me. Most of the adolescents showing up at gender clinics are dealing with other traumas: not fitting in at school, grossed out by puberty, unable to connect to people because they're autistic. Around ages 13 to 15 are the worst years for these kids. Of course they're depressed and that pain shows up in many ways.
The ones that actually get to a doctor in a clinic are most likely going through a rough patch when they get there. Most of them will feel better after two years of some treatment or no treatment just because they grow up a little and find some friends or a tribe to hang out with.
Any study that says trans kids feel better after two years with transitioning needs a really strong control group. The "feel better" effect of just being 16 instead of 14 is huge.
It is right to be skeptical of science when it has an agenda. For all these gender clinics who've been doing what they've been doing to kids for so many years, they absolutely must prove that they're helping, and helping a lot. It's more than just their jobs at stake.
Well and clearly written! This is a difficult concept to get across, as you and Kerr have discovered. The coin flipping example was good, but I wonder if there's another example that could be given that focuses more on 'torturing the data'.
Possible XKCD example? https://xkcd.com/882/
I'm looking forward to part 2, which I'm guessing will use the word Testosterone (or T) at least 20 times.
Excellent and thank you. The endless harping on "appearance congruence" really got to me. I mean, yes, if you give a female testosterone they will usually grow a beard, but are they happier?
The suicide rate in this cohort is terrible, but I know you're getting to that in part 2.
Thanks for making this public, I will share with colleagues!
Am I reading this correctly that there was no 'control' group? They had a cohort of kids treated with hormones and they tracked their progress through 2 years, but didn't compare that to kids who didn't get hormones? The improvements they report are not hugely impressive. I'd definitely want to know how kids who didn't get blockers or hormones faired...
Am I missing something? It seems like to truly say much of anything you'd need a control group? Do they address that in the paper at all?
A variety of LTTE regarding this study are in preparation by a number of researchers. I have submitted a LTTE along with another researcher. Once that LTTE is either submitted or not accepted, I will post the text here.
A total lay-person here. I'm not trying to be snarky- I just want to understand.
Can surveys of feelings of an individual (especially in a hot button area) really tell us anything?
I've lied on surveys because I know the lie will support my position or because I felt the survey was dumb.
How do the investigators know that the answers they are given are truthful? Can it be truthful in the moment the survey is taken but not truthful at other times?
Amazing work! But I think there might be a typo. Just above the ***, you say "Then, when it was time to report their data, they only told us what happened to six of those variables." Shouldn't it be "TWO of those variables"?
Thanks for this invaluable analysis. I noted various major flaws when I read the study- no control group, extremely short assessment period of 2 years, etc, but assume you'll go into that in Part 2.
Please take into account fully the love-bombing role schools play for kids who transition, and how suddenly being the brave and stunning hero with tons of glitter friends and admirers will affect kids' mood. Go to The Anti-Science Disaster of Gender Ideology in The Schools to see what schools teach and do at https://caroldansereau.substack.com/p/the-anti-science-disaster-of-gender.
Thanks for once again doing the work that peer reviewers and editors inexplicably refuse to do.
My normal reaction to this kind of sloppiness is to assume that people are just plain incompetent and ignorant. "Protocol? What protocol?" "Those instruments? Nah, we'll just use these over here." But in this case the stakes are just too high—and the ideology too entrenched—for this not to be deliberate deceit. What I don't get is now they could not have known that Jesse Singal would see right through their shoddy results.
I can't wait for the pop sci outlets to cover this angle! /s
Actually I'm most surprised by the small effect sizes. 2 years of treatment, cherry picked outcome measurements and all we get are marginal improvements?
My concern with transition medicine has always been poor screening, and a move away from gender dysphoria (basically a subset of body dysmorphia) towards gender ideology which I believe is incoherent. I assumed that treatments resulted in big improvements for those who need it. Perhaps that is the case and the poor screening offsets those benefits in the aggregate even for those receiving treatment.
Edit: I forgot to include, very clear for such an in-the-weeds topic. Glad you're writing on this Jesse.
As an MD I can speak to the cultural phenomenon of people wanting to mine the data until something pops positive. There’s pressure to publish-publish-publish and having a statistically significant result is how you do that. Promotions in academic medicine are most often based (in large part) on scholarly output.
I agree in part w/ the folks you mentioned who didn’t necessarily see anything wrong with reporting on statistically significant relationships, even if it wasn’t what was being looked for. But framing and context is vitally important and what WASN’T found is just as important as what WAS (but this is often harder to publish).
It is good the NEJM included protocols for this paper - how else would you have been able to discern something may be missing in this analysis?-but I am disappointed in the framing/overstatement of study findings by the authors and I also am very surprised that the most important scales/results (suicidality related) were not mentioned at all. I think that if the authors wanted to publish this info separately, it could have been mentioned that “so and so pattern is emerging in these results, not statistically significant, authors plan to continue to connect longitudinal data…” to leave out any mention of suicide or self-harm measures seems extremely strange.
Thank you for digging into this. I read through the study when it came out, and none of this would have occurred to me to look into.
What I did notice (and am curious about your take on): is it normal that they averaged the scores, rather than reporting on what percentage of the patients showed a clinically significant improvement or decline in each measure? As a parent, that's what I'd want to know: the percentage of kids that benefit from treatment.
More specifically, the way I'm reading the measures, it seems possible that dramatic improvements in a minority of children could even out minor declines in a larger group. I'm not a researcher, so I don't know how common it is to average scores like this. But I could see how you could come up with those numbers even if, say, just 20% of the kids dramatically improved, 50% stayed about the same, and 30% got marginally worse.
Could this be just as likely an outcome as anything else? And if so, are there any sort of ethics standards that would stop researchers from doing this?
The declines in the anxiety and depression are statistically significant, but the effect size is quite small. I'm more than a little skeptical that those declines are even clinically significant. CBT would probably have a bigger effect.
Some of the comments here suggest that the authors of this study are politically motivated and knowingly engaged in misconduct. That may be true. However, I think this kind of sloppy science is actually the norm across many areas of research. Everyone is doing it, there are professional incentives to find positive results, there are professional disincentives to criticizing the work of colleagues too seriously, etc...
I don’t mean to defend bad research practices, just to suggest the causes can be very mundane, not malicious.
Jesse, SexMatters Technical paper (Dec. 2022)"Gender-questioning teenagers: puberty blockers and hormone treatment v placebo" is also important to consider. Finds that average improvement in mental health over course of gender treatments is no bigger than for placebo in other mental health-measuring studies. You probably have already seen it, but just in case:
For any statistical laypeople who are interested in understanding how and why p-hacking, Harking, and other methodological malpractice has (and continues to) operate within the scientific 'community', and what approaches - such as open science - are being encouraged to reduce its prevalence, this is a pretty good breakdown: https://www.youtube.com/watch?v=0a9MmloTRO4 (you can probably skip the first 6mins unless you are interested in an unnecessarily long introduction)
I don't think it is too heavy in statistical jargon, so should be relatively easy to follow.