top of page
Horiz_red_podcast.png

#448 - Are NICU Outcomes Actually Getting Better Over Time? (ft Dr. Joseph Kaempf)



Hello friends 👋

What does it mean to truly improve outcomes for very low birth weight infants, and are we actually doing it? In this episode, Daphna sits down with Dr. Joseph Kaempf, neonatologist and Medical Director of Value Research and Innovation at Providence Health System in Oregon, to examine some uncomfortable truths about neonatal quality improvement. Dr. Kaempf shares findings from a study spanning 16 NICUs over 14 years showing that composite morbidity outcomes have remained flat while length of stay has increased. He explores why traditional QI tools like driver diagrams and PDSA cycles may no longer be sufficient, and why augmented intelligence may be the next frontier. The conversation also touches on culture as a driver of NICU performance and the gap between institutional interests and true shared decision-making with families. A candid episode for anyone invested in the future of neonatology.


Link to episode on youtube: https://youtu.be/0wn9tvzfXN0


----


The transcript of today's episode can be found below 👇


Daphna Yasova Barbeau, MD (00:01.88) Good morning, everybody. I am back in the studio today. My colleague Ben is keeping the wards at bay. I am back in the studio today with Dr. Joseph Kaempf, neonatologist from Portland, Oregon. Dr. Kaempf, thank you so much for being here today.


Joseph W. Kaempf, MD (00:22.067) Thank you. I am very happy to be here.


Daphna Yasova Barbeau, MD (00:25.442) We have a lot to talk about, so we are going to work our way through. You have no shortage of articles and breadth of expertise for us to discuss. For people who don't know, you are the Medical Director of Value Research and Innovation for the Women and Children's Institute of Providence Health System in the Oregon region. You have spent decades engaged in clinical quality improvement (QI) with a number of health systems — Kaiser Permanente, Northwest Newborn Specialists, and the Providence Health System. You have focused on quality, safety, patient satisfaction, and something we are especially going to talk about today: value in neonatal care. You have led numerous clinical investigations and CQI initiatives with an extensive record of developing and publishing randomized controlled trials, collaborative studies, and innovative clinical care guidelines.


What we plan to focus on today, I hope people will take away, is how your work has really served to interrogate some of our clinical practices as a community of neonatal health professionals. Before we dive into the hard questions, tell us a little bit about your pathway to QI.


Joseph W. Kaempf, MD (01:43.113) Thank you again for inviting me to your outstanding series. I appreciate you and Ben and the effort that you put into this to help us all improve. My first QI project was as a fellow — and quality improvement was not really a phrase back in the late 1980s. Interestingly, it was indomethacin [?] dosing, because the attendings were frustrated with how everybody was approaching the patent ductus arteriosus (PDA), and it is ironic because we are still wrestling with that today. My first project was 39 years ago, and it has been nothing but interesting ever since.

I got involved in a number of randomized controlled trials, and our NICUs in Portland were early joiners of the Vermont Oxford Network (VON) in the early 1990s. We very quickly participated in VON projects. What really accelerated that process for me and our Providence system is that we formed a subgroup within VON called the POD. That started as eight NICUs and grew to ten NICUs across North America. We did some really detailed, deep-dive analytics into outcomes of very low birth weight (VLBW) infants, and we stayed together from the late 1990s all the way to about 2020.


That led to several different roads looking at analytics of morbidities and mortalities. The focus that intrigued me was: why is there such variance between NICUs that all have good people with good hearts — smart, hardworking people, whether in Oregon, Florida, New York, or Canada — with such divergent results that are not explained by obvious risk analysis and risk adjustment? That continues to intrigue me and has really befuddled just about everyone in the world. Why does one NICU consistently have certain lower rates than others?


I tell younger people there are good reasons to be optimistic. Through discussion, dialogue, and careful collaboration, we can figure out what cultures of excellence are doing. Just like sports, restaurants, or automakers, there are pockets of excellence — NICUs that consistently, year after year, perform well. Why is that? I am hoping you and Ben and your community can help figure that out.


Daphna Yasova Barbeau, MD (05:12.737) I think that is what we really loved about your papers. We have the evidence, and even when we are all using the evidence, there is still variance. There must be something we are not measuring. I wanted to ask — QI was still developing at the time you found your interest in it. How can people forge a path in neonatology for something that is new and different, and that then becomes the standard of care?


Joseph W. Kaempf, MD (06:09.247) That is a good question, Daphna. One thing that concerns me — and I think anyone my age would agree — is that we had more resources in the 1990s. It was a kind of golden age. For instance, we would take 15 or 20 people from our NICU to the VON Annual Meeting, run by our good friends Roger and Jeffrey. That is unheard of today. Many NICUs struggle to send even one or two.


I am concerned for young physicians. It is much more difficult to be a neonatologist today, partly because information is overwhelming, information sources are everywhere, budgets have been cut, and we are not devoting enough time and resources to quality improvement. I was recently speaking to a group of younger neonatologists and fellows, and one of them asked me what VON stood for. That is not because this person is not smart and energetic — it tells you they were working in a place that was not emphasizing QI collaboration. That is a big challenge: are our future leaders getting enough time and resources to support robust quality improvement?


Daphna Yasova Barbeau, MD (07:56.078) That is a great segue to start talking about some of your written work. One of your most recent papers, published in February in Pediatrics, is entitled "Composite Metrics to Assess Quality Improvement in Very Low Birth Weight Infants 2010 to 2023." Across 16 NICUs, outcomes for VLBW infants did not improve much between 2010 and 2023 — and hospital length of stay actually increased. If outcomes were flat and length of stay went up, that value metric of efficiency worsened. Tell us about this work and why it was important to do.


Joseph W. Kaempf, MD (09:02.988) Let me give you a nutshell, and I will use some metaphors from other industries. The differences in outcomes between NICUs across the United States, Canada, and Europe are astounding. It would be like looking at automobile production and asking: why does this particular company consistently put out a product that is more reliable, works well, is comfortable, and is priced well? Some companies do that well and others do not.


Through our VON work in the 1990s and the first decade of the 21st century, we saw this variance despite similar techniques across institutions. So we formed something called the benefit metric, where we asked: which NICUs discharge premature infants — VLBW infants, essentially 22 to 32 weeks — with the highest percentage going home with the fewest morbidities? We focused on the seven major morbidities: chronic lung disease (CLD), severe retinopathy of prematurity (ROP), severe intraventricular hemorrhage (IVH), necrotizing enterocolitis (NEC), and others. Those seven correlate quite well with future health. If a child born premature did not have any of those seven major morbidities, their chance of being healthy approaches that of a term-born infant — about 90%. So that is the simple goal for neonatology: discharge babies home with their families as efficiently as possible, with none of the seven major morbidities. We looked at our POD NICUs, did traditional risk adjustment using VON risk adjusters — gestational age, birth weight, inborn versus outborn status, birth defects, and delivery mode — and found significant differences. There were two or three NICUs in this group that, year after year, outperformed the others. Our motto was: share massively, steal shamelessly. We then expanded to 39 NICUs and found the same result: tremendous variation. And something important for younger folks — high performers are not just big NICUs or just small community hospitals. It is a mix. Huge children's hospitals can perform well or not; small community NICUs can perform well or not. There is hope for all of us to learn from each other. And you can learn just as much from low performers. Understanding why one NICU consistently has a high infection rate or a high NEC rate gives us enormous insight. Rising tide lifts all boats.


Daphna Yasova Barbeau, MD (14:03.553) We love that quote. And to your point, we have identified high performers but often do not know why they are performing well. It is only in comparing high and low performers that we can identify what is being done differently. Does the paper underscore that we have been measuring the wrong outcomes, or is there something harder to detect that we are missing?


Joseph W. Kaempf, MD (14:42.826) I do not think we are measuring the wrong outcomes. What we found was that this benefit metric — this composite risk-adjusted total morbidity-mortality score — is flat across the Providence network of 16 NICUs. Wonderful people, working hard, for 14 years, not getting better, and getting more expensive because length of stay is increasing in every NICU. It would be like a car company telling you: our cars are not getting better, and by the way, they cost more. Why would you buy that car?


I am not criticizing Providence — this pattern is everywhere. There is no large network showing improvement. There is no report from babies born after 2017, from any network, showing that major morbidities are declining. The Canadian Neonatal Network showed some progress to about 2017. The California CPQCC (California Perinatal Quality Care Collaborative) similarly. But not since 2017. Our report is the first to track births all the way through 2023. The challenge for all of us is that we need to reverse this. Do families know this? Things are not getting better as a whole, and they are getting more expensive. There have to be reasons for it, and we need to find them.


Daphna Yasova Barbeau, MD (17:02.615) If we are not measuring the wrong outcomes, then maybe there are invisible factors we are not capturing. The paper highlights that it may not be just our clinical practice. We all thought that if we standardized clinical practice, outcomes would improve. And now even after achieving standardization, outcomes still have not improved. Maybe there are other things going on in high-performing units — culture, structure, environment — that are outside of clinical practice. How large a role do you think those factors are playing?


Joseph W. Kaempf, MD (17:48.672) There are clearly things we are not measuring. We have about eight risk adjusters currently, but there are probably 80. We are working on that now. Social determinants of health, for example, are not typically in the VON matrix: things like family income and education levels. Those need to be examined, and you need to get into the electronic health record to interrogate them properly.

Another area is maternal illness. If we simply record whether a mother had diabetes — yes or no — that hardly captures the complexity of diabetes alongside gestational weight gain. The same applies to hypertension, chorioamnionitis, and the use or misuse of antenatal corticosteroids. Those nuances are simply not accounted for, and they may explain some of the differences between NICUs.


And then there is the inborn-versus-outborn classification, which is not nearly refined enough. A 25-week infant transferred within a couple of hours of birth is very different from a 25-week infant transferred at 12 days of age on high-frequency oscillation with chest tubes. And the way diagnoses are attributed matters enormously. A large children's hospital may receive an infant at two or three weeks of age, but when the diagnosis of CLD is made at 36 weeks corrected, it is attributed to that children's hospital — even though they did not manage that infant through the critical early period. That is a clear weakness of the risk adjustment used across most networks.

We can now track transfer timing down to the minute using logistic regression, and I suspect that when we do — alongside more robust social determinants and maternal data — the NICU ranking order we have seen in our papers is going to change. That matters because our ultimate goal is to identify the authentic cultures of excellence and learn from them.


Daphna Yasova Barbeau, MD (21:34.35) When we look at another unit and say this is how they do things — is it possible for all units to replicate that, or is each unit responsible for creating its own culture? What is transferable, and what does each unit have to build from the ground up?


Joseph W. Kaempf, MD (22:23.094) Culture clearly differs between NICUs. Having worked in about 17 NICUs over my career, I think of the first line of Tolstoy's Anna Karenina: all happy families are alike; every unhappy family is unhappy in its own way. NICUs are a little like that. You can almost feel it when you walk into a NICU that is functional and working well. It is hard to objectify — it involves things like non-hierarchical conversation, a culture where people can speak up without fear. Even simple things, like nurses addressing physicians as "Dr. Kaempf" while the physicians call the nurses by their first names. That dynamic reflects something real about hierarchy. Similarly: are families genuinely integrated into rounds, or are they there in name only?

Then there is the culture of learning. NICUs that score well on cultural surveys also score high on the benefit metric. It is like the often-cited finding about functional families: across all sorts of social and demographic factors, the single thing that most reliably correlates with family functioning is the number of times they eat meals together per week. Is there a similarly sublime measurement for NICUs? We have not identified it yet, but it comes closest to what I would call a culture of learning and humility — saying, we do not know everything, and we can learn from everyone, whether that is a parent, a respiratory therapist, or a physician. That spirit is shared in cultures of excellence.

How do you measure it? Scientists will push back and say: let us measure pulmonary flows and cytokine levels. All of that is interesting. But how do you measure love? How do you measure genuine curiosity? Combine robust statistical risk adjustment with a serious look at human factors. The high-performing NICUs we have studied share a spirit of: let us be good scientists, let us be evidence-based, and let us also have a culture of learning and sharing. The good news is that NICUs are doing it well. We are working with Kaiser Permanente — they have some incredibly high-performing NICUs. Let us study them and share what we find.


Daphna Yasova Barbeau, MD (26:56.747) I love that reminder that you do not have to choose one or the other. You can have a NICU that is warm and inviting and grounded in evidence. That brings me to my next question. A number of your papers suggest that traditional QI approaches — protocols, bundles, checklists — have reached a plateau. What does the next leap in neonatology require? What needs to be fundamentally different?


Joseph W. Kaempf, MD (28:01.74) I will say something that sounds a little radical, but I believe it to be true. I grew up with the traditional model — the Institute for Healthcare Improvement (IHI) Model for Improvement, driver diagrams, Lean, Six Sigma, VON courses. I have not just visited but participated fully in all of those. That model is not working. It is too time-consuming. It worked in our POD group because of intensive effort over years — thousands of hours and hundreds of thousands of dollars. It does not scale well.


I believe the opportunity is augmented intelligence (AI) — and when I say AI, I mean augmented intelligence, not artificial intelligence. There is nothing artificial about machine learning and large language models (LLMs). It is moving very quickly. I was listening to an obstetrics meeting where the group was working through a driver diagram to reduce caesarean section rates. While they were talking through primary drivers, secondary drivers, and PDSA (Plan-Do-Study-Act) cycles, I built a driver diagram on a large language model. When I showed them, they asked how long it took. I built it while they were talking. These tools need human review — I understand that — but they are advancing quickly.


We need to use augmented intelligence to inform driver diagrams, PDSA cycles, data collection, and feedback. If driver diagrams were so effective, why is nobody improving? Why is chronic lung disease increasing everywhere in the world? Part of it is that we are not moving fast enough and not collecting and analyzing the right data. I am not saying abandon the IHI model — I am saying it needs to be used differently. We have a generation of young people who grew up in the digital age. Help us use augmented intelligence to obtain data faster, analyze it, feed it back, and drive true dialogue. Do we need humans in the loop? Of course. But the traditional method as it stands is not going to get us where we need to go.


Daphna Yasova Barbeau, MD (32:44.961) Throughout your career, what has been the hardest tradition in neonatology to question? And how have you been received when data pushes against that tradition?


Joseph W. Kaempf, MD (33:35.045) I appreciate you asking that, because I have a master's in bioethics and have spent a fair amount of time thinking about what we are doing. There is something to unreasonable expectations. Our Buddhist friends would say that wanting is the source of suffering. We want every infant, every pregnant woman, every family to be healthy — but we have to temper that with: are we expecting too much? Are we ever going to reach a 10% CLD rate overall? No. Are we going to ensure every 22- or 23-week infant survives and is healthy? No. How are we distributing resources? If we spend money on X, we do not spend it on Y. It is a finite pie.

I work in a large health system where a recent multimillion-dollar gift went to heart disease research — which is wonderful. But does it help pregnant women, young families, prenatal care, immunizations? We need to wrestle honestly with where we are putting our time and resources, within healthcare overall and within neonatology specifically. If we spend enormous effort on the most complex, exotic care, are we adequately caring for 33- and 35-week infants? I do not claim to have the answers, but these are questions we have to ask.


Daphna Yasova Barbeau, MD (36:36.874) I really appreciate your honesty. As neonatologists, we feel we are on the cutting edge of medicine — and I am not sure that is always true. What is unique about neonatology is that our patients keep changing. The 30-weekers are not the same as the 28-weekers, who are not the same as the 25-weekers. We need to be honest about that. When we go to conferences, we cannot just celebrate what has been published. We have to ask: how has this actually changed outcomes for the patients we care for?


That brings me to counseling, particularly in periviable situations. What is our responsibility to translate uncertainty to families? When families come to the NICU, they often believe we can do everything. How can we be honest about our real limitations?


Joseph W. Kaempf, MD (38:35.405) It comes down to a couple of things. The first is the difference between uncertainty and risk. Physicians deal with uncertainty: here is the CLD rate, the mortality rate, the IVH rate. But families deal with risk. Risk is uncertainty multiplied by the chance that the bad outcome happens to their child. We have to be careful not to simply transfer our uncertainty to the family — because it is their risk, not our statistical abstraction.


That is why the American Academy of Pediatrics (AAP) and the American College of Obstetricians and Gynecologists (ACOG) unequivocally support shared decision-making. True shared decision-making is informed consent — a family has a reasonable understanding of what could happen, and they are given genuine choice. We are not seeing that consistently across the United States, Canada, or Europe. One institution will resuscitate essentially 100% of their tiniest infants and call it shared decision-making. Another resuscitates perhaps half and also calls it shared decision-making. How can both be true? Something very different is happening at those institutions.

We also need to be careful about conflicts of interest. There are tremendous conflicts embedded in physician behavior — research interests, career advancement, income, and the health system's bed space and marketing goals. Those conflicts do not always align with what a young family needs to hear. Families are diverse — they may be young or older, Christian, Jewish, Muslim, non-religious — and that beautiful cultural variety does not always match institutional interests.


The second point: our data has clearly shown that bigger is not always better. There is a general sentiment that if you have a large NICU doing everything, you must be best at everything — and that is simply not true. There are mid-sized NICUs that do not perform extracorporeal membrane oxygenation (ECMO) or specialize in 22-to-24-week care, but they excel at 25-to-32-week care. You do not need to practice on more extreme cases to be excellent. Many smaller or mid-sized NICUs outperform large university hospitals on inborn mortality for 27-week infants. That has been shown in Europe, Canada, and the United States. True informed consent, true shared decision-making, and an honest accounting of where our time and energy go — those two things matter enormously. If we devote enormous effort to the most technologically complex forms of care, it exhausts our capacity to provide excellent basic neonatal care. People only have so much time and energy, and we cannot do everything.


Daphna Yasova Barbeau, MD (43:27.994) You cannot do everything — or you cannot be good at everything. When units become highly specialized, say tiny baby units or neuro-ICUs, do you think there is a risk of losing generalizability of care for the in-between babies?


Joseph W. Kaempf, MD (43:27.994) Very much so. That is what our data shows. Many of these specialized units — wonderful, loving, smart people — are not showing improved outcomes. They are not showing improved neurodevelopment. We wrote a meta-analysis showing that neurodevelopmental outcomes are not improving over time. Not just morbidity rates, but two-year outcomes. School-age data published around the world suggests that more complex neurodevelopmental outcomes — attention deficit hyperactivity disorder (ADHD), autism, executive functioning, soft neurologic findings — are actually worsening. Do parents and families know that? In many ways, outcomes are not only flat — they are worsening in some domains. We have to be really honest with ourselves. How can we collaborate better? How can we bring the humility to step back and ask where we are devoting resources and time, and how we can serve true population health? Every health organization says they are committed to population health. Neonatologists, pediatricians, obstetricians — let us be genuinely devoted to it.


Daphna Yasova Barbeau, MD (44:49.718) I could keep talking for hours. My last question: after all these decades in neonatology, what is something about infants or your work in the NICU that still surprises and excites you?


Joseph W. Kaempf, MD (45:21.446) The deep love and compassion that nurses, respiratory therapists, and physicians carry — that sense of what suffering means for families, and what joy means. That beautiful connection, regardless of income or race or creed — the love we have in NICUs for families — has not dimmed over time. It was the same when I started my residency at Johns Hopkins in 1983 as it is today. That tremendously motivates me. How can we combine the love that is already there with curiosity, intelligence, and persistence? Let us not give up. Let us bring the best hearts and the best minds together.


Daphna Yasova Barbeau, MD (46:40.748) That is beautiful. On that call to action, we will wrap up for the day. Dr. Kaempf, thank you so much — not just for being here today, but for all the work you have done for babies and families, and that you continue to do.


Joseph W. Kaempf, MD (46:56.518) Thank you.


 
 
 

Comments


bottom of page