How can we learn from incomplete clinical data?

Posted by:

Homi-Lung

On:

June 10, 2025

Hospital-acquired pneumonia (HAP) is notoriously hard to study. The cohorts are small, the measurements are high-dimensional, and—crucially—whole blocks of information are missing because it is impossible to sample every variable from every patient at every time-point. It is a research reality that frustrates both biologists and data scientists: discard all partial cases and you lose statistical power; fill the gaps with guesses and you risk introducing bias.

Computer scientist Tony Ribeiro found himself facing that dilemma while analysing HAP datasets with colleagues from Nantes and Tokyo. Working with co-authors such as Prof. Antoine Roquilly, from Nantes University Hospital and coordinator of Homi-Lung, the team wanted to understand how early biological signals evolve into full-blown infection. Yet up to 70 % of the entries in their tables were blank — a figure they visualised clearly in a support poster, with whole columns reduced to question-marks and grey boxes.

Rather than force the data into a shape it did not naturally have, Ribeiro decided to embrace the uncertainty. The result is a paper — Learning From Interpretation Transitions with Unknowns — that was presented last autumn at the 4th International Joint Conference on Learning & Reasoning (IJCLR 2024) in Nanjing. The last author of the study is Prof. Katsumi Inoue, from the National Institute of Informatics in Tokyo.

Modelling the gaps, not ignoring them

The intellectual engine of the work is an extension of the Learning From Interpretation Transitions framework, or LFIT. LFIT treats a biological system as a set of logical rules: If gene A is high and bacterium X is present, then inflammatory marker B will rise at the next time-point, and so on. Classic LFIT, however, assumes that every variable is observed. Ribeiro’s insight was to rewrite the mathematics so that each “missing” value is treated explicitly as an unknown. The new rules still describe how the system can move from one state to the next, but they come with a built-in admission of what we do not know.

That small conceptual shift has big consequences. Because the method refuses to guess invisible values, the resulting model is an over-approximation of reality: it might allow a few transitions that never occur in practice, but it never overlooks the transitions that truly matter. In other words, it errs on the side of caution—a virtue when drawing biological conclusions from imperfect data.

From theory to algorithm—and back to the clinic

To make the theory usable, the authors implemented it in an open-source package called pylfit. On benchmark Boolean-network problems from the literature, the GULA algorithm recovered the correct rules ten to one-hundred times faster than a brute-force search, even when half the data were deliberately masked as unknown.

The results presented in Nanjing bring the discussion back to Homi-Lung. The message is clear: real clinical data are messy, but logic can still find structure inside the mess. Thiscould accelerate the hunt for early biomarkers of HAP progression.

What happens next?

Ribeiro and Roquilly are running GULA directly on the Homi-Lung cohorts, comparing its rule-sets with the pathways already suspected by clinicians. If the logic confirms those hunches — or uncovers new ones — it will become a core component of the consortium’s analytical toolbox.

In the long fight against hospital-acquired pneumonia, learning how to learn from incomplete data might prove as valuable as any new assay or sensor.

Bibliographic reference
Ribeiro T., Folschette M., Magnin M., Okazaki K., Lo K-Y., Roquilly A., Poschmann J., Inoue K. Learning From Interpretation Transitions with Unknowns. Proceedings of IJCLR 2024, Nanjing, China. DOI / HAL: https://hal.science/hal-04634356