Clinical Text2Json Encoder: Milliseconds for Medical Information Extraction

May 5, 2026

TL;DR: Today, I’m releasing ClinicalEncoder Text2JSON, a new demonstration of the capabilities of our latest model, which understands multilingual clinical texts at millisecond speed, with token-level precision. This new demo combines the powerful capabilities of our model to map words to ontologically-grounded clinical concepts with a fine-grained qualifier classification system, in order to provide in an instant several signals such as temporality, uncertainty, context, and status (improving/stable/worsening). Try the live demo, explore the model on HuggingFace.

Stop guessing clinical context. Start structuring it.

ClinicalEncoder Text2JSON is a natural extension of the vision I’ve been pursuing: not just mapping words to concepts, but understanding how those concepts exist in context. Extracting “diabetes” is trivial. Understanding whether it’s newly diagnosed, self-reported, improving, or merely suspected—that’s where real clinical intelligence begins.

This demo showcases exactly that.

From Tokens to Clinical Graphs

Every word in a document is mapped to a clinical concept grounded in SNOMED CT. But more importantly, each concept is enriched with a structured set of qualifiers that define its meaning in context.

You can double-click on any concept in the interface to open its corresponding page in the SNOMED CT ontology and explore its full definition, hierarchy, and relationships. This creates a seamless bridge between raw text and formal medical knowledge.

Qualifiers: Where Meaning Emerges

Concepts alone are not enough. Clinical meaning lives in qualifiers.

ClinicalEncoder Text2JSON assigns a rich set of attributes to every detected concept:

1. Type — What kind of concept is this?

Examples include:

type="disorder" (e.g., diabetes)
type="symptom" (e.g., headache)
type="procedure" (e.g., MRI scan)
type="medication" (e.g., metformin)
type="finding", type="social-factor", type="goal"
Administrative and contextual types like type="hospital-admission", type="provider", type="date", type="location"

This allows the system to distinguish between fundamentally different categories of clinical information without ambiguity.

2. About — Who does this concern?

about="patient"
about="family-patient"
about="other-person"
about="provider"
about="general"

This becomes critical when parsing family history, provider notes, or general statements.

3. Context — What is happening to this concept?

Examples:

context="self-reported"
context="already-diagnosed" vs context="newly-diagnosed"
context="already-performed" vs context="newly-performed"
context="already-prescribed" vs context="newly-prescribed"
context="already-discontinued" vs context="newly-discontinued"
context="already-at-risk" vs context="newly-at-risk"

Context transforms static mentions into clinical events with lifecycle awareness.

4. Status — What is the current state?

status="affirmed" or status="negated"
status="normal" or status="abnormal"
status="improving", status="worsening", status="stable"
status="increasing", status="decreasing"
status="unknown"

This is where the model captures evolution and polarity—essential for clinical reasoning.

5. Temporality — When does this apply?

when="past-old"
when="past-recent"
when="now"
when="future"
when="never"

Time is one of the hardest dimensions in clinical NLP. Here, it becomes explicit and structured.

6. Certainty — How confident is the statement?

certainty="certain"
certainty="probable"
certainty="possible"
certainty="unlikely"
certainty="conditional"

Future-related concepts are often naturally tagged as probable, possible, or conditional, reflecting real clinical uncertainty.

Why This Matters

Large Language Models are impressive, but they are slow, opaque, and probabilistic when it comes to structured reasoning.

ClinicalEncoder Text2JSON takes a fundamentally different approach:

It does not generate, it understands.
It operates in milliseconds, not seconds.
It produces fully structured, inspectable outputs.

Instead of asking an LLM to “figure things out” from raw text every time, you can:

Use this model to extract precise, contextualized clinical signals
Retrieve relevant knowledge instantly from structured databases or ontologies
Feed that enriched, grounded context into an LLM or SLM

The result?

Better answers
Lower hallucination risk
Full traceability

A New Foundation for Clinical AI

This type of representation built on top of ontology-grounded concepts and fine-grained qualifiers is, in my view, the missing layer in modern AI systems.

It acts as a high-speed reasoning substrate:

Faster than LLM inference
More reliable than prompt engineering
More interpretable than embeddings alone

And crucially, it plays perfectly with generative models.

Rather than replacing LLMs, it augments them:

grounding their outputs
constraining their reasoning
enriching their inputs

What’s Next?

This is just a demo—but it points toward something much bigger.

Work is already underway to:

Integrate this pipeline directly with LLMs and SLMs
Enable real-time structured retrieval pipelines
Expand ontology coverage and multilingual support

I have no doubt that this class of models will power the next generation of clinical AI systems—systems that don’t just generate fluent text, but actually understand medicine.

So stay tuned.

Try It Yourself

Live demo: http://text2json.parallia.eu/
HuggingFace: Explore the CE26AM model
Contact: For collaborations or custom deployments

The future of clinical AI won’t be built on generation alone.

It will be built on understanding.

Country/Region