Foundation Model Evaluation at a Glance

Curia-2 sets a new standard for general-purpose radiological AI. By managing technical complexity through logic-driven innovation, Curia-2 outperforms leading foundation models from industry and academia across 2D, 3D, and clinical finding tracks.

Curia-2 implements an optimized training strategy building upon the original Curia framework
Curia-2 sets a new state of the art, outperforming competing vision-language models (VLMs) from industry leaders on 2D tasks
Curia-2 achieves the highest accuracy across vision-only models, establishing a new benchmark for 3D-first oncology workflows.
Curia-2 effectively compares with specialized VLMs on complex finding detection tasks.

Figure 1. Foundation Model Evaluation Summary.

I. The Shift from Narrow AI to Technical Frameworks

We are entering a new era of diagnostic logic. While traditional AI has relied on narrow, task-specific architectures for every modality or disease, this approach lacks the scalability required for modern medicine. Today, we are moving beyond these boundaries, whether in oncology, neurology, or musculoskeletal health, to introduce Curia-2.

Building upon the Curia framework, which demonstrated the power of scale by pre-training on over 200 million CT and MRI slices, Curia-2 significantly improves upon the original strategy to better capture the specificities of radiological data. Curia-2 is a broad foundation model designed as a shared technical framework. By training on vast, unlabeled datasets, it moves beyond simple observation to a deep understanding of structural characteristics and tissue frameworks. This transition from "narrow tools" to "general-purpose frameworks" represents the critical breakthrough needed to reach the next frontier of health: AGI for radiology. This progress is dedicated to modernizing oncology workflows and ensuring that technical complexity is managed so that clinicians can remain patient-centered.

II. Results: Setting a New Standard

Evolving the Foundation: Scaling Technical Frameworks with Curia-2

Curia-2 represents an evolution of our original framework, moving beyond simple scale to a more refined understanding of structural characteristics. Unlike the initial Curia models, Curia-2 demonstrates a clear, consistent scaling benefit as we move from Base to Large architectures.

2D Track Advancement: Curia-2 L (88.5%) and Curia-2 B (86.8%) consistently outperform the original Curia L (87.9%) and Curia B (86.2%).

Figure 2. 2D Track comparison of Curia-2 to Curia-1.

3D Track Advancement: The advancement is even more pronounced in 3D-first workflows, where Curia-2 L achieves 88.6% accuracy compared to 83.4% for Curia L, a significant increase in managing technical complexity across volumes.

Figure 3. 3D Track comparison of Curia-2 to Curia-1.

Figure 3. 3D Track comparison of Curia-2 to Curia-1

Benchmarking Against the Frontier: Industry & Academic Leaders

To measure Curia-2's impact, we benchmarked it against newly released models from industry leaders and top academic institutions. Curia-2 achieves these milestones with increased data efficiency, reaching higher performance much earlier in the training process.

2D Track Performance: As shown in our latest evaluations, Curia-2 L achieves an average performance of 88.5%. This significantly exceeds competing 2D VLMs such as Microsoft’s MedImageInsight (84.4%), Google’s MedGemma (80.1%), and Microsoft’s BioMedCLIP (79.1%).

Figure 4. 2D Track Results. Curia-2 demonstrates higher precision in structural understanding compared to leading 2D foundation models.

Figure 4: 2D Track Results. Curia-2 demonstrates higher precision in structural understanding compared to leading 2D foundation models.

3D Track Competitive Edge: On the 3D track, Curia-2 L reaches an accuracy of 88.6%, setting a new standard for 3D-first oncology workflows. It outperforms the highest-performing visiol-language models such as Microsoft’s MedImageInsight (83.6%) and Google’s MedGemma (81.8%), while remaining significantly ahead of specialized 3D-native architectures like Stanford’s Merlin (65.4%) and ETH Zurich’s CT-CLIP (51.5%).

Figure 5. 3D Track results. Curia-2 achieves the highest accuracy, setting a new standard for 3D-first oncology workflows.

Figure 5: 3D Track results. Curia-2 achieves the highest accuracy, setting a new standard for 3D-first oncology workflows.

Addressing Clinical Complexity: Finding Detection

Traditionally, finding detection, identifying around 200 clinical entities, was thought to require the domain-specific knowledge of Vision-Language Models (VLMs). Because models such as BioMedCLIP and MedImageInsight use explicit language supervision, they have a natural advantage in translating pixels into complex clinical findings.

However, Curia-2 proves that a pure vision-only technical framework can bridge this gap through superior structural understanding alone. Curia-2B achieves a peak AUC of 80.2%, where Curia-2 L followed closely with a score of 79.7%, both as vision-only models. This performance not only outpaces specialized VLMs like MedImageInsight (79.0%), BioMedCLIP (74.1%), and MedGemma (72.7%) but also remains highly competitive with high-end 3D-native architectures such as Pillar-0 (82.4%) and Merlin (81.2%).

Notably, this generational leap highlights the progress made since our initial release; while the original Curia L established a strong baseline at 79.2%, the refined logic of Curia-2 has effectively crossed the performance gap into VLM-level clinical precision for oncology workflows.

Figure 6. Competitive Performance on Finding Detection Tasks: Curia-2 (vision-only) competes with specialized VLMs.

Figure 6: Competitive Performance on Finding Detection Tasks: Curia-2 (vision-only) competes with specialized VLMs.

Optimized Data Efficiency

A key emergent property of Curia-2 is its data efficiency. Curia-2 L demonstrates quicker convergence, requiring significantly fewer training samples to achieve the same performance level as other frontier models. This ensures that technical complexity is managed to provide stable frameworks for diagnostic insights earlier in the development process.

Figure 7. Evaluation of Data Efficiency and Model Convergence.

Figure 7: Evaluation of Data Efficiency and Model Convergence.

III. Logic and Innovation: Evolving from Curia to Curia-2

The original Curia model set a new precedent for scale, demonstrating the power of pre-training on over 200 million CT and MRI slices. However, as the field moves toward billion-parameter architectures, managing technical complexity requires a more refined approach to how models learn from 3D volumes. Curia-2 represents a logic-driven evolution of this foundation.

Through grounded innovation, we have optimized our training strategy to focus on meaningful tissue frameworks:

Efficiency: A multi-stage resolution strategy allows the model to capture high-fidelity structural characteristics with greater stability and optimized computational cost.
Context: Technical complexity is managed through content-aware filtering and anatomically-guided sampling, ensuring the model prioritizes informative structural regions over non-informative background.
Latent Structure Consistency: We implemented a refined regularization logic that ensures similar structural representations with high consistency across different patients. This framework preserves the nuanced structural characteristics needed for precise clinical differentiation.

IV. A Unified Benchmark: 2D, 3D, and Finding Detection

To ensure a rigorous assessment of these structural advancements, we have reformulated our evaluation strategy. Following the success of CuriaBench, our original 19-task benchmark designed to test clinical versatility, we have restructured this technical framework into two parallel tracks tailored for modern 3D-first oncology workflows.

Curia-2 is evaluated across three distinct diagnostic pillars to provide consistent insights necessary for Human-in-the-Loop decision-making:

2D Vision: Dedicated to slice-based structural recognition and evaluation of 2D foundation models.
3D Vision: A universal volumetric evaluation track providing a standardized benchmark across all anatomical regions.
Finding Detection: A specialized track that translates complex volumetric data into clear, actionable insights for nearly 200 distinct clinical entities

V. Conclusion: Building the Next Frontier

Curia-2 marks a pivotal evolution in our mission toward Radiology AGI. By moving beyond the initial scale of Curia and the clinical versatility of CuriaBench, we have optimized our technical framework to master deep structural characteristics. This logic-driven approach enables Curia-2 L to set a new state-of-the-art with 88.5% on 2D tasks and 88.6% on 3D tracks, outperforming frontier models. Now available on Hugging Face, Curia-2 serves as a robust backbone for the global community. This progress brings us closer to our mission: achieving AGI in radiology to empower doctors and allowing them to focus on what matters most—the patient.

Ready to explore Curia? Here’s where to start:

Download Curia-2B code from HuggingFace
Contact us to access Curia-2 L model.
Read our other Curia blogposts:
- Curia: A Frontier Foundation Model for 3D Imaging
- Announcing the Release of the Curia Benchmark and Evaluation Code

Introducing Curia-2: Scaling the Next Frontier of Radiology