CausalVerse Logo

CausalVerse

NeurIPS 2025 (Spotlight)

Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations

TL;DR

CausalVerse is the first comprehensive benchmark for causal representation learning with controllable high-fidelity simulations. It allows users to inspect, modify, and configure causal graphs to match various CRL assumptions and tasks, and provides empirical insights to guide researchers in selecting or improving CRL frameworks for real-world causal reasoning.

Dataset Overview

~200k
Images
~140k
Videos
300M+
Video Frames
24
Scenes
4
Domains

Static Image Generation

  • Human in Retail Store (11.2k)
  • Indoor environments
  • Varying poses & appearances
  • Multiple lighting conditions

Physical Simulation

  • Cylinder Spring (40k images)
  • Simple Collision (20k videos)
  • Projectile motion
  • Object interactions

Robotic Manipulation

  • Robot in Kitchen (2.7k videos)
  • Multi-view capture
  • Object-centric tasks
  • Embodied agents

Traffic Analysis

  • Traffic in Town01 (1.97k videos)
  • Multi-agent interactions
  • Urban environments
  • Variable conditions

Key Features: 3-129 latent variables per scene | 1024×1024 and 800×600 resolutions | 3-32 second video durations | Multi-camera viewpoints

Ground Truth Access

Complete access to causal variables, structures, and generation processes with high-fidelity visual data

Diverse Scenarios

From static to dynamic, single to multi-agent, covering physical simulations, robotics, and traffic

Configurable Settings

Flexible control over causal assumptions, domain labels, temporal dependencies, and interventions

Rigorous Evaluation

Test CRL methods under both satisfied and unmet assumptions with standardized metrics

Configuration Example

Each scene provides detailed ground-truth variables. Below is an example of the available metadata structure, which is consistent across the dataset.

Category Sub-category Variable Dim. Type Range Description
Global Scene scene (1,) D 6 types Scene name/identifier
Global Scene gravity (1,) C - Acceleration of gravity
Global Object render_asset (1,) D 90 types Specifies visual appearance
Dynamic Object position (T,3) C - 3D coordinates across time
Dynamic Object rotation (T,3) C - Euler angles across time

Data Showcase

Sample videos from different domains in CausalVerse, showcasing the variety of scenes and viewpoints available.

Static Image Generation

Scene: Scene1-4

Static Scene 1

Scene1

Static Scene 2

Scene2

Static Scene 3

Scene3

Static Scene 4

Scene4

Physical Simulation (Image)

Scene: Fall, Refraction, Slope, Spring

Fall Simulation

Fall

Refraction Simulation

Refraction

Slope Simulation

Slope

Spring Simulation

Spring

Physical Simulation (Video)

Scene: Projectile_Hard

Birdview

Frontview

Leftview

Rightview

Robotic Manipulation

Scene: Kitchen

Agentview

Birdview

Frontview

Eyeview

Sideview

Traffic Situation Analysis

Scenes: Town1 & Town2

Town1

Town2

Evaluation

Evaluation on Mean Correlation Coefficient (MCC) and Coefficient of Determination (R²) for both image and video data.

Scene Method MCC ↑ R² ↑ Links
Ball on the Slope Supervised 0.9878 0.9962 -
Sufficient Change 0.4434 0.9630
Mechanism Sparsity 0.2491 0.3242
Self-supervised 0.4109 0.9658
Contrastive Learning 0.2853 0.9604
Cylinder Spring Supervised 0.9970 0.9910 -
Sufficient Change 0.6092 0.9344
Mechanism Sparsity 0.3353 0.2340
Self-supervised 0.4523 0.7841
Contrastive Learning 0.6342 0.9920
Light Refraction Supervised 0.9900 0.9800 -
Sufficient Change 0.6778 0.8420
Mechanism Sparsity 0.1836 0.4067
Self-supervised 0.3363 0.7841
Contrastive Learning 0.3773 0.9677

Citation

If you find our work useful, please consider citing our paper:

@inproceedings{chen2025causalverse,
title     = {CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations},
author    = {Chen, Guangyi and Deng, Yunlong and Zhu, Peiyuan and Li, Yan and Shen, Yifan and Li, Zijian and Zhang, Kun},
booktitle = {Advances in Neural Information Processing Systems},
year      = {2025}
}