TopoCoT @ WACV 2026

The 1st WACV 2026 Workshop on Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design (TopoCoT) seeks to provide a platform for industry experts and academics to brainstorm and exchange ideas about road understanding CoT and its derived outstanding works to advance autonomous driving. Through a half-day in-person event, the workshop will showcase regular and demo paper presentations and invited talks from famous researchers in academia and industry. Additionally, TopoCoT will launch two open-source real-world CoT lane topology reasoning datasets. The workshop will host a challenge based on this dataset to assess the capabilities of language and computer vision models in addressing HD map generation challenges.

Important Dates/Links

Workshop Day: Mar 7, 2026

This workshop hosts Four challenges covering different aspects of autonomous driving. See the Challenges section for details.

Submission Guidelines and Review Process:

Authors are encouraged to submit high-quality, original techical reports.
All submissions should follow the same template as for the main WACV2026 conference. Please refer to the WACV Author Kit.

Topics:

The workshop will cover a wide range of topics, including but not limited to:

LLM/VLM/CoT for lane topology understanding and HD map generation.
Reinforcement learning and transfer learning for lane topology understanding and HD map generation.
World modeling and simulation for lane topology understanding and HD map generation.
Evaluation metrics and benchmarks for lane topology understanding and HD map generation.
Vision-Language-Action (VLA) models for robust embodied intelligence in unstructured and challenging driving scenarios.
Planning-oriented scene understanding: traffic signal detection, vulnerable road user (VRU) identification, and meta-action prediction.
End-to-end trajectory planning and prediction under corner cases.
Adversarial driving scenario generation and robustness testing of end-to-end autonomous driving models.
Safety-critical edge case synthesis and data augmentation for autonomous driving.
3D semantic occupancy prediction and voxel-level scene representation for autonomous driving.
Large-scale dataset construction and annotation for 3D scene understanding.
Generative and discriminative models for occupancy forecasting and scene completion.

Challenges

The TopoCoT Workshop hosts four co-located challenges that collectively address key open problems in autonomous driving—from CoT-driven lane topology reasoning, robust embodied intelligence, adversarial stress-testing, to large-scale 3D scene understanding.

Challenge 1 Lane Topology Reasoning and HD Map Generation via CoT (TopoCoT) 🏆 Challenge Leaderboard ↗

Overview: This is the main challenge of the TopoCoT Workshop. Participants develop models for robust and generalized lane topology understanding and HD map generation, with a focus on Chain-of-Thought (CoT) reasoning designs. The challenge is based on two open-source real-world CoT lane topology reasoning datasets released by the workshop organizers.

Task Details

Goal: Assess the capabilities of language and computer vision models in addressing HD map generation challenges, leveraging CoT reasoning to improve topology understanding.

🏆 Leaderboard 💻 Baseline Code 📦 Dataset

Challenge 2 Robust and Safe Embodied Intelligence in Challenging Scenarios 🔗 Challenge Page ↗

Overview: Autonomous driving systems excel in structured environments, yet struggle in unstructured "last-mile" scenarios. This challenge leverages the Impromptu VLA Dataset—~80,000 clips of diverse and challenging corner cases—to evaluate Vision-Language-Action (VLA) models on safe planning and reasoning.

Task Details

Tasks are structured as Planning-Oriented Q&A across three levels:

Scene Understanding & Perception: Traffic signal detection (including non-standard lights), Vulnerable Road User (VRU) identification under occlusion/low-light.
Reasoning & Prediction: Drivable area estimation, meta-action planning (e.g., "Decelerate and Nudge Left"), dynamic object motion prediction.
End-to-End Trajectory Planning: Given 1.5s history, predict future waypoints over a 5-second horizon.

Evaluation Metrics:

Perception & Reasoning: Exact-match accuracy (Meta-Planning, Dynamic Object, Traffic Light, VRU).
Trajectory Planning: Average L2 Error at 1s / 2s / 3s / 4s horizons (lower is better).

🔗 Challenge Page 📄 Reference Paper

Challenge 3 Adversarial Driving Scene Generation Challenge 🔗 Challenge Page ↗

Overview: Current benchmarks lack the ability to systematically stress-test End-to-End (E2E) driving models against edge cases. This challenge inverts the usual paradigm: participants generate adversarial driving scenarios (based on nuScenes data) designed to induce failures in state-of-the-art E2E models, while maintaining physical plausibility.

Task Details

Your Goal: Manipulate the trajectory of exactly one background vehicle to create an aggressive or unexpected interaction with the ego vehicle.

Constraints:

The adversarial agent must remain within drivable area boundaries at all times.
No collisions with other background vehicles are allowed.
A minimum safety distance from the ego vehicle must be maintained (no ramming).

Three-Stage Evaluation Pipeline:

Performance Testing: Generated clips are evaluated against four SOTA E2E AD models in open-loop setting.
Neural Rendering: Scenarios are converted to high-fidelity RGB video clips.
Kinematic Rectification: Trajectory smoothness and physical feasibility are enforced via LQR controller.

Primary Metric: Average Collision Rate (higher = more adversarially effective, higher is better).

🔗 Challenge Page 📄 Reference Paper

Challenge 4 Large-scale 3D Semantic Occupancy Prediction on NuPlan-Occ 🔗 Challenge Page ↗

Overview: Semantic occupancy prediction provides dense, voxel-level scene representations critical for autonomous driving. This challenge introduces NuPlan-Occ—the largest publicly available semantic occupancy dataset, with 3.6 million frames annotated at 400×400×32 voxel resolution across 19K scenes, derived from the NuPlan benchmark.

Task Details

Input: Multi-view camera images (+ optional LiDAR, BEV maps).
Output: 3D semantic occupancy grid with 9 classes: vehicle, pedestrian, bicycle, traffic_cone, barrier, czone_sign, generic_object, background, empty.

Two Tracks:

Discriminative Track: Predict occupancy from input images.
Generative Track: Forecast occupancy sequences or complete partial scenes.

Evaluation Metrics: mIoU (primary), per-class IoU, Precision, Recall.

Baseline (MonoScene on NuPlan-Occ miniset):

Precision	Recall	IoU	mIoU
48.99	42.54	29.49	9.36

🔗 Challenge Page 📄 Reference Paper 💻 GitHub 📦 Dataset (HuggingFace)

Organizers


Dr. Zhen Li Assistant Professor, CUHKSZ	Dr. Chao Zheng Tencent	Dr. Hongyang Li Assistant Professor, HKU	Dr. Hao Zhao Assistant Professor, THU

Program Committee


Yiming Yang CUHKSZ	Chao Zhan CUHKSZ	Yihuan Yao CUHKSZ	Kun Tang Tencent	Zhipeng Cao Tencent	Erlong Li Tencent

Chao Yan Tencent	Shiyao Li Tencent	Xu Cao UIUC	Yifan Shen UIUC	Yunsheng Ma Purdue	Zhirun Yue THU

Schedule

Workshop Date: Mar 7, 2026

Workshop Time: 13:00 pm – 17:00 pm

Workshop Location: Arizona Ballroom Salon 12

Please find the tentative workshop schedule below:

Time	Event	Speaker
13:00 -- 13:05	Opening Remarks
13:05 -- 13:50	Generative and Semantic-Enhanced Scene Understanding for Autonomous Driving	Thomas Monninger (Mercedes-Benz R&D / Univ. Stuttgart)
13:50 -- 14:35	Structured Sensing for Active Uncertainty Mitigation in Autonomous Driving	Mellon Zhang (Georgia Tech)
14:35 -- 15:05	Coffee Break
15:05 -- 15:50	Context-Aware Autonomous Driving in Adverse Operating Conditions	Shounak Sural (CMU)
15:50 -- 16:35	Production-ready Autonomous Driving World Models	Hao Zhao (THU)
16:35 -- 16:40	Summary & Closing Remarks — Zhen Li (CUHKSZ)

Invited Speakers

Thomas Monninger

Staff Machine Learning Engineer, Mercedes-Benz Research & Development North America · PhD, University of Stuttgart

Title: Generative and Semantic-Enhanced Scene Understanding for Autonomous Driving

Abstract & Bio

Abstract: The safe operation of autonomous vehicles in complex environments requires an accurate and complete representation of static road infrastructure. However, traditional online mapping approaches often struggle with real-world ambiguities, such as occlusions and missing lane markings, and lack the high-level semantic richness needed for "long-tail" scenarios.

This talk presents a unified framework for scene understanding that bridges the gap between geometric reconstruction and semantic reasoning. The generative diffusion paradigm is first explored through MapDiffusion. By formulating map construction as an iterative denoising process, the model learns to capture the full distribution of plausible maps, enabling spatial uncertainty estimates that directly correlate with perceptual ambiguity in the real world.

Next, NavMapFusion addresses the integration of navigation priors by effectively fusing low-fidelity navigation maps with high-fidelity sensor data, closing the perception gap at long ranges by treating map discrepancies as resolvable noise within the diffusion process.

Finally, BEVLM transitions from geometric structure to semantic understanding by leveraging Bird's-Eye View representations as a superior interface for LLM spatial reasoning, producing semantic-enhanced BEV representations that yield significant gains in safety-critical planning and reduced collision rates in complex corner cases.

Bio: Thomas Monninger is a Staff Machine Learning Engineer at Mercedes-Benz Research & Development North America, where he specializes in advancing novel machine learning technologies for autonomous driving. Over his ten-year tenure with Mercedes-Benz, he has developed deep technical expertise across Perception, Sensor Fusion, Prediction, and Mapping. An active contributor to the autonomous vehicle research community, Thomas has authored multiple academic papers and holds several related patents. He successfully defended his PhD in Computer Science at the University of Stuttgart, Germany, focusing on Scene Understanding for Autonomous Driving.

Mellon Zhang

PhD Student in Machine Learning, Georgia Institute of Technology

Title: Structured Sensing for Active Uncertainty Mitigation in Autonomous Driving

Abstract & Bio

Abstract: Autonomous driving systems must reason under uncertainty in dynamic and safety-critical environments, where delayed or mislocalized perception can directly impact scene understanding and downstream control. While significant progress has been made in structured reasoning and uncertainty-aware models, the structure of sensing itself plays a complementary role in enabling robust autonomy.

This talk argues that active uncertainty mitigation benefits from preserving the temporal and causal ordering of sensor measurements, rather than collapsing them into static frames before inference. Using LiDAR as a case study, it shows how conventional full-scan pipelines aggregate 360° sweeps prior to processing, discarding intra-sweep structure and introducing latency that limits reactivity and directional awareness. In contrast, streaming processing maintains the sequential structure of acquisition, enabling fine-grained spatial locality, earlier anomaly localization, and incremental scene understanding.

The talk further discusses how hybrid streaming frameworks can combine local, temporally ordered sensing with global scene reasoning, and outlines ongoing efforts to leverage structured sensing for real-time anomaly detection and tighter integration with downstream planning and control. Together, these ideas highlight how structured sensing and structured reasoning can jointly advance robust and safety-critical autonomy.

Bio: Mellon Zhang is a PhD student in Machine Learning at the Georgia Institute of Technology. His research focuses on improving the real-world reliability and deployability of embodied foundation models through structured representation learning. He studies how temporal, spatial, and geometric structure in perception and action can improve robustness and generalization in safety-critical systems. His work has been published at leading conferences including CVPR and WACV.

Shounak Sural

Senior PhD Student, Electrical and Computer Engineering, Carnegie Mellon University

Title: Context-Aware Autonomous Driving in Adverse Operating Conditions

Abstract & Bio

Abstract: Autonomous vehicles (AVs) are seeing increasing deployment, yet their reliability under adverse operating contexts remains a key challenge for their safe and scalable operation. Poor visibility, degraded sensing in bad weather, GPS-denied operating zones, and road environments such as work zones often cause problems for AVs. Such operating contexts require the understanding of diverse environmental elements along with the ability to adapt quickly to such environmental changes.

This talk presents a framework called Contextuate that unifies adaptive fusion, data-efficient learning, and co-simulation into a pipeline explicitly designed for safe operation in diverse contexts. Co-simulation connects an autonomous vehicle software stack with a high-fidelity sensor-rich simulator, enabling both closed-loop evaluation of challenging contexts and scalable generation of synthetic data. Context-aware sensor fusion strategies are introduced, where perception adapts dynamically to the operating environment by adaptively leveraging LiDAR and camera sensing modalities, achieving significant improvements in object detection under degraded sensing conditions.

A multimodal dataset centered on work zones is presented as a critical adverse road condition requiring accurate 3D perception followed by safe planning. Finally, context understanding with Vision-Language Models is discussed as both a fusion strategy and a top-level controller, establishing a viable path towards safe and scalable autonomy across a range of operating contexts.

Bio: Shounak Sural is a senior PhD student in Electrical and Computer Engineering at Carnegie Mellon University. His research focuses on context-aware autonomous driving in adverse operating conditions. He has expertise across perception, localization, and planning for autonomous vehicles. He has also worked on physics-informed continuous-time models for robotics during his time at Amazon Robotics. His research has been published at top autonomous driving venues over the years.

Hao Zhao

Assistant Professor, Institute for Intelligent Industry (AIR), Tsinghua University · BAAI Scholar

Title: Production-ready Autonomous Driving World Models

Abstract & Bio

Abstract: Achieving reliable autonomous driving demands scalable, generalizable, and production-ready world models capable of sensing, acting, and evaluating safety in diverse real-world environments. However, current generative pipelines either focus on a single modality, lack controllability, or fail to support closed-loop decision making. This talk presents a unified perspective on next-generation world models for autonomous driving, grounded in three recent works.

First, UniScene introduces an occupancy-centric generation paradigm that progressively synthesizes semantic occupancy → video & LiDAR, enabling high-fidelity, controllable, and richly annotated data for simulation and training. Second, DiST-4D advances dynamic scene synthesis by leveraging metric depth as a unified geometric representation, achieving state-of-the-art performance in both temporal trajectory extrapolation and spatial novel-view rendering without per-scene optimization. Finally, OmniNWM pushes world models toward true autonomy by jointly modeling state, action, and reward within a panoramic navigation framework, supporting precise ego-control and occupancy-grounded closed-loop safety evaluation.

Together, these systems highlight a practical and scalable pathway toward industrial deployment: world models as controllable, geometry-aware, reward-driven simulators that bridge learning and safety-critical execution.

Bio: Hao Zhao is an Assistant Professor at the Institute for Intelligent Industry (AIR), Tsinghua University, and a BAAI Scholar. He received his B.Eng. and Ph.D. degrees from the Department of Electronic Engineering at Tsinghua University, and previously worked as a Research Scientist at Intel Labs China and as a Postdoctoral Researcher at Peking University. Prof. Zhao has published 50+ papers in top-tier venues such as CVPR, NeurIPS, SIGGRAPH, ICRA, and journals including TPAMI and IJCV. His work has won multiple championships in 3D scene understanding challenges and led to the development of MARS, the world's first open-source, modular, and highly realistic autonomous driving simulator, which received the Best Paper Runner-up Award at CICAI 2023. His work SlimmeRF won the Best Paper Award at 3DV 2024.

Contact Us

If you have any questions or inquiries, please contact us at email.

Citation

If the workshop inspire you, please consider citing our work:

@article{wang2023openlane,
    title={Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping},
    author={Wang, Huijie and Li, Tianyu and Li, Yang and Chen, Li and Sima, Chonghao and Liu, Zhenbo and Wang, Bangjun and Jia, Peijin and Wang, Yuting and Jiang, Shengyin and others},
    journal={Advances in Neural Information Processing Systems},
    volume={36},
    pages={18873--18884},
    year={2023}
    }
@article{li2023lanesegnet,
        title={Lanesegnet: Map learning with lane segment perception for autonomous driving},
        author={Li, Tianyu and Jia, Peijin and Wang, Bangjun and Chen, Li and Jiang, Kun and Yan, Junchi and Li, Hongyang},
        journal={arXiv preprint arXiv:2312.16108},
        year={2023}
      }
@article{yang2025topo2seq,
  title={Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning},
  author={Yang, Yiming and Luo, Yueru and He, Bingkun and Li, Erlong and Cao, Zhipeng and Zheng, Chao and Mei, Shuqi and Li, Zhen},
  journal={AAAI 2025},
  year={2025}
}

@article{yang2025topostreamer,
  title={TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving},
  author={Yang, Yiming and Luo, Yueru and He, Bingkun and Lin, Hongbin and Fu, Suzhong and Zheng, Chao and Cao, Zhipeng and Li, Erlong and Yan, Chao and Cui, Shuguang and others},
  journal={arXiv preprint arXiv:2507.00709},
  year={2025}
}

@article{yang2025fastopowm,
  title={FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models},
  author={Yang, Yiming and Lin, Hongbin and Luo, Yueru and Fu, Suzhong and Zheng, Chao and Yan, Xinrui and Mei, Shuqi and Tang, Kun and Cui, Shuguang and Li, Zhen},
  journal={arXiv preprint arXiv:2507.23325},
  year={2025}
}

@article{luo2025reltopo,
  title={RelTopo: Enhancing Relational Modeling for Driving Scene Topology Reasoning},
  author={Luo, Yueru and Zhou, Changqing and Yang, Yiming and Li, Erlong and Zheng, Chao and Mei, Shuqi and Cui, Shuguang and Li, Zhen},
  journal={arXiv preprint arXiv:2506.13553},
  year={2025}
}

// ---- Challenge Papers ----

@article{chi2025impromptu,
  title={Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models},
  author={Chi, Haohan and Gao, Huan-ang and Liu, Ziming and Liu, Jianing and Liu, Chenyu and Li, Jinwei and Yang, Kaisen and Yu, Yangcheng and Wang, Zeda and Li, Wenyi and others},
  journal={arXiv preprint arXiv:2505.23757},
  year={2025}
}

@article{xu2025challenger,
  title={Challenger: Affordable adversarial driving video generation},
  author={Xu, Zhiyuan and Li, Bohan and Gao, Huan-ang and Gao, Mingju and Chen, Yong and Liu, Ming and Yan, Chenxu and Zhao, Hang and Feng, Shuo and Zhao, Hao},
  journal={arXiv preprint arXiv:2505.15880},
  year={2025}
}

@inproceedings{li2025uniscene,
  title={Uniscene: Unified occupancy-centric driving scene generation},
  author={Li, Bohan and Guo, Jiazhe and Liu, Hongsi and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={11971--11981},
  year={2025}
}

@article{li2025scaling,
  title={Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method},
  author={Li, Bohan and Jin, Xin and Zhu, Hu and Liu, Hongsi and others},
  journal={arXiv preprint arXiv:2510.22973},
  year={2025}
}