The 1st WACV 2026 Workshop on Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design (TopoCoT) seeks to provide a platform for industry experts and academics to brainstorm and exchange ideas about road understanding CoT and its derived outstanding works to advance autonomous driving. Through a half-day in-person event, the workshop will showcase regular and demo paper presentations and invited talks from famous researchers in academia and industry. Additionally, TopoCoT will launch two open-source real-world CoT lane topology reasoning datasets. The workshop will host a challenge based on this dataset to assess the capabilities of language and computer vision models in addressing HD map generation challenges.
This workshop hosts Four challenges covering different aspects of autonomous driving. See the Challenges section for details.
The workshop will cover a wide range of topics, including but not limited to:
The TopoCoT Workshop hosts four co-located challenges that collectively address key open problems in autonomous drivingβfrom CoT-driven lane topology reasoning, robust embodied intelligence, adversarial stress-testing, to large-scale 3D scene understanding.
Overview: This is the main challenge of the TopoCoT Workshop. Participants develop models for robust and generalized lane topology understanding and HD map generation, with a focus on Chain-of-Thought (CoT) reasoning designs. The challenge is based on two open-source real-world CoT lane topology reasoning datasets released by the workshop organizers.
Goal: Assess the capabilities of language and computer vision models in addressing HD map generation challenges, leveraging CoT reasoning to improve topology understanding.
Overview: Autonomous driving systems excel in structured environments, yet struggle in unstructured "last-mile" scenarios. This challenge leverages the Impromptu VLA Datasetβ~80,000 clips of diverse and challenging corner casesβto evaluate Vision-Language-Action (VLA) models on safe planning and reasoning.
Tasks are structured as Planning-Oriented Q&A across three levels:
Evaluation Metrics:
Overview: Current benchmarks lack the ability to systematically stress-test End-to-End (E2E) driving models against edge cases. This challenge inverts the usual paradigm: participants generate adversarial driving scenarios (based on nuScenes data) designed to induce failures in state-of-the-art E2E models, while maintaining physical plausibility.
Your Goal: Manipulate the trajectory of exactly one background vehicle to create an aggressive or unexpected interaction with the ego vehicle.
Constraints:
Three-Stage Evaluation Pipeline:
Primary Metric: Average Collision Rate (higher = more adversarially effective, higher is better).
Overview: Semantic occupancy prediction provides dense, voxel-level scene representations critical for autonomous driving. This challenge introduces NuPlan-Occβthe largest publicly available semantic occupancy dataset, with 3.6 million frames annotated at 400Γ400Γ32 voxel resolution across 19K scenes, derived from the NuPlan benchmark.
Input: Multi-view camera images (+ optional LiDAR, BEV maps).
Output: 3D semantic occupancy grid with 9 classes: vehicle, pedestrian, bicycle, traffic_cone, barrier, czone_sign, generic_object, background, empty.
Two Tracks:
Evaluation Metrics: mIoU (primary), per-class IoU, Precision, Recall.
Baseline (MonoScene on NuPlan-Occ miniset):
| Precision | Recall | IoU | mIoU |
|---|---|---|---|
| 48.99 | 42.54 | 29.49 | 9.36 |
![]() |
![]() |
![]() |
|
| Dr. Zhen Li Assistant Professor, CUHKSZ |
Dr. Chao Zheng Tencent |
Dr. Hongyang Li Assistant Professor, HKU |
Dr. Hao Zhao Assistant Professor, THU |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| Yiming Yang CUHKSZ |
Chao Zhan CUHKSZ |
Yihuan Yao CUHKSZ |
Kun Tang Tencent |
Zhipeng Cao Tencent |
Erlong Li Tencent |
![]() |
![]() |
![]() |
![]() |
||
| Chao Yan Tencent |
Shiyao Li Tencent |
Xu Cao UIUC |
Yifan Shen UIUC |
Yunsheng Ma Purdue |
Zhirun Yue THU |
Workshop Date: Mar 7, 2026
Workshop Time: 13:00 pm β 17:00 pm
Workshop Location: Arizona Ballroom Salon 12
Please find the tentative workshop schedule below:
| Time | Event | Speaker |
|---|---|---|
| 13:00 -- 13:05 | Opening Remarks | |
| 13:05 -- 13:50 | Generative and Semantic-Enhanced Scene Understanding for Autonomous Driving | Thomas Monninger (Mercedes-Benz R&D / Univ. Stuttgart) |
| 13:50 -- 14:35 | Structured Sensing for Active Uncertainty Mitigation in Autonomous Driving | Mellon Zhang (Georgia Tech) |
| 14:35 -- 15:05 | Coffee Break | |
| 15:05 -- 15:50 | Context-Aware Autonomous Driving in Adverse Operating Conditions | Shounak Sural (CMU) |
| 15:50 -- 16:35 | Production-ready Autonomous Driving World Models | Hao Zhao (THU) |
| 16:35 -- 16:40 | Summary & Closing Remarks β Zhen Li (CUHKSZ) | |
Title: Generative and Semantic-Enhanced Scene Understanding for Autonomous Driving
Abstract: The safe operation of autonomous vehicles in complex environments requires an accurate and complete representation of static road infrastructure. However, traditional online mapping approaches often struggle with real-world ambiguities, such as occlusions and missing lane markings, and lack the high-level semantic richness needed for "long-tail" scenarios.
This talk presents a unified framework for scene understanding that bridges the gap between geometric reconstruction and semantic reasoning. The generative diffusion paradigm is first explored through MapDiffusion. By formulating map construction as an iterative denoising process, the model learns to capture the full distribution of plausible maps, enabling spatial uncertainty estimates that directly correlate with perceptual ambiguity in the real world.
Next, NavMapFusion addresses the integration of navigation priors by effectively fusing low-fidelity navigation maps with high-fidelity sensor data, closing the perception gap at long ranges by treating map discrepancies as resolvable noise within the diffusion process.
Finally, BEVLM transitions from geometric structure to semantic understanding by leveraging Bird's-Eye View representations as a superior interface for LLM spatial reasoning, producing semantic-enhanced BEV representations that yield significant gains in safety-critical planning and reduced collision rates in complex corner cases.
Bio: Thomas Monninger is a Staff Machine Learning Engineer at Mercedes-Benz Research & Development North America, where he specializes in advancing novel machine learning technologies for autonomous driving. Over his ten-year tenure with Mercedes-Benz, he has developed deep technical expertise across Perception, Sensor Fusion, Prediction, and Mapping. An active contributor to the autonomous vehicle research community, Thomas has authored multiple academic papers and holds several related patents. He successfully defended his PhD in Computer Science at the University of Stuttgart, Germany, focusing on Scene Understanding for Autonomous Driving.
Title: Structured Sensing for Active Uncertainty Mitigation in Autonomous Driving
Abstract: Autonomous driving systems must reason under uncertainty in dynamic and safety-critical environments, where delayed or mislocalized perception can directly impact scene understanding and downstream control. While significant progress has been made in structured reasoning and uncertainty-aware models, the structure of sensing itself plays a complementary role in enabling robust autonomy.
This talk argues that active uncertainty mitigation benefits from preserving the temporal and causal ordering of sensor measurements, rather than collapsing them into static frames before inference. Using LiDAR as a case study, it shows how conventional full-scan pipelines aggregate 360Β° sweeps prior to processing, discarding intra-sweep structure and introducing latency that limits reactivity and directional awareness. In contrast, streaming processing maintains the sequential structure of acquisition, enabling fine-grained spatial locality, earlier anomaly localization, and incremental scene understanding.
The talk further discusses how hybrid streaming frameworks can combine local, temporally ordered sensing with global scene reasoning, and outlines ongoing efforts to leverage structured sensing for real-time anomaly detection and tighter integration with downstream planning and control. Together, these ideas highlight how structured sensing and structured reasoning can jointly advance robust and safety-critical autonomy.
Bio: Mellon Zhang is a PhD student in Machine Learning at the Georgia Institute of Technology. His research focuses on improving the real-world reliability and deployability of embodied foundation models through structured representation learning. He studies how temporal, spatial, and geometric structure in perception and action can improve robustness and generalization in safety-critical systems. His work has been published at leading conferences including CVPR and WACV.
Title: Context-Aware Autonomous Driving in Adverse Operating Conditions
Abstract: Autonomous vehicles (AVs) are seeing increasing deployment, yet their reliability under adverse operating contexts remains a key challenge for their safe and scalable operation. Poor visibility, degraded sensing in bad weather, GPS-denied operating zones, and road environments such as work zones often cause problems for AVs. Such operating contexts require the understanding of diverse environmental elements along with the ability to adapt quickly to such environmental changes.
This talk presents a framework called Contextuate that unifies adaptive fusion, data-efficient learning, and co-simulation into a pipeline explicitly designed for safe operation in diverse contexts. Co-simulation connects an autonomous vehicle software stack with a high-fidelity sensor-rich simulator, enabling both closed-loop evaluation of challenging contexts and scalable generation of synthetic data. Context-aware sensor fusion strategies are introduced, where perception adapts dynamically to the operating environment by adaptively leveraging LiDAR and camera sensing modalities, achieving significant improvements in object detection under degraded sensing conditions.
A multimodal dataset centered on work zones is presented as a critical adverse road condition requiring accurate 3D perception followed by safe planning. Finally, context understanding with Vision-Language Models is discussed as both a fusion strategy and a top-level controller, establishing a viable path towards safe and scalable autonomy across a range of operating contexts.
Bio: Shounak Sural is a senior PhD student in Electrical and Computer Engineering at Carnegie Mellon University. His research focuses on context-aware autonomous driving in adverse operating conditions. He has expertise across perception, localization, and planning for autonomous vehicles. He has also worked on physics-informed continuous-time models for robotics during his time at Amazon Robotics. His research has been published at top autonomous driving venues over the years.
Title: Production-ready Autonomous Driving World Models
Abstract: Achieving reliable autonomous driving demands scalable, generalizable, and production-ready world models capable of sensing, acting, and evaluating safety in diverse real-world environments. However, current generative pipelines either focus on a single modality, lack controllability, or fail to support closed-loop decision making. This talk presents a unified perspective on next-generation world models for autonomous driving, grounded in three recent works.
First, UniScene introduces an occupancy-centric generation paradigm that progressively synthesizes semantic occupancy β video & LiDAR, enabling high-fidelity, controllable, and richly annotated data for simulation and training. Second, DiST-4D advances dynamic scene synthesis by leveraging metric depth as a unified geometric representation, achieving state-of-the-art performance in both temporal trajectory extrapolation and spatial novel-view rendering without per-scene optimization. Finally, OmniNWM pushes world models toward true autonomy by jointly modeling state, action, and reward within a panoramic navigation framework, supporting precise ego-control and occupancy-grounded closed-loop safety evaluation.
Together, these systems highlight a practical and scalable pathway toward industrial deployment: world models as controllable, geometry-aware, reward-driven simulators that bridge learning and safety-critical execution.
Bio: Hao Zhao is an Assistant Professor at the Institute for Intelligent Industry (AIR), Tsinghua University, and a BAAI Scholar. He received his B.Eng. and Ph.D. degrees from the Department of Electronic Engineering at Tsinghua University, and previously worked as a Research Scientist at Intel Labs China and as a Postdoctoral Researcher at Peking University. Prof. Zhao has published 50+ papers in top-tier venues such as CVPR, NeurIPS, SIGGRAPH, ICRA, and journals including TPAMI and IJCV. His work has won multiple championships in 3D scene understanding challenges and led to the development of MARS, the world's first open-source, modular, and highly realistic autonomous driving simulator, which received the Best Paper Runner-up Award at CICAI 2023. His work SlimmeRF won the Best Paper Award at 3DV 2024.
If you have any questions or inquiries, please contact us at email.
If the workshop inspire you, please consider citing our work:
@article{wang2023openlane,
title={Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping},
author={Wang, Huijie and Li, Tianyu and Li, Yang and Chen, Li and Sima, Chonghao and Liu, Zhenbo and Wang, Bangjun and Jia, Peijin and Wang, Yuting and Jiang, Shengyin and others},
journal={Advances in Neural Information Processing Systems},
volume={36},
pages={18873--18884},
year={2023}
}
@article{li2023lanesegnet,
title={Lanesegnet: Map learning with lane segment perception for autonomous driving},
author={Li, Tianyu and Jia, Peijin and Wang, Bangjun and Chen, Li and Jiang, Kun and Yan, Junchi and Li, Hongyang},
journal={arXiv preprint arXiv:2312.16108},
year={2023}
}
@article{yang2025topo2seq,
title={Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning},
author={Yang, Yiming and Luo, Yueru and He, Bingkun and Li, Erlong and Cao, Zhipeng and Zheng, Chao and Mei, Shuqi and Li, Zhen},
journal={AAAI 2025},
year={2025}
}
@article{yang2025topostreamer,
title={TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving},
author={Yang, Yiming and Luo, Yueru and He, Bingkun and Lin, Hongbin and Fu, Suzhong and Zheng, Chao and Cao, Zhipeng and Li, Erlong and Yan, Chao and Cui, Shuguang and others},
journal={arXiv preprint arXiv:2507.00709},
year={2025}
}
@article{yang2025fastopowm,
title={FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models},
author={Yang, Yiming and Lin, Hongbin and Luo, Yueru and Fu, Suzhong and Zheng, Chao and Yan, Xinrui and Mei, Shuqi and Tang, Kun and Cui, Shuguang and Li, Zhen},
journal={arXiv preprint arXiv:2507.23325},
year={2025}
}
@article{luo2025reltopo,
title={RelTopo: Enhancing Relational Modeling for Driving Scene Topology Reasoning},
author={Luo, Yueru and Zhou, Changqing and Yang, Yiming and Li, Erlong and Zheng, Chao and Mei, Shuqi and Cui, Shuguang and Li, Zhen},
journal={arXiv preprint arXiv:2506.13553},
year={2025}
}
// ---- Challenge Papers ----
@article{chi2025impromptu,
title={Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models},
author={Chi, Haohan and Gao, Huan-ang and Liu, Ziming and Liu, Jianing and Liu, Chenyu and Li, Jinwei and Yang, Kaisen and Yu, Yangcheng and Wang, Zeda and Li, Wenyi and others},
journal={arXiv preprint arXiv:2505.23757},
year={2025}
}
@article{xu2025challenger,
title={Challenger: Affordable adversarial driving video generation},
author={Xu, Zhiyuan and Li, Bohan and Gao, Huan-ang and Gao, Mingju and Chen, Yong and Liu, Ming and Yan, Chenxu and Zhao, Hang and Feng, Shuo and Zhao, Hao},
journal={arXiv preprint arXiv:2505.15880},
year={2025}
}
@inproceedings{li2025uniscene,
title={Uniscene: Unified occupancy-centric driving scene generation},
author={Li, Bohan and Guo, Jiazhe and Liu, Hongsi and others},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={11971--11981},
year={2025}
}
@article{li2025scaling,
title={Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method},
author={Li, Bohan and Jin, Xin and Zhu, Hu and Liu, Hongsi and others},
journal={arXiv preprint arXiv:2510.22973},
year={2025}
}