Self-Evolving Embodied Agents Group

About 关于我们

MMLab@SIGS

We are a dynamic research collective at the forefront of artificial intelligence, dedicated to solving the fundamental challenges of embodied intelligence. Our vision is to create self-evolving agents capable of perceiving, reasoning, and acting in complex, real-world environments—agents that continuously learn and adapt to seamlessly bridge the digital and physical worlds.

Our mission is to establish a complete pipeline for these agents, forming a closed loop of efficient data preparation, multimodal perception, spatiotemporal decision-making, and continuous learning. By integrating cutting-edge research in 3D vision, world models, and large language models, we are building the foundation for the next generation of intelligent systems.

We are always looking for motivated PhD students, postdocs, and research assistants who share our vision. Check out the Join Us section and follow us on Wechat. 我们是一个走在人工智能前沿的充满活力的研究集体，致力于解决具身智能的根本挑战。我们的愿景是创造能够在复杂的现实世界环境中感知、推理和行动的自我进化智能体——能够持续学习和适应，无缝连接数字与物理世界的智能体。

我们的使命是为这些智能体建立一个完整的创建流程，形成高效数据制备、多模态感知、时空决策和持续学习的闭环。通过整合3D视觉、世界模型和大型语言模型领域的最前沿研究，我们正在为下一代智能系统奠定基础。

我们随时欢迎有共同愿景的博士生、博士后和研究助理加入我们。请查看我们的加入我们栏目并关注我们的微信公众号。

Research Directions研究方向

Efficient Embodied Data Preparation 高效具身数据制备

We develop innovative methods for efficient collection, annotation, and preprocessing of embodied AI data. We focus on creating high-quality datasets that enable robust robot learning in real-world environments. 我们开发创新的方法以前沿、高效的方式收集、标注和处理具身AI数据。我们专注于创建高质量数据集，以在现实环境中实现鲁棒的机器人学习。

Automated Pipelines Synthetic-to-Real Active Learning

Multimodal Perception & World Models 多模态感知与世界模型

We build comprehensive world models through multimodal sensory fusion and understanding. Our research enables agents to perceive and reason about complex 3D environments using multiple modalities. 我们通过多模态感知融合建立全面的世界模型。我们的研究使智能体能够使用多模态感知和推理复杂的3D环境。

3D Vision Sensor Fusion World Modeling

Spatiotemporal Decision Making 时空决策与机器人策略学习

We train robot policies that can make intelligent decisions in complex spatiotemporal environments. Our work bridges the gap between simulation and real-world deployment through robust policy learning. 我们致力于训练能够在复杂时空环境中做出智能决策的机器人策略模型。我们的研究通过鲁棒策略学习弥合了仿真与现实部署的差距。

Imitation Learning Reinforcement Learning Sim-to-Real

Agent Continuous Learning 智能体持续学习

We enable agents to continuously learn and adapt to new environments and tasks without catastrophic forgetting. Our research addresses the fundamental challenges of lifelong learning in embodied AI. 我们让智能体能够不断学习并适应新环境和新任务。我们的研究应对了具身AI的持续学习的基础挑战，从而避免灾难性遗忘。

Test-time Adaptation Continual Learning LLM Adaptation

Selected Publications代表性成果

IGen: Scalable Data Generation for Robot Learning from Open-World Images

CVPR 2026 (CCF-A)

[Project] [arXiv] [Code]

本文提出 IGen，一种面向机器人学习的可扩展数据生成框架，可从开放世界图像中自动生成逼真的视觉观测与可执行动作，从而在无需人工遥操作数据的情况下训练有效的操作策略。

AnchorGen: Multi-View Geometric Anchoring for Keyframe-Aware Embodied Video Generation

Under Review

AnchorGen 是一种关键帧感知的几何锚定视频生成框架，用于提升机器人动作条件视频的三维一致性。方法通过自监督二维-三维对比学习自动发现接触与状态变化等重要关键帧，并以稀疏几何特征作为结构化条件注入多模态扩散模型，在真实机器人数据上显著提升生成质量与空间一致性。

SizeGS: Size-aware Compression of 3D Gaussian Splatting via Mixed Integer Programming

ACM MM 2025 (CCF-A) Best Paper Candidate最佳论文候选

[Project] [arXiv] [Code]

基于混合整数规划的尺寸感知压缩框架，旨在通过快速搜索超参数将 3DGS 压缩至预定大小。能在一分钟内搜索到满足尺寸约束的最佳参数，实现 SOTA 级别的离线压缩性能。

EVOS: Efficient Implicit Neural Training via EVOlutionary Selector

CVPR 2025 (CCF-A)

[Project] [arXiv] [Code]

提出动态样本选择框架 EVOS，将样本坐标视为进化个体，通过稀疏适应度评估、间隔采样与缓存机制，大幅降低计算量。结合频率引导交叉和增强无偏变异，克服频谱偏差，训练时间减少 48%-66%。

COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation

CVPR 2025 (CCF-A)

[Project] [arXiv] [Code]

针对视觉语言模型在测试时领域适应的性能退化，提出了创新 COSMIC 框架，通过多粒度、跨模态语义缓存和基于图的查询机制，显著增强模型适应能力。跨域测试任务性能提升 15.81%。

Enhancing Implicit Neural Representations via Symmetric Power Transformation

AAAI 2025 (CCF-A)

[Project] [arXiv] [Code]

创新性地提出“对称幂变换”，基于“范围定义对称假设”，通过非线性可逆变换重构数据分布，同时完成范围约束与对称化。在零额外成本的前提下，解决了极端偏差放大与边界断裂难题，并在 ImageNet 上斩获最佳 PSNR/SSIM 指标。

MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation

ECCV 2024 (CCF-B)

[Project] [arXiv] [Code]

一种高效的后训练 3D 高斯压缩编解码器。引入视点相关与无关重要性度量标准，结合属性变换（如RAHT）和块量化策略，在大幅压缩体积的同时保留高质量渲染。

Team团队

Principal Investigator指导老师

Ph.D. Students博士研究生

Master Students硕士研究生

Join Us加入我们

We are always looking for passionate and talented students to join our team. If you are interested in shaping the future of embodied AI, we encourage you to apply!

我们一直在寻找有热情和才华的学生加入我们的团队。如果您对塑造具身人工智能的未来感兴趣，我们鼓励您申请。

Self-Evolving Embodied Agents Group 自我进化具身智能体小组

MMLab@SIGS

Research Directions研究方向

Efficient Embodied Data Preparation 高效具身数据制备

Multimodal Perception & World Models 多模态感知与世界模型

Spatiotemporal Decision Making 时空决策与机器人策略学习

Agent Continuous Learning 智能体持续学习

Selected Publications代表性成果

Team团队

Principal Investigator指导老师

Zhi Wang (王智)

Jingyan Jiang (姜婧妍)

Ph.D. Students博士研究生

Master Students硕士研究生

Join Us加入我们

🎓 Undergraduates 🎓 本科生

🔬 Masters & PhDs 🔬 硕博研究生

🌍 Interns & Visiting 🌍 实习 / 访问学者

🤝 Corporate Collaboration 🤝 企业合作