V
主页
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
发布人
【加群】 一起来刷arxiv,请加vx: pwbot02(请备注:b站arxiv) 【彩蛋】 可以试试/ask + 你的提问和本篇论文进行交流 【论文标题】 EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision 【论文简述】 本论文提出了EmerNeRF,一种学习动态驾驶场景空间-时间表示的简单而强大的方法。EmerNeRF基于神经场,通过自启动能够同时捕捉场景几何、外观、运动和语义。EmerNeRF依赖于两个核心组件:首先,它将场景分成静态场和动态场。这种分解完全通过自监督实现,使我们的模型能够从广泛的实际数据源中学习。其次,EmerNeRF将动态场中的感应流场参数化,并使用该流场进一步聚合多帧特征,增强动态物体的渲染精度。将这三个场(静态、动态和流)耦合在一起,使EmerNeRF能够自足地表示高度动态的场景,无需依赖于地面实况物体注释或预训练的动态对象分割模型或光流估计模型。我们的方法在传感器模拟中实现了最先进的性能,在重建静态场景(+2.93 PSNR)和动态场景(+3.70 PSNR)时明显优于先前的方法。此外,为了增强EmerNeRF的语义泛化能力,我们将2D视觉基础模型特征转化为4D时空,并解决了现代Transformer中的一般位置偏差问题,显著提高了3D感知性能(例如,平均占用预测准确率相对改善37.50%)。最后,我们构建了一个多样且具有挑战性的120序列数据集,以在极端和高度动态的环境下评估神经场的性能。 【引导阅读的问题】 如何通过EmerNeRF方法自主学习动态驾驶场景的空间-时间表示,达到重建静态和动态场景的最佳性能? 【论文链接】 https://arxiv.org/pdf/2311.02077
打开封面
下载高清视频
观看高清视频
视频下载器
神经场与触觉感知:实现机器人手部操作的视触觉感知
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
ADaPT: As-Needed Decomposition and Planning with Language Models
Anything in Any Scene: Photorealistic Video Object Insertion
Large Language Models Cannot Self-Correct Reasoning Yet
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optim
Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refine
VeRA: Vector-based Random Matrix Adaptation
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Moral Foundations of Large Language Models
Farzi Data: Autoregressive Data Distillation
In-Context Learning Creates Task Vectors
System 2 Attention (is something you might need too)
Can Large Language Models be Good Path Planners? A Benchmark and Investigation o
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledg
Language Models can be Logical Solvers
Visual In-Context Prompting
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixtu
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
How FaR Are Large Language Models From Agents with Theory-of-Mind?
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
视觉RMT网络
RLVF: Learning from Verbal Feedback without Overgeneralization
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with
AutoMix: Automatically Mixing Language Models
SANeRF-HQ:基于提示的高质量NeRF三维物体分割
CLEX: Continuous Length Extrapolation for Large Language Models
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters
Approximating Two-Layer Feedforward Networks for Efficient Transformers
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completi
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Compressing Context to Enhance Inference Efficiency of Large Language Models
GridFormer 表结构识别方法
Improving Summarization with Human Edits
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
UFOGen: 一步高效文本到图像生成模型
Making Large Language Models Perform Better in Knowledge Graph Completion