V
主页
京.东618红包,每天可领3次
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
发布人
论文简述:在这篇论文中,作者提出了一种名为Wonder3D的新方法,用于从单视图图像高效地生成高保真纹理网格。与基于分数蒸馏采样(SDS)的最近方法相比,这些方法通常面临耗时且需要形状优化的问题。相反,某些直接通过快速网络推理产生3D信息的方法往往导致低质量的结果和缺乏几何细节。为了全面改进图像到3D任务的质量和一致性,作者提出了一种跨域扩散模型,该模型生成多视图法线地图和相应的颜色图像。为了确保一致性,作者采用了一种多视角跨域注意力机制,有助于在不同视点和模态之间交换信息。最后,作者引入了一种基于几何感知法线融合算法,从多视图2D表示中提取高质量表面。大量评估结果显示,与之前的工作相比,该方法实现了高质量的重建结果、鲁棒的一般性和合理的效率。这篇论文的主要贡献在于提出了一种跨域扩散模型,用于生成多视图法线地图和相应的颜色图像,以确保一致性。此外,作者还引入了一种基于几何感知法线融合算法,从多视图2D表示中提取高质量表面。通过这些方法的综合应用,Wonder3D方法在质量和效率方面取得了显著的改进,与之前的工作相比具有较高的性能。 论文链接: https://arxiv.org/pdf/2310.15008
打开封面
下载高清视频
观看高清视频
视频下载器
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Conditional Diffusion Distillation
Densely Captioned Images: 评估视觉语言模型的新基准
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Mo
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completi
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Drag View: Generalizable Novel View Synthesis with Unposed Imagery
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Implicit Diffusion: Efficient Optimization through Stochastic Sampling
Controllable Music Production with Diffusion Models and Guidance Gradients
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Mode
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network L
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-S
Tuna: Instruction Tuning using Feedback from Large Language Models
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
HallusionBench: You See What You Think? Or You Think What You See? An Image-Cont
Localizing and Editing Knowledge in Text-to-Image Generative Models
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
ImageBind-LLM: 多种模态指令调优
GridFormer 表结构识别方法
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
DragVideo: 交互式拖拽式视频编辑
PATHFINDER:基于树搜索的多步推理路径生成方法
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Gene
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optim
稳定分数蒸馏:高质量三维生成新方法
生成高质量的长视频:SEINE视频扩散模型
Simple and Scalable Strategies to Continually Pre-train Large Language Models
基于自然语言模型的离线强化学习框架LaMo
VideoLCM:基于视频潜在一致性模型的高效视频合成方法
CLIP作为RNN:无需训练即可分割无数视觉概念