V
主页
京东 11.11 红包
Farzi Data: Autoregressive Data Distillation
发布人
论文简述:在这篇名为Farzi Data: Autoregressive Data Distillation的论文中,作者提出了一种名为Farzi的数据蒸馏方法,用于自动回归机器学习任务。这些任务的输入和输出具有严格的从左到右的因果关系。具体来说,Farzi将事件序列数据汇总为少量合成序列,这些序列经过优化以保持(即使不是提高)与使用完整数据集训练模型的性能相当。在背后,Farzi通过以下方式执行高效的内存高效数据蒸馏:(i) 利用Hessian-Vector产品实现Adam优化器的有效反向模式区分;以及(ii)将高维离散事件空间因子化为可证明有助于隐式正则化的latent space. 在实证方面,对于序列推荐和语言建模任务,我们能够在Farzi数据的训练上达到98-120%的下游完整数据性能,当使用大小仅为原始数据集0.1%的Farzi数据进行训练时。值得注意的是,能够用显著更少的数据训练更好的模型为未来自动回归模型的设计提供了启示,并为进一步扩大模型和数据规模创造了新的机会。 论文链接: https://arxiv.org/pdf/2310.09983
打开封面
下载高清视频
观看高清视频
视频下载器
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixtu
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Ziya2: Data-centric Learning is All LLMs Need
Conditional Diffusion Distillation
UT5: Pretraining Non autoregressive T5 with unrolled denoising
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Mode
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
FreeControl:实现任意文本到图像扩散模型的无训练空间控制
Transformers are Multi-State RNNs
基于3D高斯展开的实时少样本视角合成
Toward Joint Language Modeling for Speech Units and Text
VeRA: Vector-based Random Matrix Adaptation
P5: Plug-and-Play Persona Prompting for Personalized Response Selection
DeepCache:无需训练加速扩散模型的创新方法
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters
无需相机参数的3D高斯散射:COLMAP-Free 3DGS实现稳健的视角合成与姿态估计
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
单张图像到3D的高效生成:基于分摊生成的3D高斯模型
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completi
EvoPrompt: AI Prompt Optimizer
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Improving Summarization with Human Edits
Interactive Task Planning with Language Models
CLIP作为RNN:无需训练即可分割无数视觉概念
TiC-CLIP: Continual Training of CLIP Models
稳定分数蒸馏:高质量三维生成新方法
大型语言模型在上下文学习中的可靠性提升:结合监督知识的方法
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
System 2 Attention (is something you might need too)
Retrieval meets Long Context Large Language Models
ReFT: Reasoning with Reinforced Fine-Tuning
综述:大模型的可解释性研究
CLEX: Continuous Length Extrapolation for Large Language Models
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Drivable 3D Gaussian Avatars
TrustLLM: Trustworthiness in Large Language Models
Contrastive Chain-of-Thought Prompting
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents
FlowVid:利用不完美光流实现一致性视频到视频合成