V
主页
京东 11.11 红包
Approximating Two-Layer Feedforward Networks for Efficient Transformers
发布人
论文简述:在这篇名为Approximating Two-Layer Feedforward Networks for Efficient Transformers的论文中,作者们提出了一种新的方法来降低神经网络(NN)的计算和内存需求,同时保持性能。他们采用了稀疏混合专家(MoE)构建资源高效的大型语言模型(LM)。为了实现这一目标,作者们提出了一些关于MoEs的新观点,并建立了一个统一的方法来近似两层NNs,例如Transformer的馈送块。他们还提出了一种改进MoEs和PKM(产品键记忆)的方法。与以往将MoEs与密集基准进行比较的工作不同,这篇论文的评价条件是参数相等,这对于正确评估LM至关重要。实验结果表明,与密集的Transformer-XL相比,作者提出的MoEs在WikiText-103和enwiki8两个数据集上都具有竞争力,同时具有更高的资源效率。这表明MoEs不仅适用于极端大型LM,也适用于任何规模的资源高效LM。 论文链接: https://arxiv.org/pdf/2310.10837
打开封面
下载高清视频
观看高清视频
视频下载器
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Implicit Diffusion: Efficient Optimization through Stochastic Sampling
System 2 Attention (is something you might need too)
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirect
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
When can transformers reason with abstract symbols?
ConvNets Match Vision Transformers at Scale
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with
BitNet: Scaling 1-bit Transformers for Large Language Models
LayoutPrompter: Awaken the Design Ability of Large Language Models
Transformers are Multi-State RNNs
MusicAgent: An AI Agent for Music Understanding and Generation with Large Langua
In-Context Learning Creates Task Vectors
AutoMix: Automatically Mixing Language Models
Farzi Data: Autoregressive Data Distillation
基于潜在变量推断的训练链式思维提升语言模型推理能力
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language M
Language Models can be Logical Solvers
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Visual In-Context Prompting
TiC-CLIP: Continual Training of CLIP Models
Two-Stage Statistics-Aware Transformation for Style Transfer
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters
Improving Summarization with Human Edits
Specific versus General Principles for Constitutional AI
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Toward Joint Language Modeling for Speech Units and Text
基于上下文调整的检索增强生成方法
Localizing and Editing Knowledge in Text-to-Image Generative Models
Ultra-Long Sequence Distributed Transformer
FreeControl:实现任意文本到图像扩散模型的无训练空间控制
基于CapsFusion的高质量多模态预训练数据生成
Question Aware Vision Transformer for Multimodal Reasoning
多模态基础模型研究综述
Controlled Decoding from Language Models
FLAP: Fast Language-Audio Pre-training
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completi