The FinBen: An Holistic Financial Benchmark for Large Language Models

发布人

【加群】 一起来刷arxiv，请加vx: pwbot02(请备注：b站arxiv) 
【论文标题】 The FinBen: An Holistic Financial Benchmark for Large Language Models
【论文简述】 这篇论文介绍了一种金融领域中基于语言模型的综合评估基准——FinBen，该基准由35个数据集组成，涵盖了23个金融任务，并按照Cattell-Horn-Carroll理论的启发，将这些任务分为三个难度谱系，旨在全面评估语言模型在金融领域中的能力。通过对包括GPT-4、ChatGPT和最新的Gemini在内的15个代表性语言模型的评估，研究人员发现GPT-4在定量化、提取、数值推理和股票交易方面表现出色，而Gemini在生成和预测方面表现出色。然而，两者在复杂提取和预测方面表现不佳，需要有针对性的改进。研究还发现，指令调优可以提升简单任务的性能，但在改善复杂推理和预测能力方面效果有限。FinBen将不断评估金融领域中的语言模型，在任务和模型的定期更新中促进人工智能的发展。
【论文链接】 https://arxiv.org/abs/2402.12659

打开封面下载高清视频观看高清视频视频下载器

The FinBen: An Holistic Financial Benchmark for Large Language Models

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Retrieval meets Long Context Large Language Models

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review

Interactive Task Planning with Language Models

Offline Actor-Critic Reinforcement Learning Scales to Large Models

LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language M

MusicAgent: An AI Agent for Music Understanding and Generation with Large Langua

Are Large Language Models Post Hoc Explainers?

Learning to Learn Faster from Human Feedback with Language Model Predictive Cont

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

Secrets of RLHF in Large Language Models Part II: Reward Modeling

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Llemma: An Open Language Model For Mathematics

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Anthropic：人工智能的性格应该是什么样的？

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

How FaR Are Large Language Models From Agents with Theory-of-Mind?

Can Large Language Models be Good Path Planners? A Benchmark and Investigation o

[2024年] [中英字幕] 7 Building AI Models in the Wild | 麻省理工学院深度学习导论 6.S191

Controlled Decoding from Language Models

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Moral Foundations of Large Language Models

AutoMix: Automatically Mixing Language Models

LayoutPrompter: Awaken the Design Ability of Large Language Models

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-S

Language Models can be Logical Solvers

Orca 2: Teaching Small Language Models How to Reason

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

基于语言模型的知识探测和推理方法

FlashDecoding++: Faster Large Language Model Inference on GPUs

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-

Making Large Language Models Perform Better in Knowledge Graph Completion

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

The Consensus Game: Language Model Generation via Equilibrium Search

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling