[动手写 bert 系列] torch.no_grad() vs. param.requires_grad == False

发布人

https://github.com/chunhuizhang/bilibili_vlogs/tree/master/fine_tune/bert/tutorials
https://stackoverflow.com/questions/63785319/pytorch-torch-no-grad-versus-requires-grad-false

打开封面下载高清视频观看高清视频视频下载器

[动手写bert系列] 01 huggingface tokenizer （vocab，encode，decode）原理及细节

[动手写 bert 系列] bert model architecture 模型架构初探（embedding + encoder + pooler）

[动手写bert系列] BertSelfLayer 多头注意力机制（multi head attention）的分块矩阵实现

[动手写 bert 系列] 解析 bertmodel 的output(last_hidden_state，pooler_output，hidden_state)

[动手写bert] bert pooler output 与 bert head

[pytorch] 多项式分布及采样（torch.multinomial, torch distribution Categorical）

[性能测试] 04 双4090 BERT、GPT性能测试（megatron-lm、apex、deepspeed）

[[bert、t5、gpt] 02 transformer 架构 scaled dot product self attention（qkv）

[pytorch] 激活函数（梯度消失）sigmoid，clamp，relu（sparse representation，dying relu）

【高阶数据结构】布隆过滤器（Bloom Filter）误识别（false positive，伪阳）概率的计算

[bert、t5、gpt] 11 知识蒸馏（knowledge distill）huggingface trainer pipeline

[动手写 bert 系列] 02 tokenizer encode_plus, token_type_ids（mlm，nsp）

[动手写神经网络] 01 认识 pytorch 中的 dataset、dataloader（mnist、fashionmnist、cifar10）

[bert、t5、gpt] 10 知识蒸馏（knowledge distill）初步，模型结构及损失函数设计

[Python 机器学习] 深入理解 numpy（ndarray）的 axis（轴/维度）

[动手写 bert 系列] bert embedding 源码解析，word_embedding/position_embedding/token_type

[动手写 bert 系列] Bert 中的（add & norm）残差连接与残差模块（residual connections/residual blocks）

[动手写Bert系列] bertencoder self attention 计算细节及计算过程

[pytorch] 深入理解 torch.gather 及 dim 与 index 的关系

[bert、t5、gpt] 04 构建 TransformerEncoderLayer（FFN 与 Layer Norm、skip connection）

[LLMs 实践] 21 llama2 源码分析 GQA：Grouped Query Attention

[pytorch] torch.einsum 到索引到矩阵运算（index、shape、dimension、axis）

[动手写 bert] masking 机制、bert head 与 BertForMaskedLM

[QKV attention] kv-cache、decoder only vs. BERT, 单向注意力 vs. 双向注意力

[动手写神经网络] 如何设计卷积核（conv kernel）实现降2采样，以及初探vggnet/resnet 卷积设计思路（不断降空间尺度，升channel）

[LLM 番外] 自回归语言模型cross entropy loss，及 PPL 评估

[BERT 番外] Sin Position Encoding 的简洁实现（RoPE 基础）

[python 运筹优化] 系统性介绍 scipy 中的非线性最小二乘（NNLS, curve_fit, least_squares）

[personal chatgpt] 从 RoPE 到 CoPE（绝对位置编码，相对位置编码，Contextual Position Encoding）

[personal chatgpt] gpt-4o tokenizer 及特殊中文tokens（压缩词表），o200k_base

[实战kaggle系列] 1. 使用 kaggle 命令行 api，进行数据集的下载

[pytorch 模型拓扑结构] pytorch 矩阵乘法大全（torch.dot, mm, bmm, @, *, matmul）

[[bert、t5、gpt] 03 AttentionHead 与 MultiHeadAttention

[矩阵分析] 从向量范数到矩阵范数、torch spectral norm（矩阵的谱范数）

[动手写 Transformer] 手动实现 Transformer Decoder（交叉注意力，encoder-decoder cross attentio）

[pytorch模型拓扑结构] nn.MultiheadAttention, init/forward, 及 query，key，value 的计算细节

[数据可视化] 绘制交互式 3d plot（interactive 3d plot, Axes3d） z=f(x, y) （三维空间中的 surface）

【python 运筹优化】scipy.optimize.minimize 接口介绍（method、jacobian、hessian）| 有约束非线性优化

[LLMs 实践] 13 gradient checkpointing 显存优化 trick

[数学！数学] 最大似然估计（MLE）与最小化交叉熵损失（cross entropy loss）的等价性