[mcts] 02 mcts from scartch（UCTNode，uct_search, pUCT，树的可视化）

发布人

本期 code：
https://github.com/chunhuizhang/personal_chatgpt/blob/main/tutorials/drl/mcts/mcts_02_from_scartch.ipynb
[mcts] 01 mcts 基本概念基本原理（UCB）及两个示例：https://www.bilibili.com/video/BV1zC411h7B8/

打开封面下载高清视频观看高清视频视频下载器

[mcts] 01 mcts 基本概念基本原理（UCB）及两个示例

[动手写 Transformer] 手动实现 Transformer Decoder（交叉注意力，encoder-decoder cross attentio）

[RLHF] 从 PPO rlhf 到 DPO，公式推导与原理分析

[bert、t5、gpt] 07 GPT2 decoding （greedy search, beam search）

【搜索算法】【search】01 python-astar 图上搜索（graph search）f(n)=g(n)+h(n)

[LLMs inference] hf transformers 中的 KV cache

[DRL] 从 TRPO 到 PPO（PPO-penalty，PPO-clip）

[personal chatgpt] 从 RoPE 到 CoPE（绝对位置编码，相对位置编码，Contextual Position Encoding）

[动手写bert系列] 01 huggingface tokenizer （vocab，encode，decode）原理及细节

[AI硬件科普] 内存/显存带宽，从 NVIDIA 到苹果 M4

[AI 核心概念及计算] 优化 01 梯度下降（gradient descent）与梯度上升（gradient ascent）细节及可视化分析

[蒙特卡洛方法] 01 从黎曼和式积分（Reimann Sum）到蒙特卡洛估计（monte carlo estimation）求积分求期望

[蒙特卡洛方法] 02 重要性采样（importance sampling）及 python 实现

[动手写神经网络] pytorch 高维张量 Tensor 维度操作与处理，einops

[全栈深度学习] 02 vscode remote（远程）gpus 服务器开发调试 debugger（以 nanoGPT 为例）

[工具的使用] python jupyter 环境安装配置拓展（nbextension）（ExcecuteTime：执行时间，Table of Content）

[A100 02] GPU 服务器压力测试，gpu burn，cpu burn，cuda samples

[动手写神经网络] 05 使用预训练 resnet18 提升 cifar10 分类准确率及误分类图像可视化分析

【深度学习环境搭建】02 gpu 服务器端部署 jupyter notebook server

[蒙特卡洛方法] 04 重要性采样补充，数学性质及 On-policy vs. Off-policy

[prompt engineering] 从 CoT 到 ToT（Tree of Thoughts）

[pytorch distributed] 02 DDP 基本概念（Ring AllReduce，node，world，rank，参数服务器）

【搜索算法】【search】02 爬山算法（hill climbing）二维离散空间上的邻域搜索

[LLMs 实践] 20 llama2 源码分析 cache KV（keys、values cache）加速推理

[QKV attention] flash attention（Tiling与重计算），operation fused，HBM vs. SRAM

[LangChain] 03 LangGraph 基本概念（AgentState、StateGraph，nodes，edges）

[pytorch] [求导练习] 01 sigmoid 函数自动求导练习（autograd，单变量，多变量 multivariables 形式）

[调包侠] 使用 PyTorch Swin Transformer 完成图像分类

[LLMs inference] quantization 量化整体介绍（bitsandbytes、GPTQ、GGUF、AWQ）

[pytorch 网络拓扑结构] 深入理解 nn.LayerNorm 的计算过程

[强化学习基础 03] 多臂老虎机（Multi-Armed Bandit）与 UCB

[python 多进程、多线程] 03 GIL、threading、多进程，concurrent.futures

[模型拓扑接口] 经典 RNN 模型（一）模型参数及训练参数的介绍

[调包侠] tencent ailab 中文语料 embedding vector（word2vec）

[番外] float16 与 bf16 表示和计算细节

[pytorch distributed] 01 nn.DataParallel 数据并行初步

[LLM+RL] 合成数据与model collapse，nature 正刊封面

[pytorch] [求导练习] 03 计算图（computation graph）及链式法则（chain rule）反向传播过程

[LLMs 实践] 04 PEFT/LoRA 源码分析

[pytorch] [求导练习] 06 计算图（computation graph）细节之 retain graph（multi output/backwar）