V
主页
京东 11.11 红包
[论文简析]Improving fine-grained understanding in image-text pre-training[2401.0986]
发布人
论文题目:Improving fine-grained understanding in image-text pre-training / SPARC 论文地址:http://arxiv.org/abs/2401.09865 * 视频受up能力限制经常出现中英混杂,散装英语等现象,请见谅。如论文理解报道出了偏差,欢迎各位怒斥。 ** 新论文推荐,过往论文查找,欢迎编辑这个文档: https://docs.qq.com/sheet/DSUdOTG9xWUdydVB6 *** Slides每1-2月会上传到置顶动态地址
打开封面
下载高清视频
观看高清视频
视频下载器
[论文简析]DropPos: Pre-Training ViTs by Reconstructing Dropped Positions[2309.03576]
[论文速览]Bootstrapping Language-Image Pre-training...[2201.12086]
[论文简析]Contrastive Language, Action, and State Pre-training...[2304.10782]
[论文速览]iBOT: Image BERT Pre-Training with Online Tokenizer[2111.07832]
[论文简析]DeepLab: Semantic Image Segmentation with DCN..[1606.00915]
[论文简析]GroupViT: Semantic Segmentation Emerges from Text Supervision[2202.11094]
[论文简析]DAT: Vision Transformer with Deformable Attention[2201.00520]
[论文简析]DeiT: Data-efficient Image Transformers[2012.12877]
[论文简析]CLIP Dense Inference Yields Open-Vocab ... For-Free[2309.14289]
[论文简析]Rethinking Pre-training and Self-training[2006.06882]
[论文简析]FlowNet3D: Learning Scene Flow in 3D Point Clouds[1806.01411]
[论文简析]XCiT: Cross-Covariance Image Transformers[2106.09681]
[论文简析]TokenLearner: What Can 8 Learned Tokens Do for Images and vids[2106.11297]
[论文简析]MONet: Unsupervised Scene Decomposition and Representation[1901.11390]
[论文简析]C-Learning: Learning to .. via Recursive Classification[2011.08909]
[论文简析]NeRF in the Wild: NeRF for Unconstrained Photo Collections[2008.02268]
[论文简析]Patching Open-Vocabulary Models by Interpolating Weights[2208.05592]
[论文简析]Vi2CLR: Video and Image for Visual Contrastive Learning of Representation
[论文速览]Flamingo: a Visual Language Model for Few-Shot Learning[2204.14198]
[论文速览]Denoising Diffusion Probabilistic Models / DDPM[2006.11239]
[论文简析]Crossway Diffusion: Improving Diffusion-based ... via SSL[2307.01849]
[论文速览]Multi-Object ... with Iterative Variational Inference[1903.00450]
[论文简析]Directional SSL for Heavy Image Augmentations[2110.13555]
[论文简析]Object-Centric Learning with Slot Attention[2006.15055]
[论文简析]BYOL: Bootstrap Your Own Latent[2006.07733]
[论文简析]Humble Teachers Teach Better Students for Semi-Supervised Object Detection
一口气学完回归算法、聚类算法、决策树、随机森林、神经网络、贝叶斯算法、支持向量机、神经网络等十二大机器学习算法!通俗易懂
[论文速览]LLaVA: Visual Instruction Tuning[2304.08485]
[论文简析]TAN: Temporal Alignment Networks for Long-term Video[2204.02968]
[论文速览]Ferret: Refer and Ground Anything Anywhere at Any Granularity[2310.07704]
[论文速览]Scalable Video Object Segmentation with Simplified Framework[2308.09903]
[论文简析]BiFormer: Vision Transformer with Bi-Level Routing Attention[2303.08810]
[论文简析]Equivariant Contrastive Learning[2111.00899]
[论文简析]End-to-End Learning... from Uncurated Instructional Videos[1912.06430]
[论文简析]MoCoGAN-HD: A Good Image Generator Is What You Need...[2104.15069]
[论文速览]Diffusion Policy: Visuomotor Policy Learning via Action Diff.[2303.04137]
[论文简析]Stabilizing transformers for reinforcement learning[1910.06764]
[论文简析]Red Circle: Visual Prompt Engineering for VLMs[2304.06712]
[论文简析]Regularized Vector Quantization for Tokenized Image Synthesis[2303.06424]
[论文简析]XSkill: Cross Embodiment Skill Discovery[2307.09955]