【深層学習】GPT-2 - 大規模言語モデルの可能性を見せ、社会もざわつかせたモデルの仕組み【ディープラーニングの世界vol.33】#113 #VRアカデミア #DeepLearning

This article introduces GPT-2, a language model that was previously shelved after its ability to generate supernatural sentences caused a stir. Staying away from the social sensationalism, we'll explain what kind of model it is, what task it solves, how it solves it, and what makes it so great! ▼Related Video Watch the Transformer video here! • 【深層学習】Transformer - Multi-Head Attentionを理... For busy people → • 【深層学習】忙しい人のための Transformer と Multi-Head At... GPT → • 【深層学習】GPT - 伝説の始まり。事前学習とファインチューニングによるパラダイム... The World of Deep Learning • Deep Learning の世界 Natural Language Processing Series • 自然言語処理シリーズ ▼References Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9. http://www.persagen.com/files/misc/ra... Original paper! People were excited about this amazing AI, training a huge language model with huge data sets, but behind the scenes there was a lot of thoughtful, meticulous design and effort. As people who create AI, I think it would be good to read this book to understand what lies behind the spectacular results. He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer, Cham, 2016. https://link.springer.com/chapter/10.... Residual connection is not just a solution to vanishing gradients, it's also an element in identity mapping learning. This paper is written with that philosophy in mind. I like it. Here's a video on ResNet explaining its philosophy: • 【深層学習】 CNN 紹介 "ResNet" 言わずとしれた CNN の標準技術が登... Reddy, Siva, Danqi Chen, and Christopher D. Manning. "Coqa: A conversational question answering challenge." Transactions of the Association for Computational Linguistics 7 (2019): 249-266. https://direct.mit.edu/tacl/article/d... This is a paper on the CoQA dataset, one of the tasks. It's a collection of TOEIC-style questions, and it's amazing that a deep learning model can solve them. Even just flipping through it should give you a good idea! [2019 Edition] Summary of Representative Natural Language Processing Models and Algorithms Timeline - Qiita https://qiita.com/LeftLetter/items/14... I use this as a reference for various videos. ▼In Closing Thank you for watching! If you enjoyed the video, please like and subscribe. If you have any questions or comments about the video, please leave a comment or message me on Twitter! For business or collaboration requests, please contact me via my official website or Twitter DM. AIcia Solid Project - Official Website - https://sites.google.com/view/aicia-o... Video Creation: AIcia Solid (Twitter: / aicia_solid ) Video Editing: AIris Solid (Younger Sister) (Twitter: / airis_solid ) ======= Logo: TEICA ( / t_e_i_c_a ) Model: http://3d.nicovideo.jp/works/td44519 Model by: W01fa ( / w01fa )

【深層学習】XLNet 前編 - BERT の事前学習を工夫して強くなりました【ディープラーニングの世界vol.34-1】#114 #VRアカデミア #DeepLearning

【深層学習】XLNet 前編 - BERT の事前学習を工夫して強くなりました【ディープラーニングの世界vol.34-1】#114 #VRアカデミア #DeepLearning

【重力を説明できない→この世界が2次元からのホログラムと考えれば計算できる】物理学者・橋本幸士／量子もつれから時空が創発／ブラックホールとエントロピーの関係からたどり着いた【ULTRASCIENCE】

【重力を説明できない→この世界が2次元からのホログラムと考えれば計算できる】物理学者・橋本幸士／量子もつれから時空が創発／ブラックホールとエントロピーの関係からたどり着いた【ULTRASCIENCE】

【11分で分かる】最近話題のGPTシリーズの進化の軌跡と違い！GPT-1→GPT-2→GPT-3→GPT-3.5→ChatGPT→GPT-4まで！

【11分で分かる】最近話題のGPTシリーズの進化の軌跡と違い！GPT-1→GPT-2→GPT-3→GPT-3.5→ChatGPT→GPT-4まで！

Training Sand to Think: Artificial General Intelligence & Future of Physics

Training Sand to Think: Artificial General Intelligence & Future of Physics

【深層学習】 CNN 紹介 "ResNet" 言わずとしれた CNN の標準技術が登場！【ディープラーニングの世界 vol. 17】#080 #VRアカデミア #DeepLearning

【深層学習】 CNN 紹介 "ResNet" 言わずとしれた CNN の標準技術が登場！【ディープラーニングの世界 vol. 17】#080 #VRアカデミア #DeepLearning

This is not the AI we were promised | The Royal Society

This is not the AI we were promised | The Royal Society

【深層学習】Transformer - Multi-Head Attentionを理解してやろうじゃないの【ディープラーニングの世界vol.28】#106 #VRアカデミア #DeepLearning

【深層学習】Transformer - Multi-Head Attentionを理解してやろうじゃないの【ディープラーニングの世界vol.28】#106 #VRアカデミア #DeepLearning

The Most Viral Science Videos Ever

The Most Viral Science Videos Ever

量子コンピュータはなぜ“桁違い”に速いのか？

量子コンピュータはなぜ“桁違い”に速いのか？

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

【数えるだけ】AIが単語を理解するトリックが巧妙すぎる【大規模言語モデル2】#130

【数えるだけ】AIが単語を理解するトリックが巧妙すぎる【大規模言語モデル2】#130

【深層学習】BERT - 実務家必修。実務で超応用されまくっている自然言語処理モデル【ディープラーニングの世界vol.32】#110 #VRアカデミア #DeepLearning

【深層学習】BERT - 実務家必修。実務で超応用されまくっている自然言語処理モデル【ディープラーニングの世界vol.32】#110 #VRアカデミア #DeepLearning

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Will superintelligent AI "definitely" destroy humanity? – Deciphering "If we create superintellig...

Will superintelligent AI "definitely" destroy humanity? – Deciphering "If we create superintellig...

【完全版】この動画1本でディープラーニング実装（PyTorch）の基礎を習得！忙しい人のための速習コース

【完全版】この動画1本でディープラーニング実装（PyTorch）の基礎を習得！忙しい人のための速習コース

絶対に理解させる誤差逆伝播法【深層学習】

絶対に理解させる誤差逆伝播法【深層学習】

【8分で分かる】大規模言語モデルLLMまとめ！

【8分で分かる】大規模言語モデルLLMまとめ！

【深層学習】CNN紹介 "GoogLeNet" ILSVRC2014をInceptionで制したモデル【ディープラーニングの世界 vol. 15】#078 #VRアカデミア #DeepLearning

【深層学習】CNN紹介 "GoogLeNet" ILSVRC2014をInceptionで制したモデル【ディープラーニングの世界 vol. 15】#078 #VRアカデミア #DeepLearning

【深層学習】GPT-3 ①-1 モデルと Sparse Transformer について【ディープラーニングの世界vol.39】#124 #VRアカデミア #DeepLearning

【深層学習】GPT-3 ①-1 モデルと Sparse Transformer について【ディープラーニングの世界vol.39】#124 #VRアカデミア #DeepLearning

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)