이 다이얼이 100억개 모이면 챗지피티가 됩니다 | 퍼셉트론 (Perceptron)

핸드메이드 지피티 자막이슈로 재업로드 하였습니다! 참조 Rumelhart, D. E., Mcclelland, J. L. (1987). Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations. United Kingdom: Penguin Random House LLC. Talking Nets: An Oral History of Neural Networks. (2000). United Kingdom: MIT Press. Prince, S. J. (2023). Understanding Deep Learning. United Kingdom: MIT Press. Crevier, D. (1993). AI : the tumultuous history of the search for artificial intelligence. New York: Basic Books. Cat and dog face dataset: https://www.kaggle.com/datasets/andre... Minsky, M., Papert, S. (2017). Perceptrons: An Introduction to Computational Geometry. United Kingdom: MIT Press. Widrow, Bernard, and Michael A. Lehr. "30 years of adaptive neural networks: perceptron, madaline, and backpropagation." Proceedings of the IEEE 78.9 (1990): 1415-1442. Olazaran, Mikel. "A sociological history of the neural network controversy." *Advances in computers*. Vol. 37. Elsevier, 1993. 335-425. Widrow, Bernard. "Generalization and information storage in networks of adaline neurons." Self-organizing systems (1962): 435-461. Widrow, Bernard. "Thinking about thinking: the discovery of the LMS algorithm." IEEE Signal Processing Magazine 22.1 (2005): 100-106. ChatGPT의 뉴런 수 계산 방법: GPT-2의 구현(https://github.com/karpathy/build-nan... 기준으로 설명함. 여기서 keys, queries, values는 모두 n_embd 개의 입력과 3 × n_embd 개의 출력을 갖는 Linear layer로 구현되어 있다. 여기서 n_embd는 임베딩 차원(embedding dimension) 을 의미한다. 출력 프로젝션 레이어는 n_embd 입력과 n_embd 출력을 갖는다. 따라서 단일 어텐션 레이어(single attention layer) 는 대략 4 × n_embd 개의 뉴런을 가진다. GPT-3의 임베딩 차원은 12,288이므로, 각 어텐션 레이어는 약 49,152개의 뉴런을 가진다. 각 MLP 블록은 n_embd 입력, 4 × n_embd 개의 은닉 유닛(hidden units), n_embd 출력을 가지므로 총 5 × n_embd ≈ 61,440개의 뉴런을 가진다. 따라서 GPT-3 전체의 뉴런 수는 다음과 같다: 96 × (49,152 + 61,440) = 약 10,616,832개, 초기 임베딩(initial embedding)과 최종 unembedding 단계는 제외한 수치이다. 마지막으로, 보고에 따르면 GPT-4는 약 1.8조(Trillion) 개의 파라미터를 가지고 있으며 (출처: https://newsletter.semianalysis.com/p...) 이는 GPT-3보다 약 10배 더 큰 규모이다.

AI는 무엇을 숨기고 있는가, '전두엽 절제술 AI 버전' | 희소 오토인코더

AI는 무엇을 숨기고 있는가, '전두엽 절제술 AI 버전' | 희소 오토인코더

인류가 이해한 AI는 여기까지 입니다 | 알렉스넷

인류가 이해한 AI는 여기까지 입니다 | 알렉스넷

ChatGPT is made from 100 million of these [The Perceptron]

ChatGPT is made from 100 million of these [The Perceptron]

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

Why Is AI Still "Downhill"? | Gradient Descent and Local Minima

Why Is AI Still "Downhill"? | Gradient Descent and Local Minima

The Power of Quantum Computers: Solving a Problem That Would Take 10 Billion Years in Just 5 Minu...

The Power of Quantum Computers: Solving a Problem That Would Take 10 Billion Years in Just 5 Minu...

Scott Aaronson - The TRUTH About Quantum Computing

Scott Aaronson - The TRUTH About Quantum Computing

How might LLMs store facts | Deep Learning Chapter 7

How might LLMs store facts | Deep Learning Chapter 7

Don't make chat-g-p-t-i at home; buy it | Scaling Law

Don't make chat-g-p-t-i at home; buy it | Scaling Law

"Professor, the textbooks are wrong" | Double Descent

"Professor, the textbooks are wrong" | Double Descent

ASMR Best Triggers For Sleep Collection (No Talking) 3 Hours of Tapping & Scratching

ASMR Best Triggers For Sleep Collection (No Talking) 3 Hours of Tapping & Scratching

트랜스포머, ChatGPT가 트랜스포머로 만들어졌죠. - DL5

트랜스포머, ChatGPT가 트랜스포머로 만들어졌죠. - DL5

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

The most elegant way to understand atoms

The most elegant way to understand atoms

[무료] 딥러닝 첫걸음 (홍랩 AI 시리즈)

[무료] 딥러닝 첫걸음 (홍랩 AI 시리즈)

Without this, there would be no AI today | Backpropagation

Without this, there would be no AI today | Backpropagation

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

Deepseek's +99 Enhanced Transformer Club | KV Cache & Multihead Potential Attention

Deepseek's +99 Enhanced Transformer Club | KV Cache & Multihead Potential Attention