Files
hands-on/book_equations.ipynb
2018-04-25 23:13:00 +09:00

1439 lines
54 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "book_equations.ipynb",
"version": "0.3.2",
"provenance": []
},
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
}
},
"cells": [
{
"metadata": {
"id": "ZICa1cn5n1Yv",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"**수식**\n",
"\n",
"*이 노트북은 책에 있는 모든 공식을 모아 놓은 것입니다.*\n",
"\n",
"**주의**: 깃허브의 노트북 뷰어는 적절하게 수식을 표현하지 못합니다. 로컬에서 주피터를 실행하여 이 노트북을 보거나 [nbviewer](http://nbviewer.jupyter.org/github/rickiepark/handson-ml/blob/master/book_equations.ipynb)를 사용하세요."
]
},
{
"metadata": {
"id": "A_lqPMDMn1Yx",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 1장\n",
"**식 1-1: 간단한 선형 모델**\n",
"\n",
"$\n",
"\\text{삶의_만족도} = \\theta_0 + \\theta_1 \\times \\text{1인당_GDP}\n",
"$"
]
},
{
"metadata": {
"id": "-ae0t1eRn1Yx",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 2장\n",
"**식 2-1: 평균 제곱근 오차 (RMSE)**\n",
"\n",
"$\n",
"\\text{RMSE}(\\mathbf{X}, h) = \\sqrt{\\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left(h(\\mathbf{x}^{(i)}) - y^{(i)}\\right)^2}\n",
"$\n",
"\n",
"\n",
"**표기법 (72 페이지):**\n",
"\n",
"$\n",
" \\mathbf{x}^{(1)} = \\begin{pmatrix}\n",
" -118.29 \\\\\n",
" 33.91 \\\\\n",
" 1,416 \\\\\n",
" 38,372\n",
" \\end{pmatrix}\n",
"$\n",
"\n",
"\n",
"$\n",
" y^{(1)}=156,400\n",
"$\n",
"\n",
"\n",
"$\n",
" \\mathbf{X} = \\begin{pmatrix}\n",
" (\\mathbf{x}^{(1)})^T \\\\\n",
" (\\mathbf{x}^{(2)})^T\\\\\n",
" \\vdots \\\\\n",
" (\\mathbf{x}^{(1999)})^T \\\\\n",
" (\\mathbf{x}^{(2000)})^T\n",
" \\end{pmatrix} = \\begin{pmatrix}\n",
" -118.29 & 33.91 & 1,416 & 38,372 \\\\\n",
" \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
" \\end{pmatrix}\n",
"$\n",
"\n",
"\n",
"**식 2-2: 평균 절대 오차**\n",
"\n",
"$\n",
"\\text{MAE}(\\mathbf{X}, h) = \\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left| h(\\mathbf{x}^{(i)}) - y^{(i)} \\right|\n",
"$\n",
"\n",
"**$\\ell_k$ 노름 (74 페이지):**\n",
"\n",
"$ \\left\\| \\mathbf{v} \\right\\| _k = (\\left| v_0 \\right|^k + \\left| v_1 \\right|^k + \\dots + \\left| v_n \\right|^k)^{\\frac{1}{k}} $\n"
]
},
{
"metadata": {
"id": "8fBBfiofn1Yy",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 3장\n",
"**식 3-1: 정밀도**\n",
"\n",
"$\n",
"\\text{정밀도} = \\cfrac{TP}{TP + FP}\n",
"$\n",
"\n",
"\n",
"**식 3-2: 재현율**\n",
"\n",
"$\n",
"\\text{재현율} = \\cfrac{TP}{TP + FN}\n",
"$\n",
"\n",
"\n",
"**식 3-3: $F_1$ 점수**\n",
"\n",
"$\n",
"F_1 = \\cfrac{2}{\\cfrac{1}{\\text{정밀도}} + \\cfrac{1}{\\text{재현율}}} = 2 \\times \\cfrac{\\text{정밀도}\\, \\times \\, \\text{재현율}}{\\text{정밀도}\\, + \\, \\text{재현율}} = \\cfrac{TP}{TP + \\cfrac{FN + FP}{2}}\n",
"$\n",
"\n"
]
},
{
"metadata": {
"id": "YTYQZMTjn1Yy",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 4장\n",
"**식 4-1: 선형 회귀 모델의 예측**\n",
"\n",
"$\n",
"\\hat{y} = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\dots + \\theta_n x_n\n",
"$\n",
"\n",
"\n",
"**식 4-2: 선형 회귀 모델의 예측 (벡터 형태)**\n",
"\n",
"$\n",
"\\hat{y} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\mathbf{\\theta}^T \\cdot \\mathbf{x}\n",
"$\n",
"\n",
"\n",
"**식 4-3: 선형 회귀 모델의 MSE 비용 함수**\n",
"\n",
"$\n",
"\\text{MSE}(\\mathbf{X}, h_{\\mathbf{\\theta}}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})^2}\n",
"$\n",
"\n",
"\n",
"**식 4-4: 정규 방정식**\n",
"\n",
"$\n",
"\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
"$\n",
"\n",
"\n",
"** 편도함수 기호 (165 페이지):**\n",
"\n",
"$\\frac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta})$\n",
"\n",
"\n",
"**식 4-5: 비용 함수의 편도함수**\n",
"\n",
"$\n",
"\\dfrac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta}) = \\dfrac{2}{m}\\sum\\limits_{i=1}^{m}(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})\\, x_j^{(i)}\n",
"$\n",
"\n",
"\n",
"**식 4-6: 비용 함수의 그래디언트 벡터**\n",
"\n",
"$\n",
"\\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) =\n",
"\\begin{pmatrix}\n",
" \\frac{\\partial}{\\partial \\theta_0} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
" \\frac{\\partial}{\\partial \\theta_1} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
" \\vdots \\\\\n",
" \\frac{\\partial}{\\partial \\theta_n} \\text{MSE}(\\mathbf{\\theta})\n",
"\\end{pmatrix}\n",
" = \\dfrac{2}{m} \\mathbf{X}^T \\cdot (\\mathbf{X} \\cdot \\mathbf{\\theta} - \\mathbf{y})\n",
"$\n",
"\n",
"\n",
"**식 4-7: 경사 하강법의 스텝**\n",
"\n",
"$\n",
"\\mathbf{\\theta}^{(\\text{다음 스텝})}\\,\\,\\, = \\mathbf{\\theta} - \\eta \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta})\n",
"$\n",
"\n",
"\n",
"$ O(\\frac{1}{\\epsilon}) $\n",
"\n",
"\n",
"$ \\hat{y} = 0.56 x_1^2 + 0.93 x_1 + 1.78 $\n",
"\n",
"\n",
"$ y = 0.5 x_1^2 + 1.0 x_1 + 2.0 + \\text{가우시안 잡음} $\n",
"\n",
"\n",
"$ \\dfrac{(n+d)!}{d!\\,n!} $\n",
"\n",
"\n",
"$ \\alpha \\sum_{i=1}^{n}{\\theta_i^2}$\n",
"\n",
"\n",
"**식 4-8: 릿지 회귀의 비용 함수**\n",
"\n",
"$\n",
"J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\dfrac{1}{2}\\sum\\limits_{i=1}^{n}\\theta_i^2\n",
"$\n",
"\n",
"\n",
"**식 4-9: 릿지 회귀의 정규 방정식**\n",
"\n",
"$\n",
"\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X} + \\alpha \\mathbf{A})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
"$\n",
"\n",
"\n",
"**식 4-10: 라쏘 회귀의 비용 함수**\n",
"\n",
"$\n",
"J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right|\n",
"$\n",
"\n",
"\n",
"**식 4-11: 라쏘 회귀의 서브그래디언트 벡터**\n",
"\n",
"$\n",
"g(\\mathbf{\\theta}, J) = \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) + \\alpha\n",
"\\begin{pmatrix}\n",
" \\operatorname{sign}(\\theta_1) \\\\\n",
" \\operatorname{sign}(\\theta_2) \\\\\n",
" \\vdots \\\\\n",
" \\operatorname{sign}(\\theta_n) \\\\\n",
"\\end{pmatrix} \\quad \\text{여기서 } \\operatorname{sign}(\\theta_i) =\n",
"\\begin{cases}\n",
"-1 & \\theta_i < 0 \\text{일 때 } \\\\\n",
"0 & \\theta_i = 0 \\text{일 때 } \\\\\n",
"+1 & \\theta_i > 0 \\text{일 때 }\n",
"\\end{cases}\n",
"$\n",
"\n",
"\n",
"**식 4-12: 엘라스틱넷 비용 함수**\n",
"\n",
"$\n",
"J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + r \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right| + \\dfrac{1 - r}{2} \\alpha \\sum\\limits_{i=1}^{n}{\\theta_i^2}\n",
"$\n",
"\n",
"\n",
"**식 4-13: 로지스틱 회귀 모델의 확률 추정(벡터 표현식)**\n",
"\n",
"$\n",
"\\hat{p} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\sigma(\\mathbf{\\theta}^T \\cdot \\mathbf{x})\n",
"$\n",
"\n",
"\n",
"**식 4-14: 로지스틱 함수**\n",
"\n",
"$\n",
"\\sigma(t) = \\dfrac{1}{1 + \\exp(-t)}\n",
"$\n",
"\n",
"\n",
"**식 4-15: 로지스틱 회귀 모델 예측**\n",
"\n",
"$\n",
"\\hat{y} =\n",
"\\begin{cases}\n",
" 0 & \\hat{p} < 0.5 \\text{일 때 } \\\\\n",
" 1 & \\hat{p} \\geq 0.5 \\text{일 때 } \n",
"\\end{cases}\n",
"$\n",
"\n",
"\n",
"**식 4-16: 하나의 훈련 샘플에 대한 비용 함수**\n",
"\n",
"$\n",
"c(\\mathbf{\\theta}) =\n",
"\\begin{cases}\n",
" -\\log(\\hat{p}) & y = 1 \\text{일 때 } \\\\\n",
" -\\log(1 - \\hat{p}) & y = 0 \\text{일 때 }\n",
"\\end{cases}\n",
"$\n",
"\n",
"\n",
"**식 4-17: 로지스틱 회귀의 비용 함수(로그 손실)**\n",
"\n",
"$\n",
"J(\\mathbf{\\theta}) = -\\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{\\left[ y^{(i)} log\\left(\\hat{p}^{(i)}\\right) + (1 - y^{(i)}) log\\left(1 - \\hat{p}^{(i)}\\right)\\right]}\n",
"$\n",
"\n",
"\n",
"**식 4-18: 로지스틱 비용 함수의 편도함수**\n",
"\n",
"$\n",
"\\dfrac{\\partial}{\\partial \\theta_j} \\text{J}(\\mathbf{\\theta}) = \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\left(\\mathbf{\\sigma(\\theta}^T \\cdot \\mathbf{x}^{(i)}) - y^{(i)}\\right)\\, x_j^{(i)}\n",
"$\n",
"\n",
"\n",
"**식 4-19: 클래스 k에 대한 소프트맥스 점수**\n",
"\n",
"$\n",
"s_k(\\mathbf{x}) = ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x}\n",
"$\n",
"\n",
"\n",
"**식 4-20: 소프트맥스 함수**\n",
"\n",
"$\n",
"\\hat{p}_k = \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\dfrac{\\exp\\left(s_k(\\mathbf{x})\\right)}{\\sum\\limits_{j=1}^{K}{\\exp\\left(s_j(\\mathbf{x})\\right)}}\n",
"$\n",
"\n",
"\n",
"**식 4-21: 소프트맥스 회귀 분류기의 예측**\n",
"\n",
"$\n",
"\\hat{y} = \\underset{k}{\\operatorname{argmax}} \\, \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\underset{k}{\\operatorname{argmax}} \\, s_k(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}} \\, \\left( ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x} \\right)\n",
"$\n",
"\n",
"\n",
"**식 4-22: 크로스 엔트로피 비용 함수**\n",
"\n",
"$\n",
"J(\\mathbf{\\Theta}) = - \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\sum\\limits_{k=1}^{K}{y_k^{(i)}\\log\\left(\\hat{p}_k^{(i)}\\right)}\n",
"$\n",
"\n",
"**두 확률 분포 $p$ 와 $q$ 사이의 크로스 엔트로피 (196 페이지):**\n",
"$ H(p, q) = -\\sum\\limits_{x}p(x) \\log q(x) $\n",
"\n",
"\n",
"**식 4-23: 클래스 k 에 대한 크로스 엔트로피의 그래디언트 벡터**\n",
"\n",
"$\n",
"\\nabla_{\\mathbf{\\theta}^{(k)}} \\, J(\\mathbf{\\Theta}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{ \\left ( \\hat{p}^{(i)}_k - y_k^{(i)} \\right ) \\mathbf{x}^{(i)}}\n",
"$\n"
]
},
{
"metadata": {
"id": "sFBoMnuzn1Yz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 5장\n",
"**식 5-1: 가우시안 RBF**\n",
"\n",
"$\n",
"{\\displaystyle \\phi_{\\gamma}(\\mathbf{x}, \\mathbf{\\ell})} = {\\displaystyle \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{x} - \\mathbf{\\ell} \\right\\|^2})}\n",
"$\n",
"\n",
"\n",
"**식 5-2: 선형 SVM 분류기의 예측**\n",
"\n",
"$\n",
"\\hat{y} = \\begin{cases}\n",
" 0 & \\mathbf{w}^T \\cdot \\mathbf{x} + b < 0 \\text{일 때 } \\\\\n",
" 1 & \\mathbf{w}^T \\cdot \\mathbf{x} + b \\geq 0 \\text{일 때 }\n",
"\\end{cases}\n",
"$\n",
"\n",
"\n",
"**식 5-3: 하드 마진 선형 SVM 분류기의 목적 함수**\n",
"\n",
"$\n",
"\\begin{split}\n",
"&\\underset{\\mathbf{w}, b}{\\operatorname{minimize}}\\,{\\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w}} \\\\\n",
"&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 5-4: 소프트 마진 선형 SVM 분류기의 목적 함수**\n",
"\n",
"$\n",
"\\begin{split}\n",
"&\\underset{\\mathbf{w}, b, \\mathbf{\\zeta}}{\\operatorname{minimize}}\\,{\\dfrac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} + C \\sum\\limits_{i=1}^m{\\zeta^{(i)}}}\\\\\n",
"&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1 - \\zeta^{(i)} \\text{ 이고} \\quad \\zeta^{(i)} \\ge 0\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 5-5: QP 문제**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\underset{\\mathbf{p}}{\\text{minimize}} \\, & \\dfrac{1}{2} \\mathbf{p}^T \\cdot \\mathbf{H} \\cdot \\mathbf{p} \\, + \\, \\mathbf{f}^T \\cdot \\mathbf{p} \\\\\n",
"[\\text{조건}] \\, & \\mathbf{A} \\cdot \\mathbf{p} \\le \\mathbf{b} \\\\\n",
"\\text{여기서 } &\n",
"\\begin{cases}\n",
" \\mathbf{p} \\, \\text{는 }n_p\\text{ 차원의 벡터 (} n_p = \\text{모델 파라미터 수)}\\\\\n",
" \\mathbf{H} \\, \\text{는 }n_p \\times n_p \\text{ 크기 행렬}\\\\\n",
" \\mathbf{f} \\, \\text{는 }n_p\\text{ 차원의 벡터}\\\\\n",
" \\mathbf{A} \\, \\text{는 } n_c \\times n_p \\text{ 크기 행렬 (}n_c = \\text{제약 수)}\\\\\n",
" \\mathbf{b} \\, \\text{는 }n_c\\text{ 차원의 벡터}\n",
"\\end{cases}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 5-6: 선형 SVM 목적 함수의 쌍대 형식**\n",
"\n",
"$\n",
"\\begin{split}\n",
"&\\underset{\\mathbf{\\alpha}}{\\operatorname{minimize}} \\,\n",
"\\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
" \\sum\\limits_{j=1}^{m}{\n",
" \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
" }\n",
"} \\, - \\, \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
"&\\text{[조건]}\\,i = 1, 2, \\dots, m \\text{일 때 } \\quad \\alpha^{(i)} \\ge 0\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 5-7: 쌍대 문제에서 구한 해로 원 문제의 해 계산하기**\n",
"\n",
"$\n",
"\\begin{split}\n",
"&\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
"&\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)})\\right)}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 5-8: 2차 다항식 매핑**\n",
"\n",
"$\n",
"\\phi\\left(\\mathbf{x}\\right) = \\phi\\left( \\begin{pmatrix}\n",
" x_1 \\\\\n",
" x_2\n",
"\\end{pmatrix} \\right) = \\begin{pmatrix}\n",
" {x_1}^2 \\\\\n",
" \\sqrt{2} \\, x_1 x_2 \\\\\n",
" {x_2}^2\n",
"\\end{pmatrix}\n",
"$\n",
"\n",
"\n",
"**식 5-9: 2차 다항식 매핑을 위한 커널 트릭**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\phi(\\mathbf{a})^T \\cdot \\phi(\\mathbf{b}) & \\quad = \\begin{pmatrix}\n",
" {a_1}^2 \\\\\n",
" \\sqrt{2} \\, a_1 a_2 \\\\\n",
" {a_2}^2\n",
" \\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
" {b_1}^2 \\\\\n",
" \\sqrt{2} \\, b_1 b_2 \\\\\n",
" {b_2}^2\n",
"\\end{pmatrix} = {a_1}^2 {b_1}^2 + 2 a_1 b_1 a_2 b_2 + {a_2}^2 {b_2}^2 \\\\\n",
" & \\quad = \\left( a_1 b_1 + a_2 b_2 \\right)^2 = \\left( \\begin{pmatrix}\n",
" a_1 \\\\\n",
" a_2\n",
"\\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
" b_1 \\\\\n",
" b_2\n",
" \\end{pmatrix} \\right)^2 = (\\mathbf{a}^T \\cdot \\mathbf{b})^2\n",
"\\end{split}\n",
"$\n",
"\n",
"**커널 트릭에 관한 본문 중에서 (220 페이지):**\n",
"[...] 변환된 벡터의 점곱을 간단하게 $ ({\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)})^2 $ 으로 바꿀 수 있습니다.\n",
"\n",
"\n",
"**식 5-10: 일반적인 커널**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\text{선형:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\mathbf{a}^T \\cdot \\mathbf{b} \\\\\n",
"\\text{다항식:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r \\right)^d \\\\\n",
"\\text{가우시안 RBF:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{a} - \\mathbf{b} \\right\\|^2}) \\\\\n",
"\\text{시그모이드:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\tanh\\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r\\right)\n",
"\\end{split}\n",
"$\n",
"\n",
"**식 5-11: 커널 SVM으로 예측하기**\n",
"\n",
"$\n",
"\\begin{split}\n",
"h_{\\hat{\\mathbf{w}}, \\hat{b}}\\left(\\phi(\\mathbf{x}^{(n)})\\right) & = \\,\\hat{\\mathbf{w}}^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b} = \\left(\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\phi(\\mathbf{x}^{(i)})\\right)^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b}\\\\\n",
" & = \\, \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\left(\\phi(\\mathbf{x}^{(i)})^T \\cdot \\phi(\\mathbf{x}^{(n)})\\right) + \\hat{b}\\\\\n",
" & = \\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} K(\\mathbf{x}^{(i)}, \\mathbf{x}^{(n)}) + \\hat{b}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 5-12: 커널 트릭을 사용한 편향 계산**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\hat{b} & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}{\\hat{\\mathbf{w}}}^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}{\n",
" \\left(\\sum_{j=1}^{m}{\\hat{\\alpha}}^{(j)}t^{(j)}\\phi(\\mathbf{x}^{(j)})\\right)\n",
" }^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)}\\\\\n",
" & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}\n",
"\\sum\\limits_{\\scriptstyle j=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(j)} > 0}}^{m}{\n",
" {\\hat{\\alpha}}^{(j)} t^{(j)} K(\\mathbf{x}^{(i)},\\mathbf{x}^{(j)})\n",
"}\n",
"\\right)}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 5-13: 선형 SVM 분류기 비용 함수**\n",
"\n",
"$\n",
"J(\\mathbf{w}, b) = \\dfrac{1}{2} \\mathbf{w}^T \\cdot \\mathbf{w} \\, + \\, C {\\displaystyle \\sum\\limits_{i=1}^{m}max\\left(0, 1 - t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\right)}\n",
"$\n",
"\n",
"\n"
]
},
{
"metadata": {
"id": "JyogtW6Jn1Y0",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 6장\n",
"**식 6-1: 지니 불순도**\n",
"\n",
"$\n",
"G_i = 1 - \\sum\\limits_{k=1}^{n}{{p_{i,k}}^2}\n",
"$\n",
"\n",
"\n",
"**식 6-2: 분류에 대한 CART 비용 함수**\n",
"\n",
"$\n",
"\\begin{split}\n",
"&J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}G_\\text{left} + \\dfrac{m_{\\text{right}}}{m}G_{\\text{right}}\\\\\n",
"&\\text{여기서 }\\begin{cases}\n",
"G_\\text{left/right} \\text{ 는 왼쪽/오른쪽 서브셋의 불순도}\\\\\n",
"m_\\text{left/right} \\text{ 는 왼쪽/오른쪽 서브셋의 불순도}\n",
"\\end{cases}\n",
"\\end{split}\n",
"$\n",
"\n",
"**엔트로피 계산 예 (232 페이지):**\n",
"\n",
"$ -\\frac{49}{54}\\log_2(\\frac{49}{54}) - \\frac{5}{54}\\log_2(\\frac{5}{54}) $\n",
"\n",
"\n",
"**식 6-3: 엔트로피**\n",
"\n",
"$\n",
"H_i = -\\sum\\limits_{k=1 \\atop p_{i,k} \\ne 0}^{n}{{p_{i,k}}\\log_2(p_{i,k})}\n",
"$\n",
"\n",
"\n",
"**식 6-4: 회귀를 위한 CART 비용 함수**\n",
"\n",
"$\n",
"J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}\\text{MSE}_\\text{left} + \\dfrac{m_{\\text{right}}}{m}\\text{MSE}_{\\text{right}} \\quad\n",
"\\text{여기서 }\n",
"\\begin{cases}\n",
"\\text{MSE}_{\\text{node}} = \\sum\\limits_{\\scriptstyle i \\in \\text{node}}(\\hat{y}_{\\text{node}} - y^{(i)})^2\\\\\n",
"\\hat{y}_\\text{node} = \\dfrac{1}{m_{\\text{node}}}\\sum\\limits_{\\scriptstyle i \\in \\text{node}}y^{(i)}\n",
"\\end{cases}\n",
"$\n"
]
},
{
"metadata": {
"id": "mCEpcobOn1Y0",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 7장\n",
"\n",
"**식 7-1: j번째 예측기의 가중치가 적용된 에러율**\n",
"\n",
"$\n",
"r_j = \\dfrac{\\displaystyle \\sum\\limits_{\\textstyle {i=1 \\atop \\hat{y}_j^{(i)} \\ne y^{(i)}}}^{m}{w^{(i)}}}{\\displaystyle \\sum\\limits_{i=1}^{m}{w^{(i)}}} \\quad\n",
"\\text{where }\\hat{y}_j^{(i)}\\text{ is the }j^{\\text{th}}\\text{ predictor's prediction for the }i^{\\text{th}}\\text{ instance.}\n",
"$\n",
"\n",
"**식 7-2: 예측기 가중치**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\alpha_j = \\eta \\log{\\dfrac{1 - r_j}{r_j}}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 7-3: 가중치 업데이트 규칙**\n",
"\n",
"$\n",
"\\begin{split}\n",
"& w^{(i)} \\leftarrow\n",
"\\begin{cases}\n",
"w^{(i)} & \\hat{y_j}^{(i)} = y^{(i)} \\text{ 일 때}\\\\\n",
"w^{(i)} \\exp(\\alpha_j) & \\hat{y_j}^{(i)} \\ne y^{(i)} \\text{ 일 때}\n",
"\\end{cases} \\\\\n",
"& \\text{여기서 } i = 1, 2, \\dots, m \\\\\n",
"\\end{split}\n",
"$\n",
"\n",
"**256 페이지 본문 중에서:**\n",
"\n",
"그런 다음 모든 샘플의 가중치를 정규화합니다(즉, $ \\sum_{i=1}^{m}{w^{(i)}} $으로 나눕니다).\n",
"\n",
"\n",
"**식 7-4: AdaBoost 예측**\n",
"\n",
"$\n",
"\\hat{y}(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}}{\\sum\\limits_{\\scriptstyle j=1 \\atop \\scriptstyle \\hat{y}_j(\\mathbf{x}) = k}^{N}{\\alpha_j}} \\quad \\text{여기서 }N\\text{은 예측기 수}\n",
"$\n",
"\n",
"\n"
]
},
{
"metadata": {
"id": "SFoGMOCsn1Y1",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 8장\n",
"\n",
"**식 8-1: 주성분 행렬**\n",
"\n",
"$\n",
"\\mathbf{V} =\n",
"\\begin{pmatrix}\n",
" \\mid & \\mid & & \\mid \\\\\n",
" \\mathbf{c_1} & \\mathbf{c_2} & \\cdots & \\mathbf{c_n} \\\\\n",
" \\mid & \\mid & & \\mid\n",
"\\end{pmatrix}\n",
"$\n",
"\n",
"\n",
"**식 8-2: 훈련 세트를 _d_차원으로 투영하기**\n",
"\n",
"$\n",
"\\mathbf{X}_{d\\text{-proj}} = \\mathbf{X} \\cdot \\mathbf{W}_d\n",
"$\n",
"\n",
"\n",
"**식 8-3: 원본의 차원 수로 되돌리는 PCA 역변환**\n",
"\n",
"$\n",
"\\mathbf{X}_{\\text{recovered}} = \\mathbf{X}_{d\\text{-proj}} \\cdot {\\mathbf{W}_d}^T\n",
"$\n",
"\n",
"\n",
"**식 8-4: LLE 단계 1: 선형적인 지역 관계 모델링**\n",
"\n",
"$\n",
"\\begin{split}\n",
"& \\hat{\\mathbf{W}} = \\underset{\\mathbf{W}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{x}^{(i)} - \\sum\\limits_{j=1}^{m}{w_{i,j}}\\mathbf{x}^{(j)}\\right\\|^2\\\\\n",
"& \\text{[조건] }\n",
"\\begin{cases}\n",
" w_{i,j}=0 & \\mathbf{x}^{(j)} \\text{가 } \\mathbf{x}^{(i)} \\text{의 최근접 이웃 개 중 하나가 아닐때}\\\\\n",
" \\sum\\limits_{j=1}^{m}w_{i,j} = 1 & i=1, 2, \\dots, m \\text{ 일 때}\n",
"\\end{cases}\n",
"\\end{split}\n",
"$\n",
"\n",
"**290 페이지 본문 중에서**\n",
"\n",
"[...] $\\mathbf{z}^{(i)}$와 $ \\sum_{j=1}^{m}{\\hat{w}_{i,j}\\mathbf{z}^{(j)}} $ 사이의 거리가 최소화되어야 합니다.\n",
"\n",
"\n",
"**식 8-5: LLE 단계 2: 관계를 보존하는 차원 축소**\n",
"\n",
"$\n",
"\\hat{\\mathbf{Z}} = \\underset{\\mathbf{Z}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{z}^{(i)} - \\sum\\limits_{j=1}^{m}{\\hat{w}_{i,j}}\\mathbf{z}^{(j)}\\right\\|^2\n",
"$\n"
]
},
{
"metadata": {
"id": "tzBsxT-tn1Y2",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 9장\n",
"\n",
"**식 9-1: ReLU 함수**\n",
"\n",
"$\n",
"h_{\\mathbf{w}, b}(\\mathbf{X}) = \\max(\\mathbf{X} \\cdot \\mathbf{w} + b, 0)\n",
"$"
]
},
{
"metadata": {
"id": "PCBRHj6wn1Y3",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# 10장\n",
"\n",
"**식 10-1: 퍼셉트론에서 일반적으로 사용하는 계단 함수**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\operatorname{heaviside}(z) =\n",
"\\begin{cases}\n",
"0 & z < 0 \\text{ 일 때}\\\\\n",
"1 & z \\ge 0 \\text{ 일 때}\n",
"\\end{cases} & \\quad\\quad\n",
"\\operatorname{sgn}(z) =\n",
"\\begin{cases}\n",
"-1 & z < 0 \\text{ 일 때}\\\\\n",
"0 & z = 0 \\text{ 일 때}\\\\\n",
"+1 & z > 0 \\text{ 일 때}\n",
"\\end{cases}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**식 10-2: 퍼셉트론 학습 규칙(가중치 업데이트)**\n",
"\n",
"$\n",
"{w_{i,j}}^{(\\text{다음 스텝})}\\quad = w_{i,j} + \\eta (y_j - \\hat{y}_j) x_i\n",
"$\n",
"\n",
"\n",
"**342 페이지 본문 중에서**\n",
"\n",
"이 행렬은 표준편차가 $ 2 / \\sqrt{\\text{n}_\\text{inputs} + \\text{n}_\\text{n_neurons}} $인 절단 정규(가우시안) 분포를 사용해 무작위로 초기화됩니다.\n"
]
},
{
"metadata": {
"id": "lZkG8wkrn1Y3",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Chapter 11\n",
"**Equation 11-1: Xavier initialization (when using the logistic activation function)**\n",
"\n",
"$\n",
"\\begin{split}\n",
"& \\text{Normal distribution with mean 0 and standard deviation }\n",
"\\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}}\\\\\n",
"& \\text{Or a uniform distribution between -r and +r, with }\n",
"r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}}\n",
"\\end{split}\n",
"$\n",
"\n",
"**In the text page 278:**\n",
"\n",
"When the number of input connections is roughly equal to the number of output\n",
"connections, you get simpler equations (e.g., $ \\sigma = 1 / \\sqrt{n_\\text{inputs}} $ or $ r = \\sqrt{3} / \\sqrt{n_\\text{inputs}} $).\n",
"\n",
"**Table 11-1: Initialization parameters for each type of activation function**\n",
"\n",
"* Logistic uniform: $ r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
"* Logistic normal: $ \\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
"* Hyperbolic tangent uniform: $ r = 4 \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
"* Hyperbolic tangent normal: $ \\sigma = 4 \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
"* ReLU (and its variants) uniform: $ r = \\sqrt{2} \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
"* ReLU (and its variants) normal: $ \\sigma = \\sqrt{2} \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
"\n",
"**Equation 11-2: ELU activation function**\n",
"\n",
"$\n",
"\\operatorname{ELU}_\\alpha(z) =\n",
"\\begin{cases}\n",
"\\alpha(\\exp(z) - 1) & \\text{if } z < 0\\\\\n",
"z & if z \\ge 0\n",
"\\end{cases}\n",
"$\n",
"\n",
"\n",
"**Equation 11-3: Batch Normalization algorithm**\n",
"\n",
"$\n",
"\\begin{split}\n",
"1.\\quad & \\mathbf{\\mu}_B = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{\\mathbf{x}^{(i)}}\\\\\n",
"2.\\quad & {\\mathbf{\\sigma}_B}^2 = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{(\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B)^2}\\\\\n",
"3.\\quad & \\hat{\\mathbf{x}}^{(i)} = \\dfrac{\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B}{\\sqrt{{\\mathbf{\\sigma}_B}^2 + \\epsilon}}\\\\\n",
"4.\\quad & \\mathbf{z}^{(i)} = \\gamma \\hat{\\mathbf{x}}^{(i)} + \\beta\n",
"\\end{split}\n",
"$\n",
"\n",
"**In the text page 285:**\n",
"\n",
"[...] given a new value $v$, the running average $v$ is updated through the equation:\n",
"\n",
"$ \\hat{v} \\gets \\hat{v} \\times \\text{momentum} + v \\times (1 - \\text{momentum}) $\n",
"\n",
"**Equation 11-4: Momentum algorithm**\n",
"\n",
"1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
"\n",
"**In the text page 296:**\n",
"\n",
"You can easily verify that if the gradient remains constant, the terminal velocity (i.e., the maximum size of the weight updates) is equal to that gradient multiplied by the learning rate η multiplied by $ \\frac{1}{1 - \\beta} $.\n",
"\n",
"\n",
"**Equation 11-5: Nesterov Accelerated Gradient algorithm**\n",
"\n",
"1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta} + \\beta \\mathbf{m})$\n",
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
"\n",
"**Equation 11-6: AdaGrad algorithm**\n",
"\n",
"1. $\\mathbf{s} \\gets \\mathbf{s} + \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
"\n",
"**In the text page 298-299:**\n",
"\n",
"This vectorized form is equivalent to computing $s_i \\gets s_i + \\left( \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\right)^2$ for each element $s_i$ of the vector $\\mathbf{s}$.\n",
"\n",
"**In the text page 299:**\n",
"\n",
"This vectorized form is equivalent to computing $ \\theta_i \\gets \\theta_i - \\eta \\, \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\dfrac{1}{\\sqrt{s_i + \\epsilon}} $ for all parameters $\\theta_i$ (simultaneously).\n",
"\n",
"\n",
"**Equation 11-7: RMSProp algorithm**\n",
"\n",
"1. $\\mathbf{s} \\gets \\beta \\mathbf{s} + (1 - \\beta ) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
"2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
"\n",
"\n",
"**Equation 11-8: Adam algorithm**\n",
"\n",
"1. $\\mathbf{m} \\gets \\beta_1 \\mathbf{m} - (1 - \\beta_1) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
"2. $\\mathbf{s} \\gets \\beta_2 \\mathbf{s} + (1 - \\beta_2) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
"3. $\\mathbf{m} \\gets \\left(\\dfrac{\\mathbf{m}}{1 - {\\beta_1}^T}\\right)$\n",
"4. $\\mathbf{s} \\gets \\left(\\dfrac{\\mathbf{s}}{1 - {\\beta_2}^T}\\right)$\n",
"5. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\eta \\, \\mathbf{m} \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
"\n",
"**In the text page 309:**\n",
"\n",
"We typically implement this constraint by computing $\\left\\| \\mathbf{w} \\right\\|_2$ after each training step\n",
"and clipping $\\mathbf{w}$ if needed $ \\left( \\mathbf{w} \\gets \\mathbf{w} \\dfrac{r}{\\left\\| \\mathbf{w} \\right\\|_2} \\right) $.\n",
"\n",
"\n"
]
},
{
"metadata": {
"id": "cFSDXAOzn1Y4",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Chapter 13\n",
"\n",
"**Equation 13-1: Computing the output of a neuron in a convolutional layer**\n",
"\n",
"$\n",
"z_{i,j,k} = b_k + \\sum\\limits_{u = 0}^{f_h - 1} \\, \\, \\sum\\limits_{v = 0}^{f_w - 1} \\, \\, \\sum\\limits_{k' = 0}^{f_{n'} - 1} \\, \\, x_{i', j', k'} . w_{u, v, k', k}\n",
"\\quad \\text{with }\n",
"\\begin{cases}\n",
"i' = i \\times s_h + u \\\\\n",
"j' = j \\times s_w + v\n",
"\\end{cases}\n",
"$\n",
"\n",
"**Equation 13-2: Local response normalization**\n",
"\n",
"$\n",
"b_i = a_i \\left(k + \\alpha \\sum\\limits_{j=j_\\text{low}}^{j_\\text{high}}{{a_j}^2} \\right)^{-\\beta} \\quad \\text{with }\n",
"\\begin{cases}\n",
" j_\\text{high} = \\min\\left(i + \\dfrac{r}{2}, f_n-1\\right) \\\\\n",
" j_\\text{low} = \\max\\left(0, i - \\dfrac{r}{2}\\right)\n",
"\\end{cases}\n",
"$\n",
"\n",
"\n"
]
},
{
"metadata": {
"id": "1SiAf4FTn1Y4",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Chapter 14\n",
"\n",
"**Equation 14-1: Output of a single recurrent neuron for a single instance**\n",
"\n",
"$\n",
"\\mathbf{y}_{(t)} = \\phi\\left({{\\mathbf{x}_{(t)}}^T \\cdot \\mathbf{w}_x} + {\\mathbf{y}_{(t-1)}}^T \\cdot {\\mathbf{w}_y} + b \\right)\n",
"$\n",
"\n",
"\n",
"**Equation 14-2: Outputs of a layer of recurrent neurons for all instances in a mini-batch**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\mathbf{Y}_{(t)} & = \\phi\\left(\\mathbf{X}_{(t)} \\cdot \\mathbf{W}_{x} + \\mathbf{Y}_{(t-1)}\\cdot \\mathbf{W}_{y} + \\mathbf{b} \\right) \\\\\n",
"& = \\phi\\left(\n",
"\\left[\\mathbf{X}_{(t)} \\quad \\mathbf{Y}_{(t-1)} \\right]\n",
" \\cdot \\mathbf{W} + \\mathbf{b} \\right) \\text{ with } \\mathbf{W}=\n",
"\\left[ \\begin{matrix}\n",
" \\mathbf{W}_x\\\\\n",
" \\mathbf{W}_y\n",
"\\end{matrix} \\right]\n",
"\\end{split}\n",
"$\n",
"\n",
"**In the text page 391:**\n",
"\n",
"Just like in regular backpropagation, there is a first forward pass through the unrolled network (represented by the dashed arrows); then the output sequence is evaluated using a cost function $ C(\\mathbf{Y}_{(t_\\text{min})}, \\mathbf{Y}_{(t_\\text{min}+1)}, \\dots, \\mathbf{Y}_{(t_\\text{max})}) $ (where $t_\\text{min}$ and $t_\\text{max}$ are the first and last output time steps, not counting the ignored outputs)[...]\n",
"\n",
"\n",
"**Equation 14-3: LSTM computations**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\mathbf{i}_{(t)}&=\\sigma({\\mathbf{W}_{xi}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hi}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_i)\\\\\n",
"\\mathbf{f}_{(t)}&=\\sigma({\\mathbf{W}_{xf}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hf}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_f)\\\\\n",
"\\mathbf{o}_{(t)}&=\\sigma({\\mathbf{W}_{xo}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{ho}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_o)\\\\\n",
"\\mathbf{g}_{(t)}&=\\operatorname{tanh}({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_g)\\\\\n",
"\\mathbf{c}_{(t)}&=\\mathbf{f}_{(t)} \\otimes \\mathbf{c}_{(t-1)} \\, + \\, \\mathbf{i}_{(t)} \\otimes \\mathbf{g}_{(t)}\\\\\n",
"\\mathbf{y}_{(t)}&=\\mathbf{h}_{(t)} = \\mathbf{o}_{(t)} \\otimes \\operatorname{tanh}(\\mathbf{c}_{(t)})\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**Equation 14-4: GRU computations**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\mathbf{z}_{(t)}&=\\sigma({\\mathbf{W}_{xz}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hz}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
"\\mathbf{r}_{(t)}&=\\sigma({\\mathbf{W}_{xr}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hr}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
"\\mathbf{g}_{(t)}&=\\operatorname{tanh}\\left({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot (\\mathbf{r}_{(t)} \\otimes \\mathbf{h}_{(t-1)})\\right) \\\\\n",
"\\mathbf{h}_{(t)}&=(1-\\mathbf{z}_{(t)}) \\otimes \\mathbf{h}_{(t-1)} + \\mathbf{z}_{(t)} \\otimes \\mathbf{g}_{(t)}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n"
]
},
{
"metadata": {
"id": "5IiIkIG_n1Y5",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Chapter 15\n",
"\n",
"**Equation 15-1: KullbackLeibler divergence**\n",
"\n",
"$\n",
"D_{\\mathrm{KL}}(P\\|Q) = \\sum\\limits_{i} P(i) \\log \\dfrac{P(i)}{Q(i)}\n",
"$\n",
"\n",
"\n",
"**Equation: KL divergence between the target sparsity _p_ and the actual sparsity _q_**\n",
"\n",
"$\n",
"D_{\\mathrm{KL}}(p\\|q) = p \\, \\log \\dfrac{p}{q} + (1-p) \\log \\dfrac{1-p}{1-q}\n",
"$\n",
"\n",
"**In the text page 433:**\n",
"\n",
"One common variant is to train the encoder to output $\\gamma = \\log\\left(\\sigma^2\\right)$ rather than $\\sigma$.\n",
"Wherever we need $\\sigma$ we can just compute $ \\sigma = \\exp\\left(\\dfrac{\\gamma}{2}\\right) $.\n",
"\n",
"\n"
]
},
{
"metadata": {
"id": "wVr9eBb1n1Y6",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Chapter 16\n",
"\n",
"**Equation 16-1: Bellman Optimality Equation**\n",
"\n",
"$\n",
"V^*(s) = \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V^*(s')]} \\quad \\text{for all }s\n",
"$\n",
"\n",
"**Equation 16-2: Value Iteration algorithm**\n",
"\n",
"$\n",
" V_{k+1}(s) \\gets \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V_k(s')]} \\quad \\text{for all }s\n",
"$\n",
"\n",
"\n",
"**Equation 16-3: Q-Value Iteration algorithm**\n",
"\n",
"$\n",
" Q_{k+1}(s, a) \\gets \\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . \\underset{a'}{\\max}\\,{Q_k(s',a')}]} \\quad \\text{for all } (s,a)\n",
"$\n",
"\n",
"**In the text page 458:**\n",
"\n",
"Once you have the optimal Q-Values, defining the optimal policy, noted $\\pi^{*}(s)$, is trivial: when the agent is in state $s$, it should choose the action with the highest Q-Value for that state: $ \\pi^{*}(s) = \\underset{a}{\\operatorname{argmax}} \\, Q^*(s, a) $.\n",
"\n",
"\n",
"**Equation 16-4: TD Learning algorithm**\n",
"\n",
"$\n",
"V_{k+1}(s) \\gets (1-\\alpha)V_k(s) + \\alpha\\left(r + \\gamma . V_k(s')\\right)\n",
"$\n",
"\n",
"\n",
"**Equation 16-5: Q-Learning algorithm**\n",
"\n",
"$\n",
"Q_{k+1}(s, a) \\gets (1-\\alpha)Q_k(s,a) + \\alpha\\left(r + \\gamma . \\underset{a'}{\\max} \\, Q_k(s', a')\\right)\n",
"$\n",
"\n",
"\n",
"**Equation 16-6: Q-Learning using an exploration function**\n",
"\n",
"$\n",
" Q(s, a) \\gets (1-\\alpha)Q(s,a) + \\alpha\\left(r + \\gamma . \\underset{\\alpha'}{\\max}f(Q(s', a'), N(s', a'))\\right)\n",
"$\n",
"\n",
"\n",
"**Equation 16-7: Deep Q-Learning cost function**\n",
"\n",
"$\n",
"\\begin{split}\n",
"& J(\\mathbf{\\theta}_\\text{critic}) = \\dfrac{1}{m}\\sum\\limits_{i=1}^m\\left(y^{(i)} - Q(s^{(i)},a^{(i)},\\mathbf{\\theta}_\\text{critic})\\right)^2 \\\\\n",
"& \\text{with } y^{(i)} = r^{(i)} + \\gamma . \\underset{a'}{\\max}Q(s'^{(i)},a',\\mathbf{\\theta}_\\text{actor})\n",
"\\end{split}\n",
"$\n"
]
},
{
"metadata": {
"id": "1U0nBdBvn1Y7",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Appendix A\n",
"\n",
"Equations that appear in the text:\n",
"\n",
"$\n",
"\\mathbf{H} =\n",
"\\begin{pmatrix}\n",
"\\mathbf{H'} & 0 & \\cdots\\\\\n",
"0 & 0 & \\\\\n",
"\\vdots & & \\ddots\n",
"\\end{pmatrix}\n",
"$\n",
"\n",
"\n",
"$\n",
"\\mathbf{A} =\n",
"\\begin{pmatrix}\n",
"\\mathbf{A'} & \\mathbf{I}_m \\\\\n",
"\\mathbf{0} & -\\mathbf{I}_m\n",
"\\end{pmatrix}\n",
"$\n",
"\n",
"\n",
"$ 1 - \\frac{1}{5}^2 - \\frac{4}{5}^2 $\n",
"\n",
"\n",
"$ 1 - \\frac{1}{2}^2 - \\frac{1}{2}^2 $\n",
"\n",
"\n",
"$ \\frac{2}{5} \\times $\n",
"\n",
"\n",
"$ \\frac{3}{5} \\times 0 $"
]
},
{
"metadata": {
"id": "dphwGCobn1Y7",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Appendix C"
]
},
{
"metadata": {
"id": "vH7f8-Min1Y7",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Equations that appear in the text:\n",
"\n",
"$ (\\hat{x}, \\hat{y}) $\n",
"\n",
"\n",
"$ \\hat{\\alpha} $\n",
"\n",
"\n",
"$ (\\hat{x}, \\hat{y}, \\hat{\\alpha}) $\n",
"\n",
"\n",
"$\n",
"\\begin{cases}\n",
"\\frac{\\partial}{\\partial x}g(x, y, \\alpha) = 2x - 3\\alpha\\\\\n",
"\\frac{\\partial}{\\partial y}g(x, y, \\alpha) = 2 - 2\\alpha\\\\\n",
"\\frac{\\partial}{\\partial \\alpha}g(x, y, \\alpha) = -3x - 2y - 1\\\\\n",
"\\end{cases}\n",
"$\n",
"\n",
"\n",
"$ 2\\hat{x} - 3\\hat{\\alpha} = 2 - 2\\hat{\\alpha} = -3\\hat{x} - 2\\hat{y} - 1 = 0 $\n",
"\n",
"\n",
"$ \\hat{x} = \\frac{3}{2} $\n",
"\n",
"\n",
"$ \\hat{y} = -\\frac{11}{4} $\n",
"\n",
"\n",
"$ \\hat{\\alpha} = 1 $\n",
"\n",
"\n",
"**Equation C-1: Generalized Lagrangian for the hard margin problem**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} - \\sum\\limits_{i=1}^{m}{\\alpha^{(i)} \\left(t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) - 1\\right)} \\\\\n",
"\\text{with}\\quad \\alpha^{(i)} \\ge 0 \\quad \\text{for }i = 1, 2, \\dots, m\n",
"\\end{split}\n",
"$\n",
"\n",
"**More equations in the text:**\n",
"\n",
"$ (\\hat{\\mathbf{w}}, \\hat{b}, \\hat{\\mathbf{\\alpha}}) $\n",
"\n",
"\n",
"$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) \\ge 1 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
"\n",
"\n",
"$ {\\hat{\\alpha}}^{(i)} \\ge 0 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
"\n",
"\n",
"$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
"\n",
"\n",
"$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) = 1 $\n",
"\n",
"\n",
"$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
"\n",
"\n",
"**Equation C-2: Partial derivatives of the generalized Lagrangian**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\nabla_{\\mathbf{w}}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\mathbf{w} - \\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
"\\dfrac{\\partial}{\\partial b}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = -\\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**Equation C-3: Properties of the stationary points**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
"\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} = 0\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**Equation C-4: Dual form of the SVM problem**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\mathcal{L}(\\hat{\\mathbf{w}}, \\hat{b}, \\mathbf{\\alpha}) = \\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
" \\sum\\limits_{j=1}^{m}{\n",
" \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
" }\n",
"} \\quad - \\quad \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
"\\text{with}\\quad \\alpha^{(i)} \\ge 0 \\quad \\text{for }i = 1, 2, \\dots, m\n",
"\\end{split}\n",
"$\n",
"\n",
"**Some more equations in the text:**\n",
"\n",
"$ \\hat{\\mathbf{\\alpha}} $\n",
"\n",
"\n",
"$ {\\hat{\\alpha}}^{(i)} \\ge 0 $\n",
"\n",
"\n",
"$ \\hat{\\mathbf{\\alpha}} $\n",
"\n",
"\n",
"$ \\hat{\\mathbf{w}} $\n",
"\n",
"\n",
"$ \\hat{b} $\n",
"\n",
"\n",
"$ \\hat{b} = 1 - t^{(k)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(k)}) $\n",
"\n",
"\n",
"**Equation C-5: Bias term estimation using the dual form**\n",
"\n",
"$\n",
"\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left[1 - t^{(i)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)})\\right]}\n",
"$"
]
},
{
"metadata": {
"id": "r57A4yyKn1Y8",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Appendix D"
]
},
{
"metadata": {
"id": "KwAsMn0Rn1Y9",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"**Equation D-1: Partial derivatives of $f(x,y)$**\n",
"\n",
"$\n",
"\\begin{split}\n",
"\\dfrac{\\partial f}{\\partial x} & = \\dfrac{\\partial(x^2y)}{\\partial x} + \\dfrac{\\partial y}{\\partial x} + \\dfrac{\\partial 2}{\\partial x} = y \\dfrac{\\partial(x^2)}{\\partial x} + 0 + 0 = 2xy \\\\\n",
"\\dfrac{\\partial f}{\\partial y} & = \\dfrac{\\partial(x^2y)}{\\partial y} + \\dfrac{\\partial y}{\\partial y} + \\dfrac{\\partial 2}{\\partial y} = x^2 + 1 + 0 = x^2 + 1 \\\\\n",
"\\end{split}\n",
"$\n",
"\n",
"**In the text:**\n",
"\n",
"$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1) = y $\n",
"\n",
"\n",
"$ \\frac{\\partial x}{\\partial x} = 1 $\n",
"\n",
"\n",
"$ \\frac{\\partial y}{\\partial x} = 0 $\n",
"\n",
"\n",
"$ \\frac{\\partial (u \\times v)}{\\partial x} = \\frac{\\partial v}{\\partial x} \\times u + \\frac{\\partial u}{\\partial x} \\times u $\n",
"\n",
"\n",
"$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1) $\n",
"\n",
"\n",
"$ \\frac{\\partial g}{\\partial x} = y $\n",
"\n",
"\n",
"**Equation D-2: Derivative of a function _h_(_x_) at point _x_~0~**\n",
"\n",
"$\n",
"\\begin{split}\n",
"h'(x) & = \\underset{\\textstyle x \\to x_0}{\\lim}\\dfrac{h(x) - h(x_0)}{x - x_0}\\\\\n",
" & = \\underset{\\textstyle \\epsilon \\to 0}{\\lim}\\dfrac{h(x_0 + \\epsilon) - h(x_0)}{\\epsilon}\n",
"\\end{split}\n",
"$\n",
"\n",
"\n",
"**Equation D-3: A few operations with dual numbers**\n",
"\n",
"$\n",
"\\begin{split}\n",
"&\\lambda(a + b\\epsilon) = \\lambda a + \\lambda b \\epsilon\\\\\n",
"&(a + b\\epsilon) + (c + d\\epsilon) = (a + c) + (b + d)\\epsilon \\\\\n",
"&(a + b\\epsilon) \\times (c + d\\epsilon) = ac + (ad + bc)\\epsilon + (bd)\\epsilon^2 = ac + (ad + bc)\\epsilon\\\\\n",
"\\end{split}\n",
"$\n",
"\n",
"**In the text:**\n",
"\n",
"$ \\frac{\\partial f}{\\partial x}(3, 4) $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial y}(3, 4) $\n",
"\n",
"\n",
"**Equation D-4: Chain rule**\n",
"\n",
"$\n",
"\\dfrac{\\partial f}{\\partial x} = \\dfrac{\\partial f}{\\partial n_i} \\times \\dfrac{\\partial n_i}{\\partial x}\n",
"$\n",
"\n",
"**In the text:**\n",
"\n",
"$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial n_5} = \\frac{\\partial f}{\\partial n_7} \\times \\frac{\\partial n_7}{\\partial n_5} $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
"\n",
"\n",
"$ \\frac{\\partial n_7}{\\partial n_5} $\n",
"\n",
"\n",
"$ \\frac{\\partial n_7}{\\partial n_5} = 1 $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial n_5} = 1 \\times 1 = 1 $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial n_4} = \\frac{\\partial f}{\\partial n_5} \\times \\frac{\\partial n_5}{\\partial n_4} $\n",
"\n",
"\n",
"$ \\frac{\\partial n_5}{\\partial n_4} = n_2 $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial n_4} = 1 \\times n_2 = 4 $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial x} = 24 $\n",
"\n",
"\n",
"$ \\frac{\\partial f}{\\partial y} = 10 $"
]
},
{
"metadata": {
"id": "Nn7LHmpCn1Y-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Appendix E"
]
},
{
"metadata": {
"id": "XOtBQA4-n1Y-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"**Equation E-1: Probability that the i^th^ neuron will output 1**\n",
"\n",
"$\n",
"p\\left(s_i^{(\\text{next step})} = 1\\right) \\, = \\, \\sigma\\left(\\frac{\\textstyle \\sum\\limits_{j = 1}^N{w_{i,j}s_j + b_i}}{\\textstyle T}\\right)\n",
"$\n",
"\n",
"**In the text:**\n",
"\n",
"$ \\dot{\\mathbf{x}} $\n",
"\n",
"\n",
"$ \\dot{\\mathbf{h}} $\n",
"\n",
"\n",
"**Equation E-2: Contrastive divergence weight update**\n",
"\n",
"$\n",
"w_{i,j}^{(\\text{next step})} = w_{i,j} + \\eta(\\mathbf{x}\\mathbf{h}^T - \\dot{\\mathbf{x}} \\dot {\\mathbf{h}}^T)\n",
"$"
]
},
{
"metadata": {
"id": "el71w1NKn1Y-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Glossary\n",
"\n",
"In the text:\n",
"\n",
"$\\ell _1$\n",
"\n",
"\n",
"$\\ell _2$\n",
"\n",
"\n",
"$\\ell _k$\n",
"\n",
"\n",
"$ \\chi^2 $\n"
]
},
{
"metadata": {
"id": "2Bo7NhHLn1ZA",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Just in case your eyes hurt after all these equations, let's finish with the single most beautiful equation in the world. No, it's not $E = mc²$, it's obviously Euler's identity:"
]
},
{
"metadata": {
"id": "s31L5v1mn1ZB",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"$e^{i\\pi}+1=0$"
]
},
{
"metadata": {
"id": "i3tXVx1zn1ZB",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
""
],
"execution_count": 0,
"outputs": []
}
]
}