From f76b4031597eb67ef1194a6f159dd7ffda081554 Mon Sep 17 00:00:00 2001
From: Haesun Park <haesunrpark@gmail.com>
Date: Thu, 26 Apr 2018 16:51:21 +0900
Subject: [PATCH] =?UTF-8?q?=EA=B3=B5=EC=8B=9D=20=EB=85=B8=ED=8A=B8?=
 =?UTF-8?q?=EB=B6=81,=20=ED=99=98=EA=B2=BD=20=ED=8C=8C=EC=9D=BC/=EB=A6=AC?=
 =?UTF-8?q?=EB=93=9C=EB=AF=B8=20=EC=88=98=EC=A0=95?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 README.md            |   14 +-
 book_equations.ipynb | 2855 +++++++++++++++++++++---------------------
 environment.yml      |    3 +-
 3 files changed, 1432 insertions(+), 1440 deletions(-)

diff --git a/README.md b/README.md
index 2d036ed..541cd90 100644
--- a/README.md
+++ b/README.md
@@ -17,13 +17,23 @@
 
 먼저 [git](https://git-scm.com/)이 설치되어 있지 않다면 이를 설치해야 합니다.
 
-그다음 터미널을 열고 다음 명령으로 이 레파지토리를 클론합니다(수정된 내용을 보관하고 싶다면 깃허브에서 포크한 레파지토리를 클론하는 것이 좋습니다):
+그다음 터미널을 열고 다음 명령으로 이 레파지토리를 클론합니다.
+
+>(옮긴이) 수정된 내용을 보관하고 싶다면 깃허브에서 포크한 레파지토리를 클론하는 것이 좋습니다
 
     $ cd $HOME  # 또는 적절한 다른 디렉토리
     $ git clone https://github.com/rickiepark/handson-ml.git
     $ cd handson-ml
 
-16장의 강화학습 예제를 위해서는 [OpenAI gym](https://gym.openai.com/docs)과 아타리 환경을 설치해야 합니다.
+16장의 강화학습 예제를 위해서는 [OpenAI 짐(gym)](https://gym.openai.com/docs)과 아타리 환경을 설치해야 합니다.
+
+>(옮긴이) 아나콘다가 설치되어 있다면 다음 명령을 사용하여 OpenAI 짐에 필요한 라이브러리를 먼저 시스템에 설치해야 합니다. 리눅스에서는 다음과 같습니다.
+>
+>$ sudo apt-get install -y cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev libboost-all-dev libsdl2-dev swig
+>
+>맥OS에서 명령은 다음과 같습니다.
+>
+>$ brew install cmake boost boost-python sdl2 swig wget
 
 파이썬을 잘 알고 파이썬 라이브러리를 설치하는 방법을 알고 있으면 바로 `requirements.txt`에 리스트된 라이브러리를 설치하고 [주피터 시작하기](#starting-jupyter) 섹션으로 가도 됩니다. 자세한 설치 방법이 필요하면 다음을 참고하세요.
 
diff --git a/book_equations.ipynb b/book_equations.ipynb
index f426a0c..765b5d6 100644
--- a/book_equations.ipynb
+++ b/book_equations.ipynb
@@ -1,1439 +1,1420 @@
 {
-  "nbformat": 4,
-  "nbformat_minor": 0,
-  "metadata": {
-    "colab": {
-      "name": "book_equations.ipynb",
-      "version": "0.3.2",
-      "provenance": []
-    },
-    "kernelspec": {
-      "display_name": "Python 2",
-      "language": "python",
-      "name": "python2"
-    }
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "ZICa1cn5n1Yv"
+   },
+   "source": [
+    "**수식**\n",
+    "\n",
+    "*이 노트북은 책에 있는 모든 공식을 모아 놓은 것입니다.*\n",
+    "\n",
+    "**주의**: 깃허브의 노트북 뷰어는 적절하게 수식을 표현하지 못합니다. 로컬에서 주피터를 실행하여 이 노트북을 보거나 [nbviewer](http://nbviewer.jupyter.org/github/rickiepark/handson-ml/blob/master/book_equations.ipynb)를 사용하세요."
+   ]
   },
-  "cells": [
-    {
-      "metadata": {
-        "id": "ZICa1cn5n1Yv",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "**수식**\n",
-        "\n",
-        "*이 노트북은 책에 있는 모든 공식을 모아 놓은 것입니다.*\n",
-        "\n",
-        "**주의**: 깃허브의 노트북 뷰어는 적절하게 수식을 표현하지 못합니다. 로컬에서 주피터를 실행하여 이 노트북을 보거나 [nbviewer](http://nbviewer.jupyter.org/github/rickiepark/handson-ml/blob/master/book_equations.ipynb)를 사용하세요."
-      ]
-    },
-    {
-      "metadata": {
-        "id": "A_lqPMDMn1Yx",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 1장\n",
-        "**식 1-1: 간단한 선형 모델**\n",
-        "\n",
-        "$\n",
-        "\\text{삶의_만족도} = \\theta_0 + \\theta_1 \\times \\text{1인당_GDP}\n",
-        "$"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "-ae0t1eRn1Yx",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 2장\n",
-        "**식 2-1: 평균 제곱근 오차 (RMSE)**\n",
-        "\n",
-        "$\n",
-        "\\text{RMSE}(\\mathbf{X}, h) = \\sqrt{\\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left(h(\\mathbf{x}^{(i)}) - y^{(i)}\\right)^2}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**표기법 (72 페이지):**\n",
-        "\n",
-        "$\n",
-        "  \\mathbf{x}^{(1)} = \\begin{pmatrix}\n",
-        "  -118.29 \\\\\n",
-        "  33.91 \\\\\n",
-        "  1,416 \\\\\n",
-        "  38,372\n",
-        "  \\end{pmatrix}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "$\n",
-        "  y^{(1)}=156,400\n",
-        "$\n",
-        "\n",
-        "\n",
-        "$\n",
-        "  \\mathbf{X} = \\begin{pmatrix}\n",
-        "  (\\mathbf{x}^{(1)})^T \\\\\n",
-        "  (\\mathbf{x}^{(2)})^T\\\\\n",
-        "  \\vdots \\\\\n",
-        "  (\\mathbf{x}^{(1999)})^T \\\\\n",
-        "  (\\mathbf{x}^{(2000)})^T\n",
-        "  \\end{pmatrix} = \\begin{pmatrix}\n",
-        "  -118.29 & 33.91 & 1,416 & 38,372 \\\\\n",
-        "  \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
-        "  \\end{pmatrix}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 2-2: 평균 절대 오차**\n",
-        "\n",
-        "$\n",
-        "\\text{MAE}(\\mathbf{X}, h) = \\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left| h(\\mathbf{x}^{(i)}) - y^{(i)} \\right|\n",
-        "$\n",
-        "\n",
-        "**$\\ell_k$ 노름 (74 페이지):**\n",
-        "\n",
-        "$ \\left\\| \\mathbf{v} \\right\\| _k = (\\left| v_0 \\right|^k + \\left| v_1 \\right|^k + \\dots + \\left| v_n \\right|^k)^{\\frac{1}{k}} $\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "8fBBfiofn1Yy",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 3장\n",
-        "**식 3-1: 정밀도**\n",
-        "\n",
-        "$\n",
-        "\\text{정밀도} = \\cfrac{TP}{TP + FP}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 3-2: 재현율**\n",
-        "\n",
-        "$\n",
-        "\\text{재현율} = \\cfrac{TP}{TP + FN}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 3-3: $F_1$ 점수**\n",
-        "\n",
-        "$\n",
-        "F_1 = \\cfrac{2}{\\cfrac{1}{\\text{정밀도}} + \\cfrac{1}{\\text{재현율}}} = 2 \\times \\cfrac{\\text{정밀도}\\, \\times \\, \\text{재현율}}{\\text{정밀도}\\, + \\, \\text{재현율}} = \\cfrac{TP}{TP + \\cfrac{FN + FP}{2}}\n",
-        "$\n",
-        "\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "YTYQZMTjn1Yy",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 4장\n",
-        "**식 4-1: 선형 회귀 모델의 예측**\n",
-        "\n",
-        "$\n",
-        "\\hat{y} = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\dots + \\theta_n x_n\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-2: 선형 회귀 모델의 예측 (벡터 형태)**\n",
-        "\n",
-        "$\n",
-        "\\hat{y} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\mathbf{\\theta}^T \\cdot \\mathbf{x}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-3: 선형 회귀 모델의 MSE 비용 함수**\n",
-        "\n",
-        "$\n",
-        "\\text{MSE}(\\mathbf{X}, h_{\\mathbf{\\theta}}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})^2}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-4: 정규 방정식**\n",
-        "\n",
-        "$\n",
-        "\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "** 편도함수 기호 (165 페이지):**\n",
-        "\n",
-        "$\\frac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta})$\n",
-        "\n",
-        "\n",
-        "**식 4-5: 비용 함수의 편도함수**\n",
-        "\n",
-        "$\n",
-        "\\dfrac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta}) = \\dfrac{2}{m}\\sum\\limits_{i=1}^{m}(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})\\, x_j^{(i)}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-6: 비용 함수의 그래디언트 벡터**\n",
-        "\n",
-        "$\n",
-        "\\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) =\n",
-        "\\begin{pmatrix}\n",
-        " \\frac{\\partial}{\\partial \\theta_0} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
-        " \\frac{\\partial}{\\partial \\theta_1} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
-        " \\vdots \\\\\n",
-        " \\frac{\\partial}{\\partial \\theta_n} \\text{MSE}(\\mathbf{\\theta})\n",
-        "\\end{pmatrix}\n",
-        " = \\dfrac{2}{m} \\mathbf{X}^T \\cdot (\\mathbf{X} \\cdot \\mathbf{\\theta} - \\mathbf{y})\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-7: 경사 하강법의 스텝**\n",
-        "\n",
-        "$\n",
-        "\\mathbf{\\theta}^{(\\text{다음 스텝})}\\,\\,\\, = \\mathbf{\\theta} - \\eta \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta})\n",
-        "$\n",
-        "\n",
-        "\n",
-        "$ O(\\frac{1}{\\epsilon}) $\n",
-        "\n",
-        "\n",
-        "$ \\hat{y} = 0.56 x_1^2 + 0.93 x_1 + 1.78 $\n",
-        "\n",
-        "\n",
-        "$ y = 0.5 x_1^2 + 1.0 x_1 + 2.0 + \\text{가우시안 잡음} $\n",
-        "\n",
-        "\n",
-        "$ \\dfrac{(n+d)!}{d!\\,n!} $\n",
-        "\n",
-        "\n",
-        "$ \\alpha \\sum_{i=1}^{n}{\\theta_i^2}$\n",
-        "\n",
-        "\n",
-        "**식 4-8: 릿지 회귀의 비용 함수**\n",
-        "\n",
-        "$\n",
-        "J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\dfrac{1}{2}\\sum\\limits_{i=1}^{n}\\theta_i^2\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-9: 릿지 회귀의 정규 방정식**\n",
-        "\n",
-        "$\n",
-        "\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X} + \\alpha \\mathbf{A})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-10: 라쏘 회귀의 비용 함수**\n",
-        "\n",
-        "$\n",
-        "J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right|\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-11: 라쏘 회귀의 서브그래디언트 벡터**\n",
-        "\n",
-        "$\n",
-        "g(\\mathbf{\\theta}, J) = \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) + \\alpha\n",
-        "\\begin{pmatrix}\n",
-        "  \\operatorname{sign}(\\theta_1) \\\\\n",
-        "  \\operatorname{sign}(\\theta_2) \\\\\n",
-        "  \\vdots \\\\\n",
-        "  \\operatorname{sign}(\\theta_n) \\\\\n",
-        "\\end{pmatrix} \\quad \\text{여기서 } \\operatorname{sign}(\\theta_i) =\n",
-        "\\begin{cases}\n",
-        "-1 & \\theta_i < 0 \\text{일 때 } \\\\\n",
-        "0 & \\theta_i = 0 \\text{일 때 } \\\\\n",
-        "+1 & \\theta_i > 0 \\text{일 때 }\n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-12: 엘라스틱넷 비용 함수**\n",
-        "\n",
-        "$\n",
-        "J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + r \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right| + \\dfrac{1 - r}{2} \\alpha \\sum\\limits_{i=1}^{n}{\\theta_i^2}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-13: 로지스틱 회귀 모델의 확률 추정(벡터 표현식)**\n",
-        "\n",
-        "$\n",
-        "\\hat{p} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\sigma(\\mathbf{\\theta}^T \\cdot \\mathbf{x})\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-14: 로지스틱 함수**\n",
-        "\n",
-        "$\n",
-        "\\sigma(t) = \\dfrac{1}{1 + \\exp(-t)}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-15: 로지스틱 회귀 모델 예측**\n",
-        "\n",
-        "$\n",
-        "\\hat{y} =\n",
-        "\\begin{cases}\n",
-        "  0 & \\hat{p} < 0.5 \\text{일 때 } \\\\\n",
-        "  1 & \\hat{p} \\geq 0.5 \\text{일 때 } \n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-16: 하나의 훈련 샘플에 대한 비용 함수**\n",
-        "\n",
-        "$\n",
-        "c(\\mathbf{\\theta}) =\n",
-        "\\begin{cases}\n",
-        "  -\\log(\\hat{p}) & y = 1 \\text{일 때 } \\\\\n",
-        "  -\\log(1 - \\hat{p}) & y = 0 \\text{일 때 }\n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-17: 로지스틱 회귀의 비용 함수(로그 손실)**\n",
-        "\n",
-        "$\n",
-        "J(\\mathbf{\\theta}) = -\\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{\\left[ y^{(i)} log\\left(\\hat{p}^{(i)}\\right) + (1 - y^{(i)}) log\\left(1 - \\hat{p}^{(i)}\\right)\\right]}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-18: 로지스틱 비용 함수의 편도함수**\n",
-        "\n",
-        "$\n",
-        "\\dfrac{\\partial}{\\partial \\theta_j} \\text{J}(\\mathbf{\\theta}) = \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\left(\\mathbf{\\sigma(\\theta}^T \\cdot \\mathbf{x}^{(i)}) - y^{(i)}\\right)\\, x_j^{(i)}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-19: 클래스 k에 대한 소프트맥스 점수**\n",
-        "\n",
-        "$\n",
-        "s_k(\\mathbf{x}) = ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-20: 소프트맥스 함수**\n",
-        "\n",
-        "$\n",
-        "\\hat{p}_k = \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\dfrac{\\exp\\left(s_k(\\mathbf{x})\\right)}{\\sum\\limits_{j=1}^{K}{\\exp\\left(s_j(\\mathbf{x})\\right)}}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-21: 소프트맥스 회귀 분류기의 예측**\n",
-        "\n",
-        "$\n",
-        "\\hat{y} = \\underset{k}{\\operatorname{argmax}} \\, \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\underset{k}{\\operatorname{argmax}} \\, s_k(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}} \\, \\left( ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x} \\right)\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 4-22: 크로스 엔트로피 비용 함수**\n",
-        "\n",
-        "$\n",
-        "J(\\mathbf{\\Theta}) = - \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\sum\\limits_{k=1}^{K}{y_k^{(i)}\\log\\left(\\hat{p}_k^{(i)}\\right)}\n",
-        "$\n",
-        "\n",
-        "**두 확률 분포 $p$ 와 $q$ 사이의 크로스 엔트로피 (196 페이지):**\n",
-        "$ H(p, q) = -\\sum\\limits_{x}p(x) \\log q(x) $\n",
-        "\n",
-        "\n",
-        "**식 4-23: 클래스 k 에 대한 크로스 엔트로피의 그래디언트 벡터**\n",
-        "\n",
-        "$\n",
-        "\\nabla_{\\mathbf{\\theta}^{(k)}} \\, J(\\mathbf{\\Theta}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{ \\left ( \\hat{p}^{(i)}_k - y_k^{(i)} \\right ) \\mathbf{x}^{(i)}}\n",
-        "$\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "sFBoMnuzn1Yz",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 5장\n",
-        "**식 5-1: 가우시안 RBF**\n",
-        "\n",
-        "$\n",
-        "{\\displaystyle \\phi_{\\gamma}(\\mathbf{x}, \\mathbf{\\ell})} = {\\displaystyle \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{x} - \\mathbf{\\ell} \\right\\|^2})}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-2: 선형 SVM 분류기의 예측**\n",
-        "\n",
-        "$\n",
-        "\\hat{y} = \\begin{cases}\n",
-        " 0 & \\mathbf{w}^T \\cdot \\mathbf{x} + b < 0 \\text{일 때 } \\\\\n",
-        " 1 & \\mathbf{w}^T \\cdot \\mathbf{x} + b \\geq 0 \\text{일 때 }\n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-3: 하드 마진 선형 SVM 분류기의 목적 함수**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "&\\underset{\\mathbf{w}, b}{\\operatorname{minimize}}\\,{\\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w}} \\\\\n",
-        "&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-4: 소프트 마진 선형 SVM 분류기의 목적 함수**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "&\\underset{\\mathbf{w}, b, \\mathbf{\\zeta}}{\\operatorname{minimize}}\\,{\\dfrac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} + C \\sum\\limits_{i=1}^m{\\zeta^{(i)}}}\\\\\n",
-        "&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1 - \\zeta^{(i)} \\text{ 이고} \\quad \\zeta^{(i)} \\ge 0\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-5: QP 문제**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\underset{\\mathbf{p}}{\\text{minimize}} \\, & \\dfrac{1}{2} \\mathbf{p}^T \\cdot \\mathbf{H} \\cdot \\mathbf{p} \\, + \\, \\mathbf{f}^T \\cdot \\mathbf{p}  \\\\\n",
-        "[\\text{조건}] \\, & \\mathbf{A} \\cdot \\mathbf{p} \\le \\mathbf{b} \\\\\n",
-        "\\text{여기서 } &\n",
-        "\\begin{cases}\n",
-        "  \\mathbf{p} \\, \\text{는 }n_p\\text{ 차원의 벡터 (} n_p = \\text{모델 파라미터 수)}\\\\\n",
-        "  \\mathbf{H} \\, \\text{는 }n_p \\times n_p \\text{ 크기 행렬}\\\\\n",
-        "  \\mathbf{f} \\, \\text{는 }n_p\\text{ 차원의 벡터}\\\\\n",
-        "  \\mathbf{A} \\, \\text{는 } n_c \\times n_p \\text{ 크기 행렬 (}n_c = \\text{제약 수)}\\\\\n",
-        "  \\mathbf{b} \\, \\text{는 }n_c\\text{ 차원의 벡터}\n",
-        "\\end{cases}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-6: 선형 SVM 목적 함수의 쌍대 형식**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "&\\underset{\\mathbf{\\alpha}}{\\operatorname{minimize}} \\,\n",
-        "\\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
-        "  \\sum\\limits_{j=1}^{m}{\n",
-        "  \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
-        "  }\n",
-        "} \\, - \\, \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
-        "&\\text{[조건]}\\,i = 1, 2, \\dots, m \\text{일 때 } \\quad \\alpha^{(i)} \\ge 0\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-7: 쌍대 문제에서 구한 해로 원 문제의 해 계산하기**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "&\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
-        "&\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)})\\right)}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-8: 2차 다항식 매핑**\n",
-        "\n",
-        "$\n",
-        "\\phi\\left(\\mathbf{x}\\right) = \\phi\\left( \\begin{pmatrix}\n",
-        "  x_1 \\\\\n",
-        "  x_2\n",
-        "\\end{pmatrix} \\right) = \\begin{pmatrix}\n",
-        "  {x_1}^2 \\\\\n",
-        "  \\sqrt{2} \\, x_1 x_2 \\\\\n",
-        "  {x_2}^2\n",
-        "\\end{pmatrix}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-9: 2차 다항식 매핑을 위한 커널 트릭**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\phi(\\mathbf{a})^T \\cdot \\phi(\\mathbf{b}) & \\quad = \\begin{pmatrix}\n",
-        "  {a_1}^2 \\\\\n",
-        "  \\sqrt{2} \\, a_1 a_2 \\\\\n",
-        "  {a_2}^2\n",
-        "  \\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
-        "  {b_1}^2 \\\\\n",
-        "  \\sqrt{2} \\, b_1 b_2 \\\\\n",
-        "  {b_2}^2\n",
-        "\\end{pmatrix} = {a_1}^2 {b_1}^2 + 2 a_1 b_1 a_2 b_2 + {a_2}^2 {b_2}^2 \\\\\n",
-        " & \\quad = \\left( a_1 b_1 + a_2 b_2 \\right)^2 = \\left( \\begin{pmatrix}\n",
-        "  a_1 \\\\\n",
-        "  a_2\n",
-        "\\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
-        "    b_1 \\\\\n",
-        "    b_2\n",
-        "  \\end{pmatrix} \\right)^2 = (\\mathbf{a}^T \\cdot \\mathbf{b})^2\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**커널 트릭에 관한 본문 중에서 (220 페이지):**\n",
-        "[...] 변환된 벡터의 점곱을 간단하게 $ ({\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)})^2 $ 으로 바꿀 수 있습니다.\n",
-        "\n",
-        "\n",
-        "**식 5-10: 일반적인 커널**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\text{선형:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\mathbf{a}^T \\cdot \\mathbf{b} \\\\\n",
-        "\\text{다항식:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r \\right)^d \\\\\n",
-        "\\text{가우시안 RBF:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{a} - \\mathbf{b} \\right\\|^2}) \\\\\n",
-        "\\text{시그모이드:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\tanh\\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r\\right)\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**식 5-11: 커널 SVM으로 예측하기**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "h_{\\hat{\\mathbf{w}}, \\hat{b}}\\left(\\phi(\\mathbf{x}^{(n)})\\right) & = \\,\\hat{\\mathbf{w}}^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b} = \\left(\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\phi(\\mathbf{x}^{(i)})\\right)^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b}\\\\\n",
-        " & = \\, \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\left(\\phi(\\mathbf{x}^{(i)})^T \\cdot \\phi(\\mathbf{x}^{(n)})\\right)  + \\hat{b}\\\\\n",
-        " & = \\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} K(\\mathbf{x}^{(i)}, \\mathbf{x}^{(n)}) + \\hat{b}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-12: 커널 트릭을 사용한 편향 계산**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\hat{b} & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}{\\hat{\\mathbf{w}}}^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}{\n",
-        " \\left(\\sum_{j=1}^{m}{\\hat{\\alpha}}^{(j)}t^{(j)}\\phi(\\mathbf{x}^{(j)})\\right)\n",
-        " }^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)}\\\\\n",
-        " & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}\n",
-        "\\sum\\limits_{\\scriptstyle j=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(j)} > 0}}^{m}{\n",
-        "  {\\hat{\\alpha}}^{(j)} t^{(j)} K(\\mathbf{x}^{(i)},\\mathbf{x}^{(j)})\n",
-        "}\n",
-        "\\right)}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 5-13: 선형 SVM 분류기 비용 함수**\n",
-        "\n",
-        "$\n",
-        "J(\\mathbf{w}, b) = \\dfrac{1}{2} \\mathbf{w}^T \\cdot \\mathbf{w} \\, + \\, C {\\displaystyle \\sum\\limits_{i=1}^{m}max\\left(0, 1 - t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\right)}\n",
-        "$\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "JyogtW6Jn1Y0",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 6장\n",
-        "**식 6-1: 지니 불순도**\n",
-        "\n",
-        "$\n",
-        "G_i = 1 - \\sum\\limits_{k=1}^{n}{{p_{i,k}}^2}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 6-2: 분류에 대한 CART 비용 함수**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "&J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}G_\\text{left} + \\dfrac{m_{\\text{right}}}{m}G_{\\text{right}}\\\\\n",
-        "&\\text{여기서 }\\begin{cases}\n",
-        "G_\\text{left/right} \\text{ 는 왼쪽/오른쪽 서브셋의 불순도}\\\\\n",
-        "m_\\text{left/right} \\text{ 는 왼쪽/오른쪽 서브셋의 불순도}\n",
-        "\\end{cases}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**엔트로피 계산 예 (232 페이지):**\n",
-        "\n",
-        "$ -\\frac{49}{54}\\log_2(\\frac{49}{54}) - \\frac{5}{54}\\log_2(\\frac{5}{54}) $\n",
-        "\n",
-        "\n",
-        "**식 6-3: 엔트로피**\n",
-        "\n",
-        "$\n",
-        "H_i = -\\sum\\limits_{k=1 \\atop p_{i,k} \\ne 0}^{n}{{p_{i,k}}\\log_2(p_{i,k})}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 6-4: 회귀를 위한 CART 비용 함수**\n",
-        "\n",
-        "$\n",
-        "J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}\\text{MSE}_\\text{left} + \\dfrac{m_{\\text{right}}}{m}\\text{MSE}_{\\text{right}} \\quad\n",
-        "\\text{여기서 }\n",
-        "\\begin{cases}\n",
-        "\\text{MSE}_{\\text{node}} = \\sum\\limits_{\\scriptstyle i \\in \\text{node}}(\\hat{y}_{\\text{node}} - y^{(i)})^2\\\\\n",
-        "\\hat{y}_\\text{node} = \\dfrac{1}{m_{\\text{node}}}\\sum\\limits_{\\scriptstyle i \\in \\text{node}}y^{(i)}\n",
-        "\\end{cases}\n",
-        "$\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "mCEpcobOn1Y0",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 7장\n",
-        "\n",
-        "**식 7-1: j번째 예측기의 가중치가 적용된 에러율**\n",
-        "\n",
-        "$\n",
-        "r_j = \\dfrac{\\displaystyle \\sum\\limits_{\\textstyle {i=1 \\atop \\hat{y}_j^{(i)} \\ne y^{(i)}}}^{m}{w^{(i)}}}{\\displaystyle \\sum\\limits_{i=1}^{m}{w^{(i)}}} \\quad\n",
-        "\\text{where }\\hat{y}_j^{(i)}\\text{ is the }j^{\\text{th}}\\text{ predictor's prediction for the }i^{\\text{th}}\\text{ instance.}\n",
-        "$\n",
-        "\n",
-        "**식 7-2: 예측기 가중치**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\alpha_j = \\eta \\log{\\dfrac{1 - r_j}{r_j}}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 7-3: 가중치 업데이트 규칙**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "& w^{(i)} \\leftarrow\n",
-        "\\begin{cases}\n",
-        "w^{(i)} & \\hat{y_j}^{(i)} = y^{(i)} \\text{ 일 때}\\\\\n",
-        "w^{(i)} \\exp(\\alpha_j) & \\hat{y_j}^{(i)} \\ne y^{(i)} \\text{ 일 때}\n",
-        "\\end{cases} \\\\\n",
-        "& \\text{여기서 } i = 1, 2, \\dots, m \\\\\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**256 페이지 본문 중에서:**\n",
-        "\n",
-        "그런 다음 모든 샘플의 가중치를 정규화합니다(즉, $ \\sum_{i=1}^{m}{w^{(i)}} $으로 나눕니다).\n",
-        "\n",
-        "\n",
-        "**식 7-4: AdaBoost 예측**\n",
-        "\n",
-        "$\n",
-        "\\hat{y}(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}}{\\sum\\limits_{\\scriptstyle j=1 \\atop \\scriptstyle \\hat{y}_j(\\mathbf{x}) = k}^{N}{\\alpha_j}} \\quad \\text{여기서 }N\\text{은 예측기 수}\n",
-        "$\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "SFoGMOCsn1Y1",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 8장\n",
-        "\n",
-        "**식 8-1: 주성분 행렬**\n",
-        "\n",
-        "$\n",
-        "\\mathbf{V} =\n",
-        "\\begin{pmatrix}\n",
-        "  \\mid & \\mid & & \\mid \\\\\n",
-        "  \\mathbf{c_1} & \\mathbf{c_2} & \\cdots & \\mathbf{c_n} \\\\\n",
-        "  \\mid & \\mid & & \\mid\n",
-        "\\end{pmatrix}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 8-2: 훈련 세트를 _d_차원으로 투영하기**\n",
-        "\n",
-        "$\n",
-        "\\mathbf{X}_{d\\text{-proj}} = \\mathbf{X} \\cdot \\mathbf{W}_d\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 8-3: 원본의 차원 수로 되돌리는 PCA 역변환**\n",
-        "\n",
-        "$\n",
-        "\\mathbf{X}_{\\text{recovered}} = \\mathbf{X}_{d\\text{-proj}} \\cdot {\\mathbf{W}_d}^T\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 8-4: LLE 단계 1: 선형적인 지역 관계 모델링**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "& \\hat{\\mathbf{W}} = \\underset{\\mathbf{W}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{x}^{(i)} - \\sum\\limits_{j=1}^{m}{w_{i,j}}\\mathbf{x}^{(j)}\\right\\|^2\\\\\n",
-        "& \\text{[조건] }\n",
-        "\\begin{cases}\n",
-        "  w_{i,j}=0 & \\mathbf{x}^{(j)} \\text{가 } \\mathbf{x}^{(i)} \\text{의 최근접 이웃 개 중 하나가 아닐때}\\\\\n",
-        "  \\sum\\limits_{j=1}^{m}w_{i,j} = 1 & i=1, 2, \\dots, m \\text{ 일 때}\n",
-        "\\end{cases}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**290 페이지 본문 중에서**\n",
-        "\n",
-        "[...] $\\mathbf{z}^{(i)}$와 $ \\sum_{j=1}^{m}{\\hat{w}_{i,j}\\mathbf{z}^{(j)}} $ 사이의 거리가 최소화되어야 합니다.\n",
-        "\n",
-        "\n",
-        "**식 8-5: LLE 단계 2: 관계를 보존하는 차원 축소**\n",
-        "\n",
-        "$\n",
-        "\\hat{\\mathbf{Z}} = \\underset{\\mathbf{Z}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{z}^{(i)} - \\sum\\limits_{j=1}^{m}{\\hat{w}_{i,j}}\\mathbf{z}^{(j)}\\right\\|^2\n",
-        "$\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "tzBsxT-tn1Y2",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 9장\n",
-        "\n",
-        "**식 9-1: ReLU 함수**\n",
-        "\n",
-        "$\n",
-        "h_{\\mathbf{w}, b}(\\mathbf{X}) = \\max(\\mathbf{X} \\cdot \\mathbf{w} + b, 0)\n",
-        "$"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "PCBRHj6wn1Y3",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# 10장\n",
-        "\n",
-        "**식 10-1: 퍼셉트론에서 일반적으로 사용하는 계단 함수**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\operatorname{heaviside}(z) =\n",
-        "\\begin{cases}\n",
-        "0 & z < 0 \\text{ 일 때}\\\\\n",
-        "1 & z \\ge 0 \\text{ 일 때}\n",
-        "\\end{cases} & \\quad\\quad\n",
-        "\\operatorname{sgn}(z) =\n",
-        "\\begin{cases}\n",
-        "-1 & z < 0 \\text{ 일 때}\\\\\n",
-        "0 & z = 0 \\text{ 일 때}\\\\\n",
-        "+1 & z > 0 \\text{ 일 때}\n",
-        "\\end{cases}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**식 10-2: 퍼셉트론 학습 규칙(가중치 업데이트)**\n",
-        "\n",
-        "$\n",
-        "{w_{i,j}}^{(\\text{다음 스텝})}\\quad = w_{i,j} + \\eta (y_j - \\hat{y}_j) x_i\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**342 페이지 본문 중에서**\n",
-        "\n",
-        "이 행렬은 표준편차가 $ 2 / \\sqrt{\\text{n}_\\text{inputs} + \\text{n}_\\text{n_neurons}} $인 절단 정규(가우시안) 분포를 사용해 무작위로 초기화됩니다.\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "lZkG8wkrn1Y3",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Chapter 11\n",
-        "**Equation 11-1: Xavier initialization (when using the logistic activation function)**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "& \\text{Normal distribution with mean 0 and standard deviation }\n",
-        "\\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}}\\\\\n",
-        "& \\text{Or a uniform distribution between -r and +r, with }\n",
-        "r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**In the text page 278:**\n",
-        "\n",
-        "When the number of input connections is roughly equal to the number of output\n",
-        "connections, you get simpler equations (e.g., $ \\sigma = 1 / \\sqrt{n_\\text{inputs}} $ or $ r = \\sqrt{3} / \\sqrt{n_\\text{inputs}} $).\n",
-        "\n",
-        "**Table 11-1: Initialization parameters for each type of activation function**\n",
-        "\n",
-        "* Logistic uniform: $ r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
-        "* Logistic normal: $ \\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
-        "* Hyperbolic tangent uniform: $ r = 4 \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
-        "* Hyperbolic tangent normal: $ \\sigma = 4 \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
-        "* ReLU (and its variants) uniform: $ r = \\sqrt{2} \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
-        "* ReLU (and its variants) normal: $ \\sigma = \\sqrt{2} \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
-        "\n",
-        "**Equation 11-2: ELU activation function**\n",
-        "\n",
-        "$\n",
-        "\\operatorname{ELU}_\\alpha(z) =\n",
-        "\\begin{cases}\n",
-        "\\alpha(\\exp(z) - 1) & \\text{if } z < 0\\\\\n",
-        "z & if z \\ge 0\n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation 11-3: Batch Normalization algorithm**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "1.\\quad & \\mathbf{\\mu}_B = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{\\mathbf{x}^{(i)}}\\\\\n",
-        "2.\\quad & {\\mathbf{\\sigma}_B}^2 = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{(\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B)^2}\\\\\n",
-        "3.\\quad & \\hat{\\mathbf{x}}^{(i)} = \\dfrac{\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B}{\\sqrt{{\\mathbf{\\sigma}_B}^2 + \\epsilon}}\\\\\n",
-        "4.\\quad & \\mathbf{z}^{(i)} = \\gamma \\hat{\\mathbf{x}}^{(i)} + \\beta\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**In the text page 285:**\n",
-        "\n",
-        "[...] given a new value $v$, the running average $v$ is updated through the equation:\n",
-        "\n",
-        "$ \\hat{v} \\gets \\hat{v} \\times \\text{momentum} + v \\times (1 - \\text{momentum}) $\n",
-        "\n",
-        "**Equation 11-4: Momentum algorithm**\n",
-        "\n",
-        "1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
-        "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
-        "\n",
-        "**In the text page 296:**\n",
-        "\n",
-        "You can easily verify that if the gradient remains constant, the terminal velocity (i.e., the maximum size of the weight updates) is equal to that gradient multiplied by the learning rate η multiplied by $ \\frac{1}{1 - \\beta} $.\n",
-        "\n",
-        "\n",
-        "**Equation 11-5: Nesterov Accelerated Gradient algorithm**\n",
-        "\n",
-        "1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta} + \\beta \\mathbf{m})$\n",
-        "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
-        "\n",
-        "**Equation 11-6: AdaGrad algorithm**\n",
-        "\n",
-        "1. $\\mathbf{s} \\gets \\mathbf{s} + \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
-        "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
-        "\n",
-        "**In the text page 298-299:**\n",
-        "\n",
-        "This vectorized form is equivalent to computing $s_i \\gets s_i + \\left( \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\right)^2$ for each element $s_i$ of the vector $\\mathbf{s}$.\n",
-        "\n",
-        "**In the text page 299:**\n",
-        "\n",
-        "This vectorized form is equivalent to computing $ \\theta_i \\gets \\theta_i - \\eta \\, \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\dfrac{1}{\\sqrt{s_i + \\epsilon}} $ for all parameters $\\theta_i$ (simultaneously).\n",
-        "\n",
-        "\n",
-        "**Equation 11-7: RMSProp algorithm**\n",
-        "\n",
-        "1. $\\mathbf{s} \\gets \\beta \\mathbf{s} + (1 - \\beta ) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
-        "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
-        "\n",
-        "\n",
-        "**Equation 11-8: Adam algorithm**\n",
-        "\n",
-        "1. $\\mathbf{m} \\gets \\beta_1 \\mathbf{m} - (1 - \\beta_1) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
-        "2. $\\mathbf{s} \\gets \\beta_2 \\mathbf{s} + (1 - \\beta_2) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
-        "3. $\\mathbf{m} \\gets \\left(\\dfrac{\\mathbf{m}}{1 - {\\beta_1}^T}\\right)$\n",
-        "4. $\\mathbf{s} \\gets \\left(\\dfrac{\\mathbf{s}}{1 - {\\beta_2}^T}\\right)$\n",
-        "5. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\eta \\, \\mathbf{m} \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
-        "\n",
-        "**In the text page 309:**\n",
-        "\n",
-        "We typically implement this constraint by computing $\\left\\| \\mathbf{w} \\right\\|_2$ after each training step\n",
-        "and clipping $\\mathbf{w}$ if needed $ \\left( \\mathbf{w} \\gets \\mathbf{w} \\dfrac{r}{\\left\\| \\mathbf{w} \\right\\|_2} \\right) $.\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "cFSDXAOzn1Y4",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Chapter 13\n",
-        "\n",
-        "**Equation 13-1: Computing the output of a neuron in a convolutional layer**\n",
-        "\n",
-        "$\n",
-        "z_{i,j,k} = b_k + \\sum\\limits_{u = 0}^{f_h - 1} \\, \\, \\sum\\limits_{v = 0}^{f_w - 1} \\, \\, \\sum\\limits_{k' = 0}^{f_{n'} - 1} \\, \\, x_{i', j', k'} . w_{u, v, k', k}\n",
-        "\\quad \\text{with }\n",
-        "\\begin{cases}\n",
-        "i' = i \\times s_h + u \\\\\n",
-        "j' = j \\times s_w + v\n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "**Equation 13-2: Local response normalization**\n",
-        "\n",
-        "$\n",
-        "b_i = a_i  \\left(k + \\alpha \\sum\\limits_{j=j_\\text{low}}^{j_\\text{high}}{{a_j}^2} \\right)^{-\\beta} \\quad \\text{with }\n",
-        "\\begin{cases}\n",
-        "  j_\\text{high} = \\min\\left(i + \\dfrac{r}{2}, f_n-1\\right) \\\\\n",
-        "  j_\\text{low} = \\max\\left(0, i - \\dfrac{r}{2}\\right)\n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "1SiAf4FTn1Y4",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Chapter 14\n",
-        "\n",
-        "**Equation 14-1: Output of a single recurrent neuron for a single instance**\n",
-        "\n",
-        "$\n",
-        "\\mathbf{y}_{(t)} = \\phi\\left({{\\mathbf{x}_{(t)}}^T \\cdot \\mathbf{w}_x} + {\\mathbf{y}_{(t-1)}}^T \\cdot {\\mathbf{w}_y} + b \\right)\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation 14-2: Outputs of a layer of recurrent neurons for all instances in a mini-batch**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\mathbf{Y}_{(t)} & = \\phi\\left(\\mathbf{X}_{(t)} \\cdot \\mathbf{W}_{x} + \\mathbf{Y}_{(t-1)}\\cdot  \\mathbf{W}_{y} + \\mathbf{b} \\right) \\\\\n",
-        "& = \\phi\\left(\n",
-        "\\left[\\mathbf{X}_{(t)} \\quad \\mathbf{Y}_{(t-1)} \\right]\n",
-        " \\cdot \\mathbf{W} + \\mathbf{b} \\right) \\text{ with } \\mathbf{W}=\n",
-        "\\left[ \\begin{matrix}\n",
-        "  \\mathbf{W}_x\\\\\n",
-        "  \\mathbf{W}_y\n",
-        "\\end{matrix} \\right]\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**In the text page 391:**\n",
-        "\n",
-        "Just like in regular backpropagation, there is a first forward pass through the unrolled network (represented by the dashed arrows); then the output sequence is evaluated using a cost function $ C(\\mathbf{Y}_{(t_\\text{min})}, \\mathbf{Y}_{(t_\\text{min}+1)}, \\dots, \\mathbf{Y}_{(t_\\text{max})}) $ (where $t_\\text{min}$ and $t_\\text{max}$ are the first and last output time steps, not counting the ignored outputs)[...]\n",
-        "\n",
-        "\n",
-        "**Equation 14-3: LSTM computations**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\mathbf{i}_{(t)}&=\\sigma({\\mathbf{W}_{xi}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hi}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_i)\\\\\n",
-        "\\mathbf{f}_{(t)}&=\\sigma({\\mathbf{W}_{xf}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hf}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_f)\\\\\n",
-        "\\mathbf{o}_{(t)}&=\\sigma({\\mathbf{W}_{xo}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{ho}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_o)\\\\\n",
-        "\\mathbf{g}_{(t)}&=\\operatorname{tanh}({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_g)\\\\\n",
-        "\\mathbf{c}_{(t)}&=\\mathbf{f}_{(t)} \\otimes \\mathbf{c}_{(t-1)} \\, + \\, \\mathbf{i}_{(t)} \\otimes \\mathbf{g}_{(t)}\\\\\n",
-        "\\mathbf{y}_{(t)}&=\\mathbf{h}_{(t)} = \\mathbf{o}_{(t)} \\otimes \\operatorname{tanh}(\\mathbf{c}_{(t)})\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation 14-4: GRU computations**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\mathbf{z}_{(t)}&=\\sigma({\\mathbf{W}_{xz}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hz}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
-        "\\mathbf{r}_{(t)}&=\\sigma({\\mathbf{W}_{xr}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hr}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
-        "\\mathbf{g}_{(t)}&=\\operatorname{tanh}\\left({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot (\\mathbf{r}_{(t)} \\otimes \\mathbf{h}_{(t-1)})\\right) \\\\\n",
-        "\\mathbf{h}_{(t)}&=(1-\\mathbf{z}_{(t)}) \\otimes \\mathbf{h}_{(t-1)} + \\mathbf{z}_{(t)} \\otimes \\mathbf{g}_{(t)}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "5IiIkIG_n1Y5",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Chapter 15\n",
-        "\n",
-        "**Equation 15-1: Kullback–Leibler divergence**\n",
-        "\n",
-        "$\n",
-        "D_{\\mathrm{KL}}(P\\|Q) = \\sum\\limits_{i} P(i) \\log \\dfrac{P(i)}{Q(i)}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation: KL divergence between the target sparsity _p_ and the actual sparsity _q_**\n",
-        "\n",
-        "$\n",
-        "D_{\\mathrm{KL}}(p\\|q) = p \\, \\log \\dfrac{p}{q} + (1-p) \\log \\dfrac{1-p}{1-q}\n",
-        "$\n",
-        "\n",
-        "**In the text page 433:**\n",
-        "\n",
-        "One common variant is to train the encoder to output $\\gamma = \\log\\left(\\sigma^2\\right)$ rather than $\\sigma$.\n",
-        "Wherever we need $\\sigma$ we can just compute $ \\sigma = \\exp\\left(\\dfrac{\\gamma}{2}\\right) $.\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "wVr9eBb1n1Y6",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Chapter 16\n",
-        "\n",
-        "**Equation 16-1: Bellman Optimality Equation**\n",
-        "\n",
-        "$\n",
-        "V^*(s) = \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V^*(s')]} \\quad \\text{for all }s\n",
-        "$\n",
-        "\n",
-        "**Equation 16-2: Value Iteration algorithm**\n",
-        "\n",
-        "$\n",
-        "  V_{k+1}(s) \\gets \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V_k(s')]} \\quad \\text{for all }s\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation 16-3: Q-Value Iteration algorithm**\n",
-        "\n",
-        "$\n",
-        "  Q_{k+1}(s, a) \\gets \\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . \\underset{a'}{\\max}\\,{Q_k(s',a')}]} \\quad \\text{for all } (s,a)\n",
-        "$\n",
-        "\n",
-        "**In the text page 458:**\n",
-        "\n",
-        "Once you have the optimal Q-Values, defining the optimal policy, noted $\\pi^{*}(s)$, is trivial: when the agent is in state $s$, it should choose the action with the highest Q-Value for that state: $ \\pi^{*}(s) = \\underset{a}{\\operatorname{argmax}} \\, Q^*(s, a) $.\n",
-        "\n",
-        "\n",
-        "**Equation 16-4: TD Learning algorithm**\n",
-        "\n",
-        "$\n",
-        "V_{k+1}(s) \\gets (1-\\alpha)V_k(s) + \\alpha\\left(r + \\gamma . V_k(s')\\right)\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation 16-5: Q-Learning algorithm**\n",
-        "\n",
-        "$\n",
-        "Q_{k+1}(s, a) \\gets (1-\\alpha)Q_k(s,a) + \\alpha\\left(r + \\gamma . \\underset{a'}{\\max} \\, Q_k(s', a')\\right)\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation 16-6: Q-Learning using an exploration function**\n",
-        "\n",
-        "$\n",
-        "  Q(s, a) \\gets (1-\\alpha)Q(s,a) + \\alpha\\left(r + \\gamma . \\underset{\\alpha'}{\\max}f(Q(s', a'), N(s', a'))\\right)\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation 16-7: Deep Q-Learning cost function**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "& J(\\mathbf{\\theta}_\\text{critic}) = \\dfrac{1}{m}\\sum\\limits_{i=1}^m\\left(y^{(i)} - Q(s^{(i)},a^{(i)},\\mathbf{\\theta}_\\text{critic})\\right)^2 \\\\\n",
-        "& \\text{with } y^{(i)} = r^{(i)} + \\gamma . \\underset{a'}{\\max}Q(s'^{(i)},a',\\mathbf{\\theta}_\\text{actor})\n",
-        "\\end{split}\n",
-        "$\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "1U0nBdBvn1Y7",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Appendix A\n",
-        "\n",
-        "Equations that appear in the text:\n",
-        "\n",
-        "$\n",
-        "\\mathbf{H} =\n",
-        "\\begin{pmatrix}\n",
-        "\\mathbf{H'} & 0 & \\cdots\\\\\n",
-        "0 & 0 & \\\\\n",
-        "\\vdots & & \\ddots\n",
-        "\\end{pmatrix}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "$\n",
-        "\\mathbf{A} =\n",
-        "\\begin{pmatrix}\n",
-        "\\mathbf{A'} & \\mathbf{I}_m \\\\\n",
-        "\\mathbf{0} & -\\mathbf{I}_m\n",
-        "\\end{pmatrix}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "$ 1 - \\frac{1}{5}^2 - \\frac{4}{5}^2 $\n",
-        "\n",
-        "\n",
-        "$ 1 - \\frac{1}{2}^2 - \\frac{1}{2}^2  $\n",
-        "\n",
-        "\n",
-        "$ \\frac{2}{5} \\times $\n",
-        "\n",
-        "\n",
-        "$ \\frac{3}{5} \\times 0 $"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "dphwGCobn1Y7",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Appendix C"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "vH7f8-Min1Y7",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "Equations that appear in the text:\n",
-        "\n",
-        "$ (\\hat{x}, \\hat{y}) $\n",
-        "\n",
-        "\n",
-        "$ \\hat{\\alpha} $\n",
-        "\n",
-        "\n",
-        "$ (\\hat{x}, \\hat{y}, \\hat{\\alpha}) $\n",
-        "\n",
-        "\n",
-        "$\n",
-        "\\begin{cases}\n",
-        "\\frac{\\partial}{\\partial x}g(x, y, \\alpha) = 2x - 3\\alpha\\\\\n",
-        "\\frac{\\partial}{\\partial y}g(x, y, \\alpha) = 2 - 2\\alpha\\\\\n",
-        "\\frac{\\partial}{\\partial \\alpha}g(x, y, \\alpha) = -3x - 2y - 1\\\\\n",
-        "\\end{cases}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "$ 2\\hat{x} - 3\\hat{\\alpha} = 2 - 2\\hat{\\alpha} = -3\\hat{x} - 2\\hat{y} - 1 = 0 $\n",
-        "\n",
-        "\n",
-        "$ \\hat{x} = \\frac{3}{2} $\n",
-        "\n",
-        "\n",
-        "$ \\hat{y} = -\\frac{11}{4} $\n",
-        "\n",
-        "\n",
-        "$ \\hat{\\alpha} = 1 $\n",
-        "\n",
-        "\n",
-        "**Equation C-1: Generalized Lagrangian for the hard margin problem**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} - \\sum\\limits_{i=1}^{m}{\\alpha^{(i)} \\left(t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) - 1\\right)} \\\\\n",
-        "\\text{with}\\quad \\alpha^{(i)} \\ge 0 \\quad \\text{for }i = 1, 2, \\dots, m\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**More equations in the text:**\n",
-        "\n",
-        "$ (\\hat{\\mathbf{w}}, \\hat{b}, \\hat{\\mathbf{\\alpha}}) $\n",
-        "\n",
-        "\n",
-        "$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) \\ge 1 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
-        "\n",
-        "\n",
-        "$ {\\hat{\\alpha}}^{(i)} \\ge 0 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
-        "\n",
-        "\n",
-        "$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
-        "\n",
-        "\n",
-        "$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) = 1 $\n",
-        "\n",
-        "\n",
-        "$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
-        "\n",
-        "\n",
-        "**Equation C-2: Partial derivatives of the generalized Lagrangian**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\nabla_{\\mathbf{w}}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\mathbf{w} - \\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
-        "\\dfrac{\\partial}{\\partial b}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = -\\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation C-3: Properties of the stationary points**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
-        "\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} = 0\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation C-4: Dual form of the SVM problem**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\mathcal{L}(\\hat{\\mathbf{w}}, \\hat{b}, \\mathbf{\\alpha}) = \\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
-        "  \\sum\\limits_{j=1}^{m}{\n",
-        "  \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
-        "  }\n",
-        "} \\quad - \\quad \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
-        "\\text{with}\\quad \\alpha^{(i)} \\ge 0 \\quad \\text{for }i = 1, 2, \\dots, m\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**Some more equations in the text:**\n",
-        "\n",
-        "$ \\hat{\\mathbf{\\alpha}} $\n",
-        "\n",
-        "\n",
-        "$ {\\hat{\\alpha}}^{(i)} \\ge 0 $\n",
-        "\n",
-        "\n",
-        "$ \\hat{\\mathbf{\\alpha}} $\n",
-        "\n",
-        "\n",
-        "$ \\hat{\\mathbf{w}} $\n",
-        "\n",
-        "\n",
-        "$ \\hat{b} $\n",
-        "\n",
-        "\n",
-        "$ \\hat{b} = 1 - t^{(k)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(k)}) $\n",
-        "\n",
-        "\n",
-        "**Equation C-5: Bias term estimation using the dual form**\n",
-        "\n",
-        "$\n",
-        "\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left[1 - t^{(i)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)})\\right]}\n",
-        "$"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "r57A4yyKn1Y8",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Appendix D"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "KwAsMn0Rn1Y9",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "**Equation D-1: Partial derivatives of $f(x,y)$**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "\\dfrac{\\partial f}{\\partial x} & = \\dfrac{\\partial(x^2y)}{\\partial x} + \\dfrac{\\partial y}{\\partial x} + \\dfrac{\\partial 2}{\\partial x} = y \\dfrac{\\partial(x^2)}{\\partial x} + 0 + 0 = 2xy \\\\\n",
-        "\\dfrac{\\partial f}{\\partial y} & = \\dfrac{\\partial(x^2y)}{\\partial y} + \\dfrac{\\partial y}{\\partial y} + \\dfrac{\\partial 2}{\\partial y} = x^2 + 1 + 0 = x^2 + 1 \\\\\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**In the text:**\n",
-        "\n",
-        "$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1) = y $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial x}{\\partial x} = 1 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial y}{\\partial x} = 0 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial (u \\times v)}{\\partial x} = \\frac{\\partial v}{\\partial x} \\times u + \\frac{\\partial u}{\\partial x} \\times u  $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1)  $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial g}{\\partial x} = y $\n",
-        "\n",
-        "\n",
-        "**Equation D-2: Derivative of a function _h_(_x_) at point _x_~0~**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "h'(x) & = \\underset{\\textstyle x \\to x_0}{\\lim}\\dfrac{h(x) - h(x_0)}{x - x_0}\\\\\n",
-        "      & = \\underset{\\textstyle \\epsilon \\to 0}{\\lim}\\dfrac{h(x_0 + \\epsilon) - h(x_0)}{\\epsilon}\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "\n",
-        "**Equation D-3: A few operations with dual numbers**\n",
-        "\n",
-        "$\n",
-        "\\begin{split}\n",
-        "&\\lambda(a + b\\epsilon) = \\lambda a + \\lambda b \\epsilon\\\\\n",
-        "&(a + b\\epsilon) + (c + d\\epsilon) = (a + c) + (b + d)\\epsilon \\\\\n",
-        "&(a + b\\epsilon) \\times (c + d\\epsilon) = ac + (ad + bc)\\epsilon + (bd)\\epsilon^2 = ac + (ad + bc)\\epsilon\\\\\n",
-        "\\end{split}\n",
-        "$\n",
-        "\n",
-        "**In the text:**\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial x}(3, 4) $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial y}(3, 4) $\n",
-        "\n",
-        "\n",
-        "**Equation D-4: Chain rule**\n",
-        "\n",
-        "$\n",
-        "\\dfrac{\\partial f}{\\partial x} = \\dfrac{\\partial f}{\\partial n_i} \\times \\dfrac{\\partial n_i}{\\partial x}\n",
-        "$\n",
-        "\n",
-        "**In the text:**\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial n_5} = \\frac{\\partial f}{\\partial n_7} \\times \\frac{\\partial n_7}{\\partial n_5} $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial n_7}{\\partial n_5} $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial n_7}{\\partial n_5} = 1 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial n_5} = 1 \\times 1 = 1 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial n_4} = \\frac{\\partial f}{\\partial n_5} \\times \\frac{\\partial n_5}{\\partial n_4} $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial n_5}{\\partial n_4} = n_2 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial n_4} = 1 \\times n_2 = 4 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial x} = 24 $\n",
-        "\n",
-        "\n",
-        "$ \\frac{\\partial f}{\\partial y} = 10 $"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "Nn7LHmpCn1Y-",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Appendix E"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "XOtBQA4-n1Y-",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "**Equation E-1: Probability that the i^th^ neuron will output 1**\n",
-        "\n",
-        "$\n",
-        "p\\left(s_i^{(\\text{next step})} = 1\\right) \\, = \\, \\sigma\\left(\\frac{\\textstyle \\sum\\limits_{j = 1}^N{w_{i,j}s_j + b_i}}{\\textstyle T}\\right)\n",
-        "$\n",
-        "\n",
-        "**In the text:**\n",
-        "\n",
-        "$ \\dot{\\mathbf{x}} $\n",
-        "\n",
-        "\n",
-        "$ \\dot{\\mathbf{h}} $\n",
-        "\n",
-        "\n",
-        "**Equation E-2: Contrastive divergence weight update**\n",
-        "\n",
-        "$\n",
-        "w_{i,j}^{(\\text{next step})} = w_{i,j} + \\eta(\\mathbf{x}\\mathbf{h}^T - \\dot{\\mathbf{x}} \\dot {\\mathbf{h}}^T)\n",
-        "$"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "el71w1NKn1Y-",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "# Glossary\n",
-        "\n",
-        "In the text:\n",
-        "\n",
-        "$\\ell _1$\n",
-        "\n",
-        "\n",
-        "$\\ell _2$\n",
-        "\n",
-        "\n",
-        "$\\ell _k$\n",
-        "\n",
-        "\n",
-        "$ \\chi^2 $\n"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "2Bo7NhHLn1ZA",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "Just in case your eyes hurt after all these equations, let's finish with the single most beautiful equation in the world. No, it's not $E = mc²$, it's obviously Euler's identity:"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "s31L5v1mn1ZB",
-        "colab_type": "text"
-      },
-      "cell_type": "markdown",
-      "source": [
-        "$e^{i\\pi}+1=0$"
-      ]
-    },
-    {
-      "metadata": {
-        "id": "i3tXVx1zn1ZB",
-        "colab_type": "code",
-        "colab": {}
-      },
-      "cell_type": "code",
-      "source": [
-        ""
-      ],
-      "execution_count": 0,
-      "outputs": []
-    }
-  ]
-}
\ No newline at end of file
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "A_lqPMDMn1Yx"
+   },
+   "source": [
+    "# 1장\n",
+    "**식 1-1: 간단한 선형 모델**\n",
+    "\n",
+    "$\n",
+    "\\text{삶의_만족도} = \\theta_0 + \\theta_1 \\times \\text{1인당_GDP}\n",
+    "$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "-ae0t1eRn1Yx"
+   },
+   "source": [
+    "# 2장\n",
+    "**식 2-1: 평균 제곱근 오차 (RMSE)**\n",
+    "\n",
+    "$\n",
+    "\\text{RMSE}(\\mathbf{X}, h) = \\sqrt{\\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left(h(\\mathbf{x}^{(i)}) - y^{(i)}\\right)^2}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**표기법 (72 페이지):**\n",
+    "\n",
+    "$\n",
+    "  \\mathbf{x}^{(1)} = \\begin{pmatrix}\n",
+    "  -118.29 \\\\\n",
+    "  33.91 \\\\\n",
+    "  1,416 \\\\\n",
+    "  38,372\n",
+    "  \\end{pmatrix}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "$\n",
+    "  y^{(1)}=156,400\n",
+    "$\n",
+    "\n",
+    "\n",
+    "$\n",
+    "  \\mathbf{X} = \\begin{pmatrix}\n",
+    "  (\\mathbf{x}^{(1)})^T \\\\\n",
+    "  (\\mathbf{x}^{(2)})^T\\\\\n",
+    "  \\vdots \\\\\n",
+    "  (\\mathbf{x}^{(1999)})^T \\\\\n",
+    "  (\\mathbf{x}^{(2000)})^T\n",
+    "  \\end{pmatrix} = \\begin{pmatrix}\n",
+    "  -118.29 & 33.91 & 1,416 & 38,372 \\\\\n",
+    "  \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
+    "  \\end{pmatrix}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 2-2: 평균 절대 오차**\n",
+    "\n",
+    "$\n",
+    "\\text{MAE}(\\mathbf{X}, h) = \\frac{1}{m}\\sum\\limits_{i=1}^{m}\\left| h(\\mathbf{x}^{(i)}) - y^{(i)} \\right|\n",
+    "$\n",
+    "\n",
+    "**$\\ell_k$ 노름 (74 페이지):**\n",
+    "\n",
+    "$ \\left\\| \\mathbf{v} \\right\\| _k = (\\left| v_0 \\right|^k + \\left| v_1 \\right|^k + \\dots + \\left| v_n \\right|^k)^{\\frac{1}{k}} $\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "8fBBfiofn1Yy"
+   },
+   "source": [
+    "# 3장\n",
+    "**식 3-1: 정밀도**\n",
+    "\n",
+    "$\n",
+    "\\text{정밀도} = \\cfrac{TP}{TP + FP}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 3-2: 재현율**\n",
+    "\n",
+    "$\n",
+    "\\text{재현율} = \\cfrac{TP}{TP + FN}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 3-3: $F_1$ 점수**\n",
+    "\n",
+    "$\n",
+    "F_1 = \\cfrac{2}{\\cfrac{1}{\\text{정밀도}} + \\cfrac{1}{\\text{재현율}}} = 2 \\times \\cfrac{\\text{정밀도}\\, \\times \\, \\text{재현율}}{\\text{정밀도}\\, + \\, \\text{재현율}} = \\cfrac{TP}{TP + \\cfrac{FN + FP}{2}}\n",
+    "$\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "YTYQZMTjn1Yy"
+   },
+   "source": [
+    "# 4장\n",
+    "**식 4-1: 선형 회귀 모델의 예측**\n",
+    "\n",
+    "$\n",
+    "\\hat{y} = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\dots + \\theta_n x_n\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-2: 선형 회귀 모델의 예측 (벡터 형태)**\n",
+    "\n",
+    "$\n",
+    "\\hat{y} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\mathbf{\\theta}^T \\cdot \\mathbf{x}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-3: 선형 회귀 모델의 MSE 비용 함수**\n",
+    "\n",
+    "$\n",
+    "\\text{MSE}(\\mathbf{X}, h_{\\mathbf{\\theta}}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})^2}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-4: 정규 방정식**\n",
+    "\n",
+    "$\n",
+    "\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "** 편도함수 기호 (165 페이지):**\n",
+    "\n",
+    "$\\frac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta})$\n",
+    "\n",
+    "\n",
+    "**식 4-5: 비용 함수의 편도함수**\n",
+    "\n",
+    "$\n",
+    "\\dfrac{\\partial}{\\partial \\theta_j} \\text{MSE}(\\mathbf{\\theta}) = \\dfrac{2}{m}\\sum\\limits_{i=1}^{m}(\\mathbf{\\theta}^T \\cdot \\mathbf{x}^{(i)} - y^{(i)})\\, x_j^{(i)}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-6: 비용 함수의 그래디언트 벡터**\n",
+    "\n",
+    "$\n",
+    "\\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) =\n",
+    "\\begin{pmatrix}\n",
+    " \\frac{\\partial}{\\partial \\theta_0} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
+    " \\frac{\\partial}{\\partial \\theta_1} \\text{MSE}(\\mathbf{\\theta}) \\\\\n",
+    " \\vdots \\\\\n",
+    " \\frac{\\partial}{\\partial \\theta_n} \\text{MSE}(\\mathbf{\\theta})\n",
+    "\\end{pmatrix}\n",
+    " = \\dfrac{2}{m} \\mathbf{X}^T \\cdot (\\mathbf{X} \\cdot \\mathbf{\\theta} - \\mathbf{y})\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-7: 경사 하강법의 스텝**\n",
+    "\n",
+    "$\n",
+    "\\mathbf{\\theta}^{(\\text{다음 스텝})}\\,\\,\\, = \\mathbf{\\theta} - \\eta \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta})\n",
+    "$\n",
+    "\n",
+    "\n",
+    "$ O(\\frac{1}{\\epsilon}) $\n",
+    "\n",
+    "\n",
+    "$ \\hat{y} = 0.56 x_1^2 + 0.93 x_1 + 1.78 $\n",
+    "\n",
+    "\n",
+    "$ y = 0.5 x_1^2 + 1.0 x_1 + 2.0 + \\text{가우시안 잡음} $\n",
+    "\n",
+    "\n",
+    "$ \\dfrac{(n+d)!}{d!\\,n!} $\n",
+    "\n",
+    "\n",
+    "$ \\alpha \\sum_{i=1}^{n}{\\theta_i^2}$\n",
+    "\n",
+    "\n",
+    "**식 4-8: 릿지 회귀의 비용 함수**\n",
+    "\n",
+    "$\n",
+    "J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\dfrac{1}{2}\\sum\\limits_{i=1}^{n}\\theta_i^2\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-9: 릿지 회귀의 정규 방정식**\n",
+    "\n",
+    "$\n",
+    "\\hat{\\mathbf{\\theta}} = (\\mathbf{X}^T \\cdot \\mathbf{X} + \\alpha \\mathbf{A})^{-1} \\cdot \\mathbf{X}^T \\cdot \\mathbf{y}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-10: 라쏘 회귀의 비용 함수**\n",
+    "\n",
+    "$\n",
+    "J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right|\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-11: 라쏘 회귀의 서브그래디언트 벡터**\n",
+    "\n",
+    "$\n",
+    "g(\\mathbf{\\theta}, J) = \\nabla_{\\mathbf{\\theta}}\\, \\text{MSE}(\\mathbf{\\theta}) + \\alpha\n",
+    "\\begin{pmatrix}\n",
+    "  \\operatorname{sign}(\\theta_1) \\\\\n",
+    "  \\operatorname{sign}(\\theta_2) \\\\\n",
+    "  \\vdots \\\\\n",
+    "  \\operatorname{sign}(\\theta_n) \\\\\n",
+    "\\end{pmatrix} \\quad \\text{여기서 } \\operatorname{sign}(\\theta_i) =\n",
+    "\\begin{cases}\n",
+    "-1 & \\theta_i < 0 \\text{일 때 } \\\\\n",
+    "0 & \\theta_i = 0 \\text{일 때 } \\\\\n",
+    "+1 & \\theta_i > 0 \\text{일 때 }\n",
+    "\\end{cases}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-12: 엘라스틱넷 비용 함수**\n",
+    "\n",
+    "$\n",
+    "J(\\mathbf{\\theta}) = \\text{MSE}(\\mathbf{\\theta}) + r \\alpha \\sum\\limits_{i=1}^{n}\\left| \\theta_i \\right| + \\dfrac{1 - r}{2} \\alpha \\sum\\limits_{i=1}^{n}{\\theta_i^2}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-13: 로지스틱 회귀 모델의 확률 추정(벡터 표현식)**\n",
+    "\n",
+    "$\n",
+    "\\hat{p} = h_{\\mathbf{\\theta}}(\\mathbf{x}) = \\sigma(\\mathbf{\\theta}^T \\cdot \\mathbf{x})\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-14: 로지스틱 함수**\n",
+    "\n",
+    "$\n",
+    "\\sigma(t) = \\dfrac{1}{1 + \\exp(-t)}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-15: 로지스틱 회귀 모델 예측**\n",
+    "\n",
+    "$\n",
+    "\\hat{y} =\n",
+    "\\begin{cases}\n",
+    "  0 & \\hat{p} < 0.5 \\text{일 때 } \\\\\n",
+    "  1 & \\hat{p} \\geq 0.5 \\text{일 때 } \n",
+    "\\end{cases}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-16: 하나의 훈련 샘플에 대한 비용 함수**\n",
+    "\n",
+    "$\n",
+    "c(\\mathbf{\\theta}) =\n",
+    "\\begin{cases}\n",
+    "  -\\log(\\hat{p}) & y = 1 \\text{일 때 } \\\\\n",
+    "  -\\log(1 - \\hat{p}) & y = 0 \\text{일 때 }\n",
+    "\\end{cases}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-17: 로지스틱 회귀의 비용 함수(로그 손실)**\n",
+    "\n",
+    "$\n",
+    "J(\\mathbf{\\theta}) = -\\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{\\left[ y^{(i)} log\\left(\\hat{p}^{(i)}\\right) + (1 - y^{(i)}) log\\left(1 - \\hat{p}^{(i)}\\right)\\right]}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-18: 로지스틱 비용 함수의 편도함수**\n",
+    "\n",
+    "$\n",
+    "\\dfrac{\\partial}{\\partial \\theta_j} \\text{J}(\\mathbf{\\theta}) = \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\left(\\mathbf{\\sigma(\\theta}^T \\cdot \\mathbf{x}^{(i)}) - y^{(i)}\\right)\\, x_j^{(i)}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-19: 클래스 k에 대한 소프트맥스 점수**\n",
+    "\n",
+    "$\n",
+    "s_k(\\mathbf{x}) = ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-20: 소프트맥스 함수**\n",
+    "\n",
+    "$\n",
+    "\\hat{p}_k = \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\dfrac{\\exp\\left(s_k(\\mathbf{x})\\right)}{\\sum\\limits_{j=1}^{K}{\\exp\\left(s_j(\\mathbf{x})\\right)}}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-21: 소프트맥스 회귀 분류기의 예측**\n",
+    "\n",
+    "$\n",
+    "\\hat{y} = \\underset{k}{\\operatorname{argmax}} \\, \\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\underset{k}{\\operatorname{argmax}} \\, s_k(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}} \\, \\left( ({\\mathbf{\\theta}^{(k)}})^T \\cdot \\mathbf{x} \\right)\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 4-22: 크로스 엔트로피 비용 함수**\n",
+    "\n",
+    "$\n",
+    "J(\\mathbf{\\Theta}) = - \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\sum\\limits_{k=1}^{K}{y_k^{(i)}\\log\\left(\\hat{p}_k^{(i)}\\right)}\n",
+    "$\n",
+    "\n",
+    "**두 확률 분포 $p$ 와 $q$ 사이의 크로스 엔트로피 (196 페이지):**\n",
+    "$ H(p, q) = -\\sum\\limits_{x}p(x) \\log q(x) $\n",
+    "\n",
+    "\n",
+    "**식 4-23: 클래스 k 에 대한 크로스 엔트로피의 그래디언트 벡터**\n",
+    "\n",
+    "$\n",
+    "\\nabla_{\\mathbf{\\theta}^{(k)}} \\, J(\\mathbf{\\Theta}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{ \\left ( \\hat{p}^{(i)}_k - y_k^{(i)} \\right ) \\mathbf{x}^{(i)}}\n",
+    "$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "sFBoMnuzn1Yz"
+   },
+   "source": [
+    "# 5장\n",
+    "**식 5-1: 가우시안 RBF**\n",
+    "\n",
+    "$\n",
+    "{\\displaystyle \\phi_{\\gamma}(\\mathbf{x}, \\mathbf{\\ell})} = {\\displaystyle \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{x} - \\mathbf{\\ell} \\right\\|^2})}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-2: 선형 SVM 분류기의 예측**\n",
+    "\n",
+    "$\n",
+    "\\hat{y} = \\begin{cases}\n",
+    " 0 & \\mathbf{w}^T \\cdot \\mathbf{x} + b < 0 \\text{일 때 } \\\\\n",
+    " 1 & \\mathbf{w}^T \\cdot \\mathbf{x} + b \\geq 0 \\text{일 때 }\n",
+    "\\end{cases}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-3: 하드 마진 선형 SVM 분류기의 목적 함수**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "&\\underset{\\mathbf{w}, b}{\\operatorname{minimize}}\\,{\\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w}} \\\\\n",
+    "&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-4: 소프트 마진 선형 SVM 분류기의 목적 함수**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "&\\underset{\\mathbf{w}, b, \\mathbf{\\zeta}}{\\operatorname{minimize}}\\,{\\dfrac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} + C \\sum\\limits_{i=1}^m{\\zeta^{(i)}}}\\\\\n",
+    "&[\\text{조건}] \\, i = 1, 2, \\dots, m \\text{일 때} \\quad t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\ge 1 - \\zeta^{(i)} \\text{ 이고} \\quad \\zeta^{(i)} \\ge 0\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-5: QP 문제**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\underset{\\mathbf{p}}{\\text{minimize}} \\, & \\dfrac{1}{2} \\mathbf{p}^T \\cdot \\mathbf{H} \\cdot \\mathbf{p} \\, + \\, \\mathbf{f}^T \\cdot \\mathbf{p}  \\\\\n",
+    "[\\text{조건}] \\, & \\mathbf{A} \\cdot \\mathbf{p} \\le \\mathbf{b} \\\\\n",
+    "\\text{여기서 } &\n",
+    "\\begin{cases}\n",
+    "  \\mathbf{p} \\, \\text{는 }n_p\\text{ 차원의 벡터 (} n_p = \\text{모델 파라미터 수)}\\\\\n",
+    "  \\mathbf{H} \\, \\text{는 }n_p \\times n_p \\text{ 크기 행렬}\\\\\n",
+    "  \\mathbf{f} \\, \\text{는 }n_p\\text{ 차원의 벡터}\\\\\n",
+    "  \\mathbf{A} \\, \\text{는 } n_c \\times n_p \\text{ 크기 행렬 (}n_c = \\text{제약 수)}\\\\\n",
+    "  \\mathbf{b} \\, \\text{는 }n_c\\text{ 차원의 벡터}\n",
+    "\\end{cases}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-6: 선형 SVM 목적 함수의 쌍대 형식**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "&\\underset{\\mathbf{\\alpha}}{\\operatorname{minimize}} \\,\n",
+    "\\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
+    "  \\sum\\limits_{j=1}^{m}{\n",
+    "  \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
+    "  }\n",
+    "} \\, - \\, \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
+    "&\\text{[조건]}\\,i = 1, 2, \\dots, m \\text{일 때 } \\quad \\alpha^{(i)} \\ge 0\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-7: 쌍대 문제에서 구한 해로 원 문제의 해 계산하기**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "&\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
+    "&\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(1 - t^{(i)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)})\\right)}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-8: 2차 다항식 매핑**\n",
+    "\n",
+    "$\n",
+    "\\phi\\left(\\mathbf{x}\\right) = \\phi\\left( \\begin{pmatrix}\n",
+    "  x_1 \\\\\n",
+    "  x_2\n",
+    "\\end{pmatrix} \\right) = \\begin{pmatrix}\n",
+    "  {x_1}^2 \\\\\n",
+    "  \\sqrt{2} \\, x_1 x_2 \\\\\n",
+    "  {x_2}^2\n",
+    "\\end{pmatrix}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-9: 2차 다항식 매핑을 위한 커널 트릭**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\phi(\\mathbf{a})^T \\cdot \\phi(\\mathbf{b}) & \\quad = \\begin{pmatrix}\n",
+    "  {a_1}^2 \\\\\n",
+    "  \\sqrt{2} \\, a_1 a_2 \\\\\n",
+    "  {a_2}^2\n",
+    "  \\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
+    "  {b_1}^2 \\\\\n",
+    "  \\sqrt{2} \\, b_1 b_2 \\\\\n",
+    "  {b_2}^2\n",
+    "\\end{pmatrix} = {a_1}^2 {b_1}^2 + 2 a_1 b_1 a_2 b_2 + {a_2}^2 {b_2}^2 \\\\\n",
+    " & \\quad = \\left( a_1 b_1 + a_2 b_2 \\right)^2 = \\left( \\begin{pmatrix}\n",
+    "  a_1 \\\\\n",
+    "  a_2\n",
+    "\\end{pmatrix}^T \\cdot \\begin{pmatrix}\n",
+    "    b_1 \\\\\n",
+    "    b_2\n",
+    "  \\end{pmatrix} \\right)^2 = (\\mathbf{a}^T \\cdot \\mathbf{b})^2\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**커널 트릭에 관한 본문 중에서 (220 페이지):**\n",
+    "[...] 변환된 벡터의 점곱을 간단하게 $ ({\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)})^2 $ 으로 바꿀 수 있습니다.\n",
+    "\n",
+    "\n",
+    "**식 5-10: 일반적인 커널**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\text{선형:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\mathbf{a}^T \\cdot \\mathbf{b} \\\\\n",
+    "\\text{다항식:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r \\right)^d \\\\\n",
+    "\\text{가우시안 RBF:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\exp({\\displaystyle -\\gamma \\left\\| \\mathbf{a} - \\mathbf{b} \\right\\|^2}) \\\\\n",
+    "\\text{시그모이드:} & \\quad K(\\mathbf{a}, \\mathbf{b}) = \\tanh\\left(\\gamma \\mathbf{a}^T \\cdot \\mathbf{b} + r\\right)\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**식 5-11: 커널 SVM으로 예측하기**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "h_{\\hat{\\mathbf{w}}, \\hat{b}}\\left(\\phi(\\mathbf{x}^{(n)})\\right) & = \\,\\hat{\\mathbf{w}}^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b} = \\left(\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\phi(\\mathbf{x}^{(i)})\\right)^T \\cdot \\phi(\\mathbf{x}^{(n)}) + \\hat{b}\\\\\n",
+    " & = \\, \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\left(\\phi(\\mathbf{x}^{(i)})^T \\cdot \\phi(\\mathbf{x}^{(n)})\\right)  + \\hat{b}\\\\\n",
+    " & = \\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} K(\\mathbf{x}^{(i)}, \\mathbf{x}^{(n)}) + \\hat{b}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-12: 커널 트릭을 사용한 편향 계산**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\hat{b} & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(t^{(i)} - {\\hat{\\mathbf{w}}}^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(t^{(i)} - {\n",
+    " \\left(\\sum_{j=1}^{m}{\\hat{\\alpha}}^{(j)}t^{(j)}\\phi(\\mathbf{x}^{(j)})\\right)\n",
+    " }^T \\cdot \\phi(\\mathbf{x}^{(i)})\\right)}\\\\\n",
+    " & = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left(t^{(i)} - \n",
+    "\\sum\\limits_{\\scriptstyle j=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(j)} > 0}}^{m}{\n",
+    "  {\\hat{\\alpha}}^{(j)} t^{(j)} K(\\mathbf{x}^{(i)},\\mathbf{x}^{(j)})\n",
+    "}\n",
+    "\\right)}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 5-13: 선형 SVM 분류기 비용 함수**\n",
+    "\n",
+    "$\n",
+    "J(\\mathbf{w}, b) = \\dfrac{1}{2} \\mathbf{w}^T \\cdot \\mathbf{w} \\, + \\, C {\\displaystyle \\sum\\limits_{i=1}^{m}max\\left(0, 1 - t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) \\right)}\n",
+    "$\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "JyogtW6Jn1Y0"
+   },
+   "source": [
+    "# 6장\n",
+    "**식 6-1: 지니 불순도**\n",
+    "\n",
+    "$\n",
+    "G_i = 1 - \\sum\\limits_{k=1}^{n}{{p_{i,k}}^2}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 6-2: 분류에 대한 CART 비용 함수**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "&J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}G_\\text{left} + \\dfrac{m_{\\text{right}}}{m}G_{\\text{right}}\\\\\n",
+    "&\\text{여기서 }\\begin{cases}\n",
+    "G_\\text{left/right} \\text{ 는 왼쪽/오른쪽 서브셋의 불순도}\\\\\n",
+    "m_\\text{left/right} \\text{ 는 왼쪽/오른쪽 서브셋의 불순도}\n",
+    "\\end{cases}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**엔트로피 계산 예 (232 페이지):**\n",
+    "\n",
+    "$ -\\frac{49}{54}\\log_2(\\frac{49}{54}) - \\frac{5}{54}\\log_2(\\frac{5}{54}) $\n",
+    "\n",
+    "\n",
+    "**식 6-3: 엔트로피**\n",
+    "\n",
+    "$\n",
+    "H_i = -\\sum\\limits_{k=1 \\atop p_{i,k} \\ne 0}^{n}{{p_{i,k}}\\log_2(p_{i,k})}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 6-4: 회귀를 위한 CART 비용 함수**\n",
+    "\n",
+    "$\n",
+    "J(k, t_k) = \\dfrac{m_{\\text{left}}}{m}\\text{MSE}_\\text{left} + \\dfrac{m_{\\text{right}}}{m}\\text{MSE}_{\\text{right}} \\quad\n",
+    "\\text{여기서 }\n",
+    "\\begin{cases}\n",
+    "\\text{MSE}_{\\text{node}} = \\sum\\limits_{\\scriptstyle i \\in \\text{node}}(\\hat{y}_{\\text{node}} - y^{(i)})^2\\\\\n",
+    "\\hat{y}_\\text{node} = \\dfrac{1}{m_{\\text{node}}}\\sum\\limits_{\\scriptstyle i \\in \\text{node}}y^{(i)}\n",
+    "\\end{cases}\n",
+    "$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "mCEpcobOn1Y0"
+   },
+   "source": [
+    "# 7장\n",
+    "\n",
+    "**식 7-1: j번째 예측기의 가중치가 적용된 에러율**\n",
+    "\n",
+    "$\n",
+    "r_j = \\dfrac{\\displaystyle \\sum\\limits_{\\textstyle {i=1 \\atop \\hat{y}_j^{(i)} \\ne y^{(i)}}}^{m}{w^{(i)}}}{\\displaystyle \\sum\\limits_{i=1}^{m}{w^{(i)}}} \\quad\n",
+    "\\text{where }\\hat{y}_j^{(i)}\\text{ is the }j^{\\text{th}}\\text{ predictor's prediction for the }i^{\\text{th}}\\text{ instance.}\n",
+    "$\n",
+    "\n",
+    "**식 7-2: 예측기 가중치**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\alpha_j = \\eta \\log{\\dfrac{1 - r_j}{r_j}}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 7-3: 가중치 업데이트 규칙**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "& w^{(i)} \\leftarrow\n",
+    "\\begin{cases}\n",
+    "w^{(i)} & \\hat{y_j}^{(i)} = y^{(i)} \\text{ 일 때}\\\\\n",
+    "w^{(i)} \\exp(\\alpha_j) & \\hat{y_j}^{(i)} \\ne y^{(i)} \\text{ 일 때}\n",
+    "\\end{cases} \\\\\n",
+    "& \\text{여기서 } i = 1, 2, \\dots, m \\\\\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**256 페이지 본문 중에서:**\n",
+    "\n",
+    "그런 다음 모든 샘플의 가중치를 정규화합니다(즉, $ \\sum_{i=1}^{m}{w^{(i)}} $으로 나눕니다).\n",
+    "\n",
+    "\n",
+    "**식 7-4: AdaBoost 예측**\n",
+    "\n",
+    "$\n",
+    "\\hat{y}(\\mathbf{x}) = \\underset{k}{\\operatorname{argmax}}{\\sum\\limits_{\\scriptstyle j=1 \\atop \\scriptstyle \\hat{y}_j(\\mathbf{x}) = k}^{N}{\\alpha_j}} \\quad \\text{여기서 }N\\text{은 예측기 수}\n",
+    "$\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "SFoGMOCsn1Y1"
+   },
+   "source": [
+    "# 8장\n",
+    "\n",
+    "**식 8-1: 주성분 행렬**\n",
+    "\n",
+    "$\n",
+    "\\mathbf{V} =\n",
+    "\\begin{pmatrix}\n",
+    "  \\mid & \\mid & & \\mid \\\\\n",
+    "  \\mathbf{c_1} & \\mathbf{c_2} & \\cdots & \\mathbf{c_n} \\\\\n",
+    "  \\mid & \\mid & & \\mid\n",
+    "\\end{pmatrix}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 8-2: 훈련 세트를 _d_차원으로 투영하기**\n",
+    "\n",
+    "$\n",
+    "\\mathbf{X}_{d\\text{-proj}} = \\mathbf{X} \\cdot \\mathbf{W}_d\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 8-3: 원본의 차원 수로 되돌리는 PCA 역변환**\n",
+    "\n",
+    "$\n",
+    "\\mathbf{X}_{\\text{recovered}} = \\mathbf{X}_{d\\text{-proj}} \\cdot {\\mathbf{W}_d}^T\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 8-4: LLE 단계 1: 선형적인 지역 관계 모델링**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "& \\hat{\\mathbf{W}} = \\underset{\\mathbf{W}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{x}^{(i)} - \\sum\\limits_{j=1}^{m}{w_{i,j}}\\mathbf{x}^{(j)}\\right\\|^2\\\\\n",
+    "& \\text{[조건] }\n",
+    "\\begin{cases}\n",
+    "  w_{i,j}=0 & \\mathbf{x}^{(j)} \\text{가 } \\mathbf{x}^{(i)} \\text{의 최근접 이웃 개 중 하나가 아닐때}\\\\\n",
+    "  \\sum\\limits_{j=1}^{m}w_{i,j} = 1 & i=1, 2, \\dots, m \\text{ 일 때}\n",
+    "\\end{cases}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**290 페이지 본문 중에서**\n",
+    "\n",
+    "[...] $\\mathbf{z}^{(i)}$와 $ \\sum_{j=1}^{m}{\\hat{w}_{i,j}\\mathbf{z}^{(j)}} $ 사이의 거리가 최소화되어야 합니다.\n",
+    "\n",
+    "\n",
+    "**식 8-5: LLE 단계 2: 관계를 보존하는 차원 축소**\n",
+    "\n",
+    "$\n",
+    "\\hat{\\mathbf{Z}} = \\underset{\\mathbf{Z}}{\\operatorname{argmin}}{\\displaystyle \\sum\\limits_{i=1}^{m}} \\left\\|\\mathbf{z}^{(i)} - \\sum\\limits_{j=1}^{m}{\\hat{w}_{i,j}}\\mathbf{z}^{(j)}\\right\\|^2\n",
+    "$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "tzBsxT-tn1Y2"
+   },
+   "source": [
+    "# 9장\n",
+    "\n",
+    "**식 9-1: ReLU 함수**\n",
+    "\n",
+    "$\n",
+    "h_{\\mathbf{w}, b}(\\mathbf{X}) = \\max(\\mathbf{X} \\cdot \\mathbf{w} + b, 0)\n",
+    "$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "PCBRHj6wn1Y3"
+   },
+   "source": [
+    "# 10장\n",
+    "\n",
+    "**식 10-1: 퍼셉트론에서 일반적으로 사용하는 계단 함수**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\operatorname{heaviside}(z) =\n",
+    "\\begin{cases}\n",
+    "0 & z < 0 \\text{ 일 때}\\\\\n",
+    "1 & z \\ge 0 \\text{ 일 때}\n",
+    "\\end{cases} & \\quad\\quad\n",
+    "\\operatorname{sgn}(z) =\n",
+    "\\begin{cases}\n",
+    "-1 & z < 0 \\text{ 일 때}\\\\\n",
+    "0 & z = 0 \\text{ 일 때}\\\\\n",
+    "+1 & z > 0 \\text{ 일 때}\n",
+    "\\end{cases}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 10-2: 퍼셉트론 학습 규칙(가중치 업데이트)**\n",
+    "\n",
+    "$\n",
+    "{w_{i,j}}^{(\\text{다음 스텝})}\\quad = w_{i,j} + \\eta (y_j - \\hat{y}_j) x_i\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**342 페이지 본문 중에서**\n",
+    "\n",
+    "이 행렬은 표준편차가 $ 2 / \\sqrt{\\text{n}_\\text{inputs} + \\text{n}_\\text{n_neurons}} $인 절단 정규(가우시안) 분포를 사용해 무작위로 초기화됩니다.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "lZkG8wkrn1Y3"
+   },
+   "source": [
+    "# 11장\n",
+    "**식 11-1: 세이비어 초기화 (로지스틱 활성화 함수를 사용했을 때)**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "& \\text{평균이 0이고 표준 편차 }\n",
+    "\\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} \\text{ 인 정규분포}\\\\\n",
+    "& \\text{또는 }\n",
+    "r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} \\text{ 일 때 } -r \\text{ 과 } +r \\text{ 사이의 균등분포}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**356 페이지 본문 중에서**\n",
+    "\n",
+    "입력의 연결 개수가 대략 출력의 연결 개수와 비슷하면 더 간단한 공식을 사용합니다(예를 들면, $ \\sigma = 1 / \\sqrt{n_\\text{inputs}} $ or $ r = \\sqrt{3} / \\sqrt{n_\\text{inputs}} $).\n",
+    "\n",
+    "**표 11-1: 활성화 함수 종류에 따른 초기화 매개변수**\n",
+    "\n",
+    "* 로지스틱 균등분포: $ r = \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
+    "* 로지스틱 정규분포: $ \\sigma = \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
+    "* 하이퍼볼릭 탄젠트 균등분포: $ r = 4 \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
+    "* 하이퍼볼릭 탄젠트 정규분포: $ \\sigma = 4 \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
+    "* ReLU와 그 변종들 균등분포: $ r = \\sqrt{2} \\sqrt{\\dfrac{6}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
+    "* ReLU와 그 변종들 정규분포: $ \\sigma = \\sqrt{2} \\sqrt{\\dfrac{2}{n_\\text{inputs} + n_\\text{outputs}}} $\n",
+    "\n",
+    "**식 11-2: ELU 활성화 함수**\n",
+    "\n",
+    "$\n",
+    "\\operatorname{ELU}_\\alpha(z) =\n",
+    "\\begin{cases}\n",
+    "\\alpha(\\exp(z) - 1) & z < 0 \\text{ 일 때}\\\\\n",
+    "z & z \\ge 0 \\text{ 일 때}\n",
+    "\\end{cases}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**Equation 11-3: 배치 정규화 알고리즘**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "1.\\quad & \\mathbf{\\mu}_B = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{\\mathbf{x}^{(i)}}\\\\\n",
+    "2.\\quad & {\\mathbf{\\sigma}_B}^2 = \\dfrac{1}{m_B}\\sum\\limits_{i=1}^{m_B}{(\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B)^2}\\\\\n",
+    "3.\\quad & \\hat{\\mathbf{x}}^{(i)} = \\dfrac{\\mathbf{x}^{(i)} - \\mathbf{\\mu}_B}{\\sqrt{{\\mathbf{\\sigma}_B}^2 + \\epsilon}}\\\\\n",
+    "4.\\quad & \\mathbf{z}^{(i)} = \\gamma \\hat{\\mathbf{x}}^{(i)} + \\beta\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**364 페이지 본문 중에서**\n",
+    "\n",
+    "[...] 새로운 값 $v$가 주어지면 이동 평균 $\\hat{v}$ 은 다음 식을 통해 갱신됩니다:\n",
+    "\n",
+    "$ \\hat{v} \\gets \\hat{v} \\times \\text{momentum} + v \\times (1 - \\text{momentum}) $\n",
+    "\n",
+    "**식 11-4: 모멘텀 알고리즘**\n",
+    "\n",
+    "1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
+    "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
+    "\n",
+    "**377 페이지에서**\n",
+    "\n",
+    "그래디언트가 일정하다면 종단속도(즉, 가중치를 갱신하는 최대 크기)는 학습률 $ \\eta $ 를 곱한 그래디언트에 $ \\frac{1}{1 - \\beta} $을 곱한 것과 같음을 쉽게 확인할 수 있습니다.\n",
+    "\n",
+    "**식 11-5: 네스테로프 가속 경사 알고리즘**\n",
+    "\n",
+    "1. $\\mathbf{m} \\gets \\beta \\mathbf{m} - \\eta \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta} + \\beta \\mathbf{m})$\n",
+    "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\mathbf{m}$\n",
+    "\n",
+    "**식 11-6: AdaGrad 알고리즘**\n",
+    "\n",
+    "1. $\\mathbf{s} \\gets \\mathbf{s} + \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
+    "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
+    "\n",
+    "**381 페이지 본문 중에서**\n",
+    "\n",
+    "이 벡터 형식의 계산은 벡터 $\\mathbf{s}$의 각 원소 $s_i$마다 $s_i \\gets s_i + \\left( \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\right)^2$ 을 계산하는 것과 동일합니다.\n",
+    "\n",
+    "**381 페이지 본문 중에서**\n",
+    "\n",
+    "이 벡터 형식의 계산은 모든 파라미터 $\\theta_i$에 대해 (동시에) $ \\theta_i \\gets \\theta_i - \\eta \\, \\dfrac{\\partial J(\\mathbf{\\theta})}{\\partial \\theta_i} \\dfrac{1}{\\sqrt{s_i + \\epsilon}} $ 을 계산하는 것과 동일합니다.\n",
+    "\n",
+    "**식 11-7: RMSProp 알고리즘**\n",
+    "\n",
+    "1. $\\mathbf{s} \\gets \\beta \\mathbf{s} + (1 - \\beta ) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
+    "2. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} - \\eta \\, \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
+    "\n",
+    "\n",
+    "**식 11-8: Adam 알고리즘**\n",
+    "\n",
+    "1. $\\mathbf{m} \\gets \\beta_1 \\mathbf{m} - (1 - \\beta_1) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
+    "2. $\\mathbf{s} \\gets \\beta_2 \\mathbf{s} + (1 - \\beta_2) \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta}) \\otimes \\nabla_\\mathbf{\\theta}J(\\mathbf{\\theta})$\n",
+    "3. $\\mathbf{m} \\gets \\left(\\dfrac{\\mathbf{m}}{1 - {\\beta_1}^T}\\right)$\n",
+    "4. $\\mathbf{s} \\gets \\left(\\dfrac{\\mathbf{s}}{1 - {\\beta_2}^T}\\right)$\n",
+    "5. $\\mathbf{\\theta} \\gets \\mathbf{\\theta} + \\eta \\, \\mathbf{m} \\oslash {\\sqrt{\\mathbf{s} + \\epsilon}}$\n",
+    "\n",
+    "**393 페이지 본문 중에서**\n",
+    "\n",
+    "일반적으로 매 훈련 스텝이 끝나고 $\\left\\| \\mathbf{w} \\right\\|_2$ 를 계산한 다음 $\\mathbf{w}$를 클리핑 $ \\left( \\mathbf{w} \\gets \\mathbf{w} \\dfrac{r}{\\left\\| \\mathbf{w} \\right\\|_2} \\right)$ 합니다."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "cFSDXAOzn1Y4"
+   },
+   "source": [
+    "# 13장\n",
+    "\n",
+    "**식 13-1: 합성곱층에 있는 뉴런의 출력 계산**\n",
+    "\n",
+    "$\n",
+    "z_{i,j,k} = b_k + \\sum\\limits_{u = 0}^{f_h - 1} \\, \\, \\sum\\limits_{v = 0}^{f_w - 1} \\, \\, \\sum\\limits_{k' = 0}^{f_{n'} - 1} \\, \\, x_{i', j', k'} . w_{u, v, k', k}\n",
+    "\\quad \\text{여기서 }\n",
+    "\\begin{cases}\n",
+    "i' = i \\times s_h + u \\\\\n",
+    "j' = j \\times s_w + v\n",
+    "\\end{cases}\n",
+    "$\n",
+    "\n",
+    "**식 13-2: LRN**\n",
+    "\n",
+    "$\n",
+    "b_i = a_i  \\left(k + \\alpha \\sum\\limits_{j=j_\\text{low}}^{j_\\text{high}}{{a_j}^2} \\right)^{-\\beta} \\quad \\text{여기서 }\n",
+    "\\begin{cases}\n",
+    "  j_\\text{high} = \\min\\left(i + \\dfrac{r}{2}, f_n-1\\right) \\\\\n",
+    "  j_\\text{low} = \\max\\left(0, i - \\dfrac{r}{2}\\right)\n",
+    "\\end{cases}\n",
+    "$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "1SiAf4FTn1Y4"
+   },
+   "source": [
+    "# 14장\n",
+    "\n",
+    "**식 14-1: 하나의 샘플에 대한 순환 층의 출력**\n",
+    "\n",
+    "$\n",
+    "\\mathbf{y}_{(t)} = \\phi\\left({{\\mathbf{x}_{(t)}}^T \\cdot \\mathbf{w}_x} + {\\mathbf{y}_{(t-1)}}^T \\cdot {\\mathbf{w}_y} + b \\right)\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 14-2: 미니배치에 있는 전체 샘플에 대한 순환 뉴런 층의 출력**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\mathbf{Y}_{(t)} & = \\phi\\left(\\mathbf{X}_{(t)} \\cdot \\mathbf{W}_{x} + \\mathbf{Y}_{(t-1)}\\cdot  \\mathbf{W}_{y} + \\mathbf{b} \\right) \\\\\n",
+    "& = \\phi\\left(\n",
+    "\\left[\\mathbf{X}_{(t)} \\quad \\mathbf{Y}_{(t-1)} \\right]\n",
+    " \\cdot \\mathbf{W} + \\mathbf{b} \\right) \\quad \\text{ 여기서 } \\mathbf{W}=\n",
+    "\\left[ \\begin{matrix}\n",
+    "  \\mathbf{W}_x\\\\\n",
+    "  \\mathbf{W}_y\n",
+    "\\end{matrix} \\right]\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**494 페이지 본문 중에서**\n",
+    "\n",
+    "그러면 비용 함수 $ C(\\mathbf{Y}_{(t_\\text{min})}, \\mathbf{Y}_{(t_\\text{min}+1)}, \\dots, \\mathbf{Y}_{(t_\\text{max})}) $ 를 사용하여 출력 시퀀스가 평가됩니다($t_\\text{min}$과 $t_\\text{max}$는 첫 번째와 마지막 출력 타임 스텝이며 무시된 출력은 카운팅하지 않습니다).\n",
+    "\n",
+    "**식 14-3: LSTM 계산**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\mathbf{i}_{(t)}&=\\sigma({\\mathbf{W}_{xi}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hi}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_i)\\\\\n",
+    "\\mathbf{f}_{(t)}&=\\sigma({\\mathbf{W}_{xf}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hf}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_f)\\\\\n",
+    "\\mathbf{o}_{(t)}&=\\sigma({\\mathbf{W}_{xo}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{ho}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_o)\\\\\n",
+    "\\mathbf{g}_{(t)}&=\\operatorname{tanh}({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot \\mathbf{h}_{(t-1)} + \\mathbf{b}_g)\\\\\n",
+    "\\mathbf{c}_{(t)}&=\\mathbf{f}_{(t)} \\otimes \\mathbf{c}_{(t-1)} \\, + \\, \\mathbf{i}_{(t)} \\otimes \\mathbf{g}_{(t)}\\\\\n",
+    "\\mathbf{y}_{(t)}&=\\mathbf{h}_{(t)} = \\mathbf{o}_{(t)} \\otimes \\operatorname{tanh}(\\mathbf{c}_{(t)})\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 14-4: GRU 계산**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\mathbf{z}_{(t)}&=\\sigma({\\mathbf{W}_{xz}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hz}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
+    "\\mathbf{r}_{(t)}&=\\sigma({\\mathbf{W}_{xr}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hr}}^T \\cdot \\mathbf{h}_{(t-1)}) \\\\\n",
+    "\\mathbf{g}_{(t)}&=\\operatorname{tanh}\\left({\\mathbf{W}_{xg}}^T \\cdot \\mathbf{x}_{(t)} + {\\mathbf{W}_{hg}}^T \\cdot (\\mathbf{r}_{(t)} \\otimes \\mathbf{h}_{(t-1)})\\right) \\\\\n",
+    "\\mathbf{h}_{(t)}&=(1-\\mathbf{z}_{(t)}) \\otimes \\mathbf{h}_{(t-1)} + \\mathbf{z}_{(t)} \\otimes \\mathbf{g}_{(t)}\n",
+    "\\end{split}\n",
+    "$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "5IiIkIG_n1Y5"
+   },
+   "source": [
+    "# 15장\n",
+    "\n",
+    "**식 15-1: 쿨백 라이블러 발산**\n",
+    "\n",
+    "$\n",
+    "D_{\\mathrm{KL}}(P\\|Q) = \\sum\\limits_{i} P(i) \\log \\dfrac{P(i)}{Q(i)}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 15-2: 목표 희소 정도 $p$ 와 실제 희소 정도 $q$ 사이의 KL 발산**\n",
+    "\n",
+    "$\n",
+    "D_{\\mathrm{KL}}(p\\|q) = p \\, \\log \\dfrac{p}{q} + (1-p) \\log \\dfrac{1-p}{1-q}\n",
+    "$\n",
+    "\n",
+    "**544 페이지 본문 중에서**\n",
+    "\n",
+    "자주 사용하는 변형은 $\\sigma$ 가 아니라 $\\gamma = \\log\\left(\\sigma^2\\right)$ 을 출력하도록 인코더를 훈련시키는 것입니다. $\\sigma$ 는 $ \\sigma = \\exp\\left(\\dfrac{\\gamma}{2}\\right) $ 로 쉽게 계산할 수 있습니다."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "wVr9eBb1n1Y6"
+   },
+   "source": [
+    "# 16장\n",
+    "\n",
+    "**식 16-1: 벨만 최적 방정식**\n",
+    "\n",
+    "$\n",
+    "V^*(s) = \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V^*(s')]} \\quad \\text{for all }s\n",
+    "$\n",
+    "\n",
+    "**식 16-2: 가치 반복 알고리즘**\n",
+    "\n",
+    "$\n",
+    "  V_{k+1}(s) \\gets \\underset{a}{\\max}\\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . V_k(s')]} \\quad \\text{for all }s\n",
+    "$\n",
+    "\n",
+    "**식 16-3: Q-가치 반복 알고리즘**\n",
+    "\n",
+    "$\n",
+    "  Q_{k+1}(s, a) \\gets \\sum\\limits_{s'}{T(s, a, s') [R(s, a, s') + \\gamma . \\underset{a'}{\\max}\\,{Q_k(s',a')}]} \\quad \\text{for all } (s,a)\n",
+    "$\n",
+    "\n",
+    "**574 페이지 본문 중에서**\n",
+    "\n",
+    "최적의 Q-가치를 구하면 최적의 정책인 $\\pi^{*}(s)$ 를 정의하는 것은 간단합니다. 즉, 에이전트가 상태 $s$ 에 도달했을 때 가장 높은 Q-값을 가진 행동을 선택하면 됩니다.\n",
+    "\n",
+    "$ \\pi^{*}(s) = \\underset{a}{\\operatorname{argmax}} \\, Q^*(s, a) $\n",
+    "\n",
+    "**식 16-4: TD 학습 알고리즘**\n",
+    "\n",
+    "$\n",
+    "V_{k+1}(s) \\gets (1-\\alpha)V_k(s) + \\alpha\\left(r + \\gamma . V_k(s')\\right)\n",
+    "$\n",
+    "\n",
+    "**식 16-5: Q-러닝 알고리즘**\n",
+    "\n",
+    "$\n",
+    "Q_{k+1}(s, a) \\gets (1-\\alpha)Q_k(s,a) + \\alpha\\left(r + \\gamma . \\underset{a'}{\\max} \\, Q_k(s', a')\\right)\n",
+    "$\n",
+    "\n",
+    "**식 16-6: 탐험 함수를 사용한 Q-러닝**\n",
+    "\n",
+    "$\n",
+    "  Q(s, a) \\gets (1-\\alpha)Q(s,a) + \\alpha\\left(r + \\gamma . \\underset{\\alpha'}{\\max}f(Q(s', a'), N(s', a'))\\right)\n",
+    "$\n",
+    "\n",
+    "**식 16-7: 타깃 Q-가치**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "y(s, a) = r + \\gamma . \\underset{a'}{\\max}Q_{\\theta}(s', a')\n",
+    "\\end{split}\n",
+    "$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "1U0nBdBvn1Y7"
+   },
+   "source": [
+    "# 부록 A\n",
+    "\n",
+    "본문 중에서:\n",
+    "\n",
+    "$\n",
+    "\\mathbf{H} =\n",
+    "\\begin{pmatrix}\n",
+    "\\mathbf{H'} & 0 & \\cdots\\\\\n",
+    "0 & 0 & \\\\\n",
+    "\\vdots & & \\ddots\n",
+    "\\end{pmatrix}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "$\n",
+    "\\mathbf{A} =\n",
+    "\\begin{pmatrix}\n",
+    "\\mathbf{A'} & \\mathbf{I}_m \\\\\n",
+    "\\mathbf{0} & -\\mathbf{I}_m\n",
+    "\\end{pmatrix}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "$ 1 - \\frac{1}{5}^2 - \\frac{4}{5}^2 $\n",
+    "\n",
+    "\n",
+    "$ 1 - \\frac{1}{2}^2 - \\frac{1}{2}^2  $\n",
+    "\n",
+    "\n",
+    "$ \\frac{2}{5} \\times $\n",
+    "\n",
+    "\n",
+    "$ \\frac{3}{5} \\times 0 $"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "dphwGCobn1Y7"
+   },
+   "source": [
+    "# 부록 C"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "vH7f8-Min1Y7"
+   },
+   "source": [
+    "본문 중에서:\n",
+    "\n",
+    "$ (\\hat{x}, \\hat{y}) $\n",
+    "\n",
+    "\n",
+    "$ \\hat{\\alpha} $\n",
+    "\n",
+    "\n",
+    "$ (\\hat{x}, \\hat{y}, \\hat{\\alpha}) $\n",
+    "\n",
+    "\n",
+    "$\n",
+    "\\begin{cases}\n",
+    "\\frac{\\partial}{\\partial x}g(x, y, \\alpha) = 2x - 3\\alpha\\\\\n",
+    "\\frac{\\partial}{\\partial y}g(x, y, \\alpha) = 2 - 2\\alpha\\\\\n",
+    "\\frac{\\partial}{\\partial \\alpha}g(x, y, \\alpha) = -3x - 2y - 1\\\\\n",
+    "\\end{cases}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "$ 2\\hat{x} - 3\\hat{\\alpha} = 2 - 2\\hat{\\alpha} = -3\\hat{x} - 2\\hat{y} - 1 = 0 $\n",
+    "\n",
+    "\n",
+    "$ \\hat{x} = \\frac{3}{2} $\n",
+    "\n",
+    "\n",
+    "$ \\hat{y} = -\\frac{11}{4} $\n",
+    "\n",
+    "\n",
+    "$ \\hat{\\alpha} = 1 $\n",
+    "\n",
+    "\n",
+    "**식 C-1: 하드 마진 문제를 위한 일반화된 라그랑주 함수**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\frac{1}{2}\\mathbf{w}^T \\cdot \\mathbf{w} - \\sum\\limits_{i=1}^{m}{\\alpha^{(i)} \\left(t^{(i)}(\\mathbf{w}^T \\cdot \\mathbf{x}^{(i)} + b) - 1\\right)} \\\\\n",
+    "\\text{여기서 } \\alpha^{(i)} \\ge 0 \\quad i = 1, 2, \\dots, m \\text{ 에 대해}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**본문 중에서:**\n",
+    "\n",
+    "$ (\\hat{\\mathbf{w}}, \\hat{b}, \\hat{\\mathbf{\\alpha}}) $\n",
+    "\n",
+    "\n",
+    "$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) \\ge 1 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
+    "\n",
+    "\n",
+    "$ {\\hat{\\alpha}}^{(i)} \\ge 0 \\quad \\text{for } i = 1, 2, \\dots, m $\n",
+    "\n",
+    "\n",
+    "$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
+    "\n",
+    "\n",
+    "$ t^{(i)}((\\hat{\\mathbf{w}})^T \\cdot \\mathbf{x}^{(i)} + \\hat{b}) = 1 $\n",
+    "\n",
+    "\n",
+    "$ {\\hat{\\alpha}}^{(i)} = 0 $\n",
+    "\n",
+    "\n",
+    "**식 C-2: 일반화된 라그랑주 함수의 편도함수**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\nabla_{\\mathbf{w}}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = \\mathbf{w} - \\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
+    "\\dfrac{\\partial}{\\partial b}\\mathcal{L}(\\mathbf{w}, b, \\mathbf{\\alpha}) = -\\sum\\limits_{i=1}^{m}\\alpha^{(i)}t^{(i)}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 C-3: 정류점의 속성**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\hat{\\mathbf{w}} = \\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)}\\mathbf{x}^{(i)}\\\\\n",
+    "\\sum_{i=1}^{m}{\\hat{\\alpha}}^{(i)}t^{(i)} = 0\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 C-4: SVM 문제의 쌍대 형식**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\mathcal{L}(\\hat{\\mathbf{w}}, \\hat{b}, \\mathbf{\\alpha}) = \\dfrac{1}{2}\\sum\\limits_{i=1}^{m}{\n",
+    "  \\sum\\limits_{j=1}^{m}{\n",
+    "  \\alpha^{(i)} \\alpha^{(j)} t^{(i)} t^{(j)} {\\mathbf{x}^{(i)}}^T \\cdot \\mathbf{x}^{(j)}\n",
+    "  }\n",
+    "} \\quad - \\quad \\sum\\limits_{i=1}^{m}{\\alpha^{(i)}}\\\\\n",
+    "\\text{여기서 } \\alpha^{(i)} \\ge 0 \\quad i = 1, 2, \\dots, m \\text{ 일 때}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**본문 중에서:**\n",
+    "\n",
+    "$ \\hat{\\mathbf{\\alpha}} $\n",
+    "\n",
+    "\n",
+    "$ {\\hat{\\alpha}}^{(i)} \\ge 0 $\n",
+    "\n",
+    "\n",
+    "$ \\hat{\\mathbf{\\alpha}} $\n",
+    "\n",
+    "\n",
+    "$ \\hat{\\mathbf{w}} $\n",
+    "\n",
+    "\n",
+    "$ \\hat{b} $\n",
+    "\n",
+    "\n",
+    "$ \\hat{b} = 1 - t^{(k)}({\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(k)}) $\n",
+    "\n",
+    "\n",
+    "**식 C-5: 쌍대 형식을 사용한 편향 추정**\n",
+    "\n",
+    "$\n",
+    "\\hat{b} = \\dfrac{1}{n_s}\\sum\\limits_{\\scriptstyle i=1 \\atop {\\scriptstyle {\\hat{\\alpha}}^{(i)} > 0}}^{m}{\\left[t^{(i)} - {\\hat{\\mathbf{w}}}^T \\cdot \\mathbf{x}^{(i)}\\right]}\n",
+    "$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "r57A4yyKn1Y8"
+   },
+   "source": [
+    "# 부록 D"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "KwAsMn0Rn1Y9"
+   },
+   "source": [
+    "**식 D-1: $f(x,y)의 편도함수$**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "\\dfrac{\\partial f}{\\partial x} & = \\dfrac{\\partial(x^2y)}{\\partial x} + \\dfrac{\\partial y}{\\partial x} + \\dfrac{\\partial 2}{\\partial x} = y \\dfrac{\\partial(x^2)}{\\partial x} + 0 + 0 = 2xy \\\\\n",
+    "\\dfrac{\\partial f}{\\partial y} & = \\dfrac{\\partial(x^2y)}{\\partial y} + \\dfrac{\\partial y}{\\partial y} + \\dfrac{\\partial 2}{\\partial y} = x^2 + 1 + 0 = x^2 + 1 \\\\\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**본문 중에서:**\n",
+    "\n",
+    "$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1) = y $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial x}{\\partial x} = 1 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial y}{\\partial x} = 0 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial (u \\times v)}{\\partial x} = \\frac{\\partial v}{\\partial x} \\times u + \\frac{\\partial u}{\\partial x} \\times u  $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial g}{\\partial x} = 0 + (0 \\times x + y \\times 1)  $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial g}{\\partial x} = y $\n",
+    "\n",
+    "\n",
+    "**식 D-2: 포인트 $x_0$에서 함수 $h(x)$의 도함수**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "h'(x) & = \\underset{\\textstyle x \\to x_0}{\\lim}\\dfrac{h(x) - h(x_0)}{x - x_0}\\\\\n",
+    "      & = \\underset{\\textstyle \\epsilon \\to 0}{\\lim}\\dfrac{h(x_0 + \\epsilon) - h(x_0)}{\\epsilon}\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "\n",
+    "**식 D-3: 이원수의 연산**\n",
+    "\n",
+    "$\n",
+    "\\begin{split}\n",
+    "&\\lambda(a + b\\epsilon) = \\lambda a + \\lambda b \\epsilon\\\\\n",
+    "&(a + b\\epsilon) + (c + d\\epsilon) = (a + c) + (b + d)\\epsilon \\\\\n",
+    "&(a + b\\epsilon) \\times (c + d\\epsilon) = ac + (ad + bc)\\epsilon + (bd)\\epsilon^2 = ac + (ad + bc)\\epsilon\\\\\n",
+    "\\end{split}\n",
+    "$\n",
+    "\n",
+    "**본문 중에서:**\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial x}(3, 4) $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial y}(3, 4) $\n",
+    "\n",
+    "\n",
+    "**식 D-4: 연쇄 규칙**\n",
+    "\n",
+    "$\n",
+    "\\dfrac{\\partial f}{\\partial x} = \\dfrac{\\partial f}{\\partial n_i} \\times \\dfrac{\\partial n_i}{\\partial x}\n",
+    "$\n",
+    "\n",
+    "**본문 중에서:**\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial n_5} = \\frac{\\partial f}{\\partial n_7} \\times \\frac{\\partial n_7}{\\partial n_5} $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial n_7} = 1 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial n_7}{\\partial n_5} $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial n_7}{\\partial n_5} = 1 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial n_5} = 1 \\times 1 = 1 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial n_4} = \\frac{\\partial f}{\\partial n_5} \\times \\frac{\\partial n_5}{\\partial n_4} $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial n_5}{\\partial n_4} = n_2 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial n_4} = 1 \\times n_2 = 4 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial x} = 24 $\n",
+    "\n",
+    "\n",
+    "$ \\frac{\\partial f}{\\partial y} = 10 $"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "Nn7LHmpCn1Y-"
+   },
+   "source": [
+    "# 부록 E"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "XOtBQA4-n1Y-"
+   },
+   "source": [
+    "**식 E-1: $i$번째 뉴런이 1을 출력할 확률**\n",
+    "\n",
+    "$\n",
+    "p\\left(s_i^{(\\text{다음 스텝})}\\quad = 1\\right) \\, = \\, \\sigma\\left(\\frac{\\textstyle \\sum\\limits_{j = 1}^N{w_{i,j}s_j + b_i}}{\\textstyle T}\\right)\n",
+    "$\n",
+    "\n",
+    "**본문 중에서:**\n",
+    "\n",
+    "$ \\mathbf{x}' $\n",
+    "\n",
+    "\n",
+    "$ \\mathbf{h}' $\n",
+    "\n",
+    "\n",
+    "**식 E-2: CD 가중치 업데이트**\n",
+    "\n",
+    "$\n",
+    "w_{i,j} \\gets w_{i,j} + \\eta(\\mathbf{x} \\cdot \\mathbf{h}^T - \\mathbf{x}' \\cdot \\mathbf{h}'^T)\n",
+    "$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "el71w1NKn1Y-"
+   },
+   "source": [
+    "# 용어\n",
+    "\n",
+    "본문에서:\n",
+    "\n",
+    "$\\ell _1$\n",
+    "\n",
+    "\n",
+    "$\\ell _2$\n",
+    "\n",
+    "\n",
+    "$\\ell _k$\n",
+    "\n",
+    "\n",
+    "$ \\chi^2 $\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "2Bo7NhHLn1ZA"
+   },
+   "source": [
+    "이 공식들 때문에 눈이 아프다면 가장 아름다운 하나의 공식으로 마치도록 하죠. $E = mc²$가 아닙니다. 바로 오일러의 등식(Euler's identity)이죠:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "s31L5v1mn1ZB"
+   },
+   "source": [
+    "$e^{i\\pi}+1=0$"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "name": "book_equations.ipynb",
+   "provenance": [],
+   "version": "0.3.2"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/environment.yml b/environment.yml
index 0b432d8..53b5a29 100644
--- a/environment.yml
+++ b/environment.yml
@@ -8,7 +8,8 @@ dependencies:
   - pillow
   - nltk
   - pip:
-    - tensorflow-gpu
+    - tensorflow
+    - gym[all]
     - graphviz
     - watermark
     - urlextract