Files
hands-on/tools_pandas.ipynb
2016-02-20 21:37:07 +01:00

6935 lines
224 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tools - pandas\n",
"*The `pandas` library provides high-performance, easy-to-use data structures and data analysis tools. The main data structure is the `DataFrame`, which you can think of as a spreadsheet (including column names and row labels).*\n",
"\n",
"**Prerequisites:**\n",
"* NumPy if you are not familiar with NumPy, we recommend that you go through the [NumPy tutorial](tools_numpy.ipynb) now.\n",
"\n",
"## Setup\n",
"First, let's make sure this notebook works well in both python 2 and 3:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from __future__ import division\n",
"from __future__ import print_function\n",
"from __future__ import unicode_literals"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's import `pandas`. People usually import it as `pd`:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `Series` objects\n",
"The `pandas` library contains these useful data structures:\n",
"* `Series` objects, that we will discuss now. A `Series` object is similar to a column in a spreadsheet (with a column name and row labels).\n",
"* `DataFrame` objects. You can see this as a full spreadsheet (with column names and row labels).\n",
"* `Panel` objects. You can see a `Panel` a a dictionary of `DataFrame`s (less used). These are less used, so we will not discuss them here.\n",
"\n",
"### Creating a `Series`\n",
"Let's start by creating our first `Series` object!"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 2\n",
"1 -1\n",
"2 3\n",
"3 5\n",
"dtype: int64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = pd.Series([2,-1,3,5])\n",
"s"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Similar to a 1D `ndarray`\n",
"`Series` objects behave much like one-dimensional NumPy `ndarray`s, and you can often pass them as parameters to NumPy functions:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 7.389056\n",
"1 0.367879\n",
"2 20.085537\n",
"3 148.413159\n",
"dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"np.exp(s)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Arithmetic operations on `Series` are also possible, and they apply *elementwise*, just like for `ndarray`s:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1002\n",
"1 1999\n",
"2 3003\n",
"3 4005\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s + pd.Series([1000,2000,3000,4000])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar to NumPy, if you add a single number to a `Series`, that number is added to all items in the `Series`:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1002\n",
"1 999\n",
"2 1003\n",
"3 1005\n",
"dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s + 1000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The same is true for all binary operations such as `*` or `/`, and even conditional operations:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 True\n",
"2 False\n",
"3 False\n",
"dtype: bool"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s < 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Index labels\n",
"Each item in a `Series` object has a unique identifier called the *index label*. By default, it is simply the index of the item in the `Series` but you can also set the index labels manually:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"alice 68\n",
"bob 83\n",
"charles 112\n",
"darwin 68\n",
"dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s2 = pd.Series([68, 83, 112, 68], index=[\"alice\", \"bob\", \"charles\", \"darwin\"])\n",
"s2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can then use the `Series` just like a `dict`:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"83"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s2[\"bob\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can still access the items by location, like in a regular array:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"83"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s2[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Slicing a `Series` also slices the index labels:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"bob 83\n",
"charles 112\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s2[1:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This can lead to unexpected results when using the default labels, so be careful:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1000\n",
"1 1001\n",
"2 1002\n",
"3 1003\n",
"dtype: int64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"surprise = pd.Series([1000, 1001, 1002, 1003])\n",
"surprise"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2 1002\n",
"3 1003\n",
"dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"surprise_slice = surprise[2:]\n",
"surprise_slice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oh look! The first element has index label `2`. The element with index label `0` is absent from the slice:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Key error: 0\n"
]
}
],
"source": [
"try:\n",
" surprise_slice[0]\n",
"except KeyError as e:\n",
" print(\"Key error:\", e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But you can access elements by location using the `iloc` attribute:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1002"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"surprise_slice.iloc[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Init from `dict`\n",
"You can create a `Series` object from a `dict`. The keys will be used as index labels:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"alice 68\n",
"bob 83\n",
"colin 86\n",
"darwin 68\n",
"dtype: int64"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weights = {\"alice\": 68, \"bob\": 83, \"colin\": 86, \"darwin\": 68}\n",
"s3 = pd.Series(weights)\n",
"s3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can control which elements you want to include in the `Series` and in what order by passing a second argument to the constructor with the list of desired index labels:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"colin 86\n",
"alice 68\n",
"dtype: int64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s4 = pd.Series(weights, [\"colin\", \"alice\"])\n",
"s4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Automatic alignment\n",
"When an operation involves multiple `Series` objects, `pandas` automatically aligns items by matching index labels."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index([u'alice', u'bob', u'charles', u'darwin'], dtype='object')\n",
"Index([u'alice', u'bob', u'colin', u'darwin'], dtype='object')\n"
]
},
{
"data": {
"text/plain": [
"alice 136\n",
"bob 166\n",
"charles NaN\n",
"colin NaN\n",
"darwin 136\n",
"dtype: float64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(s2.keys())\n",
"print(s3.keys())\n",
"s2 + s3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The resulting `Series` contains the union of index labels from `s2` and `s3`. Since `\"colin\"` is missing from `s2` and `\"charles\"` is missing from `s3`, these items have a `NaN` result value (ie. Not-a-Number means *missing*).\n",
"\n",
"Automatic alignment is very handy when working with data that may come from various sources with varying structure and missing items. But if you forget to set the right index labels, you can have surprising results:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"s2 = [ 68 83 112 68]\n",
"s5 = [1000 1000 1000 1000]\n",
"s2 + s5 =\n"
]
},
{
"data": {
"text/plain": [
"0 NaN\n",
"1 NaN\n",
"2 NaN\n",
"3 NaN\n",
"alice NaN\n",
"bob NaN\n",
"charles NaN\n",
"darwin NaN\n",
"dtype: float64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s5 = pd.Series([1000,1000,1000,1000])\n",
"print(\"s2 =\", s2.values)\n",
"print(\"s5 =\", s5.values)\n",
"print(\"s2 + s5 =\")\n",
"s2 + s5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Init with a scalar\n",
"You can also initialize a `Series` object using a scalar and a list of index labels: all items will be set to the scalar."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"life 42\n",
"universe 42\n",
"everything 42\n",
"dtype: int64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"meaning = pd.Series(42, [\"life\", \"universe\", \"everything\"])\n",
"meaning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `Series` name\n",
"A `Series` can have a `name`:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"bob 83\n",
"alice 68\n",
"Name: weights, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s6 = pd.Series([83, 68], index=[\"bob\", \"alice\"], name=\"weights\")\n",
"s6"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plotting a `Series`\n",
"Pandas makes it easy to plot `Series` data using matplotlib (for more details on matplotlib, check out the [matplotlib tutorial](tools_matplotlib.ipynb)). Just import matplotlib and call the `plot` method:"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAEACAYAAAB4ayemAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH+BJREFUeJzt3XmUVNW5/vHvS8TgSBAHRBRBxUQjEZaKUdQSrgPOGgfi\nBERJ/DlHYxxzwZssTBxpjTEakQiRJEq8gVxNAIciKhqHYEOUC46gaDAIEbmI0PT7+2NXY9N0d42n\nTtWp57MWy+rqU+fsEvrtXfvsZ29zd0REpDp0iLsBIiKSOxVtEZEqoqItIlJFVLRFRKqIiraISBVR\n0RYRqSJZi7aZ9TCzp8zsNTOba2aXZp7vYmbTzWy+mU0zs87RN1dEpLZZtnnaZtYN6Obur5rZlsAr\nwInACOBjd7/ZzK4Gurj7NZG3WESkhmXtabv7P9391czjlcA8oAehcD+YOexB4KSoGikiIkHWnvYG\nB5vtCqSBrwPvuXuXZt9b5u7blLh9IiLSTM43IjNDI5OByzI97pbVXnl4EZGIbZLLQWa2CaFgT3T3\nKZmnl5jZDu6+JDPu/VEbr1UxFxEpgLtby+dy7Wk/ALzu7nXNnpsKDM88HgZMafmiZheuqT+jRo2K\nvQ16z3rPer/V/Z7bkrWnbWYHA2cBc81sNmEY5DrgZ8DDZvYdYCFwerZziYhIcbIWbXd/DvhSG9/+\nj9I2R0RE2qNEZARSqVTcTSg7vefkq7X3C5X5nvOa8lfQBcw86muIiCSNmeFF3IgUEZEKoKItIlJF\nVLRFRKqIiraISBVR0RYRqSIq2iIiVURFW0Skiqhoi4i0Yvp0GDIEKi1monCNiEgLS5ZAv36wejU8\n+WR4XG5thWtUtEVEmmlshGOOgf33D73sVavg9tvL3w4VbRGRHNx2G/zhD/DXv8Jbb8Fhh8H778Mm\nOe0+UDqKsYuIZPHyy/Czn8GkSaFI77kn7LILzJgRd8u+oKItIgJ8+il8+9tw992w665fPH/uuTBx\nYmzN2oiGR0RECMW5Uye4774Nn1+6FHbbLQyRbLVV+dqj4RERkTZMnBiGRsaO3fh7224bxrX/8Ify\nt6s1KtoiUtPefBOuuAJ++1vYfPPWjznnnMoZItHwiIjUrDVr4OCDYdgwuPjito9bvRq6d4c5c6BH\nj/K0reDhETMbZ2ZLzGxOs+f2N7MXzWx25r/7lbrBIiJRu/76UIwvuqj94zp1gm99Cx56qDztak8u\nwyPjgaNaPHczcIO79wNGAbeUumEiIlGaNg1+9zt44AGwjfqzG2saIol74CBr0Xb3Z4HlLZ7+EOic\nefwVYHGJ2yUiEpklS2DEiFCEu3bN7TUDB8LKlfDqq9G2LZtCMz7XAM+Z2W2AAQeVrkkiItFpbAxj\n2OedB/lstt6hA5x9dij0caxFsr4dBb5uHHCJu+8CfB94oHRNEhGJzh13wIoVMGpU/q8955yQlmxo\nKH27clVoT3uAux8B4O6TzWxcewePHj16/eNUKkUqn19vIiIl0hRTf/HFwtYSaR5rHzKktG1Lp9Ok\n0+msx+U05c/MdgX+5O77ZL5+BbjC3Wea2WDgp+6+fxuv1ZQ/EYndp59C//4wZgycdlrh5/n5z2HW\nrNDjjlLBq/yZ2SQgBXQFlhBmi8wBfgFsCqwGLnT32W28XkVbRGLXVkw9X+WKtbdVtLN+QHD3M9v4\n1oCiWyUiUgZNMfWXXy7+XM1j7cOHF3++fCnGLiKJlktMPV9xxtoVYxeRxMo1pp6vcsTatcqfiNSc\nXGPq+Yoz1q6iLSKJlG9MPV9xxdpVtEUkcQqJqecrrli7iraIJEqhMfV8NY+1l5NuRIpIotx6Kzz6\naNhNPeod1OfPj263dt2IFJHEe+kluPnmL3ZTj1ocu7WraItITir9A/Onn8KZZ268m3rUyr1bu4q2\niGT14YfQuTN85zvw+utxt6Z1F10Ehx9e3LoihRg6FB57LPzSKAcVbRHJasYMOOgg6N0bBg2C448P\nY8aV0vtubzf1qJV7t3YVbRHJavp0OOUUuOEGeOcdOO44OP98OPDAUKzWrYuvbW+8UfqYer7KGWvX\n7BERaVdjI+y4I7zwAvTq9cXz69bB1Knhxt/SpXDllWGq3Wabla9ta9aETwDDh5c2pp6vKGLtmj0i\nIgWZOxe23nrDgg3wpS/BySeHtaXHj4c//zncAPyv/4KPPy5P26KKqeernLF2FW0Radf06XDkkW1/\n3yykA6dMgXQaFi2CPfaASy4JQylRiTqmnq9yxdpVtEWkXdmKdnNf+xrcfz+89hpsuSXsvz+ccUZp\n1rFurnlMfdttS3vuQpUr1q6iLSJtWrUqjGUffnh+r9txR7jpptDTHjAg3MQcNCgMoRTbE21sDHOj\no46p56tcsXbdiBSRNk2bBj/5CTzzTHHnWbsWfv/7cNPSHa66Ksxv3nTT/M9Vzph6vubPD79I3nuv\n+LbpRqSI5G3GDDjiiOLP07Fj6IXW14eiO2FC2GfxtttgxYrcz1PumHq+9twTdt4ZnngiumtkLdpm\nNs7MlpjZnBbPX2Jm88xsrpn9NLomikhc8hnPzoUZHHVUKGpTpoSx7l694Oqr4YMP2n9tXDH1fJ17\nbvilFJVcdmMfCKwEJrh738xzKeA64Bh3bzCzbd19aRuv1/CISBX68EPYe2/46KNoe7XvvAN33AG/\n+Q2cdBL84Aew114bH1eq3dSjVqrd2gseHnH3Z4HlLZ7+f8BP3b0hc0yrBVtEqteMGeHmYdTDEL16\nwZ13hmRjr17hpudxx20Yk48zpp6vqGPthY5p9wEONbMXzOxpM9uvlI0SkfiVemgkm65d4Uc/gnff\nDWubnHdeiMnfe2/8MfV8RRlrz2n2iJn1BP7UbHhkLvCUu19mZvsDv3f33m281keNGrX+61QqRaqS\n5umIyEbaiq6X07p1Ydx77NhwE/O7342nHYUoJNaeTqdJp9Prv77xxhtbHR4ptGg/DvzM3Wdmvn4T\nGODuG4VXNaYtUn3q6+HUU8OQhRRm5EjYffdwk7UQxU75s8yfJn8EBmVO3Afo2FrBFpHqVO6hkSSK\nKtaey5S/ScAsoI+ZLTKzEcADQO/MMMkk4NzSNktE4qSiXbyoYu1KRIrIBlatgh12gMWLw+p+Urgb\nbgj/P2+/Pf/XKhEpIjl55hnYd18V7FI455ww66WhoXTnVNEWkQ2UKrou0cTaVbRFZAMazy6tUsfa\nNaYtIuuVK7peSwqNtWtMW0SyKld0vZaUOtauoi0i62loJBqljLVreEREgBBd794dnn8+vuh6UhUS\na9fwiIi0a+7cMOaqgl16pdytXUVbRAANjUStVLF2FW0RAVS0o1aqWLuKtogUvOu65K5Uu7WraIuI\noutlUopYu4q2iCi6XialiLWraIuIxrPL6Nxzixsi0TxtkRqn6Hp55Rpr1zxtEWmVouvlVWysXUVb\npMZpaKT8iom1a3hEpIYpuh6PXGLtGh4RkY0ouh6PYmLtuWzsO87MlpjZnFa+d6WZNZrZNvlfWkTi\npqGR+BQaa8+lpz0eOKrlk2bWAzgCWJjfJUWkUqhox6fQWHvWou3uzwLLW/nWHcBV+V1ORCqFouvx\nKjTWXtCYtpmdALzn7nMLeb0ky9Kl4YaWVBdF1+NXSKw975mZZrYZcB1haGT90+29ZvTo0esfp1Ip\nUqlUvpeVCtXQAN/4Rpjn+8AD0LFj3C2SXCm6Hr/msfZOndKk0+msr8lpyp+Z9QT+5O59zezrwBPA\nKkKx7gEsBg5w949aea2m/CXYtGlwzTWw006ht/3II7DFFnG3SnLRty/cdx8ceGDcLaltP/95mHLZ\nciZJsVP+LPMHd/+Hu3dz997u3gt4H+jXWsGW5Js4EUaMgP/+b9huu9BzW7Ys7lZJNh9+GGLU++0X\nd0tk6FD4n/+BTz/N7fhcpvxNAmYBfcxskZmNaHGIk2V4RJJp5crwj23o0DAsMn48fPObcOihsHhx\n3K2T9ii6XjnyjbXnMnvkTHfv7u5fdvdd3H18i+/3dnf1rWrQo4+GaUvbbx++7tABbr013BEfOBAW\nLIi3fdI2TfWrLPnE2pWIlIJNnBiWmWzOLIxxX3996D288ko8bZO2NTaGG1+6CVk5jj8eZs8OQ1bZ\nqGhLQRYvDgX5+ONb//7558MvfgFHHw1PPVXetkn7FF2vPPnE2lW0pSCTJsEpp8Bmm7V9zMknh9kk\nQ4cWvgyllJ6GRipTrrF2FW0pyMSJ4R9ZNqkU/OUvcMkl8KtfRd4syYGKdmXKNdauoi15q6+HTz6B\nQw7J7fj+/WHmTLjpJhgzJv8FcqR0PvtM0fVKlWusXUVb8jZxYvjH1SGPfz177AHPPhsiu1dcodh7\nXBRdr2y5xNpVtCUvDQ1hPDuXoZGWuneHv/4VXnoJhg2DtWtL3z5p3/TpmjVSyXLZrV1FW/Ly5JMh\nsv7Vrxb2+i5dQuFYvhxOOimsNCflo/Hsypdtt3YVbclLrjcg27P55iH23rWrYu/lpOh6dWiKtbdF\nRVty1jy2XqyOHeHXvw6LFR12mGLv5aDoenXYdttwz6gtKtqSs5ax9WI1xd7POkux93LQ0Ej1uPvu\ntr+noi05ay22XizF3stD0fXkUNGWnGSLrRerKfY+ZAg8/XQ016hliq4nh4q25CSX2HqxTj4ZHn4Y\nzjgjDMVI6WhoJDlUtCUnpZg1koum2PvFFyv2Xkoq2smR03ZjRV1A241Vvfp6OOEEeOed/FKQxXjj\njVBkRo6Ea68NY99SmM8+CzePFy9WErKaFLvdmNSwQmLrxdpjD3juOcXeS0HR9WRR0ZZ2FRNbL5Zi\n76Wh6HqyqGhLu4qNrRdLsffiaTw7WXLZ2HecmS0xsznNnrvZzOaZ2atm9gcz0wevhCrXDcj2KPZe\nOEXXkyeXnvZ44KgWz00H9nb3fYE3gGtL3TCJXylj68Vqir0PGKDYez4UXU+eXHZjfxZY3uK5J9y9\n6dbQC0CPCNomMSt1bL1YHTrAbbd9EXt/4424W1T5ZszQ0EjSlGJM+zvAn0twHqkwUcTWi9Uy9v78\n83G3qHI1NoairZuQyVLUhyYzux5Y6+6T2jtu9OjR6x+nUilSqVQxl5UyaIqtT50ad0tad/75YTW0\nU0+FvfeGH/4QBg/WfO7mFF2vLul0mnQ6nfW4nMI1ZtYT+JO792323HBgJDDI3T9v57UK11ShW26B\n+fPh/vvjbkn7Pv88TEm89VbYdFO46io4/XSN4UL4O3z33fZXjJPKVWy4xjJ/mk52NHAVcEJ7BVuq\nVyXMGsnFl78MI0aEXuWPfwz33gu77w51deFGai3TVL9kytrTNrNJQAroCiwBRgHXAZsCH2cOe8Hd\nL2zj9eppV5k4Yuul9Le/hV5mOg0XXACXXAI77BB3q8pL0fXq11ZPO+uHSHc/s5Wnx5ekVVKR4oit\nl9KAATB5Mrz5Jtx+ewgGnXYaXHll2Di1Fii6nlxV+mMpUYkztl5qu+8e1uiePx923BEOOSQs/zpr\nVtwti56i68mloi0biDu2HoXtt4cbbwzDPYMHh08RAwfClCnJXYhK49nJpaVZZQNnnw0HHACXXhp3\nS6LT0BCCQ7fcAp9+Cj/4QXjfnTrF3bLS+PDDMA3yo480i6aaaWlWyaqSYutR2mSTMC3wxRfhl78M\nBbxXLxgzJixMVe0UXU82FW1Zr9Ji61EzCzvlPP54GE6YPx922w2+/31YtCju1hVO0fVkU9GW9Sox\ntl4u++wDDz4Ic+aEHmq/fmHIpL4+7pblR9H15FPRFiD63darRY8eYaz77behb1845pjQa50xA6rh\n1oyi68mnoi1AeXZbryadO4f1TN5+G779bbj8cujfP/x/amiIu3Vt06yR5FPRFqB6Yuvl1jwm/5Of\nwH33hfnfd94J69bF3bqNqWgnn6b8SdXH1svtxRfDNMEddoDf/CYU9kqg6HqyaMqftKnaY+vldsAB\noUfrDsceG+Z6VwJF12uDfkxr3Lp1yYmtl1OnTvD734cpgoMGwb/+FXeLFF2vFSraNS6JsfVy+dKX\nQjjnqKPCuiYLF8bbHo1n1wZlpmrchAnqZRfDLNyg3G67ULj//OcQIS837bpeO9TTrmG1Elsvh8su\ng5tuCgtSvfBC+a+v6HrtUNGuYbUWW4/aWWfB+PFhJs5f/lLeayu6XjtUtGuY5maX3pAh8Mc/wrBh\n8Nvflueaiq7XFn2YqlGVvtt6NTvooHCDd8gQWLo0bHcWJUXXa4uKdo1SbD1aX/96mDd95JFhOuCN\nN4abllHQrJHaknV4xMzGmdkSM5vT7LkuZjbdzOab2TQz6xxtM6XUNDQSvV13hWefhccegwsvjC72\nrqJdW3IZ0x4PHNXiuWuAJ9x9T+Ap4NpSN0yiU18Pn3wSpqhJtLbfHp5+GhYsCAtPff55ac//2Wdh\ntsrhh5f2vFK5shZtd38WaLmfx4nAg5nHDwInlbhdEiHF1str661Db7uxsfSxd0XXa0+hP7bbu/sS\nAHf/J6BJY4QfyiefDHvzVSrF1uMRVexd0fXaU6obke0u4zd69Oj1j1OpFKlUqkSXrQyrV4fV3m69\nNSwi1NAQfph22y3ulm1MsfX4NMXef/SjMDQ1fTrssktx55w+PSwXK9UvnU6TTqezH+juWf8APYE5\nzb6eB+yQedwNmNfOaz2pli1zHzPGvVs39yFD3J96yr2x0f2Xv3Tv3t199uy4W7ixs85yr6uLuxUy\ndqz7zju7v/Za4ef44AP3Ll3c164tXbukcmRq50Y1NdfhEcv8aTIVGJ55PAyYkuN5EmHRorD56267\nwf/+b+jtPP54uBlkBt/7HtTVhTv6M2fG3dovKLZeOZpi74MGFR57V3S9NuUy5W8SMAvoY2aLzGwE\n8FPgCDObDwzOfJ149fXhBt6++4aPuvX1YTPYffbZ+NhTTw2JuNNOgykV8itNsfXKUmzsXdH12qSd\na7JwD+PAt9wC//hH6CF997vwla/k9vqXXw6b5Y4ZE7atitMRR8D558MZZ8TbDtnQrFlw8skwdmyY\nFpiLxkbo3h2ef15JyKRqa+cafbBqQ0MDPPII3HwzrFkTtpc688z8t5bab78wRNKUjPvhD6NpbzaK\nrVeuQmLviq7XLhXtFlauhHHj4I47oGdP+PGP4ZhjipvT3KcPPPfcF4X75pujizS3RbH1ypZv7F0p\nyNqleEXGkiVwww2h5/LMM2FO7cyZcNxxpQmh7LRTOO9zz4VhkoaG4s+ZD8XWK18+sXcV7dpV80V7\nwYIw2+OrX4WPPw7ji5Mnw4ABpb/WNtuEm0cffRR6vZ99VvprtEax9eqRS+xd0fXaVrNF+/nnw82f\ngw+Gbt1g/ny45x7YY49or7vFFmE2yVZbhb0F//3vaK8Hiq1Xm+ax9+OO2zj2ruh6baupH+PGxlAw\nBw4M060GD4Z33w3jh+WcBtexYyik/frBYYeF/f2ioth6dWqKvffuvXHsXdH12lYTRXv1arj/fthr\nr3Bj8ZJLwsfPiy8OPd84dOgQpnidfnr4JfLWW9FcR7H16tVyt/dFi8LzGs+ubYmePbJ8efhHf+ed\n4ePkPfdAKlX+mRttMYPrr4dtt4VDDw0fiffdt7TX0G7r1a35bu8DB4YwjnZdr22JLNpr18K118ID\nD4QxwWnToG/fuFvVtu99D7p2Db2nRx4JQyal0BRbv/320pxP4nPZZeGX+7HHhn/Tiq7XrkT+1T/8\ncLhZU18PO+8cd2tyc+qp0KVLiL3/6ldw4onFn1Ox9WQ566ww1LXllnG3ROKUuKLtHsaK//M/q6dg\nNxk8OCw8dfzxsGxZ8bH3iRNDbF2SI2GrGksBEle0Z80KY9nHHht3SwpTqti7YusiyZS4ol1XF8b/\nqnlOcili74qtiyRTolb5W7gQ+vcPc6+32qosl4zUsmXhplOfPmHKYj43n/r2hbvuKt1NTREpr7ZW\n+avi/ujG7r4bhg1LRsGGwmPviq2LJFdiivb//V+Y4pfLspbVpHns/cgjc4u9K7YuklyJ+bGeMCEE\nVJK4vnBT7L1//+yxd8XWRZItEUW7sfGLG5BJlWvsXbF1kWRLxOyRadPCLIlDD427JdHKJfau2LpI\nshU1e8TMrgXOBtYBc4ER7r6mxTGRzx45+uiw9vCwYZFepqJMnhwWym8ee1+5Enr0CIthKQUpUt1K\nPnvEzHoCI4F+7t6X0GsfWngTC/P66/DqqzC07FeOV2u7vSu2LpJ8xQyPrADWAFuYWSOwOfBBSVqV\nhzvvhAsuyH/D3SRoHnv/+ONQxBVbF0m2YodHRgK3A6uA6e6+0WhqlMMjy5bBbrvBvHlh95latWBB\nmA74ySfwwQdKQYokQVvDIwX3tM2sN/B9oCfwCTDZzM5090ktjx09evT6x6lUilSJVr1pWg2vlgs2\nfBF7r69XwRapVul0mnQ6nfW4gnvaZnY6cIS7j8x8fQ4wwN0vbnFcJD3ttWvDVkxTp4Ztu0REkiSK\nGPt84EAz62RmBgwG5hVxvrw8+mgo2irYIlJLCi7a7l4PTABeAeoBA+4rUbuyqquDyy8v19VERCpD\nVa7y97e/hSl+b74ZNj8VEUmaRK3yV1cXFoZSwRaRWlN1Pe3Fi2GffeCdd6Bz55KdVkSkoiSmp/2L\nX4RlR1WwRaQWVVVPe9Uq2HXXMCd5jz1KckoRkYqUiJ72Qw/BgAEq2CJSu6pmaVb3cAOyri7uloiI\nxKdqetpPPhnWkx40KO6WiIjEp2qK9tixYWca22iER0SkdlTFjcgFC8I60QsXakEkEakNVX0j8q67\nYORIFWwRkYrvaf/732GH9X/8I2xYKyJSC6q2pz1uHBxzjAq2iAhUeE+7oQF23x0efhgOOKDEDRMR\nqWBV2dOeOhW6d1fBFhFpUtFFe+xYrZktItJcxRbtv/8d3n0XTjkl7paIiFSOii3adXVw0UWwSdUE\n7UVEoleRNyL/+U/42tfgrbdgm20iapiISAWL5EakmXU2s0fMbJ6ZvWZmA4o5X5N77gnbialgi4hs\nqKietpn9Gpjp7uPNbBNgc3df0eKYvHraq1eHNbOffjr0tkVEalFbPe2CR4zNbGvgEHcfDuDuDcCK\ndl+Ug9/9DvbdVwVbRKQ1xQyP9AKWmtl4M/u7md1nZkWtDuKuaX4iIu0ppmhvAvQH7nb3/sAq4Jpi\nGjNzJnz+ORx5ZDFnERFJrmIm1L0PvOfuL2e+ngxc3dqBo0ePXv84lUqRSqVaPWFdHVx6KXSo2ImI\nIiLRSKfTpNPprMcVeyNyJjDS3ReY2SjCjcirWxyT043It98OcfWFC2GLLQpukohIIpT8RmTGpcBD\nZtYReBsYUeiJ7roLzjtPBVtEpD0VEa5ZsSKsmT17NuyyS6TNERGpChW9yt+vfw2DB6tgi4hkE3tP\ne9062HNPmDABDjoo0qaIiFSNiu1pP/ZYiKt/85txt0REpPLFXrTr6uCyy8A2+n0iIiItxTo8MmcO\nDBkC77wDm24aaTNERKpKRQ6P1NXBhReqYIuI5Cq2nva//gV9+sCCBbDddpE2QUSk6lRcT/vee+Fb\n31LBFhHJRyw97TVrwprZ06bBPvtEenkRkapUUT3thx+GvfZSwRYRyVfZi7b7F9P8REQkP2Uv2rNm\nwfLlcOyx5b6yiEj1K3vRHjs29LK1ZraISP7KeiNy4ULo1y/8d6utIr2siEhVq4gbkXffDcOHq2CL\niBSqbD3tlSvDNL+XXgprZ4uISNti72lPmACHHqqCLSJSjGK3G8tJYyPceWdIQYqISOHK0tOeNg02\n2yz0tEVEpHBFF20z62BmfzezqW0dM3YsXH651swWESlWKXralwGvt3dAfT0MHVqCK1WJdDoddxPK\nTu85+Wrt/UJlvueiiraZ9QCOAe5v77gLLoAvf7mYK1WXSvyLjprec/LV2vuFynzPxfa07wCuAtqd\nN3jBBUVeRUREgCKKtpkdCyxx91cBy/xpVbduhV5FRESaKzhcY2ZjgLOBBmAzYCvgUXc/t8Vx0aZ3\nREQSqrVwTUkSkWZ2GHClu59Q9MlERKRNWmtPRKSKRL72iIiIlE5kPW0zO9rM/tfMFpjZ1VFdp1KY\nWQ8ze8rMXjOzuWZ2adxtKpdcAlZJYmadzewRM5uX+fseEHebomZm12be6xwze8jMNo27TaVmZuPM\nbImZzWn2XBczm25m881smpl1jrONEFHRNrMOwM+Bo4C9gW+b2VejuFYFaQCucPe9gW8CF9XAe26S\nNWCVMHXA4+7+NeAbwLyY2xMpM+sJjAT6uXtfwppFSYzLjSfUrOauAZ5w9z2Bp4Bry96qFqLqaR8A\nvOHuC919LfA74MSIrlUR3P2fmemPuPtKwg/yTvG2Knq5BqySwsy2Bg5x9/EA7t7g7itiblbUVgBr\ngC3MbBNgc+CDeJtUeu7+LLC8xdMnAg9mHj8InFTWRrUiqqK9E/Bes6/fpwYKWBMz2xXYF/hbvC0p\ni5wCVgnSC1hqZuMzQ0L3mdlmcTcqSu6+HLgNWAQsBv7t7k/E26qy2d7dl0DomAHbx9wezR4pNTPb\nEpgMXJbpcSdWPgGrBNkE6A/c7e79gVWEj9CJZWa9ge8DPYHuwJZmdma8rYpN7J2TqIr2YmCXZl/3\nyDyXaJmPjpOBie4+Je72lMHBwAlm9jbwW+BwM5sQc5ui9j7wnru/nPl6MqGIJ9l+wHPuvszd1wGP\nAgfF3KZyWWJmOwCYWTfgo5jbE1nRfgnY3cx6Zu4yDwVqYWbBA8Dr7l4Xd0PKwd2vc/dd3L034e/4\nqZaJ2KTJfFR+z8z6ZJ4aTPJvws4HDjSzTmZmhPec1JuvLT8xTgWGZx4PA2LvjEWyc427rzOzi4Hp\nhF8M49w9qX/JAJjZwcBZwFwzm034GHWdu/8l3pZJBC4FHjKzjsDbwIiY2xMpd6/PfIJ6BVgHzAbu\ni7dVpWdmk4AU0NXMFgGjgJ8Cj5jZd4CFwOnxtTBQuEZEpIroRqSISBVR0RYRqSIq2iIiVURFW0Sk\niqhoi4hUERVtEZEqoqItIlJFVLRFRKrI/weTvVXLditpUwAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x11161dfd0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"s7 = pd.Series([4,9,10,8,14,12,11,9,17,16,19,13], name=\"temperature\")\n",
"s7.plot()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are *many* options for plotting your data. It is not necessary to list them all here: if you need a particular type of plot (histograms, pie charts, etc.), just look for it in the excellent [Visualization](http://pandas.pydata.org/pandas-docs/stable/visualization.html) section of pandas' documentation, and look at the example code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `DataFrame` objects\n",
"A DataFrame object represents a spreadsheet, with cell values, column names and row index labels. You can think of them as dictionaries of `Series` objects.\n",
"\n",
"### Creating a `DataFrame`\n",
"You can create a DataFrame by passing a dictionary of `Series` objects:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>3</td>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>0</td>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"alice 1985 NaN Biking 68\n",
"bob 1984 3 Dancing 83\n",
"charles 1992 0 Reading 112"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people_ids = [\"alice\", \"bob\", \"charles\"]\n",
"people_dict = {\n",
" \"weight\": pd.Series([68, 83, 112], index=people_ids),\n",
" \"birthyear\": pd.Series([1985, 1984, 1992], index=people_ids, name=\"year\"),\n",
" \"children\": pd.Series([np.nan, 3, 0], index=people_ids),\n",
" \"hobby\": pd.Series([\"Biking\", \"Dancing\", \"Reading\"], index=people_ids),\n",
"}\n",
"people = pd.DataFrame(people_dict)\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that DataFrames are displayed nicely in Jupyter notebooks! Also, note that `Series` names are ignored (`\"year\"` was dropped)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can access columns pretty much as you would expect. They are returned as `Series` objects:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"alice 1985\n",
"bob 1984\n",
"charles 1992\n",
"Name: birthyear, dtype: int64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people[\"birthyear\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you pass a list of columns and/or index row labels to the `DataFrame` constructor, it will guarantee that these columns and/or rows will exist, in that order, and no other column/row will exist. For example:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>weight</th>\n",
" <th>height</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>83</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>68</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>eugene</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear weight height\n",
"bob 1984 83 NaN\n",
"alice 1985 68 NaN\n",
"eugene NaN NaN NaN"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d2 = pd.DataFrame(\n",
" people_dict,\n",
" columns=[\"birthyear\", \"weight\", \"height\"],\n",
" index=[\"bob\", \"alice\", \"eugene\"]\n",
" )\n",
"d2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another convenient way to create a `DataFrame` is to pass all the values to the constructor as an `ndarray`, and specify the column names and row index labels separately:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>nan</td>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>3</td>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>0</td>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"alice 1985 nan Biking 68\n",
"bob 1984 3 Dancing 83\n",
"charles 1992 0 Reading 112"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values = np.array([\n",
" [1985, np.nan, \"Biking\", 68],\n",
" [1984, 3, \"Dancing\", 83],\n",
" [1992, 0, \"Reading\", 112]\n",
" ])\n",
"d3 = pd.DataFrame(\n",
" values,\n",
" columns=[\"birthyear\", \"children\", \"hobby\", \"weight\"],\n",
" index=[\"alice\", \"bob\", \"charles\"]\n",
" )\n",
"d3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of an `ndarray`, you can also pass a `DataFrame` object:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>children</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>nan</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby children\n",
"alice Biking nan\n",
"bob Dancing 3"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4 = pd.DataFrame(\n",
" d3,\n",
" columns=[\"hobby\", \"children\"],\n",
" index=[\"alice\", \"bob\"]\n",
" )\n",
"d4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is also possible to create a `DataFrame` with a dictionary (or list) of dictionaries (or list):"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>3</td>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>0</td>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"alice 1985 NaN Biking 68\n",
"bob 1984 3 Dancing 83\n",
"charles 1992 0 Reading 112"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people = pd.DataFrame({\n",
" \"birthyear\": {\"alice\":1985, \"bob\": 1984, \"charles\": 1992},\n",
" \"hobby\": {\"alice\":\"Biking\", \"bob\": \"Dancing\", \"charles\": \"Reading\"},\n",
" \"weight\": {\"alice\":68, \"bob\": 83, \"charles\": 112},\n",
" \"children\": {\"alice\":np.nan, \"bob\": 3, \"charles\": 0}\n",
"})\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multi-indexing\n",
"If all columns are tuples of the same size, then they are understood as a multi-index. The same goes for row index labels. For example:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th colspan=\"2\" halign=\"left\">private</th>\n",
" <th colspan=\"2\" halign=\"left\">public</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th>children</th>\n",
" <th>weight</th>\n",
" <th>birthyear</th>\n",
" <th>hobby</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>London</th>\n",
" <th>charles</th>\n",
" <td>0</td>\n",
" <td>112</td>\n",
" <td>1992</td>\n",
" <td>Reading</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Paris</th>\n",
" <th>alice</th>\n",
" <td>NaN</td>\n",
" <td>68</td>\n",
" <td>1985</td>\n",
" <td>Biking</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>3</td>\n",
" <td>83</td>\n",
" <td>1984</td>\n",
" <td>Dancing</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" private public \n",
" children weight birthyear hobby\n",
"London charles 0 112 1992 Reading\n",
"Paris alice NaN 68 1985 Biking\n",
" bob 3 83 1984 Dancing"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d5 = pd.DataFrame(\n",
" {\n",
" (\"public\", \"birthyear\"):\n",
" {(\"Paris\",\"alice\"):1985, (\"Paris\",\"bob\"): 1984, (\"London\",\"charles\"): 1992},\n",
" (\"public\", \"hobby\"):\n",
" {(\"Paris\",\"alice\"):\"Biking\", (\"Paris\",\"bob\"): \"Dancing\", (\"London\",\"charles\"): \"Reading\"},\n",
" (\"private\", \"weight\"):\n",
" {(\"Paris\",\"alice\"):68, (\"Paris\",\"bob\"): 83, (\"London\",\"charles\"): 112},\n",
" (\"private\", \"children\"):\n",
" {(\"Paris\", \"alice\"):np.nan, (\"Paris\",\"bob\"): 3, (\"London\",\"charles\"): 0}\n",
" }\n",
")\n",
"d5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now get a `DataFrame` containing all the `\"public\"` columns very simply:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>hobby</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>London</th>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>Reading</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">Paris</th>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>Biking</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>Dancing</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear hobby\n",
"London charles 1992 Reading\n",
"Paris alice 1985 Biking\n",
" bob 1984 Dancing"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d5[\"public\"]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"London charles Reading\n",
"Paris alice Biking\n",
" bob Dancing\n",
"Name: (public, hobby), dtype: object"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d5[\"public\", \"hobby\"] # Same result as d4[\"public\"][\"hobby\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Accessing rows\n",
"Let's go back to the `people` `DataFrame`:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>3</td>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>0</td>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"alice 1985 NaN Biking 68\n",
"bob 1984 3 Dancing 83\n",
"charles 1992 0 Reading 112"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `loc` attribute lets you access rows instead of columns. The result is `Series` object in which the `DataFrame`'s column names are mapped to row index labels:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"birthyear 1992\n",
"children 0\n",
"hobby Reading\n",
"weight 112\n",
"Name: charles, dtype: object"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.loc[\"charles\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also access rows by location using the `iloc` attribute:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"birthyear 1992\n",
"children 0\n",
"hobby Reading\n",
"weight 112\n",
"Name: charles, dtype: object"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.iloc[2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also get a slice of rows, and this returns a `DataFrame` object:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>3</td>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>0</td>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"bob 1984 3 Dancing 83\n",
"charles 1992 0 Reading 112"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.iloc[1:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, you can pass a boolean array to get the matching rows:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>0</td>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"alice 1985 NaN Biking 68\n",
"charles 1992 0 Reading 112"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people[np.array([True, False, True])]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is most useful when combined with boolean expressions:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>3</td>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"alice 1985 NaN Biking 68\n",
"bob 1984 3 Dancing 83"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people[people[\"birthyear\"] < 1990]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding and removing columns\n",
"You can generally treat `DataFrame` objects like dictionaries of `Series`, so the following work fine:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>birthyear</th>\n",
" <th>children</th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>1985</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>1984</td>\n",
" <td>3</td>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>1992</td>\n",
" <td>0</td>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" birthyear children hobby weight\n",
"alice 1985 NaN Biking 68\n",
"bob 1984 3 Dancing 83\n",
"charles 1992 0 Reading 112"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby weight age over 30\n",
"alice Biking 68 31 True\n",
"bob Dancing 83 32 True\n",
"charles Reading 112 24 False"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people[\"age\"] = 2016 - people[\"birthyear\"] # adds a new column \"age\"\n",
"people[\"over 30\"] = people[\"age\"] > 30 # adds another column \"over 30\"\n",
"birthyears = people.pop(\"birthyear\")\n",
"del people[\"children\"]\n",
"\n",
"people"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"alice 1985\n",
"bob 1984\n",
"charles 1992\n",
"Name: birthyear, dtype: int64"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"birthyears"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you add a new colum, it must have the same number of rows. Missing rows are filled with NaN, and extra rows are ignored:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby weight age over 30 pets\n",
"alice Biking 68 31 True NaN\n",
"bob Dancing 83 32 True 0\n",
"charles Reading 112 24 False 5"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people[\"pets\"] = pd.Series({\"bob\": 0, \"charles\": 5, \"eugene\":1}) # alice is missing, eugene is ignored\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When adding a new column, it is added at the end (on the right) by default. You can also insert a column anywhere else using the `insert` method:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>172</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>185</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets\n",
"alice Biking 172 68 31 True NaN\n",
"bob Dancing 181 83 32 True 0\n",
"charles Reading 185 112 24 False 5"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.insert(1, \"height\", [172, 181, 185])\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Assigning new columns\n",
"You can also create new columns by calling the `assign` method. Note that this returns a new `DataFrame` object, the original is not modified:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" <th>body_mass_index</th>\n",
" <th>has_pets</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>172</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>22.985398</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>25.335002</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>185</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>32.724617</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets body_mass_index has_pets\n",
"alice Biking 172 68 31 True NaN 22.985398 False\n",
"bob Dancing 181 83 32 True 0 25.335002 False\n",
"charles Reading 185 112 24 False 5 32.724617 True"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.assign(\n",
" body_mass_index = people[\"weight\"] / (people[\"height\"] / 100) ** 2,\n",
" has_pets = people[\"pets\"] > 0\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that you cannot access columns created within the same assignment:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Key error: u'body_mass_index'\n"
]
}
],
"source": [
"try:\n",
" people.assign(\n",
" body_mass_index = people[\"weight\"] / (people[\"height\"] / 100) ** 2,\n",
" overweight = people[\"body_mass_index\"] > 25\n",
" )\n",
"except KeyError as e:\n",
" print(\"Key error:\", e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The solution is to split this assignment in two consecutive assignments:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" <th>body_mass_index</th>\n",
" <th>overweight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>172</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>22.985398</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>25.335002</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>185</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>32.724617</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets body_mass_index \\\n",
"alice Biking 172 68 31 True NaN 22.985398 \n",
"bob Dancing 181 83 32 True 0 25.335002 \n",
"charles Reading 185 112 24 False 5 32.724617 \n",
"\n",
" overweight \n",
"alice False \n",
"bob True \n",
"charles True "
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6 = people.assign(body_mass_index = people[\"weight\"] / (people[\"height\"] / 100) ** 2)\n",
"d6.assign(overweight = d6[\"body_mass_index\"] > 25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Having to create a temporary variable `d6` is not very convenient. You may want to just chain the assigment calls, but it does not work because the `people` object is not actually modified by the first assignment:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Key error: u'body_mass_index'\n"
]
}
],
"source": [
"try:\n",
" (people\n",
" .assign(body_mass_index = people[\"weight\"] / (people[\"height\"] / 100) ** 2)\n",
" .assign(overweight = people[\"body_mass_index\"] > 25)\n",
" )\n",
"except KeyError as e:\n",
" print(\"Key error:\", e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But fear not, there is a simple solution. You can pass a function to the `assign` method (typically a `lambda` function), and this function will be called with the `DataFrame` as a parameter:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" <th>body_mass_index</th>\n",
" <th>overweight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>172</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>22.985398</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>25.335002</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>185</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>32.724617</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets body_mass_index \\\n",
"alice Biking 172 68 31 True NaN 22.985398 \n",
"bob Dancing 181 83 32 True 0 25.335002 \n",
"charles Reading 185 112 24 False 5 32.724617 \n",
"\n",
" overweight \n",
"alice False \n",
"bob True \n",
"charles True "
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(people\n",
" .assign(body_mass_index = lambda df: df[\"weight\"] / (df[\"height\"] / 100) ** 2)\n",
" .assign(overweight = lambda df: df[\"body_mass_index\"] > 25)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Problem solved!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluating an expression\n",
"A great feature supported by pandas is expression evaluation. This relies on the `numexpr` library which must be installed."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"alice False\n",
"bob True\n",
"charles True\n",
"dtype: bool"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.eval(\"weight / (height/100) ** 2 > 25\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assignment expressions are also supported, and contrary to the `assign` method, this does not create a copy of the `DataFrame`, instead it directly modifies it:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" <th>body_mass_index</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>172</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>22.985398</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>25.335002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>185</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>32.724617</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets body_mass_index\n",
"alice Biking 172 68 31 True NaN 22.985398\n",
"bob Dancing 181 83 32 True 0 25.335002\n",
"charles Reading 185 112 24 False 5 32.724617"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.eval(\"body_mass_index = weight / (height/100) ** 2\")\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use a local or global variable in an expression by prefixing it with `'@'`:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" <th>body_mass_index</th>\n",
" <th>overweight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>172</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>22.985398</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>25.335002</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>185</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>32.724617</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets body_mass_index \\\n",
"alice Biking 172 68 31 True NaN 22.985398 \n",
"bob Dancing 181 83 32 True 0 25.335002 \n",
"charles Reading 185 112 24 False 5 32.724617 \n",
"\n",
" overweight \n",
"alice False \n",
"bob False \n",
"charles True "
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"overweight_threshold = 30\n",
"people.eval(\"overweight = body_mass_index > @overweight_threshold\")\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Querying a `DataFrame`\n",
"The `query` method lets you filter a `DataFrame` based on a query expression:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" <th>body_mass_index</th>\n",
" <th>overweight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>25.335002</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets body_mass_index overweight\n",
"bob Dancing 181 83 32 True 0 25.335002 False"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.query(\"age > 30 and pets == 0\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sorting a `DataFrame`\n",
"You can sort a `DataFrame` by calling its `sort_index` method. By default it sorts the rows by their index label, in ascending order, but let's reverse the order:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hobby</th>\n",
" <th>height</th>\n",
" <th>weight</th>\n",
" <th>age</th>\n",
" <th>over 30</th>\n",
" <th>pets</th>\n",
" <th>body_mass_index</th>\n",
" <th>overweight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>Reading</td>\n",
" <td>185</td>\n",
" <td>112</td>\n",
" <td>24</td>\n",
" <td>False</td>\n",
" <td>5</td>\n",
" <td>32.724617</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>Dancing</td>\n",
" <td>181</td>\n",
" <td>83</td>\n",
" <td>32</td>\n",
" <td>True</td>\n",
" <td>0</td>\n",
" <td>25.335002</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>Biking</td>\n",
" <td>172</td>\n",
" <td>68</td>\n",
" <td>31</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>22.985398</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hobby height weight age over 30 pets body_mass_index \\\n",
"charles Reading 185 112 24 False 5 32.724617 \n",
"bob Dancing 181 83 32 True 0 25.335002 \n",
"alice Biking 172 68 31 True NaN 22.985398 \n",
"\n",
" overweight \n",
"charles True \n",
"bob False \n",
"alice False "
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.sort_index(ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that `sort_index` returned a sorted *copy* of the `DataFrame`. To modify `people` directly, we can set the `inplace` argument to `True`. Also, we can sort the columns instead of the rows by setting `axis=1`:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>body_mass_index</th>\n",
" <th>height</th>\n",
" <th>hobby</th>\n",
" <th>over 30</th>\n",
" <th>overweight</th>\n",
" <th>pets</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>31</td>\n",
" <td>22.985398</td>\n",
" <td>172</td>\n",
" <td>Biking</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>32</td>\n",
" <td>25.335002</td>\n",
" <td>181</td>\n",
" <td>Dancing</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>0</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>24</td>\n",
" <td>32.724617</td>\n",
" <td>185</td>\n",
" <td>Reading</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>5</td>\n",
" <td>112</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age body_mass_index height hobby over 30 overweight pets \\\n",
"alice 31 22.985398 172 Biking True False NaN \n",
"bob 32 25.335002 181 Dancing True False 0 \n",
"charles 24 32.724617 185 Reading False True 5 \n",
"\n",
" weight \n",
"alice 68 \n",
"bob 83 \n",
"charles 112 "
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.sort_index(axis=1, inplace=True)\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To sort the `DataFrame` by the values instead of the labels, we can use `sort_values` and specify the column to sort by:"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>body_mass_index</th>\n",
" <th>height</th>\n",
" <th>hobby</th>\n",
" <th>over 30</th>\n",
" <th>overweight</th>\n",
" <th>pets</th>\n",
" <th>weight</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>24</td>\n",
" <td>32.724617</td>\n",
" <td>185</td>\n",
" <td>Reading</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>5</td>\n",
" <td>112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>31</td>\n",
" <td>22.985398</td>\n",
" <td>172</td>\n",
" <td>Biking</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>32</td>\n",
" <td>25.335002</td>\n",
" <td>181</td>\n",
" <td>Dancing</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>0</td>\n",
" <td>83</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age body_mass_index height hobby over 30 overweight pets \\\n",
"charles 24 32.724617 185 Reading False True 5 \n",
"alice 31 22.985398 172 Biking True False NaN \n",
"bob 32 25.335002 181 Dancing True False 0 \n",
"\n",
" weight \n",
"charles 112 \n",
"alice 68 \n",
"bob 83 "
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"people.sort_values(by=\"age\", inplace=True)\n",
"people"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plotting a `DataFrame`\n",
"Just like for `Series`, pandas makes it easy to draw nice graphs based on a `DataFrame`.\n",
"\n",
"For example, it is trivial to create a line plot from a `DataFrame`'s data by calling its `plot` method:"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEQCAYAAACgBo8fAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt0VPW99/H3l3sIhFsEhGAIGCEWtIpS7VGYar09y6J9\nekqPx8vxcqyX42U9dfUotRZ8Vvt4edrVnuOqrV1SxCpV8ZzHeynHU2KXrWhtQVESEsQQBEHCPeGS\n2/f5Y+/ESTK5TzKTnc9rrVns+c2ePd8Z4DN7fvv329vcHRER6fsGpLoAERFJDgW6iEhEKNBFRCJC\ngS4iEhEKdBGRiFCgi4hERLuBbmY5ZvYHM/vQzDaY2R1h+xgzW21mm8zs92Y2Ku45i8ys1MyKzOzC\nnnwDIiISsPbGoZvZRGCiu683sxHAX4HLgOuAPe7+sJndDYxx93vM7GTgaeBMIAd4Hch3DXgXEelR\n7e6hu/tOd18fLlcCRQRBfRmwPFxtOXB5uLwAeMbda929DCgF5ia5bhERaaZTfehmNhX4IrAWmODu\nuyAIfWB8uNpkYFvc07aHbSIi0oM6HOhhd8vzwJ3hnnrzLhR1qYiIpNCgjqxkZoMIwvw37v5i2LzL\nzCa4+66wn/2zsH07MCXu6TlhW/Nt6gtARKQL3N0StXd0D/3XwEZ3/7e4tpeAa8PlfwJejGv/BzMb\nYmZ5wInAO60UldLb4sWLU15Df6i5r9atmlV3Otbclnb30M3s74ArgQ1mto6ga+V7wEPAc2Z2PbAV\nWBiG9EYzew7YCNQAt3p7VYiISLe1G+ju/idgYCsPf7WV5zwAPNCNukREpJP69UzRWCyW6hI6rS/W\nDH2zbtXce/pi3elYc7sTi3rshc3UEyMi0klmhnfzoKiIiKQ5BbqISEQo0EVEIkKBLiISEQp0EZGI\nUKCLiESEAl1EJCIU6CIiEaFAFxGJCAW6iEhEKNBFRCJCgS4iEhEKdBGRiFCgi4hEhAJdRCQiFOgi\nIhGhQBcRiYh2rynak04+GUaM6P4tIwMs4fU7RET6j5Regu7DD53KSrp9O3YMMjOT8+XQcMvMhIGt\nXRpbRCRF2roEXSSuKVpXB1VV3f9iaLgdOgSHD8OwYcn9khgxAgYPTspbFpF+KvKB3hPq6+HIkeR9\nSTR8UQwcmNwviJEjYehQdTmJ9BfdCnQzWwpcCuxy91PCtjOBnwODgRrgVnd/N3xsEXA9UAvc6e6r\nW9luWgd6T3CH6urkfkFUVkJtbfJ/SQwfDgN0yFwk7XQ30M8BKoEn4wJ9DfCAu682s0uAf3X3r5jZ\nycDTwJlADvA6kJ8ouftjoPeUmprkdjlVVga/TjIykvtLIjMTBqX0MLxI39dWoLf738vd3zSz3GbN\nnwKjwuXRwPZweQHwjLvXAmVmVgrMBd7uUuXSIYMHw+jRwS1Z6uuD4wgd/QKoqGj/l0RlJQwZkvxf\nE0OGqMtJ+oe//a3tx7u6v3QP8Ccz+wlgwJfD9snAW3HrbQ/bpI8ZMODzwEwWdzh6tONfEjt2dGy9\n+vrk/pIYMUJDYSU9tTfyrquBvhS43d1fMLO/B34NXNDFbUk/YRYEZUYGHHdc8rZbXd3xLqe9e6G8\nvP1fEhoKK+no1FPbfryrgf4ld78AwN2fN7PHw/btwJS49XL4vDumhSVLljQux2IxYrFYF8uR/mzI\nkOA2ZkzyttnZobA7d7a/TlVVMCIpmb8kMjOD9y7RVVhYSGFhYYfW7dCwRTObCrzs7rPD+38FvuPu\nb5jZ+cCD7n5m3EHRLxF0tfwXOigqAgRdTskYChv/S6Kysmn3WLJuw4apyylddXeUywogBowDdgGL\ngfeBR4EhwFGCYYvrwvUXATcQDGfUsEWRHpTsobANt+rq5P6SGDFCQ2GTRROLRKRTamu7PxS2+S+J\nZA+Fbbj1t6GwCnQRSbnODoXt6G3QoOT+khgxIr2HwirQRSSS3IMRScn8JVFZmdyhsPFdTsn4klCg\ni4h0QmeGwnb0dvRoEOrd/WI444xuzBQVEelvemoobGe6nD77LPGvibZoD11EpA9pq8tFg4hERCJC\ngS4iEhEKdBGRiFCgi4hEhAJdRCQiFOgiIhGhQBcRiQgFuohIRCjQRUQiQoEuIhIRCnQRkYhQoIuI\nRIQCXUQkIhToIiIRoUAXEYkIBbqISEQo0EVEIkKBLiISEe0GupktNbNdZvZ+s/bbzazIzDaY2YNx\n7YvMrDR87MKeKFpERFrqyEWilwGPAE82NJhZDPgaMNvda80sO2wvABYCBUAO8LqZ5evioSIiPa/d\nPXR3fxPY16z5FuBBd68N16kI2y8DnnH3WncvA0qBuckrV0REWtPVPvSTgHlmttbM1pjZnLB9MrAt\nbr3tYZuIiPSwjnS5tPa8Me5+lpmdCawEpnV2I0uWLGlcjsVixGKxLpYjIhJNhYWFFBYWdmhd60j3\ntpnlAi+7+ynh/deAh9z9jfB+KXAWcCOAuz8Ytq8CFrv72wm2qa51EZFOMjPc3RI91tEuFwtvDV4A\nzgs3fhIwxN33AC8B3zKzIWaWB5wIvNPlykVEpMPa7XIxsxVADBhnZuXAYuDXwDIz2wAcA64BcPeN\nZvYcsBGoAW7VbriISO/oUJdLj7ywulxERDotGV0uIiKS5hToIiIRoUAXEYkIBbqISEQo0EVEIkKB\nLiISEQp0EZGIUKCLiESEAl1EJCIU6CIiEaFAFxGJCAW6iEhEKNBFRCJCgS4iEhEKdBGRiFCgi4hE\nhAJdRCQiFOgiIhGhQBcRiQgFuohIRCjQRUQiQoEuIhIRCnQRkYhoN9DNbKmZ7TKz9xM8dpeZ1ZvZ\n2Li2RWZWamZFZnZhsgsWEZHEOrKHvgy4qHmjmeUAFwBb49oKgIVAAXAJ8KiZWXJKFRGRtrQb6O7+\nJrAvwUM/Bb7brO0y4Bl3r3X3MqAUmNvdIkVEpH1d6kM3swXANnff0OyhycC2uPvbwzYREelhgzr7\nBDPLAL5H0N3SLUuWLGlcjsVixGKx7m5SRCRSCgsLKSws7NC65u7tr2SWC7zs7qeY2SzgdeAwYEAO\nwZ74XOB6AHd/MHzeKmCxu7+dYJvekdcWEZHPmRnunvDYZEe7XCy84e4fuPtEd5/m7nnAJ8Bp7v4Z\n8BLwLTMbYmZ5wInAO91/CyIi0p6ODFtcAfwZOMnMys3sumarOJ+H/UbgOWAj8Bpwq3bDRUR6R4e6\nXHrkhdXlIiLSacnochERkTSnQBcRiQgFuohIRCjQRUQiQoEuIhIRCnQRkYhQoIuIRIQCXUQkIhTo\nIiIRoUAXEYkIBbqISEQo0EVEIkKBLiISEQp0EZGIUKCLiESEAl1EJCIU6CIiEaFAFxGJCAW6iEhE\nKNBFRCJCgS4iEhEKdBGRiFCgi4hERLuBbmZLzWyXmb0f1/awmRWZ2Xoz+w8zy4p7bJGZlYaPX9hT\nhYuISFMd2UNfBlzUrG018AV3/yJQCiwCMLOTgYVAAXAJ8KiZWfLKFRGR1rQb6O7+JrCvWdvr7l4f\n3l0L5ITLC4Bn3L3W3csIwn5u8soVEZHWJKMP/XrgtXB5MrAt7rHtYZuIiPSwQd15spndC9S4+2+7\n8vwlS5Y0LsdiMWKxWHfKERGJnMLCQgoLCzu0rrl7+yuZ5QIvu/spcW3XAjcC57n7sbDtHsDd/aHw\n/ipgsbu/nWCb3pHXFhGRz5kZ7p7w2GRHu1wsvDVs8GLgu8CChjAPvQT8g5kNMbM84ETgna6VLSIi\nndFul4uZrQBiwDgzKwcWA98DhgD/FQ5iWevut7r7RjN7DtgI1AC3ajdcRKR3dKjLpUdeWF0uIiKd\nlowuFxERSXMKdBGRiFCgi4hEhAJdRCQiFOgiIhGhQBcRiQgFuohIRKQ00Fd+uJIPPvuAY7XH2l9Z\nRETa1K2Tc3XXig9WULS7iLL9ZUwZNYWC7AIKsguYmT2TguOC5VHDRqWyRBGRPiMtZopW11Xz0d6P\nKKoooriimKKKIop2B8tZQ7OCgM8uoOC4gsblSSMnoWtniEh/09ZM0bQI9NbUez3bD25vEvBFFUUU\nVRRxpOZIkz35hj376WOnM2hASn94iIj0mD4b6G3Ze2RvEPC7i5rs2e84tINpY6Z9vlcfBv3M7Jlk\nDslM4jsQEel9kQz01hypOULp3tIWQV+6p5TjMo9rEvQNXTjHDT9O3Tci0if0q0BvTV19HWX7y5r2\n0e8J9vDN7PODsWHQF2QXkDs6lwGmkZ0ikj4U6G1wdz6r+qxJ0Dfs2VccruCkcScFe/LjPu+vzx+X\nz7BBw1Jduoj0Qwr0LqqsrqS4orhFX/2WfVvIycppEfQFxxUwetjoVJctIhGmQE+ymroaPtr3UZOg\nbwj7zMGZjQEf34UzeeRk9dOLSLcp0HuJu7P90PYWQyyLdhdRVVPVYuRNwXEFTB8zncEDB6e6dBHp\nIxToaWDfkX2NIR/fX//JwU/IG5PXIuhnZs9kxJARqS5bRNKMAj2NHa09Sume0hZBX7KnhOzh2Qln\nyY7PHK/uG5F+SoHeB9XV11F+oDzhLFl3bzHEcmb2TKaOnsrAAQNTXbqI9CAFeoS4OxWHK1oMsSyq\nKGJ31W7yx+W3mDyVPzafjMEZqS5dRJJAgd5PVFZXUrKnpEXQb9m3hUkjJ7UYeTMzeyZjM8amumwR\n6YRuBbqZLQUuBXa5+ylh2xjgWSAXKAMWuvuB8LFFwPVALXCnu69uZbsK9F5SU1fDx/s/bjHEsmh3\nERmDMxLOks3JylE/vUga6m6gnwNUAk/GBfpDwB53f9jM7gbGuPs9ZnYy8DRwJpADvA7kJ0puBXrq\nuTs7Du1IOEv2UPUhZoyb0WLy1PSx0xkycEiqSxfpt7rd5WJmucDLcYFeDMx3911mNhEodPeZZnYP\n4O7+ULje74Al7v52gm0q0NPY/qP7E86SLT9QztTRU1tMnpqZPZORQ0emumyRyOuJQN/r7mPjHt/r\n7mPN7BHgLXdfEbY/Drzm7v+ZYJsK9D7oWO2xxrNZxo+8KdlTwphhYxLOkp2QOUHdNyJJ0lagJ+tK\nEF1K5iVLljQux2IxYrFYksqRnjJ00FBmjZ/FrPGzmrTXe30wzDIM+vU71/PbD35L0e4i6rwu4SzZ\nvNF5GmYp0o7CwkIKCws7tG5X99CLgFhcl8sady9I0OWyClisLpf+reJwRYuRN0W7i9hVtYsTx57Y\n4vz0M8bN0DBLkVYko8tlKkGgzw7vPwTsdfeHWjko+iVgMvBf6KCotOJwzWE2VWxqEfQf7fuI40cc\nn3CW7Ljh41JdtkhKdXeUywogBowDdgGLgReAlcAUYCvBsMX94fqLgBuAGjRsUbqgtr6Wj/d93OJC\nJEUVRQwdODThLNkpo6boYiTSL2hikUSCu7OzcmfCWbIHjh5gRvaMFn31+ePyNcxSIkWBLpF34OgB\nNu3Z1GLy1Nb9W8kdnZtwlmzW0KxUly3SaQp06beO1R5j897NTYZYFlcUs6liE6OGjUo4S3biiIka\nZilpS4Eu0ky917PtwLaEs2Sr66o/Py993CzZvDF5DBqQrJG+Il2jQBfphD2H9zQ5303D8qeVnzJ9\nzPQWk6dmZM9g+ODhqS5b+gkFukgSHK453Hg2y/gunM17NzMhc0LCWbLZw7NTXbZEjAJdpAfV1dc1\nns2y+bVkBw0Y1Bj08bNkTxh1goZZSpco0EVSwN3ZVbUr4SzZfUf3cdK4k1rMks0fm8/QQUNTXbqk\nMQW6SJo5dOxQk4BvmDxVtr+ME0adkHCW7Khho1JdtqQBBbpIH1FdV81Hez9qEfTFFcVkDc1KOEt2\n0shJGmbZjyjQRfq4eq9n+8HtCWfJHq09yszsmS1myU4fO13DLCNIgS4SYXuP7G0xxLKooogdh3Yw\nbcy0FiNvZoybQeaQzFSXLV3UpwJ96tSpbN26NQUVRVNubi5lZWWpLkNS4EjNEUr2lLSYJVu6p5Tj\nMo9LOEs2e3i2um/SXJ8K9LDYFFQUTfo8pbm6+jrK9pclPChrZi2GWBZkF5A7OlfDLNOEAr0f0+cp\nHeXufFb1WcJZshWHK4Jhls0mT+WPy2fYoGGpLr1fUaD3Y/o8JRkOHTvUeDbL+C6cj/d9TE5WTsJZ\nsqOHjU512ZGkQO/H9HlKT6qpq+GjfR+1CPriimJGDBmR8Fqyk0dOVj99NyjQ+zF9npIK7s72Q9sT\nzpKtqqlKGPTTx0xn8MDBqS497SnQkygvL4+lS5dy3nnndep5s2bN4tFHH2XevHk99hqJpPvnKf3P\nviP7Eh6Q3X5oO3mj81rMkp2ZPZMRQ0akuuy00Vaga9ZBL/nggw+Ssp033niDq666im3btiVleyK9\nbUzGGM6ecjZnTzm7SfvR2qOU7iltDPpXS1/lx3/+MSV7Ssgent3i/PQzs2cyPnO8um/iKND7GHfX\nP2CJpGGDhjF7wmxmT5jdpL2uvo6tB7Y2jrx5d8e7PPX+UxRVFOHuCYN+6uipDBwwMEXvJHXU5dJJ\neXl53HbbbTz55JOUl5dz8cUXs3z5coYMGcIrr7zCfffdR1lZGV/4whf4xS9+wezZsxuf19CNcvTo\nUW666SZefvlljj/+eK699lr+/d//vXGvu/lrXHTRRTz55JPU1taSnZ1NdXU1GRkZmBklJSVMnDix\n1XrT/fMU6Sp3Z/fh3Qlnye6u2k3+uPwWI2/yx+aTMTgj1aV3i7pckmzlypWsXr2aoUOH8uUvf5kn\nnniCuXPncsMNN/Dqq68yZ84cnnrqKRYsWEBJSQmDBzc90LNkyRLKy8spKyujsrKSSy65pMVed6LX\n+Pa3v83vfvc7rr76asrLy3vzLYukHTNjfOZ4xmeOZ15u02NTldWVbKrY1Bjwz218juKKYrbs28Kk\nkZMSzpIdkzEmRe+k47Yf3N7m430y0JPV49DVHdc777yTCRMmAPC1r32NdevWsX79em6++WbOOOMM\nAK6++mp+9KMfsXbtWs4999wmz1+5ciWPPfYYWVlZZGVlcccdd3D//fe3+Rrr16/vWrEi/dCIISOY\nM2kOcybNadJeU1fDln1bGoP+j+V/5Fd/+xVFu4vIGJyRcJZsTlZO2nRzfrj7wzYf71agm9ki4Cqg\nDtgAXAdkAs8CuUAZsNDdD3TndZpLdQ9CQ9ACDB8+nB07drB3716WL1/OI488AgQ/B2tqatixY0eL\n5+/YsYOcnJzG+1OmTGn3NT799NNkvgWRfmnwwMHMyJ7BjOwZXMZlje3uzo5DO5rMkn1x04sUVxRz\nqPoQM8bNaDF56sSxJ/b6MMsLp1/Y5uNdDnQzywVuBGa6e7WZPQtcAZwMvO7uD5vZ3cAi4J6uvk5f\nYGaccMIJfP/732fRokXtrn/88cfzySefMHPmTIBOdZ+ky56CSJSYGZOzJjM5azJfnfbVJo/tP7q/\nMeSLK4p5Yv0TFFUUse3ANqaOntoi6Gdmz2Tk0JEpeR/d2UM/CFQDmWZWD2QA2wkCfH64znKgkIgH\nOsCNN97I5Zdfzvnnn8/cuXOpqqrijTfeYP78+WRmNj1V6cKFC3nggQc444wzqKqq4uc//3mHX2fC\nhAns2bOHgwcPkpWVley3ISLNjB42mrNyzuKsnLOatB+rPUbp3tLGoF+1eRU/XftTSvaUMDZjbMLJ\nUxMyJ/ToTlmXA93d95nZT4By4DCw2t1fN7MJ7r4rXGenmY1PUq1pobW/jNNPP53HH3+c2267jc2b\nN5ORkcE555zD/PnzWzzvBz/4ATfffDN5eXlMmjSJK6+8kmXLlrX7GgAzZszgiiuuYNq0adTX17Nx\n48Y2R7mISM8YOmgos8bPYtb4WU3a672e8gPljSNv1u1cx4oPVlC0u4g6r0sY9Hmj85IyzLLLwxbN\nbBrwCnAOcABYCfwH8Ii7j41bb4+7j0vwfF+8eHHj/VgsRiwW65fD7H75y1/y7LPPsmbNmqRvuz9+\nniLpquJwRcLTIXxW9Rknjj2xxSzZGeNm8Paf3qawsLBxG/fff3/yp/6b2ULgAne/Mbx/NXAWcB4Q\nc/ddZjYRWOPuBQme3yfHoSfDzp072bJlC2effTYlJSVceuml3HHHHdx+++1Jf63+8HmK9HVV1VWU\n7ClpcTqEj/Z9xPEjjm8S9DfOubFHxqFvAu4zs2HAMeB84C9AJXAt8BDwT8CL3XiNSKquruamm26i\nrKyM0aNHc8UVV3DLLbekuiwRSZHMIZmcdvxpnHb8aU3aa+tr+Xjfx41B/2b5m21up1szRc3suwTh\nXQesA/4ZGAk8B0wBthIMW9yf4Ln9dg+9N+nzFIkWnW2xH9PnKRItbQW6LhIoIhIRCnQRkYhQoIuI\nRIQCXUQkIhToveyWW27hRz/6UYfWve666/jBD37QwxWJSFT0ydPn9mW/+MUvkratAQMGsHnzZqZN\nm5a0bYpI36U99D5MZ14UkXgK9E544oknWLBgQeP9/Px8vvWtbzXeP+GEE3j//ffZtGkTF154IePG\njaOgoICVK1c2rtO8G+Xhhx9m0qRJ5OTksHTpUgYMGMCWLVsaH9+7dy+XXnopWVlZnH322Xz88ccA\nzJ8/H3fnlFNOISsrq8lriEj/pEDvhPnz5/Pmm8HU208//ZSamhreeustALZs2UJVVRUnnngiF1xw\nAVdddRUVFRU888wz3HrrrRQXF7fY3qpVq/jZz37GH/7wBzZv3kxhYWGLve5nn32W+++/n/379zN9\n+nTuvfdeAN544w0ANmzYwMGDB/nmN7/Zk29dRPqAPtmHbvcnp6vBF3duBmVeXh4jR45k/fr1bNq0\niYsuuoj33nuPkpIS/vznP3PuuefyyiuvkJeXxzXXXAPAqaeeyje+8Q1WrlzJfffd12R7K1eu5Lrr\nrmu80MWSJUtYsWJFk3W+/vWvM2dOcBmtK6+8krvuuqvpe9AsUBEJ9clA72wQJ9P8+fNZs2YNmzdv\nJhaLMWbMGAoLC3nrrbeYP38+W7duZe3atYwdG5xB2N2pq6trDPh4O3bs4Mwzz2y8P2XKlBYBHX+u\n8+HDh1NZWdlD70xE+ro+GeipNG/ePF5++WXKysq49957GTVqFE8//TRr167l9ttvp6SkhFgsxu9/\n//t2t9VwKboG5eXlOtApIl2mPvROathDP3LkCJMmTeLcc89l1apV7Nmzh9NOO41LL72UkpISnnrq\nKWpra6mpqeHdd99l06ZNLba1cOFCli1bRnFxMYcPH+aHP/xhp2qZOHFikwOoItK/KdA7KT8/n5Ej\nRzJv3jwARo4cyfTp0znnnHMwM0aMGMHq1at55plnmDRpEpMmTeKee+7h2LFjLbZ18cUXc8cdd/CV\nr3yFk046ibPPPhuAoUOHdqiWJUuWcM011zB27Fief/755L1JEemTdPrcNFJcXMzs2bM5duwYAwYk\n57u2P3+eIlGk0+emsRdeeIHq6mr27dvH3XffzYIFC5IW5iLSvyg5Uuyxxx5j/Pjx5OfnM3jwYB59\n9NFUlyQifZS6XCJOn6dItKjLRUSkH1Cgi4hEhAJdRCQi0m6maG5urmZLJlFubm6qSxCRXtKtg6Jm\nNgp4HJgF1APXAyXAs0AuUAYsdPcDCZ6b8KCoiIi0ricPiv4b8Jq7FwCnAsXAPcDr7j4D+AOwqJuv\n0WMKCwtTXUKn9cWaoW/WrZp7T1+sOx1r7nKgm1kWcK67LwNw99pwT/wyYHm42nLg8m5X2UPS8S+k\nPX2xZuibdavm3tMX607Hmruzh54HVJjZMjP7m5n9ysyGAxPcfReAu+8ExiejUBERaVt3An0QcDrw\nc3c/Hagi6G5p3jGujnIRkV7Q5YOiZjYBeMvdp4X3zyEI9OlAzN13mdlEYE3Yx978+Qp6EZEuaO2g\naJeHLYaBvc3MTnL3EuB84MPwdi3wEPBPwIudKUhERLqmu8MWTyUYtjgY2AJcBwwEngOmAFsJhi3u\n736pIiLSlpSdnEtERJKr30z9N7OhZva2ma0zsw/N7P+E7Q+bWZGZrTez/wiHY6aFNmr+ezP7wMzq\nzOz0VNcZr42ax5jZajPbZGa/DyelpQUzyzGzP4T1bjCzO8L2U83sz2b2npm9aGYjUl1rvDbqnmtm\n74R/B++Y2RmprrVBgppvD9ufCUfL/c3MPjazv6W61gatfc7hY7eH+bHBzB5MZZ1AcFX6/nIDhod/\nDgTWAn8HfBUYELY/CDyQ6jo7UPMMIJ9g4tbpqa6xgzU/BPxr2H438GCq64yrdyLwxXB5BMEEuQLg\nHeCcsP1a4H+nutYO1r0GuDBsv4RgYELK622l5k3AzGbr/Bj4fqprba9mIAasBgaFj2WnutZ+s4cO\n4O6Hw8WhBL9O9rn76+5eH7avBXJSUlwrWql5k7uXAml5YDlRzaTxhDN33+nu68PlSoJgnAzku/ub\n4WqvA99IUYkJtVL3JOBTYHS42mhge2oqbClBzUUEn3W8hcBve7u21rRR8y0EOya14WMVqasy0K8C\n3cwGmNk6YCdQ6O4bm61yPfC73q+sdR2oOe20UnOfmHBmZlOBLxJ8uX9oZgvChxaSZl/28eLqfptg\n+PBPzKwceJg0Pf1Gs5ob2s4Fdrr7Rykqq03Naj4JmGdma81sTTp0bfWrQHf3enc/jeA/5jwzm9/w\nmJndC9S4+4qUFZhAWzWnq2Y1n2tmMfrAhLOwj/x54M5wT+wG4F/M7C9AJlCdyvpak6DupcDt7n4C\n8L+AX6eyvkQS1NzgCtJo7zxegpoHAWPc/SzgXwlG96VUvwr0Bu5+EHgVOAPAzK4F/gfwjyksq03N\na+4LwppfI6h5VzgZjXDC2WeprK05MxtE8J/1N+7+IkDYtXWRu58JPAOk3V5jorqBL7n7CwDu/jww\nN1X1JdJKzZjZQOB/EpytNa20UvM24D8B3P0vQL2ZjUtRiUA/CnQzy24YWWFmGcAFwHozuxj4LrDA\n3Y+lssbmWqu5+Wq9XlgbWql5HfASwYFFaGPCWQr9Gtjo7v/W0GBmx4V/DgC+D/wyRbW1pUXdQGnD\nLzkzO5/glNbpJFHNEPxbKXL3HSmoqT2Jan4BOA/AzE4CBrv7nlQU16DfjEM3s9kEB+OM4IvsN+7+\nYzMrBYZVVs6yAAAEh0lEQVQADX8Ra9391hSV2UQbNV8OPAJkA/uB9e5+Seoq/VwbNY8lTSecmdnf\nAX8ENhB0BTnwPYI+0n8J7/+nu38vZUUm0Ebdu4FHCf5dHwVudfd1qaozXms1u/sqM1tGcDqRX6Wy\nxuba+Jz/myDovwgcA+5y9zdSVSf0o0AXEYm6ftPlIiISdQp0EZGIUKCLiESEAl1EJCIU6CIiEaFA\nFxGJCAW6iEhEKNClV5hZrplt6OJz55vZy8muqSeZ2Rwz+1knn7PYzL7TUzVJ9HX5mqIiXdCdWWx9\nagacu/8V+Guq65D+RXvo0psGm9lTZrbRzJ4zs2Fmdn54lZr3zOxxMxsMYGYXh1eCeZfghE1YoKTh\nBEjh/dLWTohkZsvM7FEze8vMNptZzMyeCF//13HrPRpe2WeDmS2Oa3/QgitDrTezh8O2b4brrTOz\nwtbeaPyvinDPe2l4itXNFl6lJ3zsXguu4vRHgguXNLRPM7PfmdlfzOyN8FwhmNkLZnZ1uHyTmf2m\n038LEl2pvsKGbv3jBuQC9cBZ4f3HgXuBcmB62LYcuIPgwhjlwLSw/VngpXD5PoLTl0JwMqeVbbzm\nMmBFuLwAOAicHN5/FzglXB4d/jmA4Go/s4CxQHHctrLCP98Hjo9va+W158fVvBh4k+AX8TigguBq\nTnOA98L3OxIoBb4TPuf1uM9lLvDf4fJ4gpNtnUNwQYtRqf671S19btpDl95U7u5rw+WngfOBLf75\nxQyWA/MILu+1xd23hO1PxW1jGXB1uHx9eL8tDX3vG4BP/fMLhHwITA2X/8HM/kpwVsiTw9sB4Ej4\nq+HrwJFw3TeB5Wb2z3Suy/JVd6/14Gx8u4AJBKH8/9z9mLsfIjgjJWaWCXwZWGnBhUIeC9fH3T8j\n+IJYQxD+BzpRg0Sc+tClNzXvB99PsCecSMLTArv7J2a2y8y+ApxJ++ewbzglcn3ccsP9QRZcgeYu\nYI67HwzP+DfM3evMbC7Bl843gduA8939VjM7E7gU+KuZne7u+9qpIb4OgDra/r/XcKnB1i4AfgrB\nXn7zS7dJP6c9dOlNuWb2pXD5H4G/AFPNbFrYdjVQSNCVkGtmeWH7Fc22s5Rgr/05d+/MwdJEXxJZ\nQCVwKLwAxyUAZjacoCtmFfAdghDFzKa5+1/cfTHBRTqmdOL1m9fxR+ByMxtqZiOBrwGEe+sfm9nf\nNz7BrOH15wIXAacB3zWz3C68vkSUAl16UzHBJd02Ely8+KfAdcDzZvYewZ7rYx5caOTbwGvhQdFd\nzbbzEsEl4Z5o5/XauuydA7j7+wQXDSki+JJouCh0FvBKWNcfCS7lBvB/zex9M3sf+FP4/M5qeO11\nBOeIf5/galTvxK1zFXBDeED2A2CBmQ0BfgVc58F1We8iDS8vJ6mj86FLn2PBxXh/4u5pf31Vkd6k\nPnTpU8zsbuBm0vj6ryKpoj106fPM7HsEBy6doH/aCYYzPtALr30h8BCfd+cYwQidb/T0a4s0p0AX\nEYkIHRQVEYkIBbqISEQo0EVEIkKBLiISEQp0EZGI+P8vGw3fHxPMKwAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x11166eb90>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"people.plot(kind = \"line\", x = \"body_mass_index\", y = [\"height\", \"weight\"])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can pass extra arguments supported by matplotlib's functions. For example, we can create scatterplot and pass it a list of sizes using the `s` argument of matplotlib's `scatter` function:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEPCAYAAABRHfM8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGFVJREFUeJzt3XuUnXV97/H3N+RGEsFwSwQSBI0YoIggxCUgW7lJPQUU\niohFQJF6KD0Vag6ksBZZq7bcDkdaKloQ07Q2QjhVwC4gAs2oIIISJNyEVARCgAgCgXAZIPmeP/aT\nOBlmwm/PzL5MeL/WmsWzn8ven+zJkw/PPTITSZLeyoh2B5AkDQ8WhiSpiIUhSSpiYUiSilgYkqQi\nFoYkqUhTCyMiLo+I5RGxuMe48yPigYj4VUT8R0Rs0mParIhYUk0/qJnZJEmNafYWxhzg4F7jfgTs\nnJm7AUuAWQARsRNwFDAdOAS4JCKiyfkkSYWaWhiZeQvwXK9xN2Xm6urlz4Ftq+FDgSsy843MfIR6\nmezVzHySpHLtPobxBeC6angbYGmPacuqcZKkDtC2woiIM4HXM/N77cogSSo3sh0fGhHHA38MfLzH\n6GXAlB6vt63G9bW8N8CSpAHIzAEfG27FFkZUP/UXEZ8AZgKHZmZ3j/muBY6OiNERsT3wXuCO/t40\nMzv+5+yzz257BnOaczjnHA4Zh1POwWrqFkZEzANqwOYR8RhwNvA3wGjgxuokqJ9n5smZeX9EzAfu\nB14HTs6h+BNKkoZEUwsjM4/pY/Sc9cx/DnBO8xJJkgaq3WdJbdBqtVq7IxQx59Ay59AZDhlh+OQc\nrBiOe30iwr1VktSgiCA7/KC3JGkDYGFIkopYGJKkIhaGJKmIhSFJKmJhSJKKWBiSpCIWhiSpiIUh\nSSpiYUiSilgYkqQiFoYkqYiFIUkqYmFIkopYGJKkIhaGJKmIhSFJKmJhSJKKWBiSpCIj2x1AkvQH\nq1ev5vbbb2fFihXMmDGDiRMntjvSWm5hSFKHuPvuu5k6dToHHXQin/nMBWy99Q787d+e2+5Ya0Vm\ntjtDwyIih2NuSerPq6++yjbbvJdnnz0f+CwQwOOMH38gc+d+jSOOOGLQnxERZGYMdHm3MCSpA1xz\nzTW89trOwDHUywJgW1566Wuce+4lbUz2BxaGJHWAxx9/nNdem97HlOk88cSylufpi4UhSR1gjz32\nYPToG4HV64yPWMCee+7RnlC9eAxDkjpAZjJjxsdZvHgq3d1/B2wFXMm4cadx6603sttuuw36Mzr6\nGEZEXB4RyyNicY9xR0bEvRGxKiJ27zX/rIhYEhEPRMRBzcwmSZ0kIrj55mv5/Oc3ZeONdyFiYz70\noTncdNO1Q1IWQ6GpWxgRsQ+wEvjXzNy1Grcj9W2ufwa+mpmLqvHTgXnAnsC2wE3AtL42JdzCkLQh\ny0wykxEjhvb/6Tt6CyMzbwGe6zXuwcxcwh9OA1jjMOCKzHwjMx8BlgB7NTOfJHWiiBjyshgKnZRo\nG2Bpj9fLqnGSpA7QSYUhSepgnXQvqWXAlB6vt63G9Wn27Nlrh2u1GrVarVm5JGlY6urqoqura8je\nr+mn1UbEu4EfZuYf9Rq/kPpB7zur1zsB/w7MoL4r6kY86C1JQ2awB72buoUREfOAGrB5RDwGnE39\nIPjFwBbAf0bErzLzkMy8PyLmA/cDrwMn2wqS1Dm8cE+S3iY6+rRaSdKGw8KQJBWxMCRJRSwMSVIR\nC0OSVMTCkCQVsTAkSUUsDElSEQtDklTEwpAkFbEwJElFLAxJUhELQ5JUxMKQJBWxMCRJRSwMSVIR\nC0OSVMTCkCQVsTAkSUUsDElSEQtDklTEwpAkFbEwJElFLAxJUhELQ5JUxMKQJBWxMCRJRSwMSVIR\nC0OSVKSphRERl0fE8ohY3GPcxIj4UUQ8GBELImLTHtNmRcSSiHggIg5qZjZJUmOavYUxBzi417gz\ngJsyc0fgv4BZABGxE3AUMB04BLgkIqLJ+SRJhZpaGJl5C/Bcr9GHAXOr4bnA4dXwocAVmflGZj4C\nLAH2amY+SVK5dhzD2CozlwNk5lPAVtX4bYClPeZbVo2TJHWAke0OAORAFpo9e/ba4VqtRq1WG6I4\nkrRh6Orqoqura8jeLzIH9O91+QdEbAf8MDN3rV4/ANQyc3lETAYWZub0iDgDyMw8r5rvBuDszLy9\nj/fMZueWpA1NRJCZAz423IpdUlH9rHEtcHw1fBxwTY/xR0fE6IjYHngvcEcL8kmSCjR1l1REzANq\nwOYR8RhwNnAucFVEfAF4lPqZUWTm/RExH7gfeB042c0ISeocTd8l1QzukpKkxg2HXVKSpA2AhSFJ\nKmJhSJKKWBiSpCIWhiSpiIUhSSpiYUiSilgYkqQiFoYkqYiFIUkqYmFIkopYGJKkIhaGJKmIhSFJ\nKmJhSJKKWBiSpCIWhiSpiIUhSSpSVBgR8Vcl4yRJG67SLYzj+hh3/BDmkCR1uJHrmxgRnwWOAbaP\niGt7THoH8Gwzg0mSOst6CwP4GfAksAVwYY/xLwKLmxVKktR5IjPbnaFhEZHDMbcktVNEkJkx0OVL\nD3p/OiKWRMSKiHghIl6MiBcG+qGSpOGnaAsjIv4b+JPMfKD5kd6aWxiS1LiWbGEAyzulLCRJ7bHe\nLYyI+HQ1uB8wGbga6F4zPTO/39R0/edyC0OSGjTYLYy3Kow561k2M/MLA/3gwbAwJKlxTS2MZqqu\nFD+xenlZZv5jREwErgS2Ax4BjsrMFX0sa2FIUoNaUhgR8Y99jF4B/DIzr2n4QyN2Br4H7Am8AVwP\n/E/gJOD3mXl+RJwOTMzMM/pY3sKQpAa16qD3WGA3YEn1syuwLfDFiLhoAJ87Hbg9M7szcxXwE+DT\nwKHA3GqeucDhA3hvSVITvNWV3mvsCuxd/eNORHwT+CmwD3DPAD73XuBr1S6obuCPgV8CkzJzOUBm\nPhURWw3gvSVJTVBaGBOBCdR3QwGMBzbLzFUR0d3/Yn3LzF9HxHnAjcBK4C5gVV+z9vces2fPXjtc\nq9Wo1WqNxpCkDVpXVxddXV1D9n6lxzC+CJwFdAEBfBT4e+rHIWZn5sxBhYj4O2Ap8FdALTOXR8Rk\nYGFmTu9jfo9hSFKDWnaWVES8C9irevmLzHxioB9avd+Wmfl0REwFbgA+DJwJPJuZ53nQW5KGVrOv\nw3h/tfto976mZ+aiAX9wxE+AzYDXgVMzsysiNgPmA1OAR6mfVvt8H8taGJLUoGYXxqWZeVJELOxj\ncmbmxwf6wYNhYUhS44bthXuDYWFIUuNadXvzcRFxVkRcWr2eFhH/Y6AfKkkafkov3JsDvAZ8pHq9\nDPhaUxJJkjpSaWG8JzPPp36Amsx8mfrptZKkt4nSwngtIjamupAuIt5Dj9ucS5I2fKVXep9N/VqJ\nKRHx78DewPHNCiVJ6jylV3p/F1gMvAI8TP3Ggc80Odv68niWlCQ1qFW3N/8YsG/18x7q9376SWb+\nw0A/eDAsDElqXCtvDbIR9edXfAz4MvBKZr5/oB88GBaGJDVusIVRdAwjIm6mfofa26jf1nzPzPzd\nQD9UkjT8lJ4ltZj6dRi7UH82xi7VWVOSpLeJhm4NEhHvoH521FeByZk5pkm53iqHu6QkqUGt2iV1\nCvUD3nsAjwDfob5rSpL0NlF6HcZY4P8Cd2bmG03MI2mYefbZZ7nzzjtZuXIlEyZMYI899mCzzTZr\ndyw1gXerlTQgixYt4oIL/omrr/4BY8bsRuamRKygu/tXHH74p5g58xR2373PR+moTVpyt1pJ6umi\niy5m330/yfz5O/Lqqw+xYsVCXnjhalasWMirrz7E/Pk7su++n+Siiy5ud1QNIbcwJDXkW9+6jL/+\n6/N4+eWbgHevZ85HGDfuAC688HS+/OUvtSid1scHKElqmaeffpqpU9/Hq6/eAUwrWGIJY8fuxWOP\nPcSWW27Z7Hh6C+6SktQyl132HSI+RVlZAEwj4nAuv3xOM2OpRdzCkFRs0qQd+N3vrqR+l6BSv2DS\npKN56qnfNCuWCrlLSlJLdHd3M27cBFavfo3Gnp+WjBgxmldeeYnRo0c3K54KuEtKUkt0d3ez0UZj\naPxhm8FGG42hu9tnrg13FoakIhMmTGD16teBFQ0uuYLVq19nwoQJzYilFrIwJBUZMWIEBx98GPXn\nqZWL+Dc+8YnDiRjwnhB1CAtDUrGZM09m/PhLgNJjiMn48Zcwc+bJzYylFrEwJBXbb7/9mDZtM0aN\nOqto/lGjzmTatM356Ec/2uRkagULQ1KxiGDBgu+z9dY/YPToU4GV/cy5ktGjv8LWW1/NggXfd3fU\nBsLCkNSQLbfckkWLbuGAA5YxZsxUxow5BbgRuAO4kTFjTmHMmKkccMAT3HXXrV7hvQFp23UYETEL\n+DNgFXAPcAL1x8BeCWxH/bkbR2Xmm07J8DoMqTMsXbqUb3zjUhYsuIWVK19kwoR3cPDB+/AXf3ES\nU6ZMaXc89TIsL9yLiO2AhcD7M/O1iLgSuA7YCfh9Zp4fEacDEzPzjD6WtzAkqUHD9cK9F6g/I3x8\nRIwENgaWAYcBc6t55gKHtyeeJKm3thRGZj4HXAg8Rr0oVmTmTcCkzFxezfMUsFU78kmS3qz0Ea1D\nKiJ2AE6lfqxiBXBVRHyON5/c3e9+p9mzZ68drtVq1Gq1Ic8pScNZV1cXXV1dQ/Z+7TqGcRRwYGZ+\nqXp9LPBh4ONALTOXR8RkYGFmTu9jeY9hSFKDhusxjAeBD0fE2KifoL0/cD9wLXB8Nc9xwDXtiSdJ\n6q2dp9XOpF4Oq4C7gBOBdwDzgSnAo9RPq32+j2XdwpCkBg3L02oHy8KQpMYN111SkqRhxsKQJBWx\nMCRJRSwMSVIRC0OSVMTCkCQVsTAkSUUsDElSEQtDklTEwpAkFbEwJElFLAxJUhELQ5JUxMKQJBWx\nMCRJRSwMSVIRC0OSVMTCkCQVsTAkSUUsDElSEQtDklTEwpAkFbEwJElFLAxJUhELQ5JUxMKQJBWx\nMCRJRSwMSVKRthRGRLwvIu6KiEXVf1dExP+KiIkR8aOIeDAiFkTEpu3IJ0l6s8jM9gaIGAE8DswA\nTgF+n5nnR8TpwMTMPKOPZbLduSVpuIkIMjMGunwn7JI6APhNZi4FDgPmVuPnAoe3LZUkaR2dUBif\nAeZVw5MyczlAZj4FbNW2VJKkdbS1MCJiFHAocFU1qvd+Jvc7SVKHGNnmzz8EuDMzn6leL4+ISZm5\nPCImA7/rb8HZs2evHa7VatRqtWbmlKRhp6uri66uriF7v7Ye9I6I7wE3ZObc6vV5wLOZed5wPOid\nmfz4xz/mggsu4Wc/u5U33niNbbfdgdNOO5FjjjmG8ePHtzuipLexwR70blthRMQ44FFgh8x8sRq3\nGTAfmFJNOyozn+9j2Y4rjO7ubo488lgWLlzMyy//JZl/AowB7mLChG8ybtzdLFx4HTvttFO7o0p6\nmxq2hTEYnVgYRxxxLNdfv5JXXrmCelGsK+JfmTjxTO677xdMnjy59QElve1tCKfVDnv33nsv119/\nM6+8Mo++ygIg8/OsXHkoX//6P7U2nCQNEbcwhsCJJ57Cv/zLFqxaNfst5vw1m2xS45lnljJq1KhW\nRJOktdzC6AC33XYXq1YdUDDn+1m1ahRPPvlk0zNJ0lCzMIZA5mqgtLSD1atXNzOOJDWFhTEEdt99\nF0aM+GnBnA8DL/Oud72r2ZEkachZGEPgK1/5c8aO/Rbw+nrnGzXqm5xwwvGMGdP3gXFJ6mQe9B4i\n++9/KLfeujnd3d8GNupjjh+yySZf4p577mDq1KmtjidJHvTuFNdcM48PfnAp48fvDVwBvAysAn7J\n2LFfZNNNT+LGG6+1LCQNWxbGEJkwYQI//ekNzJ07kz33vIyNNnonEaOZNOkznHXWNJYsWcxee+3V\n7piSNGDukmqSzCQzGTHCTpbUGQa7S6rdd6vdYEUEEQP+vUhSx/F/fyVJRSwMSVIRC0OSVMTCkCQV\nsTAkSUUsDElSEQtDklTEwpAkFbEwJElFLAxJUhELQ5JUxMKQJBWxMCRJRSwMSVIRC0OSVMTCkCQV\nsTAkSUXaVhgRsWlEXBURD0TEfRExIyImRsSPIuLBiFgQEZu2K58kaV3t3ML4B+C6zJwOfAD4NXAG\ncFNm7gj8FzCrjfkGraurq90RiphzaJlz6AyHjDB8cg5WWwojIjYB9s3MOQCZ+UZmrgAOA+ZWs80F\nDm9HvqEyXP4SmXNomXPoDIeMMHxyDla7tjC2B56JiDkRsSgiLo2IccCkzFwOkJlPAVu1KZ8kqZd2\nFcZIYHfgG5m5O/AS9d1R2Wu+3q8lSW0Sma3/NzkiJgG3ZeYO1et9qBfGe4BaZi6PiMnAwuoYR+/l\nLRJJGoDMjIEuO3Iog5SqCmFpRLwvMx8C9gfuq36OB84DjgOu6Wf5Af+BJUkD05YtDICI+ADwbWAU\n8DBwArARMB+YAjwKHJWZz7cloCRpHW0rDEnS8NKRV3pHxOURsTwiFvcYd0V1RtWiiPhtRCzqMW1W\nRCypLgI8qBNzRsQBEfHLiLg7In4RER/rxJw9pk+NiBcj4rROzBgRu0bEzyLi3uo7Hd1pOSNiTETM\ni4jF1cWpZ7Qi43py7hkRd0TEXdV/P9RjWietQ33m7MB1qN/vs5re0nVoIDkbXo8ys+N+gH2A3YDF\n/Uz/P8BZ1fB04C7qx2PeDfw31ZZTh+X8ADC5Gt4ZeLwTv88e464CrgRO67SM1Hdd3g3sUr2e2KG/\n8+OAedXwxsBvgantygksBA6qhg+hflIJwE6dtA6tJ2dHrUP95ewxvaXr0AC+z4bXo47cwsjMW4Dn\n1jPLUcC8avgw4IqsX/z3CLAE2Ku5CesKc36vmvfurF9bQmbeB4yNiFHNT9lYToCIOIz6caX7mhxt\nrQYzHgTcnZn3Vss+l9Xf+GZrMOdTwPiI2AgYB3QDLzQ3YV0/OZ8E1txu553Asmr4UDprHeozZweu\nQ/19n21Zh6DhnA2vR205S2owImJf4KnMfLgatQ1wW49ZllXj2qpHzt/0Me1IYFFmvt76ZG/Ksk7O\niBgP/G/gQGBmO7Ot0cd3+b5q/A3AFsCVmXlBu/Kt0TtnZi6IiD+jvsJuDJya7T2J4wzg1oi4EAjg\nI9X4TluH+su5VoesQ33m7MB1qL/vs+H1qCO3MN7CZ+nxf8MdrM+cEbEzcA5wUssT9a13ztnA1zPz\n5ep1J5zC3DvjSGDvavy+wKdauT97PdbJGRGfo14Uk4EdgK9GxLvbkqzucuAvM3MqcCrwnTZmWZ/1\n5uygdai/nLPprHWov5wNr0fDaguj2rT/NPWrxNdYRv003DW2pcemYTv0k5OI2Bb4PnBstenfVv3k\nnAEcERHnU9+nuSoiXsnMSzoo4+PATzLzuWqe66rpC1ufsK6fnHsDP8jM1cDTEXEr8CHgkdYnBGBG\nZh4IkJn/LyK+XY3vtHWod87L10zosHWov++zo9ah9eRseD3q5C2M4M3NfCDwQGY+0WPctcDRETE6\nIrYH3gvc0aKMUJgz6rdq/0/g9Mz8eQvzrY1AQc7M/Ghm7pD1q/AvAv6+hX/RS3/nC4A/ioixETES\n2A+4v0UZoTznr6lflLpmN8WHq3Gt0jvnkojYr8qzP/VjFdB561DvnA9Vw++ks9ahPr/PNq9DxTkZ\nyHrUqqP3DR7pnwc8Qf0g4WPACdX4OcBJfcw/i/qZHQ9QnQ3QaTmBM4EXgUXUz0hZBGzRaTl7LXc2\nrTtLqtHf+THAvcBi4JwO/Z2PAb4L3FNlbeXZMm/KCewB3F79/bsN+GCP+TtmHeoj527VvB21Dq3v\n++yxXMvWoQH+3htaj7xwT5JUpJN3SUmSOoiFIUkqYmFIkopYGJKkIhaGJKmIhSFJKmJhSH2IiO0i\n4p4G5v/z6r5R65vnuIi4uJ9psxrNKLWahSH1r/gipcz858z87iDe829KP0tqFwtD6t/IiLi0erjM\nDdUDkXaIiOurB/j8OCLW3PHz7DUPyqkeWHN31B+odH6vLZVtquUfjIhzq/nPATau5v+31v8xpTIW\nhtS/acDFmbkL8DxwJHApcEpm7kn91tXf7GO57wBfyszdgVWsu1XxAeBPgV2p379pm8ycBbycmbtn\n5rHN++NIgzOs7lYrtdjDmblm62AR9afRfQS4KiLW3NxtnQf4VDeZnJCZa27eNw/4ZI9Zbs7MldW8\n9wPb0ea7K0ulLAypf909hlcBk4Dnqi2H9Vnf8w96v+eadbDdz0yQ3pK7pKT+9f5H/AXgt9XT3uoz\nROzac4bMXAG8EBF7VqOOLvys16pnakgdy8KQ+tf7jKYEPgd8MSJ+FRH3Un8edm8nAt+OiEXUn+W9\nouD9LwXu8aC3Opm3N5eGWESMz8yXquHTgcmZeWqbY0mD5jEMaeh9sroQbyT1x7Ee39Y00hBxC0OS\nVMRjGJKkIhaGJKmIhSFJKmJhSJKKWBiSpCIWhiSpyP8HgSKWja+++IAAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x11160b350>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"people.plot(kind = \"scatter\", x = \"height\", y = \"weight\", s=[40, 120, 200])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, there are way too many options to list here: the best option is to scroll through the [Visualization](http://pandas.pydata.org/pandas-docs/stable/visualization.html) page in pandas' documentation, find the plot you are interested in and look at the example code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Operations on `DataFrame`s\n",
"Although `DataFrame`s do not try to mimick NumPy arrays, there are a few similarities. Let's create a `DataFrame` to demonstrate this:"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>10</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>4</td>\n",
" <td>8</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov\n",
"alice 8 8 9\n",
"bob 10 9 9\n",
"charles 4 8 2\n",
"darwin 9 10 10"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades_array = np.array([[8,8,9],[10,9,9],[4, 8, 2], [9, 10, 10]])\n",
"grades = pd.DataFrame(grades_array, columns=[\"sep\", \"oct\", \"nov\"], index=[\"alice\",\"bob\",\"charles\",\"darwin\"])\n",
"grades"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can apply NumPy mathematical functions on a `DataFrame`: the function is applied to all values:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>2.828427</td>\n",
" <td>2.828427</td>\n",
" <td>3.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>3.162278</td>\n",
" <td>3.000000</td>\n",
" <td>3.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>2.000000</td>\n",
" <td>2.828427</td>\n",
" <td>1.414214</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>3.000000</td>\n",
" <td>3.162278</td>\n",
" <td>3.162278</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov\n",
"alice 2.828427 2.828427 3.000000\n",
"bob 3.162278 3.000000 3.000000\n",
"charles 2.000000 2.828427 1.414214\n",
"darwin 3.000000 3.162278 3.162278"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sqrt(grades)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, adding a single value to a `DataFrame` will add that value to all elements in the `DataFrame`. This is called *broadcasting*:"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>11</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>5</td>\n",
" <td>9</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov\n",
"alice 9 9 10\n",
"bob 11 10 10\n",
"charles 5 9 3\n",
"darwin 10 11 11"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades + 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, the same is true for all other binary operations, including arithmetic (`*`,`/`,`**`...) and conditional (`>`, `==`...) operations:"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov\n",
"alice True True True\n",
"bob True True True\n",
"charles False True False\n",
"darwin True True True"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades >= 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Aggregation operations, such as computing the `max`, the `sum` or the `mean` of a `DataFrame`, apply to each column, and you get back a `Series` object:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"sep 7.75\n",
"oct 8.75\n",
"nov 7.50\n",
"dtype: float64"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `all` method is also an aggregation operation: it checks whether all values are `True` or not. Let's see during which months all students got a grade greater than `5`:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"sep False\n",
"oct True\n",
"nov False\n",
"dtype: bool"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(grades > 5).all()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Most of these functions take an optional `axis` parameter which lets you specify along which axis of the `DataFrame` you want the operation executed. The default is `axis=0`, meaning that the operation is executed vertically (on each column). You can set `axis=1` to execute the operation horizontally (on each row). For example, let's find out which students had all grades greater than `5`:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"alice True\n",
"bob True\n",
"charles False\n",
"darwin True\n",
"dtype: bool"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(grades > 5).all(axis = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `any` method returns `True` if any value is True. Let's see who got at least one grade 10:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"alice False\n",
"bob True\n",
"charles False\n",
"darwin True\n",
"dtype: bool"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(grades == 10).any(axis = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you add a `Series` object to a `DataFrame` (or execute any other binary operation), pandas attempts to broadcast the operation to all *rows* in the `DataFrame`. This only works if the `Series` has the same size as the `DataFrame`s rows. For example, let's substract the `mean` of the `DataFrame` (a `Series` object) from the `DataFrame`:"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>0.25</td>\n",
" <td>-0.75</td>\n",
" <td>1.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>2.25</td>\n",
" <td>0.25</td>\n",
" <td>1.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>-3.75</td>\n",
" <td>-0.75</td>\n",
" <td>-5.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>1.25</td>\n",
" <td>1.25</td>\n",
" <td>2.5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov\n",
"alice 0.25 -0.75 1.5\n",
"bob 2.25 0.25 1.5\n",
"charles -3.75 -0.75 -5.5\n",
"darwin 1.25 1.25 2.5"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades - grades.mean() # equivalent to: grades - [7.75, 8.75, 7.50]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We substracted `7.75` from all September grades, `8.75` from October grades and `7.50` from November grades. It is equivalent to substracting this `DataFrame`:"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>7.75</td>\n",
" <td>8.75</td>\n",
" <td>7.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>7.75</td>\n",
" <td>8.75</td>\n",
" <td>7.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>7.75</td>\n",
" <td>8.75</td>\n",
" <td>7.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>7.75</td>\n",
" <td>8.75</td>\n",
" <td>7.5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov\n",
"alice 7.75 8.75 7.5\n",
"bob 7.75 8.75 7.5\n",
"charles 7.75 8.75 7.5\n",
"darwin 7.75 8.75 7.5"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame([[7.75, 8.75, 7.50]]*4, index=grades.index, columns=grades.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to substract the global mean from every grade, here is one way to do it:"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>-4</td>\n",
" <td>0</td>\n",
" <td>-6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov\n",
"alice 0 0 1\n",
"bob 2 1 1\n",
"charles -4 0 -6\n",
"darwin 1 2 2"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades - grades.values.mean() # substracts the global mean (8.00) from all grades"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Automatic alignment\n",
"Similar to `Series`, when operating on multiple `DataFrame`s, pandas automatically aligns them by row index label, but also by column names. Let's create a `DataFrame` with bonus points for each person from October to December:"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" <th>dec</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" oct nov dec\n",
"bob 0 NaN 2\n",
"colin NaN 1 0\n",
"darwin 0 1 0\n",
"charles 3 3 0"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bonus_array = np.array([[0,np.nan,2],[np.nan,1,0],[0, 1, 0], [3, 3, 0]])\n",
"bonus_points = pd.DataFrame(bonus_array, columns=[\"oct\", \"nov\", \"dec\"], index=[\"bob\",\"colin\", \"darwin\", \"charles\"])\n",
"bonus_points"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dec</th>\n",
" <th>nov</th>\n",
" <th>oct</th>\n",
" <th>sep</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>9</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>10</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dec nov oct sep\n",
"alice NaN NaN NaN NaN\n",
"bob NaN NaN 9 NaN\n",
"charles NaN 5 11 NaN\n",
"colin NaN NaN NaN NaN\n",
"darwin NaN 11 10 NaN"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades + bonus_points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks like the addition worked in some cases but way too many elements are now empty. That's because when aligning the `DataFrame`s, some columns and rows were only present on one side, and thus they were considered missing on the other side (`NaN`). Then adding `NaN` to a number results in `NaN`, hence the result.\n",
"\n",
"### Handling missing data\n",
"Dealing with missing data is a frequent task when working with real life data. Pandas offers a few tools to handle missing data.\n",
" \n",
"Let's try to fix the problem above. For example, we can decide that missing data should result in a zero, instead of `NaN`. We can replace all `NaN` values by a any value using the `fillna` method:"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dec</th>\n",
" <th>nov</th>\n",
" <th>oct</th>\n",
" <th>sep</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>11</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>10</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dec nov oct sep\n",
"alice 0 0 0 0\n",
"bob 0 0 9 0\n",
"charles 0 5 11 0\n",
"colin 0 0 0 0\n",
"darwin 0 11 10 0"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(grades + bonus_points).fillna(0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's a bit unfair that we're setting grades to zero in September, though. Perhaps we should decide that missing grades are missing grades, but missing bonus points should be replaced by zeros:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dec</th>\n",
" <th>nov</th>\n",
" <th>oct</th>\n",
" <th>sep</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>NaN</td>\n",
" <td>9</td>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>NaN</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>11</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>10</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dec nov oct sep\n",
"alice NaN 9 8 8\n",
"bob NaN 9 9 10\n",
"charles NaN 5 11 4\n",
"colin NaN NaN NaN NaN\n",
"darwin NaN 11 10 9"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fixed_bonus_points = bonus_points.fillna(0)\n",
"fixed_bonus_points.insert(0, \"sep\", 0)\n",
"fixed_bonus_points.loc[\"alice\"] = 0\n",
"grades + fixed_bonus_points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's much better: although we made up some data, we have not been too unfair.\n",
"\n",
"Another way to handle missing data is to interpolate. Let's look at the `bonus_points` `DataFrame` again:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" <th>dec</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" oct nov dec\n",
"bob 0 NaN 2\n",
"colin NaN 1 0\n",
"darwin 0 1 0\n",
"charles 3 3 0"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bonus_points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's call the `interpolate` method. By default, it interpolates vertically (`axis=0`), so let's tell it to interpolate horizontally (`axis=1`)."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" <th>dec</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" oct nov dec\n",
"bob 0 1 2\n",
"colin NaN 1 0\n",
"darwin 0 1 0\n",
"charles 3 3 0"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bonus_points.interpolate(axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bob had 0 bonus points in October, and 2 in December. When we interpolate for November, we get the mean: 1 bonus point. Colin had 1 bonus point in November, but we do not know how many bonus points he had in September, so we cannot interpolate, this is why there is still a missing value in October after interpolation. To fix this, we can set the September bonus points to 0 before interpolation."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" <th>dec</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>0</td>\n",
" <td>0.5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>0</td>\n",
" <td>3.0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov dec\n",
"bob 0 0.0 1 2\n",
"colin 0 0.5 1 0\n",
"darwin 0 0.0 1 0\n",
"charles 0 3.0 3 0\n",
"alice 0 0.0 0 0"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"better_bonus_points = bonus_points.copy()\n",
"better_bonus_points.insert(0, \"sep\", 0)\n",
"better_bonus_points.loc[\"alice\"] = 0\n",
"better_bonus_points = better_bonus_points.interpolate(axis=1)\n",
"better_bonus_points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great, now we have reasonable bonus points everywhere. Let's find out the final grades:"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dec</th>\n",
" <th>nov</th>\n",
" <th>oct</th>\n",
" <th>sep</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>NaN</td>\n",
" <td>9</td>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>NaN</td>\n",
" <td>10</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>11</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>10</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dec nov oct sep\n",
"alice NaN 9 8 8\n",
"bob NaN 10 9 10\n",
"charles NaN 5 11 4\n",
"colin NaN NaN NaN NaN\n",
"darwin NaN 11 10 9"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades + better_bonus_points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There's not much we can do about December and Colin: it's bad enough that we are making up bonus points, but we can't reasonably make up grades (well I guess some teachers probably do).\n",
"\n",
"It is slightly annoying that the September column ends up on the right. This is because the `DataFrame`s we are adding do not have the exact same columns (the `grades` `DataFrame` is missing the `\"dec\"` column), so to make things predictable, pandas orders the final columns alphabetically. To fix this, we can simply add the missing column before adding:"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" <th>dec</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" <td>9</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>10</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>4</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov dec\n",
"alice 8 8 9 NaN\n",
"bob 10 9 10 NaN\n",
"charles 4 11 5 NaN\n",
"colin NaN NaN NaN NaN\n",
"darwin 9 10 11 NaN"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grades[\"dec\"] = np.nan\n",
"final_grades = grades + better_bonus_points\n",
"final_grades"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Aggregating with `groupby`\n",
"Similar to the SQL language, pandas allows grouping your data into groups to run calculations over each group.\n",
"\n",
"First, let's add some extra data about each person so we can group them:"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" <th>dec</th>\n",
" <th>hobby</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alice</th>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" <td>9</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bob</th>\n",
" <td>10</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>NaN</td>\n",
" <td>Dancing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>charles</th>\n",
" <td>4</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" <td>Reading</td>\n",
" </tr>\n",
" <tr>\n",
" <th>colin</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Dancing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>darwin</th>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>Biking</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov dec hobby\n",
"alice 8 8 9 NaN Biking\n",
"bob 10 9 10 NaN Dancing\n",
"charles 4 11 5 NaN Reading\n",
"colin NaN NaN NaN NaN Dancing\n",
"darwin 9 10 11 NaN Biking"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_grades[\"hobby\"] = [\"Biking\", \"Dancing\", \"Reading\", \"Dancing\", \"Biking\"]\n",
"final_grades"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's group data in this `DataFrame` by hobby:"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.DataFrameGroupBy object at 0x11157abd0>"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grouped_grades = final_grades.groupby(\"hobby\")\n",
"grouped_grades"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's compute the average grade per hobby:"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sep</th>\n",
" <th>oct</th>\n",
" <th>nov</th>\n",
" <th>dec</th>\n",
" </tr>\n",
" <tr>\n",
" <th>hobby</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Biking</th>\n",
" <td>8.5</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dancing</th>\n",
" <td>10.0</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Reading</th>\n",
" <td>4.0</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sep oct nov dec\n",
"hobby \n",
"Biking 8.5 9 10 NaN\n",
"Dancing 10.0 9 10 NaN\n",
"Reading 4.0 11 5 NaN"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grouped_grades.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That was easy! Note that the `NaN` values have simply been skipped."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview functions\n",
"When dealing with large `DataFrames`, it is useful to get a quick overview of its content. Pandas offers a few functions for this. First, let's create a large `DataFrame` with a mix of numeric values, missing values and text values. Notice how Jupyter displays only the corners of the `DataFrame`:"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>some_text</th>\n",
" <th>D</th>\n",
" <th>E</th>\n",
" <th>F</th>\n",
" <th>G</th>\n",
" <th>H</th>\n",
" <th>I</th>\n",
" <th>...</th>\n",
" <th>Q</th>\n",
" <th>R</th>\n",
" <th>S</th>\n",
" <th>T</th>\n",
" <th>U</th>\n",
" <th>V</th>\n",
" <th>W</th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" <th>Z</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>Blabla</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" <td>...</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>Blabla</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" <td>...</td>\n",
" <td>22</td>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>Blabla</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>...</td>\n",
" <td>33</td>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>Blabla</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>44</td>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>Blabla</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>55</td>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>55</td>\n",
" <td>66</td>\n",
" <td>99</td>\n",
" <td>Blabla</td>\n",
" <td>154</td>\n",
" <td>44</td>\n",
" <td>143</td>\n",
" <td>77</td>\n",
" <td>33</td>\n",
" <td>11</td>\n",
" <td>...</td>\n",
" <td>66</td>\n",
" <td>55</td>\n",
" <td>66</td>\n",
" <td>99</td>\n",
" <td>154</td>\n",
" <td>44</td>\n",
" <td>143</td>\n",
" <td>77</td>\n",
" <td>33</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>66</td>\n",
" <td>77</td>\n",
" <td>110</td>\n",
" <td>Blabla</td>\n",
" <td>165</td>\n",
" <td>55</td>\n",
" <td>154</td>\n",
" <td>88</td>\n",
" <td>44</td>\n",
" <td>22</td>\n",
" <td>...</td>\n",
" <td>77</td>\n",
" <td>66</td>\n",
" <td>77</td>\n",
" <td>110</td>\n",
" <td>165</td>\n",
" <td>55</td>\n",
" <td>154</td>\n",
" <td>88</td>\n",
" <td>44</td>\n",
" <td>22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>77</td>\n",
" <td>88</td>\n",
" <td>121</td>\n",
" <td>Blabla</td>\n",
" <td>NaN</td>\n",
" <td>66</td>\n",
" <td>165</td>\n",
" <td>99</td>\n",
" <td>55</td>\n",
" <td>33</td>\n",
" <td>...</td>\n",
" <td>88</td>\n",
" <td>77</td>\n",
" <td>88</td>\n",
" <td>121</td>\n",
" <td>NaN</td>\n",
" <td>66</td>\n",
" <td>165</td>\n",
" <td>99</td>\n",
" <td>55</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>Blabla</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" <td>...</td>\n",
" <td>99</td>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>Blabla</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" <td>...</td>\n",
" <td>110</td>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>Blabla</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" <td>...</td>\n",
" <td>121</td>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>Blabla</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" <td>...</td>\n",
" <td>132</td>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>Blabla</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" <td>...</td>\n",
" <td>143</td>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>143</td>\n",
" <td>154</td>\n",
" <td>NaN</td>\n",
" <td>Blabla</td>\n",
" <td>55</td>\n",
" <td>132</td>\n",
" <td>44</td>\n",
" <td>165</td>\n",
" <td>121</td>\n",
" <td>99</td>\n",
" <td>...</td>\n",
" <td>154</td>\n",
" <td>143</td>\n",
" <td>154</td>\n",
" <td>NaN</td>\n",
" <td>55</td>\n",
" <td>132</td>\n",
" <td>44</td>\n",
" <td>165</td>\n",
" <td>121</td>\n",
" <td>99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>154</td>\n",
" <td>165</td>\n",
" <td>11</td>\n",
" <td>Blabla</td>\n",
" <td>66</td>\n",
" <td>143</td>\n",
" <td>55</td>\n",
" <td>NaN</td>\n",
" <td>132</td>\n",
" <td>110</td>\n",
" <td>...</td>\n",
" <td>165</td>\n",
" <td>154</td>\n",
" <td>165</td>\n",
" <td>11</td>\n",
" <td>66</td>\n",
" <td>143</td>\n",
" <td>55</td>\n",
" <td>NaN</td>\n",
" <td>132</td>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>165</td>\n",
" <td>NaN</td>\n",
" <td>22</td>\n",
" <td>Blabla</td>\n",
" <td>77</td>\n",
" <td>154</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>143</td>\n",
" <td>121</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>NaN</td>\n",
" <td>22</td>\n",
" <td>77</td>\n",
" <td>154</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>143</td>\n",
" <td>121</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>Blabla</td>\n",
" <td>88</td>\n",
" <td>165</td>\n",
" <td>77</td>\n",
" <td>11</td>\n",
" <td>154</td>\n",
" <td>132</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>88</td>\n",
" <td>165</td>\n",
" <td>77</td>\n",
" <td>11</td>\n",
" <td>154</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>Blabla</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" <td>...</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>Blabla</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" <td>...</td>\n",
" <td>22</td>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>Blabla</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>...</td>\n",
" <td>33</td>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>Blabla</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>44</td>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>Blabla</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>55</td>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>55</td>\n",
" <td>66</td>\n",
" <td>99</td>\n",
" <td>Blabla</td>\n",
" <td>154</td>\n",
" <td>44</td>\n",
" <td>143</td>\n",
" <td>77</td>\n",
" <td>33</td>\n",
" <td>11</td>\n",
" <td>...</td>\n",
" <td>66</td>\n",
" <td>55</td>\n",
" <td>66</td>\n",
" <td>99</td>\n",
" <td>154</td>\n",
" <td>44</td>\n",
" <td>143</td>\n",
" <td>77</td>\n",
" <td>33</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>66</td>\n",
" <td>77</td>\n",
" <td>110</td>\n",
" <td>Blabla</td>\n",
" <td>165</td>\n",
" <td>55</td>\n",
" <td>154</td>\n",
" <td>88</td>\n",
" <td>44</td>\n",
" <td>22</td>\n",
" <td>...</td>\n",
" <td>77</td>\n",
" <td>66</td>\n",
" <td>77</td>\n",
" <td>110</td>\n",
" <td>165</td>\n",
" <td>55</td>\n",
" <td>154</td>\n",
" <td>88</td>\n",
" <td>44</td>\n",
" <td>22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>77</td>\n",
" <td>88</td>\n",
" <td>121</td>\n",
" <td>Blabla</td>\n",
" <td>NaN</td>\n",
" <td>66</td>\n",
" <td>165</td>\n",
" <td>99</td>\n",
" <td>55</td>\n",
" <td>33</td>\n",
" <td>...</td>\n",
" <td>88</td>\n",
" <td>77</td>\n",
" <td>88</td>\n",
" <td>121</td>\n",
" <td>NaN</td>\n",
" <td>66</td>\n",
" <td>165</td>\n",
" <td>99</td>\n",
" <td>55</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>Blabla</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" <td>...</td>\n",
" <td>99</td>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>Blabla</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" <td>...</td>\n",
" <td>110</td>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>Blabla</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" <td>...</td>\n",
" <td>121</td>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>Blabla</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" <td>...</td>\n",
" <td>132</td>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>Blabla</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" <td>...</td>\n",
" <td>143</td>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9970</th>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>Blabla</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" <td>...</td>\n",
" <td>99</td>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9971</th>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>Blabla</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" <td>...</td>\n",
" <td>110</td>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9972</th>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>Blabla</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" <td>...</td>\n",
" <td>121</td>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9973</th>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>Blabla</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" <td>...</td>\n",
" <td>132</td>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9974</th>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>Blabla</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" <td>...</td>\n",
" <td>143</td>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9975</th>\n",
" <td>143</td>\n",
" <td>154</td>\n",
" <td>NaN</td>\n",
" <td>Blabla</td>\n",
" <td>55</td>\n",
" <td>132</td>\n",
" <td>44</td>\n",
" <td>165</td>\n",
" <td>121</td>\n",
" <td>99</td>\n",
" <td>...</td>\n",
" <td>154</td>\n",
" <td>143</td>\n",
" <td>154</td>\n",
" <td>NaN</td>\n",
" <td>55</td>\n",
" <td>132</td>\n",
" <td>44</td>\n",
" <td>165</td>\n",
" <td>121</td>\n",
" <td>99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9976</th>\n",
" <td>154</td>\n",
" <td>165</td>\n",
" <td>11</td>\n",
" <td>Blabla</td>\n",
" <td>66</td>\n",
" <td>143</td>\n",
" <td>55</td>\n",
" <td>NaN</td>\n",
" <td>132</td>\n",
" <td>110</td>\n",
" <td>...</td>\n",
" <td>165</td>\n",
" <td>154</td>\n",
" <td>165</td>\n",
" <td>11</td>\n",
" <td>66</td>\n",
" <td>143</td>\n",
" <td>55</td>\n",
" <td>NaN</td>\n",
" <td>132</td>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9977</th>\n",
" <td>165</td>\n",
" <td>NaN</td>\n",
" <td>22</td>\n",
" <td>Blabla</td>\n",
" <td>77</td>\n",
" <td>154</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>143</td>\n",
" <td>121</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>NaN</td>\n",
" <td>22</td>\n",
" <td>77</td>\n",
" <td>154</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>143</td>\n",
" <td>121</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9978</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>Blabla</td>\n",
" <td>88</td>\n",
" <td>165</td>\n",
" <td>77</td>\n",
" <td>11</td>\n",
" <td>154</td>\n",
" <td>132</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>88</td>\n",
" <td>165</td>\n",
" <td>77</td>\n",
" <td>11</td>\n",
" <td>154</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9979</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>Blabla</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" <td>...</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9980</th>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>Blabla</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" <td>...</td>\n",
" <td>22</td>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9981</th>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>Blabla</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>...</td>\n",
" <td>33</td>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9982</th>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>Blabla</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>44</td>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9983</th>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>Blabla</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>55</td>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9984</th>\n",
" <td>55</td>\n",
" <td>66</td>\n",
" <td>99</td>\n",
" <td>Blabla</td>\n",
" <td>154</td>\n",
" <td>44</td>\n",
" <td>143</td>\n",
" <td>77</td>\n",
" <td>33</td>\n",
" <td>11</td>\n",
" <td>...</td>\n",
" <td>66</td>\n",
" <td>55</td>\n",
" <td>66</td>\n",
" <td>99</td>\n",
" <td>154</td>\n",
" <td>44</td>\n",
" <td>143</td>\n",
" <td>77</td>\n",
" <td>33</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9985</th>\n",
" <td>66</td>\n",
" <td>77</td>\n",
" <td>110</td>\n",
" <td>Blabla</td>\n",
" <td>165</td>\n",
" <td>55</td>\n",
" <td>154</td>\n",
" <td>88</td>\n",
" <td>44</td>\n",
" <td>22</td>\n",
" <td>...</td>\n",
" <td>77</td>\n",
" <td>66</td>\n",
" <td>77</td>\n",
" <td>110</td>\n",
" <td>165</td>\n",
" <td>55</td>\n",
" <td>154</td>\n",
" <td>88</td>\n",
" <td>44</td>\n",
" <td>22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9986</th>\n",
" <td>77</td>\n",
" <td>88</td>\n",
" <td>121</td>\n",
" <td>Blabla</td>\n",
" <td>NaN</td>\n",
" <td>66</td>\n",
" <td>165</td>\n",
" <td>99</td>\n",
" <td>55</td>\n",
" <td>33</td>\n",
" <td>...</td>\n",
" <td>88</td>\n",
" <td>77</td>\n",
" <td>88</td>\n",
" <td>121</td>\n",
" <td>NaN</td>\n",
" <td>66</td>\n",
" <td>165</td>\n",
" <td>99</td>\n",
" <td>55</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9987</th>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>Blabla</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" <td>...</td>\n",
" <td>99</td>\n",
" <td>88</td>\n",
" <td>99</td>\n",
" <td>132</td>\n",
" <td>NaN</td>\n",
" <td>77</td>\n",
" <td>NaN</td>\n",
" <td>110</td>\n",
" <td>66</td>\n",
" <td>44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9988</th>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>Blabla</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" <td>...</td>\n",
" <td>110</td>\n",
" <td>99</td>\n",
" <td>110</td>\n",
" <td>143</td>\n",
" <td>11</td>\n",
" <td>88</td>\n",
" <td>NaN</td>\n",
" <td>121</td>\n",
" <td>77</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9989</th>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>Blabla</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" <td>...</td>\n",
" <td>121</td>\n",
" <td>110</td>\n",
" <td>121</td>\n",
" <td>154</td>\n",
" <td>22</td>\n",
" <td>99</td>\n",
" <td>11</td>\n",
" <td>132</td>\n",
" <td>88</td>\n",
" <td>66</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9990</th>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>Blabla</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" <td>...</td>\n",
" <td>132</td>\n",
" <td>121</td>\n",
" <td>132</td>\n",
" <td>165</td>\n",
" <td>33</td>\n",
" <td>110</td>\n",
" <td>22</td>\n",
" <td>143</td>\n",
" <td>99</td>\n",
" <td>77</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9991</th>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>Blabla</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" <td>...</td>\n",
" <td>143</td>\n",
" <td>132</td>\n",
" <td>143</td>\n",
" <td>NaN</td>\n",
" <td>44</td>\n",
" <td>121</td>\n",
" <td>33</td>\n",
" <td>154</td>\n",
" <td>110</td>\n",
" <td>88</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9992</th>\n",
" <td>143</td>\n",
" <td>154</td>\n",
" <td>NaN</td>\n",
" <td>Blabla</td>\n",
" <td>55</td>\n",
" <td>132</td>\n",
" <td>44</td>\n",
" <td>165</td>\n",
" <td>121</td>\n",
" <td>99</td>\n",
" <td>...</td>\n",
" <td>154</td>\n",
" <td>143</td>\n",
" <td>154</td>\n",
" <td>NaN</td>\n",
" <td>55</td>\n",
" <td>132</td>\n",
" <td>44</td>\n",
" <td>165</td>\n",
" <td>121</td>\n",
" <td>99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9993</th>\n",
" <td>154</td>\n",
" <td>165</td>\n",
" <td>11</td>\n",
" <td>Blabla</td>\n",
" <td>66</td>\n",
" <td>143</td>\n",
" <td>55</td>\n",
" <td>NaN</td>\n",
" <td>132</td>\n",
" <td>110</td>\n",
" <td>...</td>\n",
" <td>165</td>\n",
" <td>154</td>\n",
" <td>165</td>\n",
" <td>11</td>\n",
" <td>66</td>\n",
" <td>143</td>\n",
" <td>55</td>\n",
" <td>NaN</td>\n",
" <td>132</td>\n",
" <td>110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9994</th>\n",
" <td>165</td>\n",
" <td>NaN</td>\n",
" <td>22</td>\n",
" <td>Blabla</td>\n",
" <td>77</td>\n",
" <td>154</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>143</td>\n",
" <td>121</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>NaN</td>\n",
" <td>22</td>\n",
" <td>77</td>\n",
" <td>154</td>\n",
" <td>66</td>\n",
" <td>NaN</td>\n",
" <td>143</td>\n",
" <td>121</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9995</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>Blabla</td>\n",
" <td>88</td>\n",
" <td>165</td>\n",
" <td>77</td>\n",
" <td>11</td>\n",
" <td>154</td>\n",
" <td>132</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>33</td>\n",
" <td>88</td>\n",
" <td>165</td>\n",
" <td>77</td>\n",
" <td>11</td>\n",
" <td>154</td>\n",
" <td>132</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9996</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>Blabla</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" <td>...</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9997</th>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>Blabla</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" <td>...</td>\n",
" <td>22</td>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9998</th>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>Blabla</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>...</td>\n",
" <td>33</td>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9999</th>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>Blabla</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>44</td>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10000 rows × 27 columns</p>\n",
"</div>"
],
"text/plain": [
" A B C some_text D E F G H I ... Q R \\\n",
"0 NaN 11 44 Blabla 99 NaN 88 22 165 143 ... 11 NaN \n",
"1 11 22 55 Blabla 110 NaN 99 33 NaN 154 ... 22 11 \n",
"2 22 33 66 Blabla 121 11 110 44 NaN 165 ... 33 22 \n",
"3 33 44 77 Blabla 132 22 121 55 11 NaN ... 44 33 \n",
"4 44 55 88 Blabla 143 33 132 66 22 NaN ... 55 44 \n",
"5 55 66 99 Blabla 154 44 143 77 33 11 ... 66 55 \n",
"6 66 77 110 Blabla 165 55 154 88 44 22 ... 77 66 \n",
"7 77 88 121 Blabla NaN 66 165 99 55 33 ... 88 77 \n",
"8 88 99 132 Blabla NaN 77 NaN 110 66 44 ... 99 88 \n",
"9 99 110 143 Blabla 11 88 NaN 121 77 55 ... 110 99 \n",
"10 110 121 154 Blabla 22 99 11 132 88 66 ... 121 110 \n",
"11 121 132 165 Blabla 33 110 22 143 99 77 ... 132 121 \n",
"12 132 143 NaN Blabla 44 121 33 154 110 88 ... 143 132 \n",
"13 143 154 NaN Blabla 55 132 44 165 121 99 ... 154 143 \n",
"14 154 165 11 Blabla 66 143 55 NaN 132 110 ... 165 154 \n",
"15 165 NaN 22 Blabla 77 154 66 NaN 143 121 ... NaN 165 \n",
"16 NaN NaN 33 Blabla 88 165 77 11 154 132 ... NaN NaN \n",
"17 NaN 11 44 Blabla 99 NaN 88 22 165 143 ... 11 NaN \n",
"18 11 22 55 Blabla 110 NaN 99 33 NaN 154 ... 22 11 \n",
"19 22 33 66 Blabla 121 11 110 44 NaN 165 ... 33 22 \n",
"20 33 44 77 Blabla 132 22 121 55 11 NaN ... 44 33 \n",
"21 44 55 88 Blabla 143 33 132 66 22 NaN ... 55 44 \n",
"22 55 66 99 Blabla 154 44 143 77 33 11 ... 66 55 \n",
"23 66 77 110 Blabla 165 55 154 88 44 22 ... 77 66 \n",
"24 77 88 121 Blabla NaN 66 165 99 55 33 ... 88 77 \n",
"25 88 99 132 Blabla NaN 77 NaN 110 66 44 ... 99 88 \n",
"26 99 110 143 Blabla 11 88 NaN 121 77 55 ... 110 99 \n",
"27 110 121 154 Blabla 22 99 11 132 88 66 ... 121 110 \n",
"28 121 132 165 Blabla 33 110 22 143 99 77 ... 132 121 \n",
"29 132 143 NaN Blabla 44 121 33 154 110 88 ... 143 132 \n",
"... ... ... ... ... ... ... ... ... ... ... ... ... ... \n",
"9970 88 99 132 Blabla NaN 77 NaN 110 66 44 ... 99 88 \n",
"9971 99 110 143 Blabla 11 88 NaN 121 77 55 ... 110 99 \n",
"9972 110 121 154 Blabla 22 99 11 132 88 66 ... 121 110 \n",
"9973 121 132 165 Blabla 33 110 22 143 99 77 ... 132 121 \n",
"9974 132 143 NaN Blabla 44 121 33 154 110 88 ... 143 132 \n",
"9975 143 154 NaN Blabla 55 132 44 165 121 99 ... 154 143 \n",
"9976 154 165 11 Blabla 66 143 55 NaN 132 110 ... 165 154 \n",
"9977 165 NaN 22 Blabla 77 154 66 NaN 143 121 ... NaN 165 \n",
"9978 NaN NaN 33 Blabla 88 165 77 11 154 132 ... NaN NaN \n",
"9979 NaN 11 44 Blabla 99 NaN 88 22 165 143 ... 11 NaN \n",
"9980 11 22 55 Blabla 110 NaN 99 33 NaN 154 ... 22 11 \n",
"9981 22 33 66 Blabla 121 11 110 44 NaN 165 ... 33 22 \n",
"9982 33 44 77 Blabla 132 22 121 55 11 NaN ... 44 33 \n",
"9983 44 55 88 Blabla 143 33 132 66 22 NaN ... 55 44 \n",
"9984 55 66 99 Blabla 154 44 143 77 33 11 ... 66 55 \n",
"9985 66 77 110 Blabla 165 55 154 88 44 22 ... 77 66 \n",
"9986 77 88 121 Blabla NaN 66 165 99 55 33 ... 88 77 \n",
"9987 88 99 132 Blabla NaN 77 NaN 110 66 44 ... 99 88 \n",
"9988 99 110 143 Blabla 11 88 NaN 121 77 55 ... 110 99 \n",
"9989 110 121 154 Blabla 22 99 11 132 88 66 ... 121 110 \n",
"9990 121 132 165 Blabla 33 110 22 143 99 77 ... 132 121 \n",
"9991 132 143 NaN Blabla 44 121 33 154 110 88 ... 143 132 \n",
"9992 143 154 NaN Blabla 55 132 44 165 121 99 ... 154 143 \n",
"9993 154 165 11 Blabla 66 143 55 NaN 132 110 ... 165 154 \n",
"9994 165 NaN 22 Blabla 77 154 66 NaN 143 121 ... NaN 165 \n",
"9995 NaN NaN 33 Blabla 88 165 77 11 154 132 ... NaN NaN \n",
"9996 NaN 11 44 Blabla 99 NaN 88 22 165 143 ... 11 NaN \n",
"9997 11 22 55 Blabla 110 NaN 99 33 NaN 154 ... 22 11 \n",
"9998 22 33 66 Blabla 121 11 110 44 NaN 165 ... 33 22 \n",
"9999 33 44 77 Blabla 132 22 121 55 11 NaN ... 44 33 \n",
"\n",
" S T U V W X Y Z \n",
"0 11 44 99 NaN 88 22 165 143 \n",
"1 22 55 110 NaN 99 33 NaN 154 \n",
"2 33 66 121 11 110 44 NaN 165 \n",
"3 44 77 132 22 121 55 11 NaN \n",
"4 55 88 143 33 132 66 22 NaN \n",
"5 66 99 154 44 143 77 33 11 \n",
"6 77 110 165 55 154 88 44 22 \n",
"7 88 121 NaN 66 165 99 55 33 \n",
"8 99 132 NaN 77 NaN 110 66 44 \n",
"9 110 143 11 88 NaN 121 77 55 \n",
"10 121 154 22 99 11 132 88 66 \n",
"11 132 165 33 110 22 143 99 77 \n",
"12 143 NaN 44 121 33 154 110 88 \n",
"13 154 NaN 55 132 44 165 121 99 \n",
"14 165 11 66 143 55 NaN 132 110 \n",
"15 NaN 22 77 154 66 NaN 143 121 \n",
"16 NaN 33 88 165 77 11 154 132 \n",
"17 11 44 99 NaN 88 22 165 143 \n",
"18 22 55 110 NaN 99 33 NaN 154 \n",
"19 33 66 121 11 110 44 NaN 165 \n",
"20 44 77 132 22 121 55 11 NaN \n",
"21 55 88 143 33 132 66 22 NaN \n",
"22 66 99 154 44 143 77 33 11 \n",
"23 77 110 165 55 154 88 44 22 \n",
"24 88 121 NaN 66 165 99 55 33 \n",
"25 99 132 NaN 77 NaN 110 66 44 \n",
"26 110 143 11 88 NaN 121 77 55 \n",
"27 121 154 22 99 11 132 88 66 \n",
"28 132 165 33 110 22 143 99 77 \n",
"29 143 NaN 44 121 33 154 110 88 \n",
"... ... ... ... ... ... ... ... ... \n",
"9970 99 132 NaN 77 NaN 110 66 44 \n",
"9971 110 143 11 88 NaN 121 77 55 \n",
"9972 121 154 22 99 11 132 88 66 \n",
"9973 132 165 33 110 22 143 99 77 \n",
"9974 143 NaN 44 121 33 154 110 88 \n",
"9975 154 NaN 55 132 44 165 121 99 \n",
"9976 165 11 66 143 55 NaN 132 110 \n",
"9977 NaN 22 77 154 66 NaN 143 121 \n",
"9978 NaN 33 88 165 77 11 154 132 \n",
"9979 11 44 99 NaN 88 22 165 143 \n",
"9980 22 55 110 NaN 99 33 NaN 154 \n",
"9981 33 66 121 11 110 44 NaN 165 \n",
"9982 44 77 132 22 121 55 11 NaN \n",
"9983 55 88 143 33 132 66 22 NaN \n",
"9984 66 99 154 44 143 77 33 11 \n",
"9985 77 110 165 55 154 88 44 22 \n",
"9986 88 121 NaN 66 165 99 55 33 \n",
"9987 99 132 NaN 77 NaN 110 66 44 \n",
"9988 110 143 11 88 NaN 121 77 55 \n",
"9989 121 154 22 99 11 132 88 66 \n",
"9990 132 165 33 110 22 143 99 77 \n",
"9991 143 NaN 44 121 33 154 110 88 \n",
"9992 154 NaN 55 132 44 165 121 99 \n",
"9993 165 11 66 143 55 NaN 132 110 \n",
"9994 NaN 22 77 154 66 NaN 143 121 \n",
"9995 NaN 33 88 165 77 11 154 132 \n",
"9996 11 44 99 NaN 88 22 165 143 \n",
"9997 22 55 110 NaN 99 33 NaN 154 \n",
"9998 33 66 121 11 110 44 NaN 165 \n",
"9999 44 77 132 22 121 55 11 NaN \n",
"\n",
"[10000 rows x 27 columns]"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"much_data = np.fromfunction(lambda x,y: (x+y*y)%17*11, (10000, 26))\n",
"large = pd.DataFrame(much_data, columns=list(\"ABCDEFGHIJKLMNOPQRSTUVWXYZ\"))\n",
"large[large%16==0] = np.nan\n",
"large.insert(3,\"some_text\", \"Blabla\")\n",
"large"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `head` method returns the top 5 rows:"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>some_text</th>\n",
" <th>D</th>\n",
" <th>E</th>\n",
" <th>F</th>\n",
" <th>G</th>\n",
" <th>H</th>\n",
" <th>I</th>\n",
" <th>...</th>\n",
" <th>Q</th>\n",
" <th>R</th>\n",
" <th>S</th>\n",
" <th>T</th>\n",
" <th>U</th>\n",
" <th>V</th>\n",
" <th>W</th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" <th>Z</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>Blabla</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" <td>...</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>11</td>\n",
" <td>44</td>\n",
" <td>99</td>\n",
" <td>NaN</td>\n",
" <td>88</td>\n",
" <td>22</td>\n",
" <td>165</td>\n",
" <td>143</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>Blabla</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" <td>...</td>\n",
" <td>22</td>\n",
" <td>11</td>\n",
" <td>22</td>\n",
" <td>55</td>\n",
" <td>110</td>\n",
" <td>NaN</td>\n",
" <td>99</td>\n",
" <td>33</td>\n",
" <td>NaN</td>\n",
" <td>154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>Blabla</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>...</td>\n",
" <td>33</td>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>Blabla</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>44</td>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>Blabla</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>55</td>\n",
" <td>44</td>\n",
" <td>55</td>\n",
" <td>88</td>\n",
" <td>143</td>\n",
" <td>33</td>\n",
" <td>132</td>\n",
" <td>66</td>\n",
" <td>22</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 27 columns</p>\n",
"</div>"
],
"text/plain": [
" A B C some_text D E F G H I ... Q R S T U \\\n",
"0 NaN 11 44 Blabla 99 NaN 88 22 165 143 ... 11 NaN 11 44 99 \n",
"1 11 22 55 Blabla 110 NaN 99 33 NaN 154 ... 22 11 22 55 110 \n",
"2 22 33 66 Blabla 121 11 110 44 NaN 165 ... 33 22 33 66 121 \n",
"3 33 44 77 Blabla 132 22 121 55 11 NaN ... 44 33 44 77 132 \n",
"4 44 55 88 Blabla 143 33 132 66 22 NaN ... 55 44 55 88 143 \n",
"\n",
" V W X Y Z \n",
"0 NaN 88 22 165 143 \n",
"1 NaN 99 33 NaN 154 \n",
"2 11 110 44 NaN 165 \n",
"3 22 121 55 11 NaN \n",
"4 33 132 66 22 NaN \n",
"\n",
"[5 rows x 27 columns]"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"large.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course there's also a `tail` function to view the bottom 5 rows. You can pass the number of rows you want:"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>some_text</th>\n",
" <th>D</th>\n",
" <th>E</th>\n",
" <th>F</th>\n",
" <th>G</th>\n",
" <th>H</th>\n",
" <th>I</th>\n",
" <th>...</th>\n",
" <th>Q</th>\n",
" <th>R</th>\n",
" <th>S</th>\n",
" <th>T</th>\n",
" <th>U</th>\n",
" <th>V</th>\n",
" <th>W</th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" <th>Z</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>9998</th>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>Blabla</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" <td>...</td>\n",
" <td>33</td>\n",
" <td>22</td>\n",
" <td>33</td>\n",
" <td>66</td>\n",
" <td>121</td>\n",
" <td>11</td>\n",
" <td>110</td>\n",
" <td>44</td>\n",
" <td>NaN</td>\n",
" <td>165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9999</th>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>Blabla</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>44</td>\n",
" <td>33</td>\n",
" <td>44</td>\n",
" <td>77</td>\n",
" <td>132</td>\n",
" <td>22</td>\n",
" <td>121</td>\n",
" <td>55</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2 rows × 27 columns</p>\n",
"</div>"
],
"text/plain": [
" A B C some_text D E F G H I ... Q R S T \\\n",
"9998 22 33 66 Blabla 121 11 110 44 NaN 165 ... 33 22 33 66 \n",
"9999 33 44 77 Blabla 132 22 121 55 11 NaN ... 44 33 44 77 \n",
"\n",
" U V W X Y Z \n",
"9998 121 11 110 44 NaN 165 \n",
"9999 132 22 121 55 11 NaN \n",
"\n",
"[2 rows x 27 columns]"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"large.tail(n=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `info` method prints out a summary of each columns contents:"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 10000 entries, 0 to 9999\n",
"Data columns (total 27 columns):\n",
"A 8823 non-null float64\n",
"B 8824 non-null float64\n",
"C 8824 non-null float64\n",
"some_text 10000 non-null object\n",
"D 8824 non-null float64\n",
"E 8822 non-null float64\n",
"F 8824 non-null float64\n",
"G 8824 non-null float64\n",
"H 8822 non-null float64\n",
"I 8823 non-null float64\n",
"J 8823 non-null float64\n",
"K 8822 non-null float64\n",
"L 8824 non-null float64\n",
"M 8824 non-null float64\n",
"N 8822 non-null float64\n",
"O 8824 non-null float64\n",
"P 8824 non-null float64\n",
"Q 8824 non-null float64\n",
"R 8823 non-null float64\n",
"S 8824 non-null float64\n",
"T 8824 non-null float64\n",
"U 8824 non-null float64\n",
"V 8822 non-null float64\n",
"W 8824 non-null float64\n",
"X 8824 non-null float64\n",
"Y 8822 non-null float64\n",
"Z 8823 non-null float64\n",
"dtypes: float64(26), object(1)\n",
"memory usage: 2.1+ MB\n"
]
}
],
"source": [
"large.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, the `describe` method gives a nice overview of the main aggregated values over each column:\n",
"* `count`: number of non-null (not NaN) values\n",
"* `mean`: mean of non-null values\n",
"* `std`: [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) of non-null values\n",
"* `min`: minimum of non-null values\n",
"* `25%`, `50%`, `75%`: 25th, 50th and 75th [percentile](https://en.wikipedia.org/wiki/Percentile) of non-null values\n",
"* `max`: maximum of non-null values"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>E</th>\n",
" <th>F</th>\n",
" <th>G</th>\n",
" <th>H</th>\n",
" <th>I</th>\n",
" <th>J</th>\n",
" <th>...</th>\n",
" <th>Q</th>\n",
" <th>R</th>\n",
" <th>S</th>\n",
" <th>T</th>\n",
" <th>U</th>\n",
" <th>V</th>\n",
" <th>W</th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" <th>Z</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>8823.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8822.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8822.000000</td>\n",
" <td>8823.000000</td>\n",
" <td>8823.000000</td>\n",
" <td>...</td>\n",
" <td>8824.000000</td>\n",
" <td>8823.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8822.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8824.000000</td>\n",
" <td>8822.000000</td>\n",
" <td>8823.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>87.977559</td>\n",
" <td>87.972575</td>\n",
" <td>87.987534</td>\n",
" <td>88.012466</td>\n",
" <td>87.983791</td>\n",
" <td>88.007480</td>\n",
" <td>87.977561</td>\n",
" <td>88.000000</td>\n",
" <td>88.022441</td>\n",
" <td>88.022441</td>\n",
" <td>...</td>\n",
" <td>87.972575</td>\n",
" <td>87.977559</td>\n",
" <td>87.972575</td>\n",
" <td>87.987534</td>\n",
" <td>88.012466</td>\n",
" <td>87.983791</td>\n",
" <td>88.007480</td>\n",
" <td>87.977561</td>\n",
" <td>88.000000</td>\n",
" <td>88.022441</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>47.535911</td>\n",
" <td>47.535523</td>\n",
" <td>47.521679</td>\n",
" <td>47.521679</td>\n",
" <td>47.535001</td>\n",
" <td>47.519371</td>\n",
" <td>47.529755</td>\n",
" <td>47.536879</td>\n",
" <td>47.535911</td>\n",
" <td>47.535911</td>\n",
" <td>...</td>\n",
" <td>47.535523</td>\n",
" <td>47.535911</td>\n",
" <td>47.535523</td>\n",
" <td>47.521679</td>\n",
" <td>47.521679</td>\n",
" <td>47.535001</td>\n",
" <td>47.519371</td>\n",
" <td>47.529755</td>\n",
" <td>47.536879</td>\n",
" <td>47.535911</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>...</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>...</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" <td>44.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>...</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" <td>88.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>...</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" <td>132.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>...</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" <td>165.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" A B C D E \\\n",
"count 8823.000000 8824.000000 8824.000000 8824.000000 8822.000000 \n",
"mean 87.977559 87.972575 87.987534 88.012466 87.983791 \n",
"std 47.535911 47.535523 47.521679 47.521679 47.535001 \n",
"min 11.000000 11.000000 11.000000 11.000000 11.000000 \n",
"25% 44.000000 44.000000 44.000000 44.000000 44.000000 \n",
"50% 88.000000 88.000000 88.000000 88.000000 88.000000 \n",
"75% 132.000000 132.000000 132.000000 132.000000 132.000000 \n",
"max 165.000000 165.000000 165.000000 165.000000 165.000000 \n",
"\n",
" F G H I J \\\n",
"count 8824.000000 8824.000000 8822.000000 8823.000000 8823.000000 \n",
"mean 88.007480 87.977561 88.000000 88.022441 88.022441 \n",
"std 47.519371 47.529755 47.536879 47.535911 47.535911 \n",
"min 11.000000 11.000000 11.000000 11.000000 11.000000 \n",
"25% 44.000000 44.000000 44.000000 44.000000 44.000000 \n",
"50% 88.000000 88.000000 88.000000 88.000000 88.000000 \n",
"75% 132.000000 132.000000 132.000000 132.000000 132.000000 \n",
"max 165.000000 165.000000 165.000000 165.000000 165.000000 \n",
"\n",
" ... Q R S T \\\n",
"count ... 8824.000000 8823.000000 8824.000000 8824.000000 \n",
"mean ... 87.972575 87.977559 87.972575 87.987534 \n",
"std ... 47.535523 47.535911 47.535523 47.521679 \n",
"min ... 11.000000 11.000000 11.000000 11.000000 \n",
"25% ... 44.000000 44.000000 44.000000 44.000000 \n",
"50% ... 88.000000 88.000000 88.000000 88.000000 \n",
"75% ... 132.000000 132.000000 132.000000 132.000000 \n",
"max ... 165.000000 165.000000 165.000000 165.000000 \n",
"\n",
" U V W X Y \\\n",
"count 8824.000000 8822.000000 8824.000000 8824.000000 8822.000000 \n",
"mean 88.012466 87.983791 88.007480 87.977561 88.000000 \n",
"std 47.521679 47.535001 47.519371 47.529755 47.536879 \n",
"min 11.000000 11.000000 11.000000 11.000000 11.000000 \n",
"25% 44.000000 44.000000 44.000000 44.000000 44.000000 \n",
"50% 88.000000 88.000000 88.000000 88.000000 88.000000 \n",
"75% 132.000000 132.000000 132.000000 132.000000 132.000000 \n",
"max 165.000000 165.000000 165.000000 165.000000 165.000000 \n",
"\n",
" Z \n",
"count 8823.000000 \n",
"mean 88.022441 \n",
"std 47.535911 \n",
"min 11.000000 \n",
"25% 44.000000 \n",
"50% 88.000000 \n",
"75% 132.000000 \n",
"max 165.000000 \n",
"\n",
"[8 rows x 26 columns]"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"large.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# To be continued...\n",
"Coming soon:\n",
"* categories\n",
"* pivot-tables\n",
"* stacking\n",
"* merging\n",
"* time series\n",
"* loading & saving"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}