In this chapter, we will discuss how to import Datasets and Libraries. Let us begin by understanding how to import libraries.
Let us start by importing Pandas, which is a great library for managing relational (table-format) datasets. Seaborn comes handy when dealing with DataFrames, which is most widely used data structure for data analysis.
The following command will help you import Pandas −
# Pandas for managing datasets import pandas as pd
Now, let us import the Matplotlib library, which helps us customize our plots.
# Matplotlib for additional customization from matplotlib import pyplot as plt
We will import the Seaborn library with the following command −
# Seaborn for plotting and styling import seaborn as sb
We have imported the required libraries. In this section, we will understand how to import the required datasets.
Seaborn comes with a few important datasets in the library. When Seaborn is installed, the datasets download automatically.
You can use any of these datasets for your learning. With the help of the following function you can load the required dataset
load_dataset()
In this section, we will import a dataset. This dataset loads as Pandas DataFrame by default. If there is any function in the Pandas DataFrame, it works on this DataFrame.
The following line of code will help you import the dataset −
# Seaborn for plotting and styling import seaborn as sb df = sb.load_dataset('tips') print df.head()
The above line of code will generate the following output −
total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4
To view all the available data sets in the Seaborn library, you can use the following command with the get_dataset_names() function as shown below −
import seaborn as sb print sb.get_dataset_names()
The above line of code will return the list of datasets available as the following output
[u'anscombe', u'attention', u'brain_networks', u'car_crashes', u'dots', u'exercise', u'flights', u'fmri', u'gammas', u'iris', u'planets', u'tips', u'titanic']
DataFrames store data in the form of rectangular grids by which the data can be over viewed easily. Each row of the rectangular grid contains values of an instance, and each column of the grid is a vector which holds data for a specific variable. This means that rows of a DataFrame do not need to contain, values of same data type, they can be numeric, character, logical, etc. DataFrames for Python come with the Pandas library, and they are defined as two-dimensional labeled data structures with potentially different types of columns.
For more details on DataFrames, visit our tutorial on pandas.