Naem Azam
Naem Azam
Web Developer Software Developer Linux Admin Researcher
Naem Azam

Blog

A Beginner's Walkthrough Matplotlib Python

A Beginner's Walkthrough Matplotlib Python

Why Visualize the Data?

Until now we’ve analyzed our data based solely on whatever the descriptive statistics that pandas showed us. But statistics could be very misleading take Anscombe’s Quartets for example. In Anscombe’s Quartets, we have 4 datasets with the same descriptive statistics, but when visualized we could see that all the datasets were anything but similar.

That is why descriptive statistics should only be a step of the analysis pipeline and not the pipeline itself.

Plots in Matplotlib Python

Matplotlib is a data visualization library in Python. The pyplot, a sublibrary of matplotlib, is a collection of functions that helps in creating a variety of charts. Using matplotlib you can plot various plots very easily. Let’s take a look at various plots that it has to offer:-

  • Line Plot
  • Scatter Plot
  • Histogram
  • Bar Plot
  • Pie Chart
  • Box Plot

We’ll see how you can create them in this tutorial. For this tutorial, we’ll be using the Housing Price Dataset on Kaggle. For simplicity, I’ll remove every column with a NaN value.

df = pd.read_csv('data.csv')
df.dropna(axis = 1,inplace = True)
df.head()

Importing Matplotlib

Conventionally, we don’t import matplotlib as a whole instead we import a subclass called pyplot, as plt, along with an optional magic expression.

import matplotlib.pyplot as plt
%matplotlib inline

%matplotlib notebook: It will display interactive plots within the notebook.

%matplotlib inline: It’ll display static images in the notebook.

Plotting Data using Matplotlib

Line Plot

Line plots are used to represent the relation between two data X and Y on a different axis. So basically a line plot is a plot where points are connected via points. We can create them using plt.plot().

Line Plot using matplotlib in Python

Line Plot using matplotlib in Python

It assumes the values of the x-axis to start from zero going up to as many items in the data.

Scatter Plot

A Scatter plot is a plot that is used to represent the relation between 2 features. You can create them using plt.scatter().

Scatter Plot in matplotlib

Scatter Plot

And as you can see we created a scatter plot above with the x-axis a the column 2ndFlrSF and the y-axis column SalePrice. And as seen in the graph we can say that the more the value of 2ndFlrSF more the value of SalePrice. But there are houses that don’t have the 2nd floor that’s why there are so many points on x = 0.

Histogram

histogram is used to visualize frequency distributions. The bars in the histogram represent the frequency of the variable in a particular range, the size of this range is determined by bin size. You can set bin size manually by passing it as a value for the bins argument. You can create them using plt.hist().

Histogram in Matplotlib

Histogram in Matplotlib in Python

You can either manually find bin size or you can use formulas like Sturge’s rule, Rice’s rule, etc. to find it.

Bar Plot

A bar plot presents categorical data with bars with lengths proportional to the values that they represent. You can create them using plt.bar().

Barplot in matplotlib

Barplot

The histogram presents numerical data whereas the bar graph shows categorical data. The histogram is drawn in such a way that there is no gap between the bars.

Box Plot

Boxplot is used to visualize the 5-number summary of a distribution. Box plots can show outliers which are displayed as a circle. You can create them using plt.boxplot().

Box plot in matplotlib

Box Plot in Python Matplotlib

  • The red line is the median.
  • The lowest line is the minimum non-outlier value.
  • The highest line is the maximum non-outlier value.
  • The highest line of the box is the 3rd quartile value.
  • The lowest line of the box is the 1st quartile value.

Pie chart

Pie Chart is a circular statistical plot that can display only one series of data. Matplotlib has pie() function in its pyplot module which creates a pie chart representing the data in an array. 

pie chart in matplotlib

pie chart

Customize the Plot

Adding Label Axis

Until now, our x and y-axis were empty which made it difficult to determine which axis represented what. Since labeling is necessary for understanding the chart dimensions, we will see how to add labels to the plot. In order to set labels, we can pass them as arguments in xlabel() and ylabel().

Adding label axis in matplotlib python

Adding label axis in matplotlib python

Adding Title of the Plot

While working with plots it becomes essential to tell what plot represents what. This can be done by adding a Title to the graph to be shown above. We can do that by bypassing the title as an argument to plt.title()

Adding Title of the Plot

Adding Title of the Plot

Adjusting Plot Size

After visualizing for some time now you might have found out that regardless of the amount the size of the plot is the same. But you can adjust the plot size by passing the tuple i.e. the shape of the plot as an argument to plt.figure().

Adjusting Plot Size in Matplotlib Python

Adjusting Plot Size in Matplotlib Python

Plotting 2 Plots in One

In matplotlib, you can create 2 scatter plots in one by simply adding code for another one.

Plotting 2 Plots in One

Plotting 2 Plots in One

Adjusting Opacity of the Dots

The plot above has orange points overlapping the blue points. We can adjust the opacity of the dots by changing the value of the alpha argument. By default, alpha is 1. Hence lesser the alpha value the lesser the opacity.

Adding Legend

If we were to show someone the above plot it’ll be hard to determine what dot color represented which variable. In order to tackle this, we can add the label for each plot to be displayed in the legend using plt.legend().

Adding legend in Python Matplotlib Visualization

Adding legend to Python Matplotlib Visualization

Making Subplots

We have seen how we can create 2 scatter plots in the same plot. But we can actually create them separately as 2 separate subplots. We can create subplots By using plt.subplot2grid(), which takes 2 tuples of the grid size and coordinates the particular plot. For Eg: The following subplots are made in 1 row 2 column grid at (0,0) and (0,1) coordinates. We can also specify the span of the plot using rowspan and colspan arguments.

Making subplots

Making subplots

We can assign a1 and a2 the corresponding plots they have to display along with their respective customization.

Matplotlib is a great tool for visualization but as the plot grows more complex it becomes harder to plot, along with it there are many plots not supported by matplotlib.