thespacebetweenstars.com

Hands-On Data Visualization with Python: Seaborn Scatter Plots

Written on

Chapter 1: Understanding Scatter Plots

A scatter plot is a graphical representation that illustrates the correlation between two continuous numerical datasets. For instance, it can depict the connection between individuals' heights and weights or their salaries and years of work experience. In this guide, we will utilize the Python Seaborn library to create a scatter plot that reveals the relationship between salary and years of professional experience.

Prerequisite Libraries

To follow along, make sure you have the following libraries installed:

You can access the complete source code for this tutorial on my GitHub repository. Feel free to download it (Scatter-Plot.ipynb) to help you follow along.

Building a Scatter Plot with Seaborn

  1. Importing Libraries

We will begin by importing the necessary libraries into our Python environment.

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

In addition to Seaborn, we will also require Pandas and Matplotlib for this tutorial. Understanding Pandas is crucial as it integrates seamlessly with Seaborn.

  1. Loading Data

Next, we will load salary data from a CSV file and display a preview of the records.

df = pd.read_csv("salary_data.csv")

print(df.head())

This dataset comprises details on years of experience, gender, age, and salary.

A preview of the salary dataset
  1. Creating a Basic Scatter Plot

Now, let's generate our first scatter plot using the Seaborn scatterplot method.

sns.scatterplot(x='YearsExperience', y='Salary', data=df)

plt.show()

This plot requires only three parameters: the x-axis values, the y-axis values, and the dataset. We have set the DataFrame, df, as our data source and assigned the "YearsExperience" and "Salary" columns to the x and y axes, respectively.

Basic scatter plot showing salary vs. years of experience

From the scatter plot, we can observe a positive correlation between salary and years of experience.

  1. Incorporating the Hue Parameter

Next, we will add a hue parameter to our scatter plot to differentiate between subsets of data, such as male and female.

sns.scatterplot(x='YearsExperience', y='Salary', hue='Gender', data=df)

plt.show()

Here, we have set the "Gender" column as the value for the hue parameter.

Scatter plot with gender differentiation

The scatter plot now displays two distinct data groups, colored in blue for males and orange for females.

  1. Adding a Size Parameter

We can also introduce a size parameter to indicate the size of the data points based on another dimension.

sns.scatterplot(x='YearsExperience', y='Salary', size='YearsExperience', data=df)

plt.show()

Here, we set the "YearsExperience" column as the size parameter.

Scatter plot with varying data point sizes

The size of the data points now varies according to the number of years of experience, with larger points reflecting more years.

  1. Adjusting the Figure Size

We have nearly completed our scatter plot, but the default figure size may be too small for optimal visualization. Let's adjust the figure dimensions.

plt.figure(figsize=(12, 8))

sns.scatterplot(x='YearsExperience', y='Salary', data=df)

plt.show()

Enlarged scatter plot for better visualization

The enlarged figure size allows for clearer distribution visualization of the data points.

  1. Setting a Title

Finally, we will add a title to our scatter plot.

plt.title("Salary vs Years of Experience")

plt.show()

Scatter plot with title and legends

We have successfully completed a scatter plot with an appropriate title and legends!

Afterword

Scatter plots are particularly valuable for examining the relationships between two continuous numerical datasets. Seaborn provides features that allow us to enhance our visualizations by adding hue and size dimensions to our plots, thereby enabling us to analyze data from multiple perspectives within a 2D framework.

Video Tutorials for Further Learning

To deepen your understanding of data visualization in Python, check out the following video tutorials:

The first video, "Seaborn Python for Beginners - Data Visualization Hands-on Lab," offers practical insights into using Seaborn for data visualization.

The second video, "Python Data Visualization Tutorial | Color, Marker and Size!" provides further exploration of customizing visualizations in Python.

Subscribe to Medium

If you enjoyed this article and wish to read more from me or other authors, consider subscribing to Medium. Your subscription supports me in creating more content that benefits the community.

Reference

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Transform Your Travel Dreams: Explore the World for Free!

Discover how to turn your travel dreams into reality without spending a dime. Embrace the journey of imagination and planning.

Finding Fulfillment Beyond Goals: Embracing the Journey

Discover the importance of enjoying the journey rather than fixating solely on achieving goals.

Life: Embracing the Reality of a Losing Game

Exploring the concept that life can feel like a losing game and the insights it brings.