Skip to content

Seaborn in Practice: Syntax and Guide

Seaborn is a powerful data visualization library in Python that provides a high-level interface for drawing attractive and informative statistical graphics. One common misconception about Seaborn and programming in general is the necessity to remember all the syntax. In reality, it's more about understanding the tool's capabilities and how to leverage its functions to visualize data effectively.

So, what can i do exactly with seaborn ?

import the library

import seaborn as sns
sns.set(style="whitegrid")

Load some dataset

we will be using a dataset containing tips from a restaurant. We will know more more about it down the road

df = sns.load_dataset("tips")

Let's see a preview of the dataset

df
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
... ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2

We have an overview of the data but it is not enough. We will do a broad visualisation of the columns with pairplot.

Pairplot

A quick way to visualize relationships in a dataset is by using the method pairplot.

sns.pairplot(df)
Result

pairplot example

You can see here, we have a table of graphs. The 3 rows and 3 columns correpond to the 3 numerical values in out dataset: tip, total_bill and size In each cell, one column is plot against another:

  • In the diagonals, a column is plotted againt itselt and you have histograms
  • In the anti-diagonals, 2 columns are plotted against each other and you have a scatterplot

You can also notice only 3 columns of our dataframe is here. It is because they contain numerical values. The 3 others (sex, smoker, day and time) are

Histogram

Histograms are used both in univariate statistics and multivariate statistics To display the average notes by gender, you can use a bar plot:

import seaborn as sns
# group by gender, then get the column "notes" then, compute the mean of notes in a group
df1 = df.groupby('gender')[['notes']].mean().reset_index()
sns.set(style="whitegrid")
# show a barplot of out new dataframe (mean_notes = fct(gender))
ax = sns.barplot(x="gender", y="notes", data=df1)

Counting occurrences can be visualized using a categorical plot:

sns.catplot(x='gender', kind='count', data=ratings_df)

You can extend this to visualize counts by gender and skin color:

sns.catplot(x='gender', hue='couleur', kind='count', data=ratings_df)

Further stratifying by region:

sns.catplot(x='gender', hue='couleur', row='region', kind='count', data=ratings_df, height=3, aspect=2)

Scatterplot

Scatterplots offer a powerful way to visualize relationships between two variables:

To represent points based on 'eval' as a function of 'age':

ax = sns.scatterplot(x='age', y='eval', data=ratings_df)

You can distinguish points by gender using different colors:

ax = sns.scatterplot(x='age', y='eval', hue='sex', data=ratings_df)

For more complex visualizations involving multiple categorical variables:

sns.relplot(x="age", y="eval", hue="sex", row="region", data=ratings_df, height=3, aspect=2)

Including regression lines on scatterplots:

sns.lmplot(data=ratings_df, x="var1", y="var2", height=5, aspect=1.5)  # Height 5, width 1.5 times larger than height

Creating scatter plots with histograms for marginal distributions:

sns.jointplot(data=df, x="var1", y="var2", height=3.5)

Boxplot

Boxplots provide a visual summary of the distribution of data:

To view the average ages and percentiles at 5% and 95%:

sns.boxplot(ratings_df['age'], orient='v')

Visualizing the average notes and percentiles for each gender:

ax = sns.boxplot(x='sexe', y='notes', data=ratings_df)

Further stratifying data for insights:

df["xxx_grp"] = pd.cut(df.xxx, [18, 30, 40, 50, 60, 70, 80])  # Creating age strata
sns.boxplot(x="xxx_grp", y="xxx", hue="yyy", data=df)  # Optional hue for differentiation

Distribution Plot

Understanding the distribution of data:

ax = sns.distplot(ratings_df['notes'], kde=False)

Analyzing note distribution by gender:

sns.distplot(ratings_df[ratings_df['sexe'] == 'female']['eval'], color='green', kde=False)
sns.distplot(ratings_df[ratings_df['sexe'] == 'male']['eval'], color="orange", kde=False)
plt.show()

Heatmap

Utilizing heatmaps to visualize numerical data:

corr = new_df.corr()  # Calculating feature correlations
ax = sns.heatmap(corr, vmin=0, vmax=1, cmap="YlGnBu", annot=True)
plt.savefig('seabornPandas.png')
plt.show()

Conclusion

These examples showcase how Seaborn can be effectively utilized for various visualization needs without the necessity to memorize all the syntax.

Understanding the basic syntax and functionality of Seaborn allows you to explore various plots and graphs that suit your data analysis requirements. Through simple examples and by focusing on the visual representation of data, you can gain deeper insights without the burden of remembering intricate details.

Remember, Seaborn is designed to assist in the visual exploration of your data, offering a wide range of options for customizing and fine-tuning plots to suit your specific needs.

Experiment with different plot types and functionalities to better understand the story your data has to tell. And don't hesitate to refer to the documentation and various online resources available to enrich your understanding and application of Seaborn.

Let the visualization journey begin, and may your data tell its story vividly through Seaborn!