Seaborn Is based on matplotlib A module generated , Specializing in statistical visualization , Can and pandas Seamless link , Make it easier for beginners to get started . be relative to matplotlib,Seaborn More concise grammar , The relationship is similar to numpy and pandas Relationship between .
2.1 install :
1)linux system
sudo pip install seaborn
2)window system
pip install seaborn
2.2 quick get start
import as sns
sns.set(style="ticks")
from matplotlib import pyplot
# Load dataset
tips = sns.load_dataset("tips")
# mapping
sns.boxplot(x="day", y="total_bill", hue="sex", data=tips, palette="PRGn")
sns.despine(offset=10, trim=True)
# Picture display and preservation
pyplot.savefig("GroupedBoxplots.png")
pyplot.show()
2.3seaborn common method
1, Univariate analysis drawing
1) Concentration trend of distribution , Reflects the degree to which data is close to or clustered towards its central value
x = np.random.normal(size=100)
sns.distplot(x, kde=True)# kde=False Off nuclear density distribution , rug Indicates in x A small strip generated on each observation on the axis ( Marginal blanket )
2, The best way to observe the distribution relationship between the two variables is to use a scatter diagram
1) Direct fitting probability density function
sns.jointplot(x="x", y="y", data=df, kind="kde")
2) It can more intuitively reflect the distribution of points
hex chart ( When there is a large amount of data )¶
Preferably black and white
When there is a large amount of data , use hex chart , Tell which piece is more ( Color depth )
mean, cov = [0, 1], [(1, .5), (.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["x", "y"])
x, y = np.random.multivariate_normal(mean, cov, 1000).T
with sns.axes_style("ticks"):
sns.jointplot(x=x, y=y, kind="hex")
3, Multivariable pairwise display
# Rhododendron data iris = sns.load_dataset("iris")
sns.pairplot(iris)
4,Seaborn Visualize various drawing operations
1, Box diagram box graph
import matplotlib.pyplot as plt
import numpy as np
Box chart median Q2, Quarter digit Q1, Three quarters Q3 Outliers ¶
IQR = Q3 - Q1
If Q1-1.5IQR perhaps Q3+1.5IQR Is the outlier
tang_data = [np.random.normal(0, std, 100) for std in range(1,4)]
fig = plt.figure(figsize=(8,6))
plt.boxplot(tang_data, vert=True, notch=True)
plt.xticks([x+1 for x in range(len(tang_data))], ["x1", "x2", "x3"])
plt.xlabel("x")
plt.title("box plot")
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
2, Single feature histogram
1)distplot
x = np.random.normal(size=100)
sns.distplot(x, kde=False, bins=20)
2)countplot Counting diagram
countplot Hence the name Si Yi , Counting diagram , It can be considered as a histogram applied to classification variables , It can also be considered to be used to compare the count difference between categories , call count Functional barplot.
seaborn.countplot(x=None, y=None, hue=None, data=None, order=None,
hue_order=None, orient=None, color=None, palette=None, saturation=0.75,
ax=None, **kwargs)
x, y, hue: names of variables in data or vector data, optional
data: DataFrame, array, or list of arrays, optional
order, hue_order: lists of strings, optional # Set order
orient: “v” | “h”, optional # Set horizontal or vertical display
ax: matplotlib Axes, optional # Set subgraph location , The basics of drawing are described in the next section
3, Analyze the relationship between the two features , Using scatter diagram to express
mean, cov = [0,1], [(1, .5), (.5,1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["X1", "X2"])
sns.jointplot(x="X1", y="X2", data=df)
# kind = "hex" # hexagon
data = np.random.multivariate_normal(mean, cov, 2000).T
with sns.axes_style("white"):
sns.jointplot(x=data[0], y=data[1], kind="hex", color="k")
4, Look at the relationship between two variables
iris = sns.load_dataset("iris")
sns.pairplot(iris)
5, Bar chart
sns.barplot(x="sex", y="survived", data=titanic, hue="class")
Point diagram , Don't look at the concentration trend , It depends on their changes
sns.pointplot(x="sex", y="survived", data=titanic, hue="class")
sns.pointplot(x="class", y="survived", data=titanic, hue="sex",
palette={"male":"g","female":"m"}, markers=["^", "o"], linestyles=["-","--"])
tips = sns.load_dataset("tips", data_home=".")
# jitter shock
sns.stripplot(x="day", y="total_bill", data=tips, jitter=True)
sns.swarmplot(x="day", y="total_bill", data=tips)
sns.swarmplot(x="day", y="total_bill", data=tips, hue="sex")
sns.swarmplot(x="day", y="total_bill", data=tips, hue="time")
6, Box diagram
sns.boxplot(x="day", y="total_bill", data=tips, hue="time")
7, Violin picture
sns.violinplot(x="day", y="total_bill", data=tips, hue="sex", split=True)
sns.violinplot(x="day", y="total_bill", data=tips, inner=None, split=True)
sns.swarmplot(x="day", y="total_bill", data=tips, color="k", alpha=1.0)
8, The size of the specified value is clear through the color of the thermal diagram , And the trend of change
uniform_data = np.random.rand(3,3)
sns.heatmap(uniform_data)
sns.heatmap(uniform_data, vmin=0.2, vmax=0.5)
normal_data = np.random.randn(3,3)
sns.heatmap(normal_data, center=0)
flights = sns.load_dataset("flights")
data = flights.pivot("month", "year", "passengers")
sns.heatmap(data)
sns.heatmap(data, annot=True, fmt="d", linewidths=.5, cbar=False,
cmap="YlGnBu")
9, Set the overall style of drawing
def sin_plot(flip=1):
x = np.linspace(0, 14, 100)
for i in range(1,7):
plt.plot(x, np.sin(x+i*.5)*(7-i)*flip)
sin_plot()
10, There are five theme styles ,darkgrid whitegrid dark white ticks
sns.set_style("darkgrid")
data = np.random.normal(size=(20,6)) + np.arange(6) / 2
sns.boxplot(data=data)
11, The style of each subgraph can be different ,with There's a style inside , A style outside
with sns.axes_style("whitegrid"):
plt.subplot(211)
sin_plot()
plt.subplot(212)
sin_plot(-1)
12, Layout style
sns.set_context("paper")
plt.figure(figsize=(8,6))
sin_plot()
sns.set_context("talk")
plt.figure(figsize=(8,6))
sin_plot()
sns.set_context("poster")
plt.figure(figsize=(8,6))
sin_plot()
sns.set_context("notebook", font_scale=3.5, rc={"lines.linewidth": 4.5})
plt.figure(figsize=(8,6))
sin_plot()
Technology