Cross analysis is usually used for the relationship between two or more grouped variables , The cross table was used to analyze the relationship between variables . We set two related variables as row variable and column variable , Make statistical data into two-dimensional cross table ( PivotTable ). The commonly used functions are pivot_table().
pivot_table(values, index, columns, aggfunc, fill_value)
The parameters are described below :
Parameter description
values Values in PivotTable
index Rows in PivotTable report
columns Columns in PivotTable
aggfunc Statistical function
fill_valueNA Uniform replacement of values
It can be compared excel PivotTable report in
Let's use the most familiar data from Titanic . I want to know now what effect age and cabin class have on survival .
# Group by age bins = np.arange(0, 90, 10) age_groups = pd.cut(data['Age'], bins)
data.pivot_table(values=['Survived'], index=['Pclass'], columns=age_groups,
aggfunc=[np.mean])
The meaning of this table is first class 0 reach 10 What's the survival rate for the age group 0.755, In the third class 30-40 What's the survival rate for the age group 0.253, Other data are interpreted in the same way . It's easy to see that almost every age group has the highest survival rate in first class .
Let's visualize :
for i in [1, 2, 3]: plt.figure(figsize=(8, 8)) new_df.loc[i].plot(kind='bar',
title='Pclass'+str(i)+' survival rate') plt.xlabel(' age group ') plt.ylabel(' survival rate ')
The cross dimension in cross analysis can be up to two dimensions , If the more points, the finer the points , The more you can't find the point , The more difficult it is to find problems and laws .
Technology