According to the survey results , Eight of the top ten most commonly used data tools come from or use Python.Python It is widely used in all fields of data science , Including data analysis , machine learning , Deep learning and data visualization . But you know how to use it Python Do you do data analysis ? What knowledge do you need to learn ? Let's talk about it .
Related to data analysis Python There are many libraries , such as Numpy,pandas,matplotlib,scipy etc. , The operation of data analysis includes data import and export , Data filtering , data description , data processing , statistical analysis , Visualization and so on . Now let's look at how to use it Python Complete data analysis .
Generate data table
There are two common generation methods , The first is to import external data , The second is to write data directly ,Python Supports importing from multiple types of data . At the beginning of use Python You need to import data before you import it pandas library , For convenience , We also import Numpy library . Code is the simplest mode , There are many optional parameter settings , For example, column names , Index column , Data format and so on .
Check data sheet
Python Used in shape Function to view the dimensions of a data table , That is, the number of rows and columns . You can use it info Function to view the overall information of the data table , use dtypes Function to return the data format .Isnull yes Python Functions for testing null values in , You can check the whole data sheet , You can also check the null value of a column separately , The result returned is a logical value , Return with null value True, If not, return False. use unique Function to view unique values , use Values Function to view values in a data table .
Data sheet cleaning
Python The method of dealing with null value is flexible , have access to Dropna Function is used to delete data that contains null values in a data table , It can also be used fillna Function to fill in null values .Python in dtype Is a function to view the data format , Corresponding to it is astype function , Used to change the data format ,Rename Is a function that changes the name of a column ,drop_duplicates Function to remove duplicate values ,replace Function to replace data .
Data preprocessing
Data preprocessing is to sort out the cleaned data for later statistics and analysis , It mainly includes data table merging , sort , Numerical breakdown , Data grouping and marking . stay Python Can be used in merge Function to merge two data tables , The way to merge is inner, In addition, there are left,right and outer mode . use ort_values Function sum sort_index Function to complete the sorting , use where Function to complete data grouping , use split Function implementation .
Data extraction
It mainly uses three functions :loc,iloc and ix, among loc Function is extracted by label value ,iloc Extract by location ,ix It can be extracted by label and location at the same time . In addition to lifting data by label and location , Data can also be processed according to specific conditions , For example, using loc and isin Use two functions together , Extract the data according to the specified conditions .
Data filtering summary
Python Used in loc Function with filtering conditions to complete the filtering function , coordination sum and
count Functions can also be implemented excel in sumif and countif Functions .Python The main functions used in are groupby and pivot_table.groupby Is a function for categorizing and summarizing , It's easy to use , Just specify the name of the column to group , You can also specify multiple column names at the same time ,groupby
Group by the order in which column names appear .
of course , use Python There are many operations to do data analysis . If you want me to Python Become your career , It's better to choose a major . I suggest you learn from the content , environment , How's the teaching going , Whether to teach face to face , Let's try it on the spot , What suits you is the best .
Technology