Kimball and Inmon Are two mainstream data warehouse methodologies , Respectively by Ralph Kimbal Okami and Bill Inmon The great God proposed , In the actual data warehouse construction , The industry tends to
Learn from each other and use two development modes
Inmon and Kimball He is a great pioneer in the field of data warehouse , They have been engaged in data warehouse research for many years ,Inmon Also known as “ Father of data warehouse ”.Inmon of 《 data warehouse 》 and Kimball of 《 The Data Warehouse Toolkit 》 They are all classics in this field . Later, people summarized their data warehouse ideas as “Inmon theory ” and “Kimball theory ”. Their ideas have something in common , There are also differences .
What is data warehouse : Father of data warehouse Inmon The classic definition of is as follows : data warehouse (Data Warehouse) Is a theme oriented (Subject
Oriented), Integrated (Integrated), Relatively stable (Non-Volatile), Reflect historical changes (Time
Variant) Data set for , Used to support management decisions (Decision Making Support).
1. common ground
(1) They all strongly advocate data warehouse , Think from OLTP reach BI It is necessary to establish a data warehouse between analysis ;
(2) They all believe that the establishment of data warehouse needs to start from the overall perspective of the enterprise , Iterative development , Try to avoid establishing an independent data warehouse by department ;
(3) Before data enters the data warehouse , Need to go through ETL integration .
2. difference
Inmon theory
(1) Data warehouse concept : data warehouse (Data Warehouse) Is a theme oriented (Subject
Oriented), Integrated (Integrated), Relatively stable (Non-Volatile), Reflect historical changes (Time
Variant) Data set for , Used to support management decisions (Decision Making Support);
(2) Establish a data warehouse from top to bottom according to the theme , If according to the customer , supplier , Products, etc. establish different themes . Add one topic at a time in the development process ;
(3) When a data mart is established, it spans multiple topics , It needs to be based on integrated subject data .
Kimball theory
(1) Bottom up , Dimensional modeling ;
(2) First, establish a minimum granularity fact table according to the business main line , Then create a dimension table , Form data mart , adopt “ Consistent dimension ” The information of different data marts can be seen together ;
1. What is? Kimball
1.1 concept
Kimball From the perspective of process, the mode is bottom-up , That is, from data mart to data warehouse to data source ( First there is a data mart, then there is a data warehouse
) An agile development method based on . about Kimball pattern , The data source is often a given number of database tables , The data is stable, but the relationship between the data is complex , Need from these OLTP Extract the analytical data structure from the transactional data structure generated in , Then put it into the data mart, and the next step is BI And decision support .
1.2 technological process
usually ,Kimball Are mission oriented . first , After getting the data, we need to explore the data first , Try to split the data into different table requirements according to the target . secondly , After the data dependency is defined, each task is passed ETL from Stage Layer conversion to DM layer . here DM Layer data consists of several fact tables and dimension tables . next , On completion DM After splitting the fact table dimension table of layer , On the one hand, the data mart can directly BI Link output data , On the other hand, you can first DW Layer output data , Facilitate subsequent multidimensional analysis .
Kimball Support data warehouse bus structure , Advocate dimensional modeling , Build a dimensional data warehouse by means of star model or snowflake model . In architecture , Data mart is closely combined with dimension data warehouse , Data mart is a logical subject field in data warehouse . Various front-end tools will have direct access to the dimension data warehouse .
Kimball Often means fast delivery , Agile iteration , The data warehouse architecture will not be designed too complex , In the unpredictable Internet industry , This architecture has gradually become a mainstream paradigm .
2. What is? Inmon
2.1 concept
Inmon The pattern is top-down in terms of process , That is, from the data source to the data warehouse and then to the data mart ( First there is data warehouse, then there is data market ) A waterfall flow development method .
about Inmon pattern , Data sources are often heterogeneous , For example, self-defined crawler data is a typical one , The data source is customized according to the final goal . The main data processing work here focuses on the cleaning of heterogeneous data , Including data type verification , Data value range checking and other complex rules . In this scenario ,
Data cannot be retrieved from stage Layer output directly to dm layer , Must pass first ETL Clean the format of data and put it into dw layer , Again from dw Select the data combination required by the layer and output it to dm layer .
stay Inmon In mode , The concepts of fact table and dimension table are not emphasized , Because the data source is more likely to change , More emphasis needs to be placed on data cleaning , Extract entities from - relationship .
2.2 technological process
usually ,Inmon Are data source oriented . first , We need to explore to obtain data that meets the expectations as much as possible , Try to divide the data into different table requirements as expected . secondly , Clear the data cleaning rules and pass each task ETL from Stage Layer conversion to DW layer , here DW Layers usually involve more UDF development , Abstract data into entities - relational model . next , On completion DW After data governance , Data can be exported to the data mart for basic data combination . last , Export data from the data mart to BI To assist specific business in the system .
3.Kimball VS Inmon Comparison of advantages and disadvantages
Kimball The method does not require high technical level of the team , Easier to implement , Starting from the construction of small thematic domain data mart , But in the process of gradual construction , The consistency of joint dimension data warehouse is difficult to control , Planning at the tactical level , Or have urgent goals to achieve .
Inmon The way , Good standardization , Data integration and data consistency are addressed , Applicable to large enterprise class , Strategic level planning . But the requirements for the team are high , And the implementation cycle is long , High cost . The details can be determined according to the scale of the enterprise , Project planning , budget , Comprehensive consideration from the perspective of team .
characteristic KimballInmon
It's a long way to go for fast delivery
Small development difficulty
Maintenance difficulty
Skill requirements entry level expert level
Data requirements specific business enterprise
4. Feature comparison
Technology