<> One , Tool introduction
<>Kettle brief introduction
be based on JAVA Of ETL tool , Support graphical GUI Design interface , Then it can flow in the form of workflow , Doing some simple or complex data extraction , Quality inspection , Data cleaning , data conversion , Data filtering and other aspects have a relatively stable performance
<>Sqoop brief introduction
Apache Open source software , Mainly used in HADOOP(Hive) And traditional database (mysql,postgresql…) Data transfer between .
It is suitable for mass data transmission between relational databases which can communicate directly with big data cluster .
<> Two , contrast
function KettleSqoop
Domain data extraction , transformation , Loading relational and non relational database data migration
Input relational database ,HDFS,Hbase,Excel,HL7,JSON,RSS, text file , And so on , Non relational database
Output relational database ,Hbase,HDFS,Excel,CSV, And so on , Non relational database
Hadoop Integration external tools , You need to install the corresponding version of the plug-in , Only popular Hadoop The distribution belongs to Hadoop Ecosphere , Ready to use
The applicable data volume is 100000 , million , Tens of millions
support system Linux,Windows,UnixLinux
Interaction has a graphical interface, no graphical interface
Low level multithreading improves efficiency MapReduce
<> Three , summary
If the requirement is just to put the relational database data (Oracle,MySQL) Migrating to a non relational database (HDFS,Hbase,Hive), Recommended Sqoop tool , Enough to meet demand
If it is a relational database of different kinds (Oracle,MySQL,SQL
server) Integration into the same relational database , as MySQL. Recommended Kettle, Yes GUI The interface is easy to operate .
Technology