{"id":1404,"date":"2016-12-08T21:11:37","date_gmt":"2016-12-08T21:11:37","guid":{"rendered":"http:\/\/62.131.51.129\/?p=1404"},"modified":"2016-12-08T21:11:37","modified_gmt":"2016-12-08T21:11:37","slug":"create-a-hive-table","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=1404","title":{"rendered":"Create a Hive table &#8211; 3 ways"},"content":{"rendered":"<p>In this little note, I want to show three\u00a0different ways to create a table on Hive. The first one starts with a file on HDFS that is available and we create a table upon this file. This table is defined as an external file that is exposed as a table.<\/p>\n<p>The code to be executed in Hive is:<\/p>\n<pre>CREATE EXTERNAL TABLE fam\n(nummer SMALLINT,\nname STRING)\nROW FORMAT DELIMITED\nFIELDS TERMINATED BY ','\nLOCATION '\/user\/hdfs'\n<\/pre>\n<p>And we may say the content of the table from, say SQL Developer:<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1408\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled.png\" alt=\"\" width=\"493\" height=\"331\" \/><\/a><\/p>\n<p>From the clause &#8220;LOCATION &#8216;\/user\/hdfs'&#8221;, we know that files that will deliver the data will be stored in directory \/user\/hdfs. If we want to add data to the table, we only have to copy files to that location. Let us start with two files:<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1416\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-4.png\" alt=\"\" width=\"746\" height=\"92\" \/><\/a><\/p>\n<p>We then copy an additional file to this environment ( hadoop dfs -put \/tmp\/fam3 \/user\/hdfs ). This shows:<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1417\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-5.png\" alt=\"\" width=\"673\" height=\"112\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>When we then start the Hive environment: ( \u00a0beeline -u jdbc:hive2:\/\/192.168.2.60:10000 -n training -p training ), we may issue the select command: select * from fam; . This provides the content of the fam table. One may see the content of the newly added fam3 table. This shows clearly how easy it is to add data to a Hadoop platform. The only thing we have to do is to add files to the correct environment. These files will be automatically recognised as part of a table.<\/p>\n<p>An alternative to adding data to a table is to move data within the HDFS environment. Suppose we have a file on HDFS that contains data that should be added to the table. First, we start Hive:<\/p>\n<pre>beeline -u jdbc:hive2:\/\/192.168.2.60:10000 -n training -p training\n<\/pre>\n<p>I then moved data into the table:<\/p>\n<pre>LOAD DATA INPATH '\/user\/training\/fam4' INTO TABLE fam;\n<\/pre>\n<p>I then saw the file being added to the data environment:<br \/>\n<a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1420\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-6.png\" alt=\"\" width=\"761\" height=\"334\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Another way is a direct import from, say Oracle, which creates on the fly a file on HDFS and which adds the table definition to Hive. The code is:<\/p>\n<pre>sqoop import \\\n--connect \"jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))\" \\\n--username scott --password binvegni \\\n--table fam \\\n--columns \"NUMMER, NAAM\" \\\n--m 1 \\\n--fields-terminated-by ',' \\\n--table device \\\n--hive-import\n<\/pre>\n<p>Which creates a file on hdfs:<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1409\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-1.png\" alt=\"\" width=\"1020\" height=\"575\" \/><\/a><\/p>\n<p>and a table in Hive that can be seen from, say Excel:<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1410\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-2.png\" alt=\"\" width=\"186\" height=\"280\" \/><\/a><\/p>\n<p>A slightly different script is this. Load data from Oracle into a file on hdfs:<\/p>\n<pre>\nsqoop import \\\n--connect \"jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))\" \\\n--table COUNTRIES \\\n--columns \"NAME, LOCAL\"  \\\n--num-mappers 1 \\\n--compression-codec=snappy \\\n--username ANONYMOUS \\\n--password ine13758 \\\n--warehouse-dir=\/user\/hive\/warehouse \n<\/pre>\n<p>with a subsequent code in hive:<\/p>\n<pre>\nCREATE EXTERNAL TABLE COUNTRIES ( \n    NAME STRING, \n    LOCAL STRING) \nROW FORMAT DELIMITED FIELDS TERMINATED BY ',' \nLOCATION '\/user\/hive\/warehouse\/COUNTRIES';\n<\/pre>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>It is good to realise that we must undertake two actions:<\/p>\n<p>1:An action to store the data (in a file, that is stored in a HDFS directory.<\/p>\n<p>2: An action to create table definitions that are stored in a metastore.<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1414\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-3.png\" alt=\"\" width=\"979\" height=\"447\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>A third manner is to use the HCatalog:<\/p>\n<pre>\nhcat -e \"CREATE TABLE prut \\\n(id INT, company STRING) \\\nROW FORMAT DELIMITED FIELDS TERMINATED BY ',' \\\nLOCATION '\/user\/prut'\"\n<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this little note, I want to show three\u00a0different ways to create a table on Hive. The first one starts with a file on HDFS that is available and we create a table upon this file. This table is defined as an external file that is exposed as a table. The code to be executed [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1405,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1404","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1404","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1404"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1404\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1404"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}