{"id":1452,"date":"2016-12-18T13:13:17","date_gmt":"2016-12-18T13:13:17","guid":{"rendered":"http:\/\/62.131.51.129\/?p=1452"},"modified":"2016-12-18T13:13:17","modified_gmt":"2016-12-18T13:13:17","slug":"partitioned-table-in-hive","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=1452","title":{"rendered":"Partitioned Table in Hive"},"content":{"rendered":"<p>It is possible to partition the tables in Hive. Remember the data are stored in files. So we expect the files to be partitioned. This is accomplished by a split of the files over different directories. One directory serves one partition, a second another partition etc.<br \/>\nLet us take the example of 7 records that are split over 5 partitions according to their number. We have:<br \/>\n<a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1454\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled.jpg\" alt=\"\" width=\"500\" height=\"300\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>We see in the directory that corresponds to the file location, that 5 directories are present, each directory corresponding to one partition.\u00a0<a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled0.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1455\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled0.png\" alt=\"\" width=\"529\" height=\"446\" \/><\/a><\/p>\n<p>Let us switch to\u00a0partition 2 that stores (2, ine), (2,ellen) and (2, henkjan). When the directory with the data on partition 2 is investigated, we see a file with the data within that directory:<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-9.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1456\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-9.png\" alt=\"\" width=\"1213\" height=\"298\" \/><\/a><\/p>\n<p>With the data, the nummer that is used to create the partitions is omitted. The nummer is stored in the directory name.<\/p>\n<p>To create this table, we use this command:<\/p>\n<pre>CREATE EXTERNAL TABLE fam_by_nummer(\nnaam STRING)\nPARTITIONED BY (nummer INT)\nROW FORMAT DELIMITED\nFIELDS TERMINATED BY ','\nLOCATION '\/loudacre\/fam_by_nummer';\n<\/pre>\n<p>To load the table, we take a detour.<br \/>\nFirst, the data are loaded to hdfs:<\/p>\n<pre>sqoop import \\\n--connect \"jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))\" \\\n--username scott --password bunvegni \\\n--table fam \\\n--columns \"NUMMER, NAAM\" \\\n--m 1 \\\n--target-dir \/loudacre\/fam_by_nummer_temp1;\n<\/pre>\n<p>Then a table is created:<\/p>\n<pre>CREATE EXTERNAL TABLE fam_by_nummer_temp(\nnummer INT, naam STRING)\nROW FORMAT DELIMITED\nFIELDS TERMINATED BY ','\nLOCATION '\/loudacre\/fam_by_nummer_temp1';\n<\/pre>\n<p>And the data are loaded from this table into the partitioned table:<\/p>\n<pre>insert into table fam_by_nummer\npartition (nummer)\nselect naam, nummer from fam_by_nummer_temp;\n<\/pre>\n<p>The exact sequence of the field wasn&#8217;t directly clear to me, but it could be derived from the Hue interface.<br \/>\nAdding data to an existing partition is easy.<br \/>\nFirst create a file and then upload it to the correct directory:<\/p>\n<pre>hadoop dfs -put \/tmp\/extra \/loudacre\/fam_by_nummer\/nummer=2\n<\/pre>\n<p>Adding a new partition requires action on the Hive side:<\/p>\n<pre>ALTER TABLE fam_by_nummer\nADD PARTITION (nummer=6);\n<\/pre>\n<p>This must be added by the creation of the directory in the proper environment:<\/p>\n<pre>hadoop dfs -mkdir \/loudacre\/fam_by_nummer\/nummer=6\n<\/pre>\n<p>And the storage of a file with data in that directory:<\/p>\n<pre>hadoop dfs -put \/tmp\/extra \/loudacre\/fam_by_nummer\/nummer=6\n<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is possible to partition the tables in Hive. Remember the data are stored in files. So we expect the files to be partitioned. This is accomplished by a split of the files over different directories. One directory serves one partition, a second another partition etc. Let us take the example of 7 records that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1453,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1452","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1452","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1452"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1452\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1452"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1452"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1452"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}