{"id":1423,"date":"2016-12-13T08:48:49","date_gmt":"2016-12-13T08:48:49","guid":{"rendered":"http:\/\/62.131.51.129\/?p=1423"},"modified":"2016-12-13T08:48:49","modified_gmt":"2016-12-13T08:48:49","slug":"avro-format","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=1423","title":{"rendered":"Avro format"},"content":{"rendered":"<p>In Hive, we see a situation where a table definition is stored in a metastore. This table definition is linked to a directory where the data are stored. It is possible to use different formats here. One may think of a text format. But other formats are possible too. One example is the avro format. This format can be characterised by a combination of file definition and the actual data.<br \/>\nLet us see how these data can be imported.<br \/>\nA possibility is to use the scoop command to import data in avro format:<\/p>\n<pre>sqoop import \\\n--connect \"jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))\" \\\n--username scott --password binvegni \\\n--table fam \\\n--columns \"NUMMER, NAAM\" \\\n--m 1 \\\n--target-dir \/loudacre\/fam_avro \\\n--null-non-string '\\\\N' \\\n--as-avrodatafile\n<\/pre>\n<p>The result is composed of two parts:<br \/>\n1: The directory where the table content is stored. This can be found on HDFS in directory in \/loudacre\/fam_avro. This can be verified with &#8220;hadoop dfs -ls \/loudacre\/fam_avro&#8221;. This provides an overview of the files where the data are stored. When this is opened, one sees:<br \/>\n<a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1429\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-7.png\" alt=\"\" width=\"785\" height=\"323\" \/><\/a><\/p>\n<p>So, we have the data. The data definition is stored in the same directory as the directory where the scoop command was launched. To get this on HDFS platform, we issue &#8221; hadoop dfs -put sqoop_import_fam.avsc \/loudacre\/&#8221;. This stored the definition on HDFS level. This can be investigated:<\/p>\n<p><a href=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-8.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1430\" src=\"http:\/\/62.131.51.129\/wp-content\/uploads\/2016\/12\/Untitled-8.png\" alt=\"\" width=\"481\" height=\"477\" \/><\/a><\/p>\n<p>If we have the data (in directory \/loudacre\/fam_avro) and the definition (in file \/loudacre\/sqoop_import_fam.avsc), we may proceed by a definition in hive. Then hive is started by: &#8220;beeline -u jdbc:hive2:\/\/192.168.2.60:10000 -n training -p training&#8221;, followed by:<\/p>\n<pre>\nCREATE EXTERNAL TABLE fam_avro\nSTORED AS AVRO\nLOCATION '\/loudacre\/fam_avro'\nTBLPROPERTIES ('avro.schema.url'=\n'hdfs:\/loudacre\/sqoop_import_fam.avsc');\n<\/pre>\n<p>A check with &#8220;select * from fam_avro&#8221; showed the data as expected.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Hive, we see a situation where a table definition is stored in a metastore. This table definition is linked to a directory where the data are stored. It is possible to use different formats here. One may think of a text format. But other formats are possible too. One example is the avro format. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1425,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1423","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1423","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1423"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1423\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1423"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}