{"id":1905,"date":"2018-05-23T12:51:28","date_gmt":"2018-05-23T10:51:28","guid":{"rendered":"http:\/\/van-maanen.com\/?p=1905"},"modified":"2018-05-23T12:51:28","modified_gmt":"2018-05-23T10:51:28","slug":"write-an-avro-file","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=1905","title":{"rendered":"Write an AVRO file"},"content":{"rendered":"<p>Below, I provide some Python code to write an AVRO file. An AVRO file consists of a scheme and a set of records. The records are written in binary format. The scheme is as follows:<\/p>\n<pre>\n{\"type\": \"record\",\n \"name\": \"StringPair\",\n \"doc\": \"A pair of strings.\",\n \"fields\": [\n     {\"name\": \"left\", \"type\": \"string\"},\n     {\"name\": \"right\", \"type\": \"string\"}]}\n<\/pre>\n<p>The code to write such file is as follows:<\/p>\n<pre>\nimport sys\nfrom avro import schema\nfrom avro import io\nfrom avro import datafile\nif __name__ == '__main__':\n    if len(sys.argv) != 2:\n            sys.exit('Usage: %s <data_file>' % sys.argv[0])\navro_file = sys.argv[1]\nwriter = open(avro_file, 'wb')\ndatum_writer = io.DatumWriter()\nschema_object = schema.Parse(open(b'C:\\\\Users\\\\tmaanen\\\\.spyder-py3\\\\tom.avsc', \"r\").read())\ndfw = datafile.DataFileWriter(writer, datum_writer, schema_object)\nfor line in sys.stdin.readlines():\n    (left, right) = line.split(',')\n    dfw.append({'left':left, 'right':right});\ndfw.close()\n<\/pre>\n<p>The script can be run on the command line as C:\\ProgramData\\Anaconda3\\python.exe C:\\Users\\tmaanen\\.spyder-py3\\TomHdfs.py C:\\Users\\tmaanen\\.spyder-py3\\a.avro<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Below, I provide some Python code to write an AVRO file. An AVRO file consists of a scheme and a set of records. The records are written in binary format. The scheme is as follows: {&#8220;type&#8221;: &#8220;record&#8221;, &#8220;name&#8221;: &#8220;StringPair&#8221;, &#8220;doc&#8221;: &#8220;A pair of strings.&#8221;, &#8220;fields&#8221;: [ {&#8220;name&#8221;: &#8220;left&#8221;, &#8220;type&#8221;: &#8220;string&#8221;}, {&#8220;name&#8221;: &#8220;right&#8221;, &#8220;type&#8221;: &#8220;string&#8221;}]} The [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-1905","post","type-post","status-publish","format-standard","hentry","category-allgemein"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1905"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1905\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1905"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}