{"id":969,"date":"2015-08-16T20:41:47","date_gmt":"2015-08-16T20:41:47","guid":{"rendered":"http:\/\/www.van-maanen.com\/?p=969"},"modified":"2015-08-16T20:41:47","modified_gmt":"2015-08-16T20:41:47","slug":"sending-data-via-avro","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=969","title":{"rendered":"Sending data via AVRO"},"content":{"rendered":"<p>I got a better understanding when I used AVRO to write data via PHP and to read them via Java. It demonstrated to me how data can be written in one language and subsequently be read in another language.<br \/>\nI use a file to have the data be written by PHP. Subsequently the data can be read by Java.<br \/>\nThe question then is: what is the advantage of using AVRO to have data been written in file. This can be compared to ordinary CSV files or a more advanced XML format.<br \/>\nLet us first write the data via PHP:<br \/>\n<a href=\"http:\/\/www.van-maanen.com\/wp-content\/uploads\/2015\/08\/php\">using this script<\/a><br \/>\nWe have now written in PHP some data to a file. The nice thing about it that the data are written with their description, as provided by the schema. Apparently, the schema shows that we write records with two elements: a number and a name. This schema is only written once and the data are written after the schema.<br \/>\nWhen one compares this to a normal CSV file, one may notice that the schema is added to the file. Hence a programme that reads the data may verify that the data are in the correct format.<br \/>\nOne may argue that this is similar to XML but with XML, the schema is repeated with every data element. Hence XML files tend to very large as compared to CSV. An avro file avoids this by providing the sceme just once.<\/p>\n<p>The file may be read in Java by:<\/p>\n<pre>\npackage avro;\n\nimport java.io.File;\nimport java.io.IOException;\n\nimport org.apache.avro.Schema;\nimport org.apache.avro.Schema.Parser;\nimport org.apache.avro.file.DataFileReader;\nimport org.apache.avro.file.DataFileWriter;\nimport org.apache.avro.generic.GenericData;\nimport org.apache.avro.generic.GenericDatumReader;\nimport org.apache.avro.generic.GenericDatumWriter;\nimport org.apache.avro.generic.GenericRecord;\nimport org.apache.avro.io.DatumReader;\nimport org.apache.avro.io.DatumWriter;\n\npublic class GenericMain {\n\tpublic static void main(String[] args) throws IOException {\n\t\tSchema schema = new Parser().parse(new File(\"C:\/inetpub\/wwwroot\/user.avsc\"));\n\t\tFile file = new File(\"C:\/inetpub\/wwwroot\/data.avr\");\n\t\tDatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);\n\t\tDataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);\n\t\tGenericRecord member = null;\n\t\twhile (dataFileReader.hasNext()) {\n\t\t\t\/\/ Reuse user object by passing it to next(). This saves us from\n\t\t\t\/\/ allocating and garbage collecting many objects for files with\n\t\t\t\/\/ many items.\n\t\t\tmember = dataFileReader.next(member);\n\t\t\tSystem.out.println(member);\n\t\t}\n\t\t\n\t}\n}\n\n<\/pre>\n<p>This demonstrates that the data can be read in another language. In this case, Java is used to read the file. The Java programme just needs to know where the data is stored (in data.avr) and how the schema looks like (provided in user.avsc). After that the file can be read and its records can be accessed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I got a better understanding when I used AVRO to write data via PHP and to read them via Java. It demonstrated to me how data can be written in one language and subsequently be read in another language. I use a file to have the data be written by PHP. Subsequently the data can [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":970,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-969","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-nice-to-know"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/969","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=969"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/969\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=969"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=969"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=969"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}