{"id":1000,"date":"2015-09-03T19:48:07","date_gmt":"2015-09-03T19:48:07","guid":{"rendered":"http:\/\/www.van-maanen.com\/?p=1000"},"modified":"2015-09-03T19:48:07","modified_gmt":"2015-09-03T19:48:07","slug":"flume","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=1000","title":{"rendered":"Flume"},"content":{"rendered":"<p>Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that is stored on Hadoop, ready to be analysed. The example is a series of events from a log that are collected. The file is then transferred to another platform (say Hadoop) to be processed further.<br \/>\nI got flume working on a sandbox for Cloudera. It looks as if three related parameters must be provided for: one parameter refers to a flume.conf file (found in \/etc\/flume-ng\/conf.empty\/flume.conf); one parameter refers to the name of the agent. This can be found in the flume.conf file which happens to be sandbox in my case. Finally a third parameter refers to a conf directory which parameters are set via flume-env.sh.<\/p>\n<pre>\n flume-ng agent --conf-file \/etc\/flume-ng\/conf.empty\/flume.conf --name sandbox  --conf \/opt\/examples\/flume\/conf\n<\/pre>\n<p>Another example is next statement, more or less similar to the command given above:<\/p>\n<pre>\nsudo flume-ng agent -c \/etc\/gphd\/flume\/conf -f \/etc\/gphd\/flume\/conf\/flume.conf -n agent\n<\/pre>\n<p>This one uses a conf file that is stored on \/etc\/gphd\/flume\/conf\/flume.conf. The name of the agent is &#8220;agent; the conf directory is \/etc\/gphd\/flume\/conf. From the conf file, we know that so-called netcat is set up, that listens to port 44444. We use this to start a terminal session that starts a stream on the local Linux platform:<\/p>\n<pre>\n[pivhdsne:~]$ nc localhost 44444\ntesting\nOK\n1\nOK\n3\nOK\n4\nOK\n<\/pre>\n<p>From the conf file, we know that these streams are stored on hdfs in directory \/user\/flume. If we look there, we see this file:<\/p>\n<pre>\n[pivhdsne:~]$ hadoop dfs -cat \/user\/flume\/FlumeData.1442949290540\nDEPRECATED: Use of this script to execute hdfs command is deprecated.\nInstead use the hdfs command for it.\n\ntesting\n1\n3\n4\n[pivhdsne:~]$ \n<\/pre>\n<p>We see a file that is created on hdfs that stores the streamed data from the Linux platform. This is a way to transfer files from Linux to hdfs. A final example is given below, with the same netcat listener:<\/p>\n<pre>\ncat test|nc localhost 44444\n<\/pre>\n<p>This creates a stream (via cat) that translates a file into a stream. The stream is sent to the netcat process that submits the stream to port 44444. The stream is then catched by flume and stored in hdfs files.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that is stored on Hadoop, ready to be analysed. The example is a series of events from a log that are collected. The file is then [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1001,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-1000","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-nice-to-know"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1000","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1000"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1000\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1000"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}