{"id":1957,"date":"2018-09-08T22:56:45","date_gmt":"2018-09-08T20:56:45","guid":{"rendered":"http:\/\/van-maanen.com\/?p=1957"},"modified":"2018-09-08T22:56:45","modified_gmt":"2018-09-08T20:56:45","slug":"useful-scala-programme","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=1957","title":{"rendered":"Useful Scala Programme"},"content":{"rendered":"<p>I saw a small Scala programme that allows you to calculate subtotals. The idea is that a flat file is provided with a name and a subtotal. A given namen can occur more than one, thus providing a subtotal more than once. The question is to calculate the total per name. Let me provide some records from a file that provides key data. <\/p>\n<pre>\nABIGAIL,46\nABIGAIL,13\nABDULLAH,11\n<\/pre>\n<p>The programme then looks like:<\/p>\n<pre>\nval testTom = sc.textFile(\"\/user\/training\/testTom.txt\")\nval filteredRows = testTom.filter(line => !line.contains(\"Comma\")).map(line => line.split(\",\"))\nfilteredRows.map ( n => (n(0),n(1).toInt)).reduceByKey((a,b) => a+b).sortBy(_._2).collect.foreach(println _)\n<\/pre>\n<p>The first line reads a file from a Hadoop platform and stores it as a so-called RDD.<br \/>\nThe second line removes the first line and splits the other lines in elements that are split by comma.<br \/>\nThen, in the third line, the first column is used as a key and the data from the second column are added over the different values from the first column. The results are sorted. Finally the results are printed.<br \/>\nThe results look like:<\/p>\n<pre>\n(JAYDEN,7807)\n(MATTHEW,7891)\n(MICHAEL,9187)\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I saw a small Scala programme that allows you to calculate subtotals. The idea is that a flat file is provided with a name and a subtotal. A given namen can occur more than one, thus providing a subtotal more than once. The question is to calculate the total per name. Let me provide some [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-1957","post","type-post","status-publish","format-standard","hentry","category-data-warehousing"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1957","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1957"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1957\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1957"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1957"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1957"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}