{"id":1981,"date":"2018-10-07T21:51:19","date_gmt":"2018-10-07T19:51:19","guid":{"rendered":"http:\/\/van-maanen.com\/?p=1981"},"modified":"2018-10-07T21:51:19","modified_gmt":"2018-10-07T19:51:19","slug":"analysing-json-and-database-tables-in-spark","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=1981","title":{"rendered":"Analysing JSON and database tables in Spark"},"content":{"rendered":"<p>In a <a href=\"http:\/\/van-maanen.com\/index.php\/2018\/09\/29\/processing-csv-files-in-spark\/\">previous note<\/a>, I showed how CSV files can be analysed. One may use the same technique to analyse JSON files or tables in a database. First, analysing JSON files can be analysed with code that looks like:<\/p>\n<pre>\nval jsonRDD = sc.wholeTextFiles(\"\/user\/tom\/baby_names.json\").map(x => x._2)\nval namesJson = sqlContext.read.json(jsonRDD)\nnamesJson.registerTempTable(\"names\")\nsqlContext.sql(\"select * from names\").collect.foreach(println)\n<\/pre>\n<p>Going to a table in a database, requires an additional jar, that allows to access a database. in case of MySQL, we may use mysql-connector-java-5.0.8-bin.jar, that must be stored in a directory next to other jars. We can then fire spark with spark-shell &#8211;jars lib\/mysql-connector-java-5.0.8-bin.jar.<br \/>\nThe code looks like:<\/p>\n<pre>\nval df1 = sqlContext.read.format(\"jdbc\").option(\"url\",\"jdbc:mysql:\/\/van-maanen.com\/wordpress\").option(\"driver\",\"com.mysql.jdbc.Driver\").option(\"dbtable\",\"wp_comments\").option(\"user\",\"tom\").option(\"password\",\"filpso\").load()\ndf1.registerTempTable(\"names\")\nsqlContext.sql(\"select * from names\").collect.foreach(println)\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In a previous note, I showed how CSV files can be analysed. One may use the same technique to analyse JSON files or tables in a database. First, analysing JSON files can be analysed with code that looks like: val jsonRDD = sc.wholeTextFiles(&#8220;\/user\/tom\/baby_names.json&#8221;).map(x => x._2) val namesJson = sqlContext.read.json(jsonRDD) namesJson.registerTempTable(&#8220;names&#8221;) sqlContext.sql(&#8220;select * from names&#8221;).collect.foreach(println) Going [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-1981","post","type-post","status-publish","format-standard","hentry","category-allgemein"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1981","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1981"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/1981\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1981"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1981"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1981"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}