{"id":737,"date":"2014-12-18T10:31:29","date_gmt":"2014-12-18T10:31:29","guid":{"rendered":"http:\/\/tomvanmaanen.nl\/?p=737"},"modified":"2014-12-18T10:31:29","modified_gmt":"2014-12-18T10:31:29","slug":"a-nice-utility-to-investigate-files-on-unix","status":"publish","type":"post","link":"http:\/\/archief.van-maanen.com\/?p=737","title":{"rendered":"AWK to investigate files on Unix"},"content":{"rendered":"<p>Today, I worked with the Unix&#8217; awk utility. This is an extremely potent utility to investigate text files on a Unix platform. It can be invoked from the terminal command line. The command must start with awk.<\/p>\n<p>The keyword awk is followed by a script that is positioned between quotes. After the quotes, the textfile is mentioned (say ww-ii-data.txt).<\/p>\n<p>When some items need to initialised, we have the begin clause. The beginclause is positioned between brackets {}.<br \/>\nAfter that a selection can be made on lines with a selection between slashes. The actions on the line  are then also positioned between brackets. Finaly after the END, an end-clause may be included. We then have:<\/p>\n<p>awk &#8216;BEGIN {} \/selection\/ {} END {}&#8217; file.<\/p>\n<p>As an example:<\/p>\n<pre>\nawk ' \nBEGIN {count=0;max=0}\n \/\/{\n   temp = substr($0,37,3) + 0;\n   count++;\n   if (max< temp)\n      max=temp\n        }\nEND {print \"regels:  \", count,\" max in Celcius\", (5\/9)*(max-32);}\n'   ww-ii-data.txt\n<\/pre>\n<p>I noticed that variables can be used. No declaration is needed. Nice.<\/p>\n<p>An alternative programme is written on a file where columns are separated by commas. In that case, the seperator must be included in the BEGIN clause. This is accomplished with \"FS=\"separator code\"\". If that is done, the different columns are labelled as $1, $2, etc. This allows you to directly access such a column. If one would like to use this columns, one may use a variable $1, $2 that stands for this column.<\/p>\n<pre>\nawk ' \nBEGIN {count=0;max=0;FS=\",\"}\n \/\/ {\n   temp = $3 + 0;\n   count++;\n   if (max< temp)\n      max=temp\n        }\nEND {print \"regels:  \", count,\" max \", max;}\n'   \/home\/hadoop\/a.csv\n<\/pre>\n<p>Finally, a statement to remove end-of-line characters in a UNIX file:<\/p>\n<pre\nawk -v RS='\"[^\"]*\"' -v ORS= '{gsub(\/\\n\/, \" \", RT); print $0 RT}' {Source_File} | tr -d \"\\015\" > {Processed_File}\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Today, I worked with the Unix&#8217; awk utility. This is an extremely potent utility to investigate text files on a Unix platform. It can be invoked from the terminal command line. The command must start with awk. The keyword awk is followed by a script that is positioned between quotes. After the quotes, the textfile [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":754,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-737","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/737","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=737"}],"version-history":[{"count":0,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=\/wp\/v2\/posts\/737\/revisions"}],"wp:attachment":[{"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=737"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=737"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/archief.van-maanen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=737"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}