Navigation Bar

Sunday, July 3, 2016

Group functions using AWK

Many a times it is required to print a summary by aggregating the data in a file employee wise, year wise etc. This may be required mainly for reporting purposes.

Below is an example of printing employee wise sum of salaries from a file. The example also prints the number of occurances of each employee in the file.

emp_sal.txt
EmpName,MonthandYear,Salary
aaaaa,Jan-2016,25000
bbbbb,Jan-2016,30000
cccccccc,Jan-2016,16000
aaaaa,Feb-2016,25000
bbbbb,Feb-2016,30000
cccccccc,Feb-2016,16000

awk -F"," 'NR>1{arr[$1]=arr[$1]+$3;arr1[$1]++}END {for (a in arr) print a, arr[a],  arr1[a]}' emp_sal.txt

arr sums up the values in column 3 for each employee  - arr[$1]=arr[$1]+$3
arr1 gives the count of each employee in the file - arr1[$1]++
NR>1 skips the first line which is the header line. This can be omitted if there is no header.

If the file is a csv file with columns enclosed in '"' as may be the case many a time a slight tweaking of the script is required

"EmpName","MonthandYear","Salary"
"aaaaa","Jan-2016","25000"
"bbbbb","Jan-2016","30000"
"cccccccc","Jan-2016","16000"
"aaaaa","Feb-2016","25000"
"bbbbb","Feb-2016","30000"
"cccccccc","Feb-2016","16000"

awk -F"\",\"" 'NR>1{arr[$1]=arr[$1]+$3;arr1[$1]++}END {for (a in arr) print substr(a,2), arr[a],  arr1[a]}' emp_sal.txt
Here the delimiter is taken as "," and the leading " is removed using substr.


Quote for the day
“Whether You Think You Can Or Think You Can’t, You’re Right.”- Henry Ford

No comments:

Post a Comment