Many a times it is required to print a summary by aggregating the data in a file employee wise, year wise etc. This may be required mainly for reporting purposes.
Below is an example of printing employee wise sum of salaries from a file. The example also prints the number of occurances of each employee in the file.
emp_sal.txt
EmpName,MonthandYear,Salary
aaaaa,Jan-2016,25000
bbbbb,Jan-2016,30000
cccccccc,Jan-2016,16000
aaaaa,Feb-2016,25000
bbbbb,Feb-2016,30000
cccccccc,Feb-2016,16000
awk -F"," 'NR>1{arr[$1]=arr[$1]+$3;arr1[$1]++}END {for (a in arr) print a, arr[a], arr1[a]}' emp_sal.txt
arr sums up the values in column 3 for each employee - arr[$1]=arr[$1]+$3
arr1 gives the count of each employee in the file - arr1[$1]++
NR>1 skips the first line which is the header line. This can be omitted if there is no header.
If the file is a csv file with columns enclosed in '"' as may be the case many a time a slight tweaking of the script is required
"EmpName","MonthandYear","Salary"
"aaaaa","Jan-2016","25000"
"bbbbb","Jan-2016","30000"
"cccccccc","Jan-2016","16000"
"aaaaa","Feb-2016","25000"
"bbbbb","Feb-2016","30000"
"cccccccc","Feb-2016","16000"
awk -F"\",\"" 'NR>1{arr[$1]=arr[$1]+$3;arr1[$1]++}END {for (a in arr) print substr(a,2), arr[a], arr1[a]}' emp_sal.txt
Here the delimiter is taken as "," and the leading " is removed using substr.
Quote for the day
“Whether You Think You Can Or Think You Can’t, You’re Right.”- Henry Ford
Below is an example of printing employee wise sum of salaries from a file. The example also prints the number of occurances of each employee in the file.
emp_sal.txt
EmpName,MonthandYear,Salary
aaaaa,Jan-2016,25000
bbbbb,Jan-2016,30000
cccccccc,Jan-2016,16000
aaaaa,Feb-2016,25000
bbbbb,Feb-2016,30000
cccccccc,Feb-2016,16000
awk -F"," 'NR>1{arr[$1]=arr[$1]+$3;arr1[$1]++}END {for (a in arr) print a, arr[a], arr1[a]}' emp_sal.txt
arr sums up the values in column 3 for each employee - arr[$1]=arr[$1]+$3
arr1 gives the count of each employee in the file - arr1[$1]++
NR>1 skips the first line which is the header line. This can be omitted if there is no header.
If the file is a csv file with columns enclosed in '"' as may be the case many a time a slight tweaking of the script is required
"EmpName","MonthandYear","Salary"
"aaaaa","Jan-2016","25000"
"bbbbb","Jan-2016","30000"
"cccccccc","Jan-2016","16000"
"aaaaa","Feb-2016","25000"
"bbbbb","Feb-2016","30000"
"cccccccc","Feb-2016","16000"
awk -F"\",\"" 'NR>1{arr[$1]=arr[$1]+$3;arr1[$1]++}END {for (a in arr) print substr(a,2), arr[a], arr1[a]}' emp_sal.txt
Here the delimiter is taken as "," and the leading " is removed using substr.
Quote for the day
“Whether You Think You Can Or Think You Can’t, You’re Right.”- Henry Ford
No comments:
Post a Comment