Difference between revisions of "Awk/scripts"

From Christoph's Personal Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
==Basic examples==
 +
''Note: The examples in this section have been taken directly from the original Awk paper.<ref name="Aho">Aho AV, Kernighan BW, Weinberger PJ (1978). "Awk - A pattern scanning and processing language". Second Edition, Bell Laboratories, 8 pp.</ref>''
 +
 +
===Printing===
 +
*Print all input lines whose length exceeds 72 characters:
 +
length > 72
 +
*Print all lines with an even number of fields:
 +
NF % 2 == 0
 +
*Replace the first field of each line by its logarithm:
 +
{ $1 = log($1); print }
 +
*Print the third and second columns of a table in that order:
 +
{print $3,$2}
 +
*Print all input lines with an A, B, or C in the second field:
 +
$2 ~ /A|B|C/
 +
*Print all lines in which the first field is different from the previous first field:
 +
$1 != prev { print; prev = $1 }
 +
*Print each record preceded by the record number and the number of fields:
 +
{print NR, NR, $0}
 +
*Write the first field, <code>$1</code>, on the foo <code>foo1</code>, and the second field on the file <code>foo2</code>:
 +
{print $1>"foo1"; print $2>"foo2"}
 +
*Append the output to the file <code>foo</code>:
 +
print $1>>"foo"
 +
*Use the contents of field 2 as a file name:
 +
print $1>$2
 +
*Print <code>$1</code> as a floating point number 8 digits wide, with two after the decimal point, and <code>$2</code> as a 10-digit long decimal number, followed by a newline:
 +
printf "%8.2f %10ld\n", $1, $2
 +
 +
===BEGIN and END===
 +
''Note: The special pattern <code>BEGIN</code> matches the beginning of the input, before the first record is read. The pattern <code>END</code> matches the end of the input, after the last record has been processed. <code>BEGIN</code> and <code>END</code> thus provide a way to gain control before and after processing, for initialization and wrapup.''
 +
 +
*As an example, the field separator can be set to a colon by:
 +
BEGIN { FS = ":" }
 +
... rest of program ...
 +
*Or, the input lines may be counted by:
 +
END { print NR }
 +
 +
If <code>BEGIN</code> is present, it must be the first pattern; <code>END</code> must be the last if used.
 +
 +
===Regular expressions===
 +
*Print all lines which contain any occurance of the name "smith":
 +
/smith/
 +
 +
*<code>()</code> (parentheses) &mdash; for grouping;
 +
*<code>|</code> (pipe) &mdash; for alternatives;
 +
*<code>+</code> &mdash; for "one or more"; and
 +
*<code>?</code> &mdash; for "zero or more"
 +
 +
*Print all lines which contain any of the names "Aho", "Weinberger", or "Kernighan", whether capitalized or not:
 +
/[Aa]ho|[Ww]einberger|[Kk]ernighan/
 +
*Match any string of characters enclosed in slashes:
 +
/\/.*\//
 +
*Print all lines where the first field matches "john" or "John" (also matches "Johnson", etc.):
 +
$1 ~ /[jJ]ohn/
 +
*Match exactly "john" or "John":
 +
$1 ~ /^[jJ]ohn$/
 +
 +
===Relational expressions===
 +
The relational operators are: <code><</code>, <code><=</code>, <code>==</code>, <code>!=</code>, <code>>=</code>, and <code>></code>.
 +
 +
*Select lines where the second field is at least 100 greater than the first field:
 +
$2 > $1 + 100
 +
*Print lines with an even number of fields:
 +
NF % 2 == 0
 +
*Select lines that begin with an "s", "t", "u", etc.:
 +
$1 >= "s"
 +
*Perform a string comparison (note: In the absence of any other information, fields are treated as strings):
 +
$1 > $2
 +
 +
===Combination of Patterns===
 +
 
==Examples==
 
==Examples==
 
''Note: Taken directly from Patrick Hartigan's awk page. The # is the comment character for awk. 'field' means 'column'.''
 
''Note: Taken directly from Patrick Hartigan's awk page. The # is the comment character for awk. 'field' means 'column'.''
Line 136: Line 206:
 
</pre>
 
</pre>
  
 +
==References==
 +
<references/>
 
==External links==
 
==External links==
 
*[http://www.netlib.org/research/awkbookcode/ AWK Book Code] &mdash; contains all the programs from The AWK Programming Language, by Aho, Kernighan and Weinberger (Addison-Wesley, 1988). They have been packed by the bundle program found on page 81, and can be unpacked by the unbundle on page 82, also included here. A text editor will also do this pretty easily.
 
*[http://www.netlib.org/research/awkbookcode/ AWK Book Code] &mdash; contains all the programs from The AWK Programming Language, by Aho, Kernighan and Weinberger (Addison-Wesley, 1988). They have been packed by the bundle program found on page 81, and can be unpacked by the unbundle on page 82, also included here. A text editor will also do this pretty easily.

Revision as of 05:17, 6 August 2007

Basic examples

Note: The examples in this section have been taken directly from the original Awk paper.[1]

Printing

  • Print all input lines whose length exceeds 72 characters:
length > 72
  • Print all lines with an even number of fields:
NF % 2 == 0
  • Replace the first field of each line by its logarithm:
{ $1 = log($1); print }
  • Print the third and second columns of a table in that order:
{print $3,$2}
  • Print all input lines with an A, B, or C in the second field:
$2 ~ /A|B|C/
  • Print all lines in which the first field is different from the previous first field:
$1 != prev { print; prev = $1 }
  • Print each record preceded by the record number and the number of fields:
{print NR, NR, $0}
  • Write the first field, $1, on the foo foo1, and the second field on the file foo2:
{print $1>"foo1"; print $2>"foo2"}
  • Append the output to the file foo:
print $1>>"foo"
  • Use the contents of field 2 as a file name:
print $1>$2
  • Print $1 as a floating point number 8 digits wide, with two after the decimal point, and $2 as a 10-digit long decimal number, followed by a newline:
printf "%8.2f %10ld\n", $1, $2

BEGIN and END

Note: The special pattern BEGIN matches the beginning of the input, before the first record is read. The pattern END matches the end of the input, after the last record has been processed. BEGIN and END thus provide a way to gain control before and after processing, for initialization and wrapup.

  • As an example, the field separator can be set to a colon by:
BEGIN { FS = ":" }
... rest of program ...
  • Or, the input lines may be counted by:
END { print NR }

If BEGIN is present, it must be the first pattern; END must be the last if used.

Regular expressions

  • Print all lines which contain any occurance of the name "smith":
/smith/
  • () (parentheses) — for grouping;
  • | (pipe) — for alternatives;
  • + — for "one or more"; and
  • ? — for "zero or more"
  • Print all lines which contain any of the names "Aho", "Weinberger", or "Kernighan", whether capitalized or not:
/[Aa]ho|[Ww]einberger|[Kk]ernighan/
  • Match any string of characters enclosed in slashes:
/\/.*\//
  • Print all lines where the first field matches "john" or "John" (also matches "Johnson", etc.):
$1 ~ /[jJ]ohn/
  • Match exactly "john" or "John":
$1 ~ /^[jJ]ohn$/

Relational expressions

The relational operators are: <, <=, ==, !=, >=, and >.

  • Select lines where the second field is at least 100 greater than the first field:
$2 > $1 + 100
  • Print lines with an even number of fields:
NF % 2 == 0
  • Select lines that begin with an "s", "t", "u", etc.:
$1 >= "s"
  • Perform a string comparison (note: In the absence of any other information, fields are treated as strings):
$1 > $2

Combination of Patterns

Examples

Note: Taken directly from Patrick Hartigan's awk page. The # is the comment character for awk. 'field' means 'column'.

# Print first two fields in opposite order:
  awk '{ print $2, $1 }' file

# Print lines longer than 72 characters:
  awk 'length > 72' file

# Print length of string in 2nd column
  awk '{print length($2)}' file

# Add up first column, print sum and average:
       { s += $1 }
  END  { print "sum is", s, " average is", s/NR }

# Print fields in reverse order:
  awk '{ for (i = NF; i > 0; --i) print $i }' file

# Print the last line
      {line = $0}
  END {print line}

# Print the total number of lines that contain the word Pat
  /Pat/ {nlines = nlines + 1}
  END {print nlines}

# Print all lines between start/stop pairs:
  awk '/start/, /stop/' file

# Print all lines whose first field is different from previous one:
  awk '$1 != prev { print; prev = $1 }' file

# Print column 3 if column 1 > column 2:
  awk '$1 > $2 {print $3}' file

# Print line if column 3 > column 2:
  awk '$3 > $2' file

# Count number of lines where col 3 > col 1
  awk '$3 > $1 {print i + "1"; i++}' file

# Print sequence number and then column 1 of file:
  awk '{print NR, $1}' file

# Print every line after erasing the 2nd field
  awk '{$2 = ""; print}' file

# Print hi 28 times
  yes | head -28 | awk '{ print "hi" }'

# Print hi.0010 to hi.0099 (NOTE IRAF USERS!)
  yes | head -90 | awk '{printf("hi00%2.0f \n", NR+9)}'

# Print out 4 random numbers between 0 and 1
  yes | head -4 | awk '{print rand()}'

# Print out 40 random integers modulo 5
  yes | head -40 | awk '{print int(100*rand()) % 5}'

# Replace every field by its absolute value
  { for (i = 1; i <= NF; i=i+1) if ($i < 0) $i = -$i print}

# If you have another character that delimits fields, use the -F option
# For example, to print out the phone number for Jones in the following file,
# 000902|Beavis|Theodore|333-242-2222|149092
# 000901|Jones|Bill|532-382-0342|234023
# ...
# type
  awk -F"|" '$2=="Jones"{print $4}' filename

# Some looping commands
# Remove a bunch of print jobs from the queue
  BEGIN{
	for (i=875;i>833;i--){
		printf "lprm -Plw %d\n", i
	} exit
       }

# Formatted printouts are of the form printf( "format\n", value1, value2, ... valueN)
#   e.g. printf("howdy %-8s What it is bro. %.2f\n", $1, $2*$3)
#	%s = string
#	%-8s = 8 character string left justified
# 	%.2f = number with 2 places after .
#	%6.2f = field 6 chars with 2 chars after .
#	\n is newline
#	\t is a tab

# Print frequency histogram of column of numbers
$2 <= 0.1 {na=na+1}
($2 > 0.1) && ($2 <= 0.2) {nb = nb+1}
($2 > 0.2) && ($2 <= 0.3) {nc = nc+1}
($2 > 0.3) && ($2 <= 0.4) {nd = nd+1}
($2 > 0.4) && ($2 <= 0.5) {ne = ne+1}
($2 > 0.5) && ($2 <= 0.6) {nf = nf+1}
($2 > 0.6) && ($2 <= 0.7) {ng = ng+1}
($2 > 0.7) && ($2 <= 0.8) {nh = nh+1}
($2 > 0.8) && ($2 <= 0.9) {ni = ni+1}
($2 > 0.9) {nj = nj+1}
END {print na, nb, nc, nd, ne, nf, ng, nh, ni, nj, NR}

# Find maximum and minimum values present in column 1
NR == 1 {m=$1 ; p=$1}
$1 >= m {m = $1}
$1 <= p {p = $1}
END { print "Max = " m, "   Min = " p }

# Example of defining variables, multiple commands on one line
NR == 1 {prev=$4; preva = $1; prevb = $2; n=0; sum=0}
$4 != prev {print preva, prevb, prev, sum/n; n=0; sum=0; prev = $4; preva = $1; prevb = $2}
$4 == prev {n++; sum=sum+$5/$6}
END {print preva, prevb, prev, sum/n}

# Example of defining and using a function, inserting values into an array
# and doing integer arithmetic mod(n). This script finds the number of days
# elapsed since Jan 1, 1901. (from http://www.netlib.org/research/awkbookcode/ch3)
function daynum(y, m, d,    days, i, n)
{   # 1 == Jan 1, 1901
    split("31 28 31 30 31 30 31 31 30 31 30 31", days)
    # 365 days a year, plus one for each leap year
    n = (y-1901) * 365 + int((y-1901)/4)
    if (y % 4 == 0) # leap year from 1901 to 2099
        days[2]++
    for (i = 1; i < m; i++)
        n += days[i]
    return n + d
}
    { print daynum($1, $2, $3) }

# Example of using substrings
# substr($2,9,7) picks out characters 9 thru 15 of column 2
{print "imarith", substr($2,1,7) " - " $3, "out."substr($2,5,3)}
{print "imarith", substr($2,9,7) " - " $3, "out."substr($2,13,3)}
{print "imarith", substr($2,17,7) " - " $3, "out."substr($2,21,3)}
{print "imarith", substr($2,25,7) " - " $3, "out."substr($2,29,3)}

References

  1. Aho AV, Kernighan BW, Weinberger PJ (1978). "Awk - A pattern scanning and processing language". Second Edition, Bell Laboratories, 8 pp.

External links

  • AWK Book Code — contains all the programs from The AWK Programming Language, by Aho, Kernighan and Weinberger (Addison-Wesley, 1988). They have been packed by the bundle program found on page 81, and can be unpacked by the unbundle on page 82, also included here. A text editor will also do this pretty easily.
  • Patrick Hartigan's awk page