Difference between revisions of "Awk"

From Christoph's Personal Wiki
Jump to: navigation, search
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''AWK''' is a general purpose [[:Category:Scripting languages|scripting language]] that is designed for processing text based data, either in files or data streams.
+
'''Awk''' is a general purpose [[:Category:Scripting languages|scripting language]] that is designed for processing text based data, either in files or data streams. This article will mainly consider the '''GNU Awk''' (aka '''Gawk''') variant. Note: Awk gets its name from its authors: '''A'''ho, '''W'''einberger, and '''K'''ernighan.
  
Awk is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and [[Regular expression|regular expression]]s. The power, terseness, and limitations of awk programs and [[sed]] scripts inspired Larry Wall to write [[Perl]].
+
In the words of its creators,
 +
<div style="padding: 1em; margin: 10px; border: 2px dotted #18e;">
 +
''Awk'' is a programming language whose basic operation is to search a set of files for patterns, and to perform specified actions upon lines or fields of lines which contain instances of those patterns. ''Awk'' makes certain data selection and transformation operations easy to express.<ref name="Aho">Aho AV, Kernighan BW, Weinberger PJ (1978). "Awk - A pattern scanning and processing language". Second Edition, Bell Laboratories, 8 pp.</ref>
 +
</div>
  
== Structure of awk programs ==
+
''Awk'' is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and [[Regular expression|regular expression]]s. The power, terseness, and limitations of ''Awk'' programs and [[sed]] scripts inspired Larry Wall to write [[Perl]].
Generally speaking, two pieces of data are given to awk: a command file and a primary input file. A command file (which can be an actual file, or can be included in the [[:Category:Linux Command Line Tools|command line]] invocation of awk) contains a series of commands which tell awk how to process the input file. The primary input file is typically text that is formatted in some way; it can be an actual file, or it can be read by awk from the standard input. A typical awk program consists of a series of lines, each of the form
+
  
  /''pattern''/ { ''action'' }
+
  see: [[Awk/scripts]] for detailed examples or [[Awk/tips_and_tricks]].
  
where ''pattern'' is a [[Regular expression|regular expression]] and ''action'' is a command. Awk looks through the input file; when it finds a line that matches ''pattern'', it executes the command(s) specified in ''action''. Alternate line forms include:
+
==Quick tutorial==
 +
''Note: The following was inspired by Cesar A. Murakami.''
  
; <tt>BEGIN { ''action'' }</tt>
+
===Some Special variables (Input/Output)===
: Executes ''action'' commands at the beginning of the script execution, i.e., before any of the lines are processed.
+
; <tt>END { ''action'' }</tt>
+
: Similar to the previous form, but executes ''action'' ''after'' the end of input.
+
; <tt>/''pattern''/</tt>
+
: Prints any lines matching ''pattern''.
+
; <tt>{ ''action'' }</tt>
+
: Executes ''action'' for each line in the input.
+
  
Each of these forms can be included multiple times in the command file. Lines in the command file are executed in order, so if there are two "<tt>BEGIN</tt>" statements, the first is executed, then the second, and then the rest of the lines. <tt>BEGIN</tt> and <tt>END</tt> statements do ''not'' have to be located before and after (respectively) the other lines in the command file.
+
; FS: Input field separator (char or regex)
 +
; RS: Input record separator
 +
; NF: quantity of fields of the current record
 +
; NR: Current input record number
 +
; OFS: Output field separator
 +
; ORS: Output record separator
  
Awk was created as a broadbased replacement to C algorithmic approaches developed to integrate text parsing methods.
+
===Some more special variables (files and match)===
  
== Awk commands ==
+
; ARGC: number of parameters passed to awk
Awk commands are the statement that is substituted for ''action'' in the examples above. Awk commands can include function calls, variable assignments, calculations, or any combination thereof. Awk contains built-in support for many functions; many more are provided by the various flavors of awk. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.
+
; ARGV: the parameters (files)
 +
; FILENAME: the current file being processed
 +
; FNR: record number of FILENAME
 +
; RSTART: starting position of string matched by match() function
 +
; RLENGTH: length of string matched by match() function
  
For brevity, the enclosing curly braces ( <tt>{ }</tt> ) will be omitted from these examples.
+
===Conditionals===
  
=== The ''print'' command ===
+
if (var1 > 2) {
The ''print'' command is used to output text. The simplest form of this command is
+
  print "greater than 2";
 +
  var1 = var1 + 1;
 +
}
 +
if ( var1 in arr )
 +
  printf "%d is an array index\n",var1;
 +
else {
 +
  arr[var1] = var2;
 +
  printf "%d was not an array index\n",var1;
 +
}
  
print
+
===Loops===
  
This displays the contents of the current line. In awk, lines are broken down into ''fields'', and these can be displayed separately:
+
for (i=1; i<=10; i++) {
 +
  print i;
 +
}
 +
for (i in arr) {
 +
  print arr[i];
 +
}
 +
while ( x != 0 ) {
 +
  do_something;
 +
  print x;
 +
}
  
; <tt>print $1</tt>
+
===Functions===
: Displays the first field of the current line
+
; <tt>print $1, $3</tt>
+
: Displays the first and third fields of the current line, separated by a predefined string called the output field separator (OFS) whose default value is a single space character
+
  
Although these fields (<tt>$X</tt>) may bear resemblance to variables (the <tt>$</tt> symbol indicates variables in [[perl]]), they actually refer to the fields of the current line. A special case, <tt>$0</tt>, refers to the entire line. In fact, the commands "<tt>print</tt>" and "<tt>print $0</tt>" are identical in functionality.
+
function mysum(param1, param2) {
 +
  return param1 + param2;
 +
}
  
The <tt>print</tt> command can also display the results of calculations and/or function calls:
+
===Code Blocks ('PATTERN {ACTION}')===
  
  print 3+2
+
  BEGIN {
print foobar(3)
+
  # Code to be executed before record processing.
  print foobar(variable)
+
  }
  print sin(3-2)
+
  /regex/ {
 
+
  # Code to be executed when $0 contains a substring that are matched by regex.
Output may be sent to a file
+
}
 
+
  $1 == "XYZ" {
  print "expression" > "file name"
+
  # Code to be executed when the first field is equal to "XYZ".
 
+
  }
=== Variables, et cetera ===
+
$2 ~ /regex/ {
Variable names can use any of the characters <tt>[A-Za-z0-9_]</tt>, with the exception of language keywords. The operators <tt>+ - * /</tt> are addition, subtraction, multiplication, and division, respectively. For string concatenation, simply place two variables (or string constants) next to each other, optionally with a space in between. String constants are delimited by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using <tt>#</tt> as the first character on a line.
+
  # Code to be executed when the second field contains a substring that are matched by regex.
 
+
}
=== User-defined functions ===
+
! PATTERN {
In a format similar to [[C programming language|C]], function definitions consist of the keyword <tt>function</tt>, the function name, argument names and the function body. Here is an example function:
+
  # Code to be executed when PATTERN is not matched / satisfied.
 
+
}
  function add_three(number, temp) {
+
PAT1, PAT2 {
   temp = number + 3
+
  # Code to be executed for record range (PAT1 is matched on start record, PAT2 on final, both lines  included in the range).
   return temp
+
}
 +
  {
 +
   # Code to be executed for every record.
 +
}
 +
END {
 +
   # Code to be executed after record processing.
 
  }
 
  }
  
This statement can be invoked as follows:
+
==Structure of ''Awk'' programs==
 +
Generally speaking, two pieces of data are given to ''Awk'': a command file and a primary input file. A command file (which can be an actual file, or can be included in the [[:Category:Linux Command Line Tools|command line]] invocation of ''Awk'') contains a series of commands which tell ''Awk'' how to process the input file. The primary input file is typically text that is formatted in some way; it can be an actual file, or it can be read by ''Awk'' from the standard input. A typical ''Awk'' program consists of a series of lines, each of the form
  
  print add_three(36)    # prints '39'
+
  /''pattern''/ { ''action'' }
  
Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.
+
where ''pattern'' is a [[Regular expression|regular expression]] and ''action'' is a command. ''Awk'' looks through the input file; when it finds a line that matches ''pattern'', it executes the command(s) specified in ''action''. Alternate line forms include:
  
== Sample applications ==
+
;<code>BEGIN { ''action'' }</code>: Executes ''action'' commands at the beginning of the script execution, i.e., before any of the lines are processed.
===Hello World===
+
;<code>END { ''action'' }</code>: Similar to the previous form, but executes ''action'' ''after'' the end of input.
Here is the ubiquitous "[[Hello world program]]" program written in AWK:
+
;<code>/''pattern''/</code>: Prints any lines matching ''pattern''.
 +
;<code>{ ''action'' }</code>: Executes ''action'' for each line in the input.
  
BEGIN { print "Hello, world!"; exit }
+
Each of these forms can be included multiple times in the command file. Lines in the command file are executed in order, so if there are two "<code>BEGIN</code>" statements, the first is executed, then the second, and then the rest of the lines. <code>BEGIN</code> and <code>END</code> statements do ''not'' have to be located before and after (respectively) the other lines in the command file.
  
===Print lines longer than 80 characters===
+
''Awk'' was created as a broadbased replacement to C algorithmic approaches developed to integrate text parsing methods.
Print all lines longer than 80 characters. Note that the default action is to print the current line.
+
  
length > 80
+
==''Awk'' commands==
 +
''Awk'' commands are the statement that is substituted for ''action'' in the examples above. ''Awk'' commands can include function calls, variable assignments, calculations, or any combination thereof. ''Awk'' contains built-in support for many functions; many more are provided by the various flavors of ''Awk''. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.
  
===Print a count of words===
+
For brevity, the enclosing curly braces ( <code>{ }</code> ) will be omitted from these examples.
Count words in the input, and print lines, words, and characters (like [[Wc (command)|wc]])
+
  
{ w += NF; c += length}
+
===The ''print'' command===
  END { print NR, w, c }
+
The ''print'' command is used to output text. The simplest form of this command is
 +
  print
  
===Sum first column===
+
This displays the contents of the current line. In ''Awk'', lines are broken down into ''fields'', and these can be displayed separately:
Sum first column of input
+
;<code>print $1</code>: Displays the first field of the current line
 +
;<code>print $1, $3</code>: Displays the first and third fields of the current line, separated by a predefined string called the output field separator (OFS) whose default value is a single space character
  
{ s += $1 }
+
Although these fields (<code>$X</code>) may bear resemblance to variables (the <code>$</code> symbol indicates variables in [[Perl]]), they actually refer to the fields of the current line. A special case, <code>$0</code>, refers to the entire line. In fact, the commands "<code>print</code>" and "<code>print $0</code>" are identical in functionality.
END { print s }
+
  
===Calculate word frequencies===
+
The <code>print</code> command can also display the results of calculations and/or function calls:
Word frequency, (uses [[associative array]]s)
+
print 3+2
 +
print foobar(3)
 +
print foobar(variable)
 +
print sin(3-2)
  
BEGIN { FS="[^a-zA-Z]+"}
+
Output may be sent to a file:
   
+
  print "expression" > "file name"
{ for (i=1; i<=NF; i++)
+
      words[tolower($i)]++
+
}
+
+
END { for (i in words)
+
    print i, words[i]
+
}
+
  
== Associative arrays ==
+
===Variables, et cetera===
Awk has built-in, language-level support for associative arrays.
+
Variable names can use any of the characters <code>[A-Za-z0-9_]</code>, with the exception of language keywords. The operators <code>+ - * /</code> are addition, subtraction, multiplication, and division, respectively. For string concatenation, simply place two variables (or string constants) next to each other, optionally with a space in between. String constants are delimited by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using <code>#</code> as the first character on a line.
  
For example:
+
===User-defined functions===
 +
In a format similar to [[C programming language|C]], function definitions consist of the keyword <code>function</code>, the function name, argument names and the function body. Here is an example function:
  
  phonebook["Sally Smart"] = "555-9999"
+
  function add_three(number,temp) {
phonebook["John Doe"] = "555-1212"
+
    temp = number+3
phonebook["John Doe"] = "555-1337"
+
    return temp
 
+
You can also loop through an associated array as follows:
+
 
+
for (name in phonebook)
+
{
+
    print name, " ", phonebook[name]
+
 
  }
 
  }
  
You can also check if an element is in the associative array, and delete elements from an associative array.
+
This statement can be invoked as follows:
 +
print add_three(36)    # prints '39'
  
Multi-dimensional associative arrays can be implemented in standard Awk using concatenation and e.g. SUBSEP:
+
Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.
  
{ # for every input line
+
==Predefined variables==
    multi[$1 SUBSEP $2]++;
+
;<code>FILENAME</code>: Name of current input file
}
+
;<code>RS</code>: Input record separator character (Default is new line)
#
+
;<code>OFS</code>: Output field separator string (Blank is default)
END {
+
;<code>ORS</code>: Output record separator string (Default is new line)
    for (x in multi) {
+
;<code>NF</code>: Number of input record
        split(x, arr, SUBSEP);
+
;<code>NR</code>: Number of fields in input record
        print arr[1], arr[2], multi[x];
+
;<code>OFMT</code>: Output format of number
    }
+
;<code>FS</code>: Field separator character (Blank & tab is default)
  }
+
  
== Self-contained AWK scripts ==
+
==String functions==
As with many other programming languages, self-contained AWK script can be constructed using the so-called "[[shebang]]" syntax.
+
The following are Awk's built-in string functions:
 +
;<code>gsub(r,s,t)</code>: globally substitutes s for each match of the regular expression r in the string t. Returns the number of substitutions. If t is not supplied, defaults to $0.
 +
;<code>index(s,t)</code>: returns position of substring t in string s or zero if not present.
 +
;<code>length(s)</code>: returns length of string s or length of $0 if no string is supplied.
 +
;<code>match(s,r)</code>: returns either the position in s where the regular expression r begins, or 0 if no occurrences are found. Sets the values of RSTART and RLENGTH.
 +
;<code>split(s,a,sep)</code>: parses string s into elements of array a using field separator sep; returns number of elements. If sep is not supplied, FS is used. Array splitting works the same way as field splitting.
 +
;<code>sprintf("fmt",expr)</code>: uses printf format specification for expr.
 +
;<code>sub(r,s,t)</code>: substitutes s for first match of the regular expression r in the string t. Returns 1 if successful; 0 otherwise. If t is not supplied, defaults to $0.
 +
;<code>substr(s,p,n)</code>: returns substring of string s at beginning position p up to a maximum length of n. If n is not supplied, the rest of the string from p is used.
 +
;<code>tolower(s)</code>: translates all uppercase characters in string s to lowercase and returns the new string.
 +
;<code>toupper(s)</code>: translates all lowercase characters in string s to uppercase and returns the new string.
  
For example, a Linux command called <tt>hello.awk</tt> that prints the string "Hello, world!" may be built by going first creating a file named <tt>hello.awk</tt> containing the following lines:
+
==Awk versions and implementations==
 
+
GNU ''Awk'', or ''gawk'', is another free software implementation. It was written before the original implementation became freely available, and is still widely used.
#!/usr/bin/awk -f
+
BEGIN { print "Hello, world!"; exit }
+
 
+
== Awk versions and implementations ==
+
[[GNU]] awk, or ''gawk'', is another free software implementation. It was written before the original implementation became freely available, and is still widely used.
+
 
+
Downloads and further information about these versions are available from the sites listed below ("External links").
+
 
+
==Christoph's Additions==
+
 
+
<pre>
+
% sort -rn Ecoli.top.travers | gawk '{if($1 <= x.xx) {print $1}}' | wc -l
+
 
+
% gawk '{print $2}' Ecoli.top.travers | sort > Ecoli.top.travers.col2
+
% gawk '{print $2}' Ecoli.top.cai | sort > Ecoli.top.cai.col2
+
% comm -12 Ecoli.travers.col2 Ecoli.top.cai.col2 | wc -l
+
</pre>
+
 
+
== Books ==
+
Book:
+
| Title=''The AWK Programming Language''
+
| Author=Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger
+
| Publisher=Addison-Wesley
+
| Year=1988
+
| ID=ISBN 0-201-07981-X
+
| URL=http://cm.bell-labs.com/cm/cs/awkbook/
+
''The book's webpage includes downloads of the original implementation of Awk and links to others.''
+
 
+
Book:
+
| Title=''GAWK: Effective AWK Programming: A User's Guide for GNU Awk''
+
| Author=Arnold Robbins
+
| URL=http://www.gnu.org/software/gawk/manual/html_node/index.html
+
| Edition=Edition 3
+
 
+
Book:
+
| Title=''sed & awk, Second Edition''
+
| Author=Dale Dougherty and Arnold Robbins
+
| Edition=Second Edition
+
| Year=March 1997
+
| ID=ISBN: 1-56592-225-5
+
| URL=http://www.oreilly.com/catalog/sed2/
+
| Publisher=[[O'Reilly Media]]
+
  
 +
==References==
 +
<references/>
 +
==Books==
 +
*'''''[http://cm.bell-labs.com/cm/cs/awkbook/ The AWK Programming Language]''''' by Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger (1988). Addison-Wesley. ISBN 0-201-07981-X (''note: The book's webpage includes downloads of the original implementation of Awk and links to others.'')
 +
*'''''[http://www.gnu.org/software/gawk/manual/html_node/index.html GAWK: Effective AWK Programming: A User's Guide for GNU Awk]''''' by Arnold Robbins, 3rd Edition.
 +
*'''''[http://www.oreilly.com/catalog/sed2/ sed & awk]''''' by Dale Dougherty and Arnold Robbins, 2nd Edition (1997). O'Reilly Media. ISBN 1-56592-225-5
 
*[http://www.computer-books.us/awk.php Computer-Books.us] - A collection of Awk books available for free download.
 
*[http://www.computer-books.us/awk.php Computer-Books.us] - A collection of Awk books available for free download.
  
== External links ==
+
==External links==
 +
*[http://www.gnu.org/software/gawk/gawk.html GAWK (GNU Awk) webpage]
 +
*[http://web.mit.edu/gnu/doc/html/gawk_1.html GAWK Manual] &mdash; on MIT.edu
 +
*[[Wikibooks:Programming:AWK]]
 +
*[http://www.cs.hmc.edu/qref/awk.html Getting started with awk] &mdash; documentation originally written by Andrew M. Ross.
 
*[news:comp.lang.awk comp.lang.awk] is a USENET newsgroup dedicated to awk.
 
*[news:comp.lang.awk comp.lang.awk] is a USENET newsgroup dedicated to awk.
*[http://www.gnu.org/software/gawk/gawk.html GAWK (GNU Awk) webpage]
 
 
*[http://clio.rice.edu/djgpp/win2k/gwk311b.zip DJGPP port of Gawk 3.11b as a downloadable 768KB zipfile]
 
*[http://clio.rice.edu/djgpp/win2k/gwk311b.zip DJGPP port of Gawk 3.11b as a downloadable 768KB zipfile]
 
+
*[http://www.awk-scripting.de/cgi-bin/wiki.cgi/scripting/00-WikiIndex example scripts]
  
 
{{linux_commands}}
 
{{linux_commands}}
 
[[Category:Linux Command Line Tools]]
 
[[Category:Linux Command Line Tools]]
 
[[Category:Scripting languages]]
 
[[Category:Scripting languages]]

Latest revision as of 21:29, 30 May 2022

Awk is a general purpose scripting language that is designed for processing text based data, either in files or data streams. This article will mainly consider the GNU Awk (aka Gawk) variant. Note: Awk gets its name from its authors: Aho, Weinberger, and Kernighan.

In the words of its creators,

Awk is a programming language whose basic operation is to search a set of files for patterns, and to perform specified actions upon lines or fields of lines which contain instances of those patterns. Awk makes certain data selection and transformation operations easy to express.[1]

Awk is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. The power, terseness, and limitations of Awk programs and sed scripts inspired Larry Wall to write Perl.

see: Awk/scripts for detailed examples or Awk/tips_and_tricks.

Quick tutorial

Note: The following was inspired by Cesar A. Murakami.

Some Special variables (Input/Output)

FS
Input field separator (char or regex)
RS
Input record separator
NF
quantity of fields of the current record
NR
Current input record number
OFS
Output field separator
ORS
Output record separator

Some more special variables (files and match)

ARGC
number of parameters passed to awk
ARGV
the parameters (files)
FILENAME
the current file being processed
FNR
record number of FILENAME
RSTART
starting position of string matched by match() function
RLENGTH
length of string matched by match() function

Conditionals

if (var1 > 2) {
  print "greater than 2";
  var1 = var1 + 1;
}
if ( var1 in arr )
  printf "%d is an array index\n",var1;
else {
  arr[var1] = var2;
  printf "%d was not an array index\n",var1;
}

Loops

for (i=1; i<=10; i++) {
  print i;
}
for (i in arr) {
  print arr[i];
}
while ( x != 0 ) {
  do_something;
  print x;
}

Functions

function mysum(param1, param2) {
  return param1 + param2;
}

Code Blocks ('PATTERN {ACTION}')

BEGIN {
  # Code to be executed before record processing.
}
/regex/ {
  # Code to be executed when $0 contains a substring that are matched by regex.
}
$1 == "XYZ" {
  # Code to be executed when the first field is equal to "XYZ".
 }
$2 ~ /regex/ {
  # Code to be executed when the second field contains a substring that are matched by regex.
}
! PATTERN {
  # Code to be executed when PATTERN is not matched / satisfied.
}
PAT1, PAT2 {
  # Code to be executed for record range (PAT1 is matched on start record, PAT2 on final, both lines  included in the range).
}
{
  # Code to be executed for every record.
}
END {
  # Code to be executed after record processing.
}

Structure of Awk programs

Generally speaking, two pieces of data are given to Awk: a command file and a primary input file. A command file (which can be an actual file, or can be included in the command line invocation of Awk) contains a series of commands which tell Awk how to process the input file. The primary input file is typically text that is formatted in some way; it can be an actual file, or it can be read by Awk from the standard input. A typical Awk program consists of a series of lines, each of the form

/pattern/ { action }

where pattern is a regular expression and action is a command. Awk looks through the input file; when it finds a line that matches pattern, it executes the command(s) specified in action. Alternate line forms include:

BEGIN { action }
Executes action commands at the beginning of the script execution, i.e., before any of the lines are processed.
END { action }
Similar to the previous form, but executes action after the end of input.
/pattern/
Prints any lines matching pattern.
{ action }
Executes action for each line in the input.

Each of these forms can be included multiple times in the command file. Lines in the command file are executed in order, so if there are two "BEGIN" statements, the first is executed, then the second, and then the rest of the lines. BEGIN and END statements do not have to be located before and after (respectively) the other lines in the command file.

Awk was created as a broadbased replacement to C algorithmic approaches developed to integrate text parsing methods.

Awk commands

Awk commands are the statement that is substituted for action in the examples above. Awk commands can include function calls, variable assignments, calculations, or any combination thereof. Awk contains built-in support for many functions; many more are provided by the various flavors of Awk. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.

For brevity, the enclosing curly braces ( { } ) will be omitted from these examples.

The print command

The print command is used to output text. The simplest form of this command is

print

This displays the contents of the current line. In Awk, lines are broken down into fields, and these can be displayed separately:

print $1
Displays the first field of the current line
print $1, $3
Displays the first and third fields of the current line, separated by a predefined string called the output field separator (OFS) whose default value is a single space character

Although these fields ($X) may bear resemblance to variables (the $ symbol indicates variables in Perl), they actually refer to the fields of the current line. A special case, $0, refers to the entire line. In fact, the commands "print" and "print $0" are identical in functionality.

The print command can also display the results of calculations and/or function calls:

print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)

Output may be sent to a file:

print "expression" > "file name"

Variables, et cetera

Variable names can use any of the characters [A-Za-z0-9_], with the exception of language keywords. The operators + - * / are addition, subtraction, multiplication, and division, respectively. For string concatenation, simply place two variables (or string constants) next to each other, optionally with a space in between. String constants are delimited by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using # as the first character on a line.

User-defined functions

In a format similar to C, function definitions consist of the keyword function, the function name, argument names and the function body. Here is an example function:

function add_three(number,temp) {
   temp = number+3
   return temp
}

This statement can be invoked as follows:

print add_three(36)     # prints '39'

Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.

Predefined variables

FILENAME
Name of current input file
RS
Input record separator character (Default is new line)
OFS
Output field separator string (Blank is default)
ORS
Output record separator string (Default is new line)
NF
Number of input record
NR
Number of fields in input record
OFMT
Output format of number
FS
Field separator character (Blank & tab is default)

String functions

The following are Awk's built-in string functions:

gsub(r,s,t)
globally substitutes s for each match of the regular expression r in the string t. Returns the number of substitutions. If t is not supplied, defaults to $0.
index(s,t)
returns position of substring t in string s or zero if not present.
length(s)
returns length of string s or length of $0 if no string is supplied.
match(s,r)
returns either the position in s where the regular expression r begins, or 0 if no occurrences are found. Sets the values of RSTART and RLENGTH.
split(s,a,sep)
parses string s into elements of array a using field separator sep; returns number of elements. If sep is not supplied, FS is used. Array splitting works the same way as field splitting.
sprintf("fmt",expr)
uses printf format specification for expr.
sub(r,s,t)
substitutes s for first match of the regular expression r in the string t. Returns 1 if successful; 0 otherwise. If t is not supplied, defaults to $0.
substr(s,p,n)
returns substring of string s at beginning position p up to a maximum length of n. If n is not supplied, the rest of the string from p is used.
tolower(s)
translates all uppercase characters in string s to lowercase and returns the new string.
toupper(s)
translates all lowercase characters in string s to uppercase and returns the new string.

Awk versions and implementations

GNU Awk, or gawk, is another free software implementation. It was written before the original implementation became freely available, and is still widely used.

References

  1. Aho AV, Kernighan BW, Weinberger PJ (1978). "Awk - A pattern scanning and processing language". Second Edition, Bell Laboratories, 8 pp.

Books

External links

Linux command line programs
File and file system management: cat | cd | chmod | chown | chgrp | umask | cp | du | df | file | fsck | ln | ls | lsof | mkdir | more | mount | mv | pwd | rcp | rm | rmdir | split | touch | tree
Process management: anacron | at | chroot | cron/crontab | kill | nice | ps | sleep | screen | time | timex | top | nice/renice | wait
User Management/Environment: env | finger | id | locale | mesg | passwd | su | sudo | uname | uptime | w | wall | who | write
Text processing: awk | cut | diff | ex | head | tac | tee | iconv | join | less | more | paste | sed | sort | tail | tr | uniq | wc | xargs | perl
Shell programming: echo | expr | unset Printing: lp
Communications:
inetd | netstat | ping | rlogin | traceroute
Searching:

find | grep/egrep/fgrep | strings

Miscellaneous:

banner | bc | cal | man | yes