Difference between revisions of "Sed"

From Christoph's Personal Wiki
Jump to: navigation, search
(added redirect)
 
(SED emulating UNIX commands)
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
#REDIRECT [[Sed programming language]]
+
{{lowercase|title=sed}}
 +
'''sed''' (which stands for '''S'''tream '''ED'''itor) is a simple but powerful [[:Category:Linux Command Line Tools|command line tool]] (or [[:Category:Scripting languages|scripting language]]) used to apply various pre-specified textual transformations to a sequential stream of text data. It reads input files line by line, edits each line according to rules specified in its simple language (the ''sed script''), and then outputs the line.
 +
 
 +
see: [[Sed/manpage|sed manpage]] and [[Sed/Scripts|sed scripts]] for detailed examples.
 +
 
 +
== Functions ==
 +
<tt>sed</tt> is often thought of as a non-interactive text editor. It differs from conventional text editors in that the processing of the two inputs is inverted. Instead of iterating once through a list of edit commands applying each one to the whole text file in memory, sed iterates once through the text file applying the whole list of edit commands to each line. Because only one line at a time is in memory, sed can process text files with an arbitrarily-large number of lines. Some implementations of sed can only process lines of limited lengths.
 +
 
 +
sed's command set is modeled after the <tt>ed</tt> editor, and most commands work similarly in this inverted paradigm. For example, the command '''25d''' means ''if this is line 25, then delete (don't output) it'', rather than ''go to line 25 and delete it'' as it does in ed. The notable exceptions are the copy and move commands, which span a range of lines and thus don't have straightforward equivalents in sed.  Instead, sed introduces an extra buffer called the ''hold'' space, and additional commands to manipulate it. The ed command to copy line 25 to line 76 ('''25t76''') for example would be coded as two separate commands in sed ('''25h; 76g'''), to store the line in the hold space until the point at which it should be retrieved.
 +
 
 +
==Usage==
 +
The following example shows a typical usage of sed, where the ''-e'' option indicates that the sed expression follows:
 +
    sed -e 's/oldstuff/newstuff/g' inputFileName > outputFileName
 +
 
 +
The ''s'' stands for substitute; the ''g'' stands for global, which means that all matching occurrences in the line would be replaced. After the first slash is the [[regular expression]] to search for and after the second slash is the expression to replace it with. The substitute command (s///) is by far the most powerful and most commonly used sed command.
 +
 
 +
<tt>sed</tt> is often used as a filter in a pipeline:
 +
    generate_data | sed -e 's/x/y/'
 +
That is, generate the data, but make the small change of replacing ''x'' with ''y''.
 +
 
 +
Several substitutions or other commands can be put together in a file called, for example, ''subst.sed'' and then be applied using the ''-f'' option to read the commands from the file:
 +
    sed -f subst.sed inputFileName > outputFileName
 +
 
 +
Besides substitution, other forms of simple processing are possible. For example, the following deletes empty lines or lines that only contain spaces:
 +
    sed -e '/^ *$/d' inputFileName
 +
 
 +
This example used some of the following regular expression [[metacharacter]]s:
 +
* The caret (<code>^</code>) matches the beginning of the line.
 +
* The dollar sign (<code>$</code>) matches the end of the line.
 +
* The period (<code>.</code>) matches any single character.
 +
* The asterisk (<code>*</code>) matches zero or more occurrences of the previous character.
 +
* A bracketed expression delimited by <code>[</code> and <code>]</code> matches any of the characters inside the brackets.
 +
 
 +
Complex sed constructs are possible, to the extent that it can be conceived of as a highly specialised, albeit simple, programming language. Flow of control, for example, can be managed by use of a label (a colon followed by a string which is to be the label name) and the branch instruction '''b'''; an instruction '''b''' followed by a valid label name will move processing to the block following the label; if the label does not exist then the branch will end the script.
 +
 
 +
== Commands ==
 +
(number of arguments)
 +
; (2)!cmd : exclamation sign means "Don't apply to specified addresses"
 +
; (0)# : comment
 +
; (0)<nowiki>:</nowiki>label : place a label
 +
; (1)= : display line number
 +
; (2)D : delete first part of the pattern space
 +
; (2)G : append contents of hold area
 +
; (2)H : append pattern space on buffer
 +
; (2)N : append next line
 +
; (2)P : print first part of the pattern space
 +
; (1)a : append text
 +
; (2)blabel : branch to label
 +
; (2)c : change lines
 +
; (2)d : delete lines
 +
; (2)g : get contents of hold area
 +
; (2)h : hold pattern space (in a hold buffer)
 +
; (1)i : insert lines
 +
; (2)l : list lines
 +
; (2)n : next line
 +
; (2)p : print
 +
; (1)q : quit
 +
; (1)r file : read the contents of file
 +
; (2)tlabel : test substitutions and branch on successful substitution
 +
; (2)w file : write to file
 +
; (2)x : exchange buffer space with pattern space
 +
; (2){ : group commands
 +
; (2)s/RE/replacement/[flags] : substitute
 +
; (2)y/list1/list2/ : translates list1 into list2
 +
 
 +
==History==
 +
sed is one of the very early Unix commands that permitted command line processing of data files. It evolved as the natural successor to the popular [[grep]] command. Cousin to the later [[AWK programming language|AWK]], sed allowed powerful and interesting data processing to be done by shell scripts. Sed was probably the earliest Unix tool that really encouraged regular expressions to be used ubiquitously. In terms of speed of operation, sed is generally faster than perl in execution and markedly faster than AWK.
 +
 
 +
sed and AWK are often cited as the progenitors and inspiration for [[Perl]]; in particular the s/// syntax from the example above is part of Perl's syntax.
 +
 
 +
sed's language does not have variables and has only primitive GOTO and branching functionality; nevertheless, the language is Turing-complete.
 +
 
 +
[[GNU]] sed includes several new features such as in-place editing of files (i.e., replace the original file with the result of applying the sed program). In-place editing is often used instead of <tt>ed (UNIX)</tt> scripts: for example,
 +
 
 +
    sed -i 's/abc/def/' file
 +
 
 +
can be used instead of
 +
 
 +
    ed file
 +
    1,$ s/abc/def/
 +
    w
 +
    q
 +
 
 +
There is an extended version of sed called '''Super-sed'''  (<tt>ssed</tt>) that includes regular expressions compatible with [[Perl]].
 +
 
 +
== Samples ==
 +
This example will enable sed, which usually only works on one line, to remove newlines from sentences where the second sentence starts with one space.
 +
 
 +
Consider the following text:
 +
  This is my cat
 +
  my cat's name is betty
 +
  This is my dog
 +
  my dog's name is frank
 +
 
 +
The sed script below will turn it into:
 +
  This is my cat my cat's name is betty
 +
  This is my dog my dog's name is frank
 +
 
 +
Here's the script:
 +
  sed 'N;s/\n / /;P;D;'
 +
 
 +
* (N) add the next line to the work buffer
 +
* (s) substitute
 +
* (/\n /) match: \n and one space
 +
* (/ /) replace with: one space
 +
* (P) print the top line of the work buffer
 +
* (D) delete the top line from the work buffer and run the script again
 +
 
 +
'''The Address Command (submatches)'''
 +
 
 +
More complex substitutions are possible using the "Address" command:
 +
  /pattern1/s/pattern2/replacement/flags 
 +
:will replace pattern2 with replacement where pattern1 is matched.
 +
Likewise:
 +
  /pattern1/!s/pattern2/replacement/flags 
 +
:will replace pattern2 where pattern1 is *not* matched.
 +
 
 +
For example, if you have a file (text.txt) containing the following lines:
 +
 
 +
  Hello world.
 +
  Hello world. I love sed.
 +
 
 +
And you want to replace "world" with "mom", but only on those lines that contain the word "sed", you can use:
 +
  sed -e '/^.*sed.*$/s/world/mom/g' text.txt
 +
will result in:
 +
  Hello world.
 +
  Hello mom.  I love sed.
 +
 
 +
You can negate this behavior with:
 +
  sed -e '/^.*sed.*$/!s/world/mom/g' text.txt
 +
which will result in the opposite:
 +
  Hello mom.
 +
  Hello world.  I love sed.
 +
 
 +
== SED emulating UNIX commands ==
 +
Note: by [http://sed.sourceforge.net/local/docs/emulating_unix.txt Aurélio Marinho Jargas]
 +
 
 +
UNIX        |  SED
 +
-------------+----------------------------------------------------------------
 +
cat          |  sed ':'
 +
cat -s      |  sed '1s/^$//p;/./,/^$/!d'
 +
tac          |  sed '1!G;h;$!d'
 +
grep        |  sed '/patt/!d'
 +
grep -v      |  sed '/patt/d'
 +
head        |  sed '10q'
 +
head -1      |  sed 'q'
 +
tail        |  sed -e ':a' -e '$q;N;11,$D;ba'
 +
tail -1      |  sed '$!d'
 +
tail -f      |  sed -u '/./!d'
 +
cut -c 10    |  sed 's/\(.\)\{10\}.*/\1/'
 +
cut -d: -f4  |  sed 's/\(\([^:]*\):\)\{4\}.*/\2/'
 +
tr A-Z a-z  |  sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
 +
tr a-z A-Z  |  sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/'
 +
tr -s ' '    |  sed 's/ \+/ /g'
 +
tr -d '\012' |  sed 'H;$!d;g;s/\n//g'
 +
wc -l        |  sed -n '$='
 +
uniq        |  sed 'N;/^\(.*\)\n\1$/!P;D'
 +
rev          |  sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
 +
basename    |  sed 's,.*/,,'
 +
dirname      |  sed 's,[^/]*$,,'
 +
xargs        |  sed -e ':a' -e '$!N;s/\n/ /;ta'
 +
paste -sd:  |  sed -e ':a' -e '$!N;s/\n/:/;ta'
 +
cat -n      |  sed '=' | sed '$!N;s/\n/ /'
 +
grep -n      |  sed -n '/patt/{=;p;}' | sed '$!N;s/\n/:/'
 +
cp orig new  |  sed 'w new' orig
 +
hostname -s  |  hostname | sed 's/\..*//'
 +
 
 +
== Further reading ==
 +
*[http://www.cs.hmc.edu/qref/sed.html A very basic introduction]
 +
*[http://sed.sourceforge.net/sedfaq.html The sed FAQ]
 +
*[http://www.gnu.org/software/sed/manual/sed.html GNU sed manual]
 +
*[http://www.linuxmanpages.com/man1/sed.1.php Manual page for sed version 4.1.2], the program's [[manpage]]
 +
 
 +
==See also==
 +
*[[Awk|AWK]]
 +
*[[regular expression]]s
 +
 
 +
==External links==
 +
*[http://sed.sourceforge.net Major sources for sed scripts, files, usage]
 +
*[http://sed.sourceforge.net/sed1line.txt Handy one-line sed scripts]
 +
*[http://sed.sourceforge.net/grabbag/tutorials/do_it_with_sed.txt do-it-with-sed]
 +
*[http://sed.sourceforge.net/grabbag/scripts/ example sed scripts]
 +
*[http://sed.sourceforge.net/grabbag/scripts/turing.txt Paper describing Turing machine in sed, and its universality]
 +
*[http://sed.sourceforge.net/grabbag/scripts/turing.sed Turing machine in sed, the actual script]
 +
*[http://www.npcguild.org/~ksb/hack/math.sed A calculator written in sed ]
 +
*[http://www.gnu.org/directory/text/editors/super-sed.html Super-sed]
 +
*[http://aurelio.net/sed/sokoban/ sed Sokoban]
 +
*[http://www.exactcode.de/oss/minised/ Minised homepage]
 +
*[http://www.selectorweb.com/sed_tutorial.html sed tutorial]
 +
*[http://www.student.northpark.edu/pemente/sed/index.htm Eric Pement's sed page]
 +
*[http://www.rtfiber.com.tw/~changyj/sed Yao-Jen Chang's sed page]
 +
*[http://users.cybercity.dk/~bse26236/batutil/help/SED.HTM sed examples]
 +
*[http://lvogel.free.fr/sed.htm Sed resources] &mdash; by Laurent VOGEL (2004-02-12)
 +
*[http://www.opengroup.org/onlinepubs/009695399/utilities/sed.html sed] &mdash; by The Open Group Base Specifications Issue 6
 +
 
 +
[[Category:Linux Command Line Tools]]
 +
[[Category:Scripting languages]]

Latest revision as of 00:56, 18 May 2015

The correct title of this article is sed. The initial letter is capitalized due to technical restrictions.

sed (which stands for Stream EDitor) is a simple but powerful command line tool (or scripting language) used to apply various pre-specified textual transformations to a sequential stream of text data. It reads input files line by line, edits each line according to rules specified in its simple language (the sed script), and then outputs the line.

see: sed manpage and sed scripts for detailed examples.

Functions

sed is often thought of as a non-interactive text editor. It differs from conventional text editors in that the processing of the two inputs is inverted. Instead of iterating once through a list of edit commands applying each one to the whole text file in memory, sed iterates once through the text file applying the whole list of edit commands to each line. Because only one line at a time is in memory, sed can process text files with an arbitrarily-large number of lines. Some implementations of sed can only process lines of limited lengths.

sed's command set is modeled after the ed editor, and most commands work similarly in this inverted paradigm. For example, the command 25d means if this is line 25, then delete (don't output) it, rather than go to line 25 and delete it as it does in ed. The notable exceptions are the copy and move commands, which span a range of lines and thus don't have straightforward equivalents in sed. Instead, sed introduces an extra buffer called the hold space, and additional commands to manipulate it. The ed command to copy line 25 to line 76 (25t76) for example would be coded as two separate commands in sed (25h; 76g), to store the line in the hold space until the point at which it should be retrieved.

Usage

The following example shows a typical usage of sed, where the -e option indicates that the sed expression follows:

   sed -e 's/oldstuff/newstuff/g' inputFileName > outputFileName

The s stands for substitute; the g stands for global, which means that all matching occurrences in the line would be replaced. After the first slash is the regular expression to search for and after the second slash is the expression to replace it with. The substitute command (s///) is by far the most powerful and most commonly used sed command.

sed is often used as a filter in a pipeline:

   generate_data | sed -e 's/x/y/'

That is, generate the data, but make the small change of replacing x with y.

Several substitutions or other commands can be put together in a file called, for example, subst.sed and then be applied using the -f option to read the commands from the file:

   sed -f subst.sed inputFileName > outputFileName

Besides substitution, other forms of simple processing are possible. For example, the following deletes empty lines or lines that only contain spaces:

   sed -e '/^ *$/d' inputFileName 

This example used some of the following regular expression metacharacters:

  • The caret (^) matches the beginning of the line.
  • The dollar sign ($) matches the end of the line.
  • The period (.) matches any single character.
  • The asterisk (*) matches zero or more occurrences of the previous character.
  • A bracketed expression delimited by [ and ] matches any of the characters inside the brackets.

Complex sed constructs are possible, to the extent that it can be conceived of as a highly specialised, albeit simple, programming language. Flow of control, for example, can be managed by use of a label (a colon followed by a string which is to be the label name) and the branch instruction b; an instruction b followed by a valid label name will move processing to the block following the label; if the label does not exist then the branch will end the script.

Commands

(number of arguments)

(2)!cmd 
exclamation sign means "Don't apply to specified addresses"
(0)# 
comment
(0):label 
place a label
(1)= 
display line number
(2)D 
delete first part of the pattern space
(2)G 
append contents of hold area
(2)H 
append pattern space on buffer
(2)N 
append next line
(2)P 
print first part of the pattern space
(1)a 
append text
(2)blabel 
branch to label
(2)c 
change lines
(2)d 
delete lines
(2)g 
get contents of hold area
(2)h 
hold pattern space (in a hold buffer)
(1)i 
insert lines
(2)l 
list lines
(2)n 
next line
(2)p 
print
(1)q 
quit
(1)r file 
read the contents of file
(2)tlabel 
test substitutions and branch on successful substitution
(2)w file 
write to file
(2)x 
exchange buffer space with pattern space
(2){ 
group commands
(2)s/RE/replacement/[flags] 
substitute
(2)y/list1/list2/ 
translates list1 into list2

History

sed is one of the very early Unix commands that permitted command line processing of data files. It evolved as the natural successor to the popular grep command. Cousin to the later AWK, sed allowed powerful and interesting data processing to be done by shell scripts. Sed was probably the earliest Unix tool that really encouraged regular expressions to be used ubiquitously. In terms of speed of operation, sed is generally faster than perl in execution and markedly faster than AWK.

sed and AWK are often cited as the progenitors and inspiration for Perl; in particular the s/// syntax from the example above is part of Perl's syntax.

sed's language does not have variables and has only primitive GOTO and branching functionality; nevertheless, the language is Turing-complete.

GNU sed includes several new features such as in-place editing of files (i.e., replace the original file with the result of applying the sed program). In-place editing is often used instead of ed (UNIX) scripts: for example,

   sed -i 's/abc/def/' file

can be used instead of

   ed file
   1,$ s/abc/def/
   w
   q

There is an extended version of sed called Super-sed (ssed) that includes regular expressions compatible with Perl.

Samples

This example will enable sed, which usually only works on one line, to remove newlines from sentences where the second sentence starts with one space.

Consider the following text:

 This is my cat
  my cat's name is betty
 This is my dog
  my dog's name is frank

The sed script below will turn it into:

 This is my cat my cat's name is betty
 This is my dog my dog's name is frank

Here's the script:

 sed 'N;s/\n / /;P;D;'
  • (N) add the next line to the work buffer
  • (s) substitute
  • (/\n /) match: \n and one space
  • (/ /) replace with: one space
  • (P) print the top line of the work buffer
  • (D) delete the top line from the work buffer and run the script again

The Address Command (submatches)

More complex substitutions are possible using the "Address" command:

 /pattern1/s/pattern2/replacement/flags   
will replace pattern2 with replacement where pattern1 is matched.

Likewise:

 /pattern1/!s/pattern2/replacement/flags  
will replace pattern2 where pattern1 is *not* matched.

For example, if you have a file (text.txt) containing the following lines:

 Hello world.
 Hello world. I love sed.

And you want to replace "world" with "mom", but only on those lines that contain the word "sed", you can use:

 sed -e '/^.*sed.*$/s/world/mom/g' text.txt

will result in:

 Hello world.
 Hello mom.  I love sed.

You can negate this behavior with:

 sed -e '/^.*sed.*$/!s/world/mom/g' text.txt

which will result in the opposite:

 Hello mom.
 Hello world.  I love sed.

SED emulating UNIX commands

Note: by Aurélio Marinho Jargas

UNIX         |  SED
-------------+----------------------------------------------------------------
cat          |  sed ':'
cat -s       |  sed '1s/^$//p;/./,/^$/!d'
tac          |  sed '1!G;h;$!d'
grep         |  sed '/patt/!d'
grep -v      |  sed '/patt/d'
head         |  sed '10q'
head -1      |  sed 'q'
tail         |  sed -e ':a' -e '$q;N;11,$D;ba'
tail -1      |  sed '$!d'
tail -f      |  sed -u '/./!d'
cut -c 10    |  sed 's/\(.\)\{10\}.*/\1/'
cut -d: -f4  |  sed 's/\(\([^:]*\):\)\{4\}.*/\2/'
tr A-Z a-z   |  sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
tr a-z A-Z   |  sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/'
tr -s ' '    |  sed 's/ \+/ /g'
tr -d '\012' |  sed 'H;$!d;g;s/\n//g'
wc -l        |  sed -n '$='
uniq         |  sed 'N;/^\(.*\)\n\1$/!P;D'
rev          |  sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
basename     |  sed 's,.*/,,'
dirname      |  sed 's,[^/]*$,,'
xargs        |  sed -e ':a' -e '$!N;s/\n/ /;ta'
paste -sd:   |  sed -e ':a' -e '$!N;s/\n/:/;ta'
cat -n       |  sed '=' | sed '$!N;s/\n/ /'
grep -n      |  sed -n '/patt/{=;p;}' | sed '$!N;s/\n/:/'
cp orig new  |  sed 'w new' orig
hostname -s  |  hostname | sed 's/\..*//'

Further reading

See also

External links