Difference between revisions of "Here document"
(Created page with "In computing, a '''here document''' ('''here-document''', '''here-text''', '''heredoc''', '''hereis''', '''here-string''', or '''here-script''') is a file literal or input str...")
Latest revision as of 00:46, 29 June 2020
In computing, a here document (here-document, here-text, heredoc, hereis, here-string, or here-script) is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also used for a form of multiline string literals that use similar syntax, preserving line breaks and other whitespace (including indentation) in the text.
Narrowly speaking, here documents are file literals or stream literals. These originate in the Unix shell, though similar facilities are available in some other languages.
Here documents are available in many Unix shells. In the following example, text is passed to the
tr command (transliterating lower to upper-case) using a here document. This could be in a shell file or entered interactively at a prompt.
$ LANG=C tr a-z A-Z << END_TEXT > one two three > four five six > END_TEXT ONE TWO THREE FOUR FIVE SIX
END_TEXT was used as the delimiting identifier. It specified the start and end of the here document. The redirect and the delimiting identifier do not need to be separated by a space:
<< END_TEXT both work equally well.
By default, the behaviour is largely identical to the contents of double quotes: variable names are replaced by their values, commands within backticks are evaluated, etc.
$ cat << EOF > \$ Working dir "$PWD" `pwd` > EOF $ Working dir "/home/user" /home/user
This can be disabled by quoting any part of the label, which is then ended by the unquoted value. "Quoting" includes escaping, so if
\EOF is used, this is quoted, so variable interpolation does not occur, and it ends with
EOF, while if
\\EOF is used, this is quoted and ends with
\EOF. This perhaps surprising behaviour is however easily implemented in a shell, by the tokenizer simply recording a token was quoted (during the evaluation phase of lexical analysis), without needing to preserve the original, quoted value.
One application is to use
\' as the starting delimiter and thus
' as the ending delimiter, which is similar to a multiline string literal but stripping starting and ending linebreaks. The behaviour is essentially identical to that if the contents were enclosed in single quotes. Thus for example by setting it in single quotes:
$ cat << 'EOF' > \$ Working dir "$PWD" `pwd` > EOF \$ Working dir "$PWD" `pwd`
Double quotes may also be used, but this is subject to confusion, because expansion does occur in a double-quoted string, but does not occur in a here document with double-quoted delimiter. Single- and double-quoted delimiters are distinguished in some other languages, notably Perl (see below), where behaviour parallels the corresponding string quoting.
Appending a minus sign to the << (i.e.
<<-) has the effect that leading tabs are ignored. (Not in csh or tcsh.) This allows indenting here documents in shell scripts (primarily for alignment with existing indentation) without changing their value. (Note that while tabs can typically be entered in editors, at the command line they are typically entered by
Ctrl-V + Tab instead, due to tab completion, and in the example, they are actual tabs, so the example can be copy and pasted.)
A script containing:
LANG=C tr a-z A-Z <<- END_TEXT Here doc with <<- A single space character (i.e. 0x20 ) is at the beginnning of this line This line begins with a single TAB character i.e 0x09 as does the next line END_TEXT echo The intended end was before this line echo and these were not processed by tr echo +++++++++++++++ LANG=C tr a-z A-Z << END_TEXT Here doc with << A single space character (i.e. 0x20 ) is at the beginning of this line This line begins with a single TAB character i.e 0x09 as does the next line END_TEXT echo The intended end was before this line, echo but because the line with the delimiting Identifier began with a TAB it was NOT recognized and echo the tr command continued processing.
HERE DOC WITH <<- A SINGLE SPACE CHARACTER (I.E. 0X20 ) IS AT THE BEGINNING OF THIS LINE THIS LINE BEGINS WITH A SINGLE TAB CHARACTER I.E 0X09 AS DOES THE NEXT LINE The intended end was before this line and these were not processed by tr +++++++++++++++ HERE DOC WITH << A SINGLE SPACE CHARACTER (I.E. 0X20 ) IS AT THE BEGINNNING OF THIS LINE THIS LINE BEGINS WITH A SINGLE TAB CHARACTER I.E 0X09 AS DOES THE NEXT LINE END_TEXT ECHO THE INTENDED END WAS BEFORE THIS LINE, ECHO BUT BECAUSE THE LINE WITH THE DELIMITING IDENTIFIER BEGAN WITH A TAB IT WAS NOT RECOGNIZED AND ECHO THE TR COMMAND CONTINUED PROCESSING.
Another use is to output to a file:
cat << EOF > ~/testFile001 > 3 spaces precede this text. > A single tab character is at the beginning of this line. >Nothing precedes this text EOF
A here string (available in bash, ksh, or zsh) is syntactically similar, consisting of
<<<, and effects input redirection from a word (a sequence treated as a unit by the shell, in this context generally a string literal). In this case, the usual shell syntax is used for the word ("here string syntax"), with the only syntax being the redirection: a here string is an ordinary string used for input redirection, not a special kind of string.
A single word need not be quoted:
$ LANG=C tr a-z A-Z <<< one ONE
In case of a string with spaces, it must be quoted:
$ LANG=C tr a-z A-Z <<< 'one two three' ONE TWO THREE
This could also be written as:
$ foo='one two three' $ LANG=C tr a-z A-Z <<< "$foo" ONE TWO THREE
Multiline strings are acceptable, yielding:
$ LANG=C tr a-z A-Z <<< 'one > two three' ONE TWO THREE
Note that leading and trailing newlines, if present, are included:
$ LANG=C tr a-z A-Z <<< ' > one > two three > ' ONE TWO THREE $
The key difference from here documents is that, in here documents, the delimiters are on separate lines; the leading and trailing newlines are stripped. Unlike here documents, here strings do not use delimiters.
Here strings are particularly useful for commands that often take short input, such as the calculator bc:
$ bc <<< 2^10 1024 #~OR~ $ for i in $(seq 1 10); do bc <<< 2^$i; done 2 4 8 16 32 64 128 256 512 1024
Note that here string behavior can also be accomplished (reversing the order) via piping and the
echo command, as in:
$ echo 'one two three' | LANG=C tr a-z A-Z ONE TWO THREE
however here strings are particularly useful when the last command needs to run in the current process, as is the case with the
$ echo 'one two three' | read -r a b c $ echo "$a $b $c"
yields nothing, while
$ read -r a b c <<< 'one two three' $ echo "$a $b $c" one two three
This happens because in the previous example piping causes
read to run in a subprocess, and as such can not affect the environment of the parent process.