Difference between revisions of "Wget"

Revision as of 23:26, 21 May 2015

wget — The non-interactive network downloader.

Usage

Simple download:

$ wget http://www.example.com/index.html

Download a file and store it locally using a different file name:

$ wget -O example.html http://www.example.com/index.html

Background download:

$ wget -b https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.0.4.tar.gz

The above command is useful when you initiate a download via a remote machine. This will start downloading in background, so that you can disconnect the terminal once the command is issued.

Mirror an entire web site:

$ wget -m http://www.example.com

Mirror an entire subdirectory of a web site (with no parent option in case of backlinks):

$ wget -mk -w 20 -np http://example.com/foo/

Download all pages from a site and the pages the site links to (one-level deep):

$ wget -H -r --level=1 -k -p http://www.example.com

Resume large file download:

$ wget -c --output-document=MIT8.01F99-L01.mp4 "https://www.youtube.com/watch?v=X9c0MRooBzQ"

Schedule hourly downloads of a file

$ wget --output-document=traffic_$(date +\%Y\%m\%d\%H).gif "http://sm3.sitemeter.com/YOUR_CODE"

Automatically download music (by Jeff Veen):

$ wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i mp3_sites.txt

where mp3_sites.txt lists your favourite (legal) download sites.

Crawl a website and generate a log file of any broken links:

$ wget --spider -o wget.log -e robots=off --wait 1 -r -p http://www.example.com/

Download multiple files

Create variable that holds all URLs and then using 'BASH for loop' to download all files:

% URLS="http://www.example.com/foo.tar.gz ftp://ftp.example.org/pub/bar.tar.gz"

Use for loop as follows:

% for u in $URLS; do wget $u; done

You can also put a list of the URLs in a file and download using the -i option:

% wget -i download.txt

Automating/scripting download process

#!/bin/sh
# wget-list: manage the list of downloaded files

# invoke wget-list without arguments
while [ `find .wget-list -size +0` ]
 do
  url=`head -n1 .wget-list`
   wget -c $url
   sed -si 1d .wget-list
 done

#/bin/sh
# wget-all: process .wget-list in every subdirectory
# invoke wget-all without arguments

find -name .wget-list -execdir wget-list ';'

#!/bin/sh
# wget-dirs: run wget-all in specified directories
# invoking: wget-dirs <path-to-directory> ...

for dir in $*
  do
      pushd $dir
      wget-all
      popd
  done
wget-all

External links

GNU Wget Manual — last update: 15-Jun-2005
Geek to Live: Mastering Wget — via lifehacker.com
wget: your ultimate command line downloader

@@ Line 2: / Line 2: @@
 ==Usage==
-*Mirror an entire web site:
- wget -m <nowiki>http://www.example.com</nowiki>
-*Mirror an entire subdirectory of a web site (with no parent option in case of backlinks):
+* Simple download:
-  wget -mk -w 20 -np <nowiki>http://example.com/foo/</nowiki>
+  $ wget <nowiki>http://www.example.com/index.html</nowiki>
-*Download all pages from a site and the pages the site links to (one-level deep):
+* Download a file and store it locally using a different file name:
-  wget -H -r --level=1 -k -p <nowiki>http://www.example.com</nowiki>
+  $ wget -O example.html <nowiki>http://www.example.com/index.html</nowiki>
-*Resume large file download:
+* Background download:
-  wget -c --output-document=Bill_Maher_-_New_Rules_2007-03-15.avi "<nowiki>http://www.youtube.com/watch%3Fv%3DhFjRI5jJ5I4&usg=AL29H23P1UQZRf0yDqRBlwB0jyfSLbzzhg</nowiki>"
+  $ wget -b <nowiki>https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.0.4.tar.gz</nowiki>
-*Schedule hourly downloads of a file
+The above command is useful when you initiate a download via a remote machine. This will start downloading in background, so that you can disconnect the terminal once the command is issued.
- wget --output-document=traffic_$(date +\%Y\%m\%d\%H).gif "<nowiki>http://sm3.sitemeter.com/YOUR_CODE</nowiki>"
-*Automatically download music (by [http://www.veen.com/jeff/archives/000573.html Jeff Veen]):
+* Mirror an entire web site:
-  wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i mp3_sites.txt
+ $ wget -m <nowiki>http://www.example.com</nowiki>
+* Mirror an entire subdirectory of a web site (with no parent option in case of backlinks):
+ $ wget -mk -w 20 -np <nowiki>http://example.com/foo/</nowiki>
+* Download all pages from a site and the pages the site links to (one-level deep):
+ $ wget -H -r --level=1 -k -p <nowiki>http://www.example.com</nowiki>
+* Resume large file download:
+ $ wget -c --output-document=MIT8.01F99-L01.mp4 "<nowiki>https://www.youtube.com/watch?v=X9c0MRooBzQ</nowiki>"
+* Schedule hourly downloads of a file
+ $ wget --output-document=traffic_$(date +\%Y\%m\%d\%H).gif "<nowiki>http://sm3.sitemeter.com/YOUR_CODE</nowiki>"
+* Automatically download music (by [http://www.veen.com/jeff/archives/000573.html Jeff Veen]):
+  $ wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i mp3_sites.txt
 where <code>mp3_sites.txt</code> lists your favourite (legal) download sites.
+* Crawl a website and generate a log file of any broken links:
+ $ wget --spider -o wget.log -e robots=off --wait 1 -r -p <nowiki>http://www.example.com/</nowiki>
 ==Download multiple files==
@@ Line 75: / Line 90: @@
 ==External links==
 *[http://www.gnu.org/software/wget/manual/ GNU Wget Manual] &mdash; last update: 15-Jun-2005
-*[http://www.lifehacker.com/software/top/geek-to-live--mastering-wget-161202.php Geek to Live: Mastering Wget]
+*[http://lifehacker.com/161202/geek-to-live--mastering-wget Geek to Live: Mastering Wget] &mdash; via lifehacker.com
 *[http://www.cyberciti.biz/nixcraft/vivek/blogger/2005/06/linux-wget-your-ultimate-command-line.php wget: your ultimate command line downloader]
 [[Category:Linux Command Line Tools]]

Difference between revisions of "Wget"

Revision as of 23:26, 21 May 2015

Contents

Usage

Download multiple files

Automating/scripting download process

See also

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools