Difference between revisions of "Wget"
From Christoph's Personal Wiki
(→Usage) |
|||
Line 2: | Line 2: | ||
==Usage== | ==Usage== | ||
− | |||
− | |||
− | * | + | * Simple download: |
− | wget | + | $ wget <nowiki>http://www.example.com/index.html</nowiki> |
− | *Download | + | * Download a file and store it locally using a different file name: |
− | wget - | + | $ wget -O example.html <nowiki>http://www.example.com/index.html</nowiki> |
− | * | + | * Background download: |
− | wget - | + | $ wget -b <nowiki>https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.0.4.tar.gz</nowiki> |
− | + | The above command is useful when you initiate a download via a remote machine. This will start downloading in background, so that you can disconnect the terminal once the command is issued. | |
− | + | ||
− | *Automatically download music (by [http://www.veen.com/jeff/archives/000573.html Jeff Veen]): | + | * Mirror an entire web site: |
− | wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i mp3_sites.txt | + | $ wget -m <nowiki>http://www.example.com</nowiki> |
+ | |||
+ | * Mirror an entire subdirectory of a web site (with no parent option in case of backlinks): | ||
+ | $ wget -mk -w 20 -np <nowiki>http://example.com/foo/</nowiki> | ||
+ | |||
+ | * Download all pages from a site and the pages the site links to (one-level deep): | ||
+ | $ wget -H -r --level=1 -k -p <nowiki>http://www.example.com</nowiki> | ||
+ | |||
+ | * Resume large file download: | ||
+ | $ wget -c --output-document=MIT8.01F99-L01.mp4 "<nowiki>https://www.youtube.com/watch?v=X9c0MRooBzQ</nowiki>" | ||
+ | |||
+ | * Schedule hourly downloads of a file | ||
+ | $ wget --output-document=traffic_$(date +\%Y\%m\%d\%H).gif "<nowiki>http://sm3.sitemeter.com/YOUR_CODE</nowiki>" | ||
+ | |||
+ | * Automatically download music (by [http://www.veen.com/jeff/archives/000573.html Jeff Veen]): | ||
+ | $ wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i mp3_sites.txt | ||
where <code>mp3_sites.txt</code> lists your favourite (legal) download sites. | where <code>mp3_sites.txt</code> lists your favourite (legal) download sites. | ||
+ | |||
+ | * Crawl a website and generate a log file of any broken links: | ||
+ | $ wget --spider -o wget.log -e robots=off --wait 1 -r -p <nowiki>http://www.example.com/</nowiki> | ||
==Download multiple files== | ==Download multiple files== | ||
Line 75: | Line 90: | ||
==External links== | ==External links== | ||
*[http://www.gnu.org/software/wget/manual/ GNU Wget Manual] — last update: 15-Jun-2005 | *[http://www.gnu.org/software/wget/manual/ GNU Wget Manual] — last update: 15-Jun-2005 | ||
− | *[http:// | + | *[http://lifehacker.com/161202/geek-to-live--mastering-wget Geek to Live: Mastering Wget] — via lifehacker.com |
*[http://www.cyberciti.biz/nixcraft/vivek/blogger/2005/06/linux-wget-your-ultimate-command-line.php wget: your ultimate command line downloader] | *[http://www.cyberciti.biz/nixcraft/vivek/blogger/2005/06/linux-wget-your-ultimate-command-line.php wget: your ultimate command line downloader] | ||
[[Category:Linux Command Line Tools]] | [[Category:Linux Command Line Tools]] |
Revision as of 23:26, 21 May 2015
wget — The non-interactive network downloader.
Contents
Usage
- Simple download:
$ wget http://www.example.com/index.html
- Download a file and store it locally using a different file name:
$ wget -O example.html http://www.example.com/index.html
- Background download:
$ wget -b https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.0.4.tar.gz
The above command is useful when you initiate a download via a remote machine. This will start downloading in background, so that you can disconnect the terminal once the command is issued.
- Mirror an entire web site:
$ wget -m http://www.example.com
- Mirror an entire subdirectory of a web site (with no parent option in case of backlinks):
$ wget -mk -w 20 -np http://example.com/foo/
- Download all pages from a site and the pages the site links to (one-level deep):
$ wget -H -r --level=1 -k -p http://www.example.com
- Resume large file download:
$ wget -c --output-document=MIT8.01F99-L01.mp4 "https://www.youtube.com/watch?v=X9c0MRooBzQ"
- Schedule hourly downloads of a file
$ wget --output-document=traffic_$(date +\%Y\%m\%d\%H).gif "http://sm3.sitemeter.com/YOUR_CODE"
- Automatically download music (by Jeff Veen):
$ wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i mp3_sites.txt
where mp3_sites.txt
lists your favourite (legal) download sites.
- Crawl a website and generate a log file of any broken links:
$ wget --spider -o wget.log -e robots=off --wait 1 -r -p http://www.example.com/
Download multiple files
- Create variable that holds all URLs and then using 'BASH for loop' to download all files:
% URLS="http://www.example.com/foo.tar.gz ftp://ftp.example.org/pub/bar.tar.gz"
- Use for loop as follows:
% for u in $URLS; do wget $u; done
- You can also put a list of the URLs in a file and download using the
-i
option:
% wget -i download.txt
Automating/scripting download process
#!/bin/sh # wget-list: manage the list of downloaded files # invoke wget-list without arguments while [ `find .wget-list -size +0` ] do url=`head -n1 .wget-list` wget -c $url sed -si 1d .wget-list done
#/bin/sh # wget-all: process .wget-list in every subdirectory # invoke wget-all without arguments find -name .wget-list -execdir wget-list ';'
#!/bin/sh # wget-dirs: run wget-all in specified directories # invoking: wget-dirs <path-to-directory> ... for dir in $* do pushd $dir wget-all popd done wget-all
See also
External links
- GNU Wget Manual — last update: 15-Jun-2005
- Geek to Live: Mastering Wget — via lifehacker.com
- wget: your ultimate command line downloader