How to download a website by using WGET

Sometimes, you need to download a website or a webpage including all the links. At this moment, I need to download a homework assignment webpage which can be removed in any time. (TT It’s already past due.) Here, I am going to use wget.

I will just list all the options which can be used to achieve the goal.

  • –no-clobber: When a file is downloaded more than once, this option will prevent it overwriting or creating a new file. The same file will not be downloaded because the downloaded copy already exists.
  • –restrict-file-names=[modes]: The modes are comma-separated set of values, which are unix, windows, nocontrol, ascii, lowercase, uppercase. By default in unix-like system, it is set to unix. This can be changed to windows to download files in windows systems.
  • –recursive: turn on recursive retrieving
  • –adjust-extension: This is useful when you are downloading dynamic links or cgi pages. By giving this option, the pages with extension like asp, php, or cgi will be downloaded as html file so that you can view the webpages in bare html pages.
  • –domain [domain name]: You may need to set a boundary. This will prevent wget from searching beyond this domain and download all the web links from outside.
  • –page-requisites: This option causes Wget to download all the files that are necessary to properly display a given HTML page.
  • –no-parent: This is similar. It will not search any parent directory or file. This will restrict the target files to the given directory.
  • –convert-links: This will change the links in the downloaded documents to make them suitable for local viewing. For example, this will convert an absolute path such as /aaa/bbb/c.html to a relative path ../../c.html.

Here my case is as below:

wget \
–no-clobber \
–recursive \
–adjust-extension \
–domain www.cs.indiana.edu \
–page-requisites \
–no-parent \
–convert-links \
http://www.cs.indiana.edu/classes/p415-sjoh/hw/project/index.htm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s