How to Download All Pages of a Web Site in Linux

Wget is powerful open source software on Linux that can be used to download content from websites. Sometimes it is necessary to make a copy of all the content on a website, perhaps to create a mirror of the site or to preserve the current content for later use. With Wget, an entire website can be downloaded with a single one-line command.

Make sure that wget is installed by using the “which” command. Typing “which wget”, without quotes, in a terminal will show where Wget is installed on Linux computers. If Wget is not installed, use the distribution’s software installation tools such as Yum or Apt-get to install Wget.

Set the flag to mirror the website. Wget has multiple flags that can be set to alter the behavior of the program. The “-m” flag is used download an entire website. For instance, type:

wget -m www.fake-web-site.com

This creates a local copy of “www.fake-web-site.com” on your computer. By default, Wget creates a directory with the same name as the website inside the directory in which it was executed.

Check that the content was acquired by opening the local copy of the page in a browser.

Max Powers is a Python software developer in Austin, Texas. Powers received his Ph.D. in physics from the University of South Carolina in 2008, and has contributed to scientific publications such as “The Astrophysical Journal.”

Leave a Comment

Your email address will not be published. Required fields are marked *