💾 Archived View for perso.pw › blog › articles › wget-mirror.gmi captured on 2022-07-16 at 14:26:47. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-12-17)

➡️ Next capture (2023-01-29)

-=-=-=-=-=-=-

Download files listed in a http index with wget

NIL=> https://bsd.network/@solenepercent/104367119838935215 Comment on Mastodon

Sometimes I need to download files through http from a list on an "autoindex"

page and it's always painful to find a correct command for this.

The easy solution is **wget** but you need to use the correct parameters

because wget has a lot of mirroring options but you only want specific ones to

achieve this goal.

I ended up with the following command:

wget --continue --accept "*.tgz" --no-directories --no-parent --recursive http://ftp.fr.openbsd.org/pub/OpenBSD/6.7/amd64/

This will download every tgz files available at the address given as last parameter.

The parameters given will filter to only download the **tgz** files, put the

files in the current working directory and most important, don't try to escape

to the parent directory to start downloading again. The `--continue`` parameter

allow to interrupt wget and start again, downloaded file will be skipped and

partially downloaded files will be completed.

continue feature only work if your local file and the remote file are the same,

this simply look at the local and remote names and will ask the remote server

to start downloading at the current byte range of your local file. If meanwhile

the remote file changed, you will have a mix of the old and new file.

Obviously ftp protocol would be better suited for this download job but ftp is

less and less available so I find **wget** to be a nice workaround for this.