All source listed below is under MIT license if no LICENSE file stating different is available.

Downie

Downie is a full site downloader. I tried several downloaders and they all were not capable to download big sites like molodetz.nl (containing 300.000+ links / resources). See features to see what makes this site downloader better than the competition.

If you encounter a site that can't be downloaded using this downloader, contact me @ retoor@molodetz.nl. Also, creating a bug ticket is possible after logging in. I respond to both methods within 24h.

Features

  • Fast as light async fetching of your pages allowing concurrency.
  • Limit fetching only to given domain. Remote URLs are not followed. Also redirects to remote URLs (yes, that can be an issue) are not followed.
  • Discriminates between file and folder for urls by adding .html to pages not having an extension. So /pony becomes /pony.html so that /pony/photos.html is possible.
  • Absolute URLs will be converted to relative URLs so that the site becomes portable and will work isolated on your webserver.
  • Progress is saved. You do not have to start over completely after exiting the application. It will just continue.
  • Caching of already visited urls.

Using Downie

After following the installation instructions, use downie https://target-site.com if globally installed. If not globally installed, execute ./downie https://target-site.com from the current directory.

Installation

Debian (for development)

Only requirement is make. Install by executing sudo apt install make -y in the terminal. Installation of all required Python dependencies are done by executing make install. This will install system packages using apt:

  • python
  • python3-venv
  • python3-pip There is a big chance you already have them. After that:
  1. A Python environment will be created named .venv. You can ignore this folder.
  2. Required Python libraries will be installed. (aiohttp, aiofiles, beautifullsoup4, app (molodetz)).
  3. Downie will be installed. (For the devs: in edit mode with -e!).
  4. A executable file named downie will be placed in the current directory. This is the main application. Tip: make downie globally accessable by copying downie to /usr/local/bin.

Other operating systems; Windows / Mac; not for development.

You have to figure out how to install python3, python3-venv and python3-pip on your OS. Make sure you have these on your system. Chance is big that you already got those. Run the following commands:

python3 -m venv .venv 
source .venv/bin/activate 
pip install git+https://retoor.molodetz.nl/retoor/downie.git
cp .venv/bin/downie .

Tip: make downie globally accessable by copying downie to /usr/local/bin.

Default configuration / internals

Not defined by CLI parameters, but configurable as Downie class constructor parameters:

  • concurrent_request_count - Amount of concurrent requests. This defaults to 500.
  • concurrent_write_count - Amount of concurrent downloads of files. This defaults to 10. These values are based on my usage. There is a possibility that there are more optimial parameters.

Running a downloaded site:

Open site statically

Just open a page (.html) from the site using your browser.

Serve the site

Simple, run:

python3 -m http.server [port]

If you don't know what port to use, use 7331. Your website is accessable on https://localhost:7331 now.

Caution

URLs of downloaded content can be modified to make the site portable. So it is possible that your downloaded site is literally the content online. It is optimized. A literal copy of the site could be unusable.

dist
src/downie
.gitignore
Makefile
pyproject.toml
README.md