Downie
Downie is a full site downloader. I tried several downloaders and they all were not capable to download big sites like molodetz.nl (containing 300.000+ links / resources). See features to see what makes this site downloader better than the competition.
If you encounter a site that can't be downloaded using this downloader, contact me @ retoor@molodetz.nl. Also, creating a bug ticket is possible after logging in. I respond to both methods within 24h.
Features
- Fast as light async fetching of your pages allowing concurrency.
- Limit fetching only to given domain. Remote URLs are not followed. Also redirects to remote URLs (yes, that can be an issue) are not followed.
- Discriminates between file and folder for urls by adding
.html
to pages not having an extension. So/pony
becomes /pony.html so that/pony/photos.html
is possible. - Absolute URLs will be converted to relative URLs so that the site becomes portable and will work isolated on your webserver.
- Progress is saved. You do not have to start over completely after exiting the application. It will just continue.
- Caching of already visited urls.
Using Downie
After following the installation instructions, use downie https://target-site.com
if globally installed. If not globally installed, execute ./downie https://target-site.com
from the current directory.
Installation
Debian (for development)
Only requirement is make
. Install by executing sudo apt install make -y
in the terminal.
Installation of all required Python dependencies are done by executing make install
.
This will install system packages using apt
:
python
python3-venv
python3-pip
There is a big chance you already have them. After that:
- A Python environment will be created named
.venv
. You can ignore this folder. - Required Python libraries will be installed. (aiohttp, aiofiles, beautifullsoup4, app (molodetz)).
- Downie will be installed. (For the devs: in edit mode with -e!).
- A executable file named
downie
will be placed in the current directory. This is the main application. Tip: make downie globally accessable by copyingdownie
to/usr/local/bin
.
Other operating systems; Windows / Mac; not for development.
You have to figure out how to install python3, python3-venv and python3-pip on your OS. Make sure you have these on your system. Chance is big that you already got those. Run the following commands:
python3 -m venv .venv
source .venv/bin/activate
pip install git+https://retoor.molodetz.nl/retoor/downie.git
cp .venv/bin/downie .
Tip: make downie globally accessable by copying downie
to /usr/local/bin
.
Default configuration / internals
Not defined by CLI parameters, but configurable as Downie class constructor parameters:
concurrent_request_count
- Amount of concurrent requests. This defaults to 500.concurrent_write_count
- Amount of concurrent downloads of files. This defaults to 10. These values are based on my usage. There is a possibility that there are more optimial parameters.
Running a downloaded site:
Open site statically
Just open a page (.html) from the site using your browser.
Serve the site
Simple, run:
python3 -m http.server [port]
If you don't know what port to use, use 7331
. Your website is accessable on https://localhost:7331
now.
Caution
URLs of downloaded content can be modified to make the site portable. So it is possible that your downloaded site is literally the content online. It is optimized. A literal copy of the site could be unusable.
dist | |
src/downie | |
.gitignore | |
Makefile | |
pyproject.toml | |
README.md |