Initial commit.
This commit is contained in:
commit
6ad826f3ee
4
.gitignore
vendored
Normal file
4
.gitignore
vendored
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
downie*
|
||||||
|
__pycache__/
|
||||||
|
www*
|
||||||
|
.venv
|
18
Makefile
Normal file
18
Makefile
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
PYTHON=.venv/bin/python
|
||||||
|
PIP=.venv/bin/pip
|
||||||
|
DOWNIE=.venv/bin/downie
|
||||||
|
|
||||||
|
|
||||||
|
install:
|
||||||
|
sudo apt install python3 python3-venv python3-pip -y
|
||||||
|
python3 -m venv .venv
|
||||||
|
${PIP} install -e .
|
||||||
|
@ln -s ${DOWNIE} downie || true
|
||||||
|
@echo "Installed downie. Use by executing ./downie [your url to downie]"
|
||||||
|
|
||||||
|
build:
|
||||||
|
${PIP} install build
|
||||||
|
${PYTHON} -m build .
|
||||||
|
|
||||||
|
example:
|
||||||
|
${DOWNIE} https://www.molodetz.nl
|
65
README.md
Normal file
65
README.md
Normal file
@ -0,0 +1,65 @@
|
|||||||
|
# Downie
|
||||||
|
|
||||||
|
**Downie** is a full **site downloader**. I tried several downloaders and they all were not capable to download big sites like **molodetz.nl** (containing **300.000+** links / resources). See **features** to see what makes this site downloader **better than most competition**.
|
||||||
|
|
||||||
|
If you encounter a site that can't be downloaded using this downloader, contact me @ *retoor@molodetz.nl*. Also, creating a bug ticket is possible after logging in. **I respond to both methods within 24h**.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
- Fast as light **async** fetching of your pages allowing concurrency.
|
||||||
|
- Limit fetching only to given domain. **Remote URLs are not followed**. Also redirects to remote URLs (yes, that can be an issue) are not followed.
|
||||||
|
- Discriminates between file and folder for urls by adding `.html` to pages not having an extension. So `/pony` becomes /pony.html so that `/pony/photos.html` is possible.
|
||||||
|
- **Absolute URLs will be converted to relative URLs** so that the site becomes **portable** and **will work isolated** on your webserver.
|
||||||
|
- **Progress is saved.** You do not have to start over completely after exiting the application. It will just continue.
|
||||||
|
- Caching of already visited urls.
|
||||||
|
|
||||||
|
## Using Downie
|
||||||
|
After following the **installation instructions**, use `downie https://target-site.com` if **globally installed**. If not globally installed, execute `./downie https://target-site.com` from the current directory.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Debian (for development)
|
||||||
|
Only **requirement** is `make`. Install by executing `sudo apt install make -y` in the terminal.
|
||||||
|
Installation of all required Python dependencies are done by executing `make install`.
|
||||||
|
This will **install system packages** using `apt`:
|
||||||
|
- `python`
|
||||||
|
- `python3-venv`
|
||||||
|
- `python3-pip`
|
||||||
|
*There is a big chance you already have them.*
|
||||||
|
After that:
|
||||||
|
1. A Python environment will be created named `.venv`. *You can ignore this folder.*
|
||||||
|
2. Required Python libraries will be installed. *(aiohttp, aiofiles, beautifullsoup4, app (molodetz)).*
|
||||||
|
3. Downie will be installed. *(For the devs: in edit mode with -e!).*
|
||||||
|
4. A executable file named `downie` will be placed in the current directory. This is the main application.
|
||||||
|
**Tip:** make downie globally accessable by copying `downie` to `/usr/local/bin`.
|
||||||
|
|
||||||
|
### Other operating systems; Windows / Mac; not for development.
|
||||||
|
You have to figure out how to install python3, python3-venv and python3-pip on your OS. Make sure you have these on your system. Chance is big that you already got those.
|
||||||
|
Run the following commands:
|
||||||
|
```bash
|
||||||
|
python3 -m venv .venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
pip install git+https://retoor.molodetz.nl/retoor/downie.git
|
||||||
|
cp .venv/bin/downie .
|
||||||
|
```
|
||||||
|
**Tip:** make downie globally accessable by copying `downie` to `/usr/local/bin`.
|
||||||
|
|
||||||
|
## Default configuration / internals
|
||||||
|
Not defined by CLI parameters, but configurable as Downie class constructor parameters:
|
||||||
|
- `concurrent_request_count` - Amount of concurrent requests. This defaults to 500.
|
||||||
|
- `concurrent_write_count` - Amount of concurrent downloads of files. This defaults to 10.
|
||||||
|
These values are based on my usage. There is a possibility that there are more optimial parameters.
|
||||||
|
|
||||||
|
## Running a downloaded site:
|
||||||
|
|
||||||
|
### Open site statically
|
||||||
|
Just open a page (.html) from the site using your browser.
|
||||||
|
|
||||||
|
### Serve the site
|
||||||
|
Simple, run:
|
||||||
|
```bash
|
||||||
|
python3 -m http.server [port]
|
||||||
|
```
|
||||||
|
If you don't know what port to use, use `7331`. Your website is accessable on `https://localhost:7331` now.
|
||||||
|
|
||||||
|
## Caution
|
||||||
|
URLs of downloaded content can be modified to make the site portable. So it is possible that your downloaded site is literally the content online. It is optimized. A literal copy of the site could be unusable.
|
34
pyproject.toml
Normal file
34
pyproject.toml
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=61.0", "wheel"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[project]
|
||||||
|
name = "downie"
|
||||||
|
version = "1.0.0"
|
||||||
|
description = "A fast async download parser with concurrency control. Written for specific use case to crawl a certain site. Will only download urls under the base url. Remote links are not followed."
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.7"
|
||||||
|
dependencies = [
|
||||||
|
"aiohttp",
|
||||||
|
"beautifulsoup4",
|
||||||
|
"aiofiles",
|
||||||
|
"app @ git+https://retoor.molodetz.nl/retoor/app.git"
|
||||||
|
]
|
||||||
|
authors = [
|
||||||
|
{ name = "retoor", email = "retoor@molodetz.nl" }
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
downie = "downie.__main__:main"
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
autocomplete = [
|
||||||
|
"argcomplete>=1.12.3"
|
||||||
|
]
|
||||||
|
|
||||||
|
[tool.setuptools]
|
||||||
|
package-dir = {"" = "src"}
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["src"]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user