|
for https://retoor.molodetz.nl/retoor/isspam
|
|
|
|
https://snek.molodetz.nl/terminal.html ubuntu running thing instructions:
|
|
```
|
|
mkdir /project
|
|
cd /project
|
|
git clone https://retoor.molodetz.nl/retoor/isspam.git
|
|
apt install valgrind curl
|
|
export RUSTUP_HOME=/project/.rustup
|
|
export CARGO_HOME=/project/.cargo
|
|
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
|
. "/project/.cargo/env"
|
|
cd isspam
|
|
rustup install nightly
|
|
rustup default nightly
|
|
make
|
|
make benchmark
|
|
python3 bench.py
|
|
```
|
|
|
|
clone: `git clone https://gitlab.com/jestdotty-group/draft/jisspam.git jest_rust`
|
|
|
|
edit make: `vi makefile` and add build:
|
|
```
|
|
build_jest:
|
|
@echo "compiling jest_rust project"
|
|
cd jest_rust && cargo build --release && cp target/release/jisspam ..
|
|
```
|
|
append to all script:
|
|
```
|
|
all: build run valgrind build_risspam run_risspam build_cpp build_borded_cpp build_py build_jest
|
|
```
|
|
|
|
add to bench: `vi bench.py`
|
|
```py
|
|
time_start = time.time()
|
|
subprocess.check_output('./jisspam books/*.txt', shell=True)
|
|
print("Time Jest Rust:", time.time() - time_start)
|
|
```
|
|
|
|
run: `python3 bench.py`
|
|
output looks something like this:
|
|
```
|
|
***benchmarking***
|
|
Time C: 31.315868377685547
|
|
Time Rust: 41.232205867767334
|
|
Time CPP: 20.1683189868927
|
|
Time Borded CPP: 15.468477964401245
|
|
Time Jest Rust: 54.74523115158081
|
|
Time Retoor Python: 287.63036131858826
|
|
***end benchmark***
|
|
```
|
|
|
|
add `/jisspam` to `.gitignore` to not commit the executable accidentally
|
|
|
|
# local machine benchmarks
|
|
|
|
single threaded:
|
|
```
|
|
***benchmarking***
|
|
Time C: 2.4082751274108887
|
|
Time Rust: 2.865687847137451
|
|
Time CPP: 1.1568822860717773
|
|
Time Borded CPP: 1.9657189846038818
|
|
Time Jest Rust: 33.63373279571533
|
|
Time Retoor Python: 133.92413425445557
|
|
***end benchmark***
|
|
```
|
|
|
|
rayon:
|
|
```
|
|
***benchmarking***
|
|
Time C: 2.457853317260742
|
|
Time Rust: 3.0170154571533203
|
|
Time CPP: 1.1482579708099365
|
|
Time Borded CPP: 2.002591371536255
|
|
Time Jest Rust: 4.294418811798096
|
|
Time Retoor Python: 201.2997748851776
|
|
***end benchmark***
|
|
```
|
|
|
|
tokio:
|
|
```
|
|
***benchmarking***
|
|
Time C: 2.448648452758789
|
|
Time Rust: 3.095592737197876
|
|
Time CPP: 1.1662013530731201
|
|
Time Borded CPP: 1.9207634925842285
|
|
Time Jest Rust: 4.717588901519775
|
|
Time Retoor Python: 139.8203284740448
|
|
***end benchmark***
|
|
```
|
|
## compile options benchmarks
|
|
lto not thin: `Time Jest Rust: 5.306957483291626` slower
|
|
|
|
lto fat: `Time Jest Rust: 5.413678407669067` slower
|
|
|
|
codegen-units 1: `Time Jest Rust: 4.451631546020508` faster
|
|
|
|
opt-level z: `Time Jest Rust: 7.045313119888306` slower
|
|
|
|
strip true: `Time Jest Rust: 4.337219476699829` faster
|
|
|
|
lto true: `Time Jest Rust: 4.703521728515625` slower
|
|
|
|
lto none: `Time Jest Rust: 4.817203998565674`
|
|
|
|
lto thin: `Time Jest Rust: 4.429729223251343` faster
|
|
|
|
# data integrity
|
|
(this isn't tested, just guessed, and I don't have data to compare it with)
|
|
for loops:
|
|
```
|
|
file count: 904
|
|
failed file count: 0
|
|
sentence count: 5602301
|
|
word count: 81701260
|
|
capitalized count: 1753639
|
|
numeric count: 14981248
|
|
forbidden count: 1237059
|
|
words per sentence average: 14.6
|
|
forbidden word percentage: 2%
|
|
capitalized word percentage: 2%
|
|
|
|
benchmark: 5033ms
|
|
```
|
|
|
|
muncher:
|
|
```
|
|
file count: 904
|
|
failed file count: 0
|
|
sentence count: 5338705
|
|
word count: 86765116
|
|
capitalized count: 13640820
|
|
numeric count: 10902254
|
|
forbidden count: 0
|
|
words per sentence average: 16.3
|
|
forbidden word percentage: 0%
|
|
capitalized word percentage: 16%
|
|
|
|
benchmark: 504ms
|
|
```
|
|
with forbidden words:
|
|
```
|
|
file count: 904
|
|
failed file count: 0
|
|
sentence count: 5338705
|
|
word count: 86765116
|
|
capitalized count: 13640820
|
|
numeric count: 10902254
|
|
forbidden count: 279717
|
|
words per sentence average: 16.3
|
|
forbidden word percentage: 0%
|
|
capitalized word percentage: 16%
|
|
|
|
benchmark: 6078ms
|
|
```
|
|
|
|
# forbidden words benchmarks
|
|
seems they take up about 4000ms to churn through in the original version
|
|
|
|
for loops count forbidden word once only:
|
|
```
|
|
file count: 904
|
|
failed file count: 0
|
|
sentence count: 5602301
|
|
word count: 81701260
|
|
capitalized count: 1753639
|
|
numeric count: 14981248
|
|
forbidden count: 1143234
|
|
words per sentence average: 14.6
|
|
forbidden word percentage: 1%
|
|
capitalized word percentage: 2%
|
|
|
|
benchmark: 4737ms
|
|
```
|
|
for loops with trie:
|
|
```
|
|
file count: 904
|
|
failed file count: 0
|
|
sentence count: 5602301
|
|
word count: 81701260
|
|
capitalized count: 1753639
|
|
numeric count: 14981248
|
|
forbidden count: 176528
|
|
words per sentence average: 14.6
|
|
forbidden word percentage: 0%
|
|
capitalized word percentage: 2%
|
|
|
|
benchmark: 1588ms
|
|
```
|
|
|
|
muncher with trie is 2600ms
|
|
|
|
for loops with fxhash trie: 1200ms
|