for https://retoor.molodetz.nl/retoor/isspam
https://snek.molodetz.nl/terminal.html ubuntu running thing instructions:
mkdir /project
cd /project
git clone https://retoor.molodetz.nl/retoor/isspam.git
apt install valgrind curl
export RUSTUP_HOME=/project/.rustup
export CARGO_HOME=/project/.cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "/project/.cargo/env"
cd isspam
rustup install nightly
rustup default nightly
make
make benchmark
python3 bench.py
clone: git clone https://gitlab.com/jestdotty-group/draft/jisspam.git jest_rust
edit make: vi makefile
and add build:
build_jest:
@echo "compiling jest_rust project"
cd jest_rust && cargo build --release && cp target/release/jisspam ..
append to all script:
all: build run valgrind build_risspam run_risspam build_cpp build_borded_cpp build_py build_jest
add to bench: vi bench.py
time_start = time.time()
subprocess.check_output('./jisspam books/*.txt', shell=True)
print("Time Jest Rust:", time.time() - time_start)
run: python3 bench.py
output looks something like this:
***benchmarking***
Time C: 31.315868377685547
Time Rust: 41.232205867767334
Time CPP: 20.1683189868927
Time Borded CPP: 15.468477964401245
Time Jest Rust: 54.74523115158081
Time Retoor Python: 287.63036131858826
***end benchmark***
add /jisspam
to .gitignore
to not commit the executable accidentally
local machine benchmarks
single threaded:
***benchmarking***
Time C: 2.4082751274108887
Time Rust: 2.865687847137451
Time CPP: 1.1568822860717773
Time Borded CPP: 1.9657189846038818
Time Jest Rust: 33.63373279571533
Time Retoor Python: 133.92413425445557
***end benchmark***
rayon:
***benchmarking***
Time C: 2.457853317260742
Time Rust: 3.0170154571533203
Time CPP: 1.1482579708099365
Time Borded CPP: 2.002591371536255
Time Jest Rust: 4.294418811798096
Time Retoor Python: 201.2997748851776
***end benchmark***
tokio:
***benchmarking***
Time C: 2.448648452758789
Time Rust: 3.095592737197876
Time CPP: 1.1662013530731201
Time Borded CPP: 1.9207634925842285
Time Jest Rust: 4.717588901519775
Time Retoor Python: 139.8203284740448
***end benchmark***
compile options benchmarks
lto not thin: Time Jest Rust: 5.306957483291626
slower
lto fat: Time Jest Rust: 5.413678407669067
slower
codegen-units 1: Time Jest Rust: 4.451631546020508
faster
opt-level z: Time Jest Rust: 7.045313119888306
slower
strip true: Time Jest Rust: 4.337219476699829
faster
lto true: Time Jest Rust: 4.703521728515625
slower
lto none: Time Jest Rust: 4.817203998565674
lto thin: Time Jest Rust: 4.429729223251343
faster
data integrity
(this isn't tested, just guessed, and I don't have data to compare it with) for loops:
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 1237059
words per sentence average: 14.6
forbidden word percentage: 2%
capitalized word percentage: 2%
benchmark: 5033ms
muncher:
file count: 904
failed file count: 0
sentence count: 5338705
word count: 86765116
capitalized count: 13640820
numeric count: 10902254
forbidden count: 0
words per sentence average: 16.3
forbidden word percentage: 0%
capitalized word percentage: 16%
benchmark: 504ms
with forbidden words:
file count: 904
failed file count: 0
sentence count: 5338705
word count: 86765116
capitalized count: 13640820
numeric count: 10902254
forbidden count: 279717
words per sentence average: 16.3
forbidden word percentage: 0%
capitalized word percentage: 16%
benchmark: 6078ms
forbidden words benchmarks
seems they take up about 4000ms to churn through in the original version
for loops count forbidden word once only:
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 1143234
words per sentence average: 14.6
forbidden word percentage: 1%
capitalized word percentage: 2%
benchmark: 4737ms
for loops with trie:
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 176528
words per sentence average: 14.6
forbidden word percentage: 0%
capitalized word percentage: 2%
benchmark: 1588ms
muncher with trie is 2600ms
for loops with fxhash trie: 1200ms
.. | ||
src | ||
test_files | ||
.gitignore | ||
Cargo.toml | ||
README.md |