for https://retoor.molodetz.nl/retoor/isspam https://snek.molodetz.nl/terminal.html ubuntu running thing instructions: ``` mkdir /project cd /project git clone https://retoor.molodetz.nl/retoor/isspam.git apt install valgrind curl export RUSTUP_HOME=/project/.rustup export CARGO_HOME=/project/.cargo curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh . "/project/.cargo/env" cd isspam rustup install nightly rustup default nightly make make benchmark python3 bench.py ``` clone: `git clone https://gitlab.com/jestdotty-group/draft/jisspam.git jest_rust` edit make: `vi makefile` and add build: ``` build_jest: @echo "compiling jest_rust project" cd jest_rust && cargo build --release && cp target/release/jisspam .. ``` append to all script: ``` all: build run valgrind build_risspam run_risspam build_cpp build_borded_cpp build_py build_jest ``` add to bench: `vi bench.py` ```py time_start = time.time() subprocess.check_output('./jisspam books/*.txt', shell=True) print("Time Jest Rust:", time.time() - time_start) ``` run: `python3 bench.py` output looks something like this: ``` ***benchmarking*** Time C: 31.315868377685547 Time Rust: 41.232205867767334 Time CPP: 20.1683189868927 Time Borded CPP: 15.468477964401245 Time Jest Rust: 54.74523115158081 Time Retoor Python: 287.63036131858826 ***end benchmark*** ``` add `/jisspam` to `.gitignore` to not commit the executable accidentally # local machine benchmarks single threaded: ``` ***benchmarking*** Time C: 2.4082751274108887 Time Rust: 2.865687847137451 Time CPP: 1.1568822860717773 Time Borded CPP: 1.9657189846038818 Time Jest Rust: 33.63373279571533 Time Retoor Python: 133.92413425445557 ***end benchmark*** ``` rayon: ``` ***benchmarking*** Time C: 2.457853317260742 Time Rust: 3.0170154571533203 Time CPP: 1.1482579708099365 Time Borded CPP: 2.002591371536255 Time Jest Rust: 4.294418811798096 Time Retoor Python: 201.2997748851776 ***end benchmark*** ``` tokio: ``` ***benchmarking*** Time C: 2.448648452758789 Time Rust: 3.095592737197876 Time CPP: 1.1662013530731201 Time Borded CPP: 1.9207634925842285 Time Jest Rust: 4.717588901519775 Time Retoor Python: 139.8203284740448 ***end benchmark*** ``` ## compile options benchmarks lto not thin: `Time Jest Rust: 5.306957483291626` slower lto fat: `Time Jest Rust: 5.413678407669067` slower codegen-units 1: `Time Jest Rust: 4.451631546020508` faster opt-level z: `Time Jest Rust: 7.045313119888306` slower strip true: `Time Jest Rust: 4.337219476699829` faster lto true: `Time Jest Rust: 4.703521728515625` slower lto none: `Time Jest Rust: 4.817203998565674` lto thin: `Time Jest Rust: 4.429729223251343` faster # data integrity (this isn't tested, just guessed, and I don't have data to compare it with) for loops: ``` file count: 904 failed file count: 0 sentence count: 5602301 word count: 81701260 capitalized count: 1753639 numeric count: 14981248 forbidden count: 1237059 words per sentence average: 14.6 forbidden word percentage: 2% capitalized word percentage: 2% benchmark: 5033ms ``` muncher: ``` file count: 904 failed file count: 0 sentence count: 5338705 word count: 86765116 capitalized count: 13640820 numeric count: 10902254 forbidden count: 0 words per sentence average: 16.3 forbidden word percentage: 0% capitalized word percentage: 16% benchmark: 504ms ``` with forbidden words: ``` file count: 904 failed file count: 0 sentence count: 5338705 word count: 86765116 capitalized count: 13640820 numeric count: 10902254 forbidden count: 279717 words per sentence average: 16.3 forbidden word percentage: 0% capitalized word percentage: 16% benchmark: 6078ms ``` # forbidden words benchmarks seems they take up about 4000ms to churn through in the original version for loops count forbidden word once only: ``` file count: 904 failed file count: 0 sentence count: 5602301 word count: 81701260 capitalized count: 1753639 numeric count: 14981248 forbidden count: 1143234 words per sentence average: 14.6 forbidden word percentage: 1% capitalized word percentage: 2% benchmark: 4737ms ``` for loops with trie: ``` file count: 904 failed file count: 0 sentence count: 5602301 word count: 81701260 capitalized count: 1753639 numeric count: 14981248 forbidden count: 176528 words per sentence average: 14.6 forbidden word percentage: 0% capitalized word percentage: 2% benchmark: 1588ms ``` muncher with trie is 2600ms for loops with fxhash trie: 1200ms