All source listed below is under MIT license if no LICENSE file stating different is available.

isspam

Fast as light evaluator for text files to summarize specific details about the text files.

This repository contains multiple versions of the same(-ish) algorithm.

Versions

Building

Build all versions to the repo root:

make build_all

Build isspam (C) with memory check (requires valgrind to be installed):

make valgrind

Benchmarking

After all binaries have been build to the repo root, you can benchmark them like this:

make benchmark

or without extracting books again:

make benchmark_only

Running

Using files as parameter

./(r)isspam ./spam/*.txt
./(r)isspam ./not_spam/*.txt

Using stdin

Useful for automation. Works only on the isspam version.

cat ./spam/example_spam1.txt | ./isspam

Example output

Output example made by isspam.

File: ./spam/example_spam3.txt
Capitalized words: 39
Sentences: 20
Words: 420
Numbers: 1
Forbidden words: 15
<0:recovery>
<1:techie>
<2:https>
<3:digital>
<4:hack>
<5://>
<6:com>
<7:@>
<8:crypto>
<9:bitcoin>
<10:whatsapp>
<11:cryptocurrency>
<12:stolen>
<13:contact>
<14:understanding>
Word count per sentence: 21
Memory usage: 1 MB, 6.460 (re)allocated, 4.222 unqiue free'd, 0 in use.

Valgrind status

Valgrind output for isspam version.

Rust variant thinks it's too cool for memory checks afterwards.

Date: 2024-11-30

==58062== 
==58062== HEAP SUMMARY:
==58062==     in use at exit: 0 bytes in 0 blocks
==58062==   total heap usage: 6,490 allocs, 6,490 frees, 2,343,156 bytes allocated
==58062== 
==58062== All heap blocks were freed -- no leaks are possible
==58062== 
==58062== For lists of detected and suppressed errors, rerun with: -s
==58062== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
.gitea/workflows
12bitfloat_rust
borded_cpp
jest_rust
not_spam
retoor_c
spam
.clang-format
.gitignore
bench.py
books.tar.gz
Makefile
README.md