Compare commits

..

27 Commits

Author SHA1 Message Date
JestDotty
5fc6c839a1 notes
Some checks failed
isspam build / build (push) Failing after 4m49s
2025-10-04 10:03:15 -04:00
JestDotty
c459fe6d79 more tests 2025-10-04 09:58:40 -04:00
JestDotty
d73d4ff7c1 reorg 2025-10-04 09:18:03 -04:00
JestDotty
94b786f83a sequential file read slightly faster 2025-03-24 16:03:36 -04:00
JestDotty
422e8ace29 clean up README 2025-03-24 00:48:04 -04:00
JestDotty
c581ca6817 fxhash and I remembered how to static lazy right 2025-03-24 00:20:15 -04:00
JestDotty
41f5398f20 do or do not, there is always a trie organization 2025-03-23 23:58:34 -04:00
JestDotty
55a4901a30 if a word has forbidden only count it once. uses for loops again whoops 2025-03-23 23:23:40 -04:00
JestDotty
320f6bf4af muncher benchmark and data 2025-03-23 23:14:30 -04:00
JestDotty
a699aba7c2 don't track 2025-03-23 22:31:14 -04:00
JestDotty
90cd44f302 optimization benchmarks. switched to tokio 2025-03-23 22:29:54 -04:00
JestDotty
e3c71f8fc9 sync. rayon and build options, README 2025-03-23 21:39:12 -04:00
6ddef94103 updates. 2025-03-24 02:31:31 +01:00
Jest Dotty
0eaf353463 flatten jest_rust 2025-03-23 20:23:38 -04:00
Jest Dotty
1c0fc334e8 jest_rust, build, bench 2025-03-23 20:03:09 -04:00
BordedDev
f1b9005c9c Removed TBB from default makefile 2025-03-23 23:37:31 +01:00
BordedDev
50e01a2fe6 Fixed horrible mistake ;P 2025-03-23 22:40:43 +01:00
BordedDev
f2c0942cc2 Updated file io perf 2025-03-23 22:06:48 +01:00
BordedDev
177a512c38 Tweaked some variables, add main2.cpp which removes the struct 2025-03-23 03:36:29 +01:00
42938575d3 Added py 2025-03-23 03:25:51 +01:00
BordedDev
b14337dc60 Fixed issues with utf8 2025-03-23 02:13:38 +01:00
7dc6143a7f Added python version. 2025-03-22 23:55:48 +01:00
BordedDev
6b148b3235 Updated Dockerfile to use gcc
Added borded cpp to bench
Made borded part of make all
2025-03-20 23:32:24 +01:00
BordedDev
c78d43b593 Added borded cpp to make file 2025-03-20 23:21:33 +01:00
9b1ab5ed0c Docker setup. 2025-03-20 22:52:53 +01:00
748c404404 Retoorii fixes. 2025-03-20 22:23:41 +01:00
BordedDev
8528fe8f0f Setup v1 of borded spam parser 2025-03-20 21:44:22 +01:00
13 changed files with 170 additions and 332 deletions

View File

@ -1,2 +1,3 @@
/target
/Cargo.lock
/test_books

View File

@ -9,6 +9,7 @@ tokio = { version = "1.44.1", features = ["full"] }
[profile.release]
codegen-units = 1 # less means more compile work but better optimized
lto = "thin" # thin has best performance. fat the worst
lto = "fat" # thin has best performance. fat the worst
strip = true
# opt-level = "z" # slows down
panic = "abort"

View File

@ -1,5 +1,35 @@
for https://retoor.molodetz.nl/retoor/isspam
extract `../books.tar.gz`
# local machine benchmarks
single threaded: `33.63373279571533`
rayon: `4.294418811798096`
tokio: `4.717588901519775`
tokio:
muncher: `2486ms`
for_loops: `1227ms`
for_loops_forbidden_only: `987ms`
trie creation and stats accumulation take 0ms
FxHashMap faster than BTreeMap
## compile options benchmarks
`lto` thin, fat doesn't change much
`codegen-units` 0, 1 doesn't change much
`opt-level = "z"` slow things down
# ubuntu terminal running
https://snek.molodetz.nl/terminal.html ubuntu running thing instructions:
```
mkdir /project
@ -51,145 +81,4 @@ Time Retoor Python: 287.63036131858826
***end benchmark***
```
add `/jisspam` to `.gitignore` to not commit the executable accidentally
# local machine benchmarks
single threaded:
```
***benchmarking***
Time C: 2.4082751274108887
Time Rust: 2.865687847137451
Time CPP: 1.1568822860717773
Time Borded CPP: 1.9657189846038818
Time Jest Rust: 33.63373279571533
Time Retoor Python: 133.92413425445557
***end benchmark***
```
rayon:
```
***benchmarking***
Time C: 2.457853317260742
Time Rust: 3.0170154571533203
Time CPP: 1.1482579708099365
Time Borded CPP: 2.002591371536255
Time Jest Rust: 4.294418811798096
Time Retoor Python: 201.2997748851776
***end benchmark***
```
tokio:
```
***benchmarking***
Time C: 2.448648452758789
Time Rust: 3.095592737197876
Time CPP: 1.1662013530731201
Time Borded CPP: 1.9207634925842285
Time Jest Rust: 4.717588901519775
Time Retoor Python: 139.8203284740448
***end benchmark***
```
## compile options benchmarks
lto not thin: `Time Jest Rust: 5.306957483291626` slower
lto fat: `Time Jest Rust: 5.413678407669067` slower
codegen-units 1: `Time Jest Rust: 4.451631546020508` faster
opt-level z: `Time Jest Rust: 7.045313119888306` slower
strip true: `Time Jest Rust: 4.337219476699829` faster
lto true: `Time Jest Rust: 4.703521728515625` slower
lto none: `Time Jest Rust: 4.817203998565674`
lto thin: `Time Jest Rust: 4.429729223251343` faster
# data integrity
(this isn't tested, just guessed, and I don't have data to compare it with)
for loops:
```
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 1237059
words per sentence average: 14.6
forbidden word percentage: 2%
capitalized word percentage: 2%
benchmark: 5033ms
```
muncher:
```
file count: 904
failed file count: 0
sentence count: 5338705
word count: 86765116
capitalized count: 13640820
numeric count: 10902254
forbidden count: 0
words per sentence average: 16.3
forbidden word percentage: 0%
capitalized word percentage: 16%
benchmark: 504ms
```
with forbidden words:
```
file count: 904
failed file count: 0
sentence count: 5338705
word count: 86765116
capitalized count: 13640820
numeric count: 10902254
forbidden count: 279717
words per sentence average: 16.3
forbidden word percentage: 0%
capitalized word percentage: 16%
benchmark: 6078ms
```
# forbidden words benchmarks
seems they take up about 4000ms to churn through in the original version
for loops count forbidden word once only:
```
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 1143234
words per sentence average: 14.6
forbidden word percentage: 1%
capitalized word percentage: 2%
benchmark: 4737ms
```
for loops with trie:
```
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 176528
words per sentence average: 14.6
forbidden word percentage: 0%
capitalized word percentage: 2%
benchmark: 1588ms
```
muncher with trie is 2600ms
for loops with fxhash trie: 1200ms
add `/jisspam` to `.gitignore` to not commit the executable accidentally

View File

@ -1,3 +1,4 @@
mod parser;
mod stats;
mod trie;
@ -50,142 +51,35 @@ static FORBIDDEN_WORDS: LazyLock<Trie> = LazyLock::new(|| {
trie
});
impl Stats {
pub fn process(&mut self, text: &str) {
// self.muncher(&text);
self.for_loops(&text);
}
#[allow(dead_code)]
/// probably buggy. for example, are new lines sentences? what if the text has no last period?
/// 500ms is without forbidden words check, but...
/// 6000ms if adding forbidden words.. so not faster
/// with trie this is 2600ms
fn muncher(&mut self, text: &str) {
let mut capitalized = true;
let mut whitespaced = false;
let mut dotted = false;
let mut word = String::new();
for char in text.chars() {
if whitespaced {
if !char.is_whitespace() {
whitespaced = false; //end whiteness
}
continue;
} else if char.is_whitespace() {
whitespaced = true;
self.word_count += 1; //end of word
if capitalized {
self.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
self.forbidden_count += 1;
}
word = String::new();
continue;
}
if dotted {
if char != '.' {
dotted = false; //end sentencing
}
continue;
} else if char == '.' {
dotted = true;
self.sentence_count += 1;
self.word_count += 1; //end of word
if capitalized {
self.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
self.forbidden_count += 1;
}
word = String::new();
continue;
}
word += &char.to_string();
if char.is_numeric() {
self.numeric_count += 1;
capitalized = false;
}
if !char.is_ascii_uppercase() {
capitalized = false;
}
}
}
#[allow(dead_code)]
/// typically 5000ms
/// with trie this is 1600ms
fn for_loops(&mut self, text: &str) {
for sentence in text
.split('.')
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
self.sentence_count += 1;
for word in sentence
.split_whitespace()
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
self.word_count += 1;
//get all numbers counted
let mut all_capitalized = true;
for char in word.chars() {
if char.is_numeric() {
self.numeric_count += 1;
//TODO are numbers capitalized or not? I don't know!
}
if !char.is_ascii_uppercase() {
all_capitalized = false;
}
}
if all_capitalized {
self.capitalized_count += 1;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
self.forbidden_count += 1;
}
}
}
}
}
#[tokio::main]
async fn main() {
let files = env::args().skip(1);
let mut stats = Stats::default();
let mut rx = {
let (tx, rx) = mpsc::unbounded_channel();
for file in files {
//reading files not sequentially average shaves 30ms (of 1250ms), and that's on a NVMe SSD so why not
let Ok(text) = fs::read_to_string(&file) else {
stats.failed_file_count += 1;
continue;
};
stats.file_count += 1;
let tx = tx.clone();
tokio::spawn(async move {
let mut stats = Stats::default();
stats.process(&text);
tx.send(stats).unwrap();
//reading files in threads doesn't change speed of any sort but oh well
if let Ok(text) = fs::read_to_string(&file) {
stats.file_count += 1;
parser::for_loops::parse(&mut stats, &text);
} else {
stats.failed_file_count += 1;
}
let _ = tx.send(stats);
});
}
rx
};
let mut stats = Stats::default();
while let Some(file_stat) = rx.recv().await {
stats += file_stat;
}
println!("{stats}");
}
/// needs ../books.tar.gz to be extracted into ../books
#[test]
fn test() {
use std::{env, fs, process::Command, time::Instant};
@ -199,44 +93,6 @@ fn test() {
Err(err) => eprintln!("compile failed: {err}"),
}
//get test files
let files = fs::read_dir("test_files")
.unwrap()
.map(|f| {
f.unwrap()
.path()
.canonicalize()
.unwrap()
.to_str()
.unwrap()
.to_string()
})
.collect::<Vec<_>>();
println!("test files found: {:#?}", files);
//benchmark run
let benchmark = Instant::now();
let mut run = Command::new("target/release/jisspam");
let run_arged = run.args(files);
match run_arged.output() {
Ok(output) => println!("{}", String::from_utf8_lossy(&output.stdout)),
Err(err) => eprintln!("run failed: {err}"),
}
println!("benchmark: {}ms", benchmark.elapsed().as_millis());
}
#[test]
fn books_test() {
use std::{env, fs, process::Command, time::Instant};
println!("cwd: {}", env::current_dir().unwrap().display());
//compile
let mut compile = Command::new("cargo");
let compile_arged = compile.arg("build").arg("--release");
match compile_arged.output() {
Ok(output) => println!("compiled {}", String::from_utf8_lossy(&output.stdout)),
Err(err) => eprintln!("compile failed: {err}"),
}
//get test files
let files = fs::read_dir("../books")
.unwrap()
@ -250,7 +106,9 @@ fn books_test() {
.to_string()
})
.collect::<Vec<_>>();
println!("test files found: {:#?}", files);
println!("test files found: {}", files.len());
println!();
//benchmark run
let benchmark = Instant::now();

View File

@ -0,0 +1,37 @@
use crate::{FORBIDDEN_WORDS, stats::Stats};
#[allow(dead_code)]
/// typically 5000ms
/// with trie this is 1600ms
pub fn parse(stats: &mut Stats, text: &str) {
for sentence in text
.split('.')
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
stats.sentence_count += 1;
for word in sentence
.split_ascii_whitespace()
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
stats.word_count += 1;
//get all numbers counted
let mut all_capitalized = true;
for char in word.chars() {
if char.is_numeric() {
stats.numeric_count += 1;
all_capitalized = false;
} else if !char.is_ascii_uppercase() {
all_capitalized = false;
}
}
if all_capitalized {
stats.capitalized_count += 1;
}
if FORBIDDEN_WORDS.contains(&word.to_lowercase()) {
stats.forbidden_count += 1;
}
}
}
}

View File

@ -0,0 +1,14 @@
use crate::{FORBIDDEN_WORDS, stats::Stats};
#[allow(dead_code)]
pub fn parse(stats: &mut Stats, text: &str) {
for word in text
.split_ascii_whitespace()
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
if FORBIDDEN_WORDS.contains(&word.to_lowercase()) {
stats.forbidden_count += 1;
}
}
}

View File

@ -0,0 +1,3 @@
pub mod for_loops;
pub mod for_loops_forbidden_only;
pub mod muncher;

View File

@ -0,0 +1,66 @@
use crate::{FORBIDDEN_WORDS, stats::Stats};
#[allow(dead_code)]
/// probably buggy. for example, are new lines sentences? what if the text has no last period?
/// 500ms is without forbidden words check, but...
/// 6000ms if adding forbidden words.. so not faster
/// with trie this is 2600ms
pub fn parse(stats: &mut Stats, text: &str) {
let mut capitalized = true;
let mut whitespaced = false;
let mut dotted = false;
let mut word = String::new();
for char in text.chars() {
if whitespaced {
if !char.is_whitespace() {
whitespaced = false; //end whiteness
}
continue;
} else if char.is_whitespace() {
whitespaced = true;
stats.word_count += 1; //end of word
if capitalized {
stats.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
stats.forbidden_count += 1;
}
word = String::new();
continue;
}
if dotted {
if char != '.' {
dotted = false; //end sentencing
}
continue;
} else if char == '.' {
dotted = true;
stats.sentence_count += 1;
stats.word_count += 1; //end of word
if capitalized {
stats.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
stats.forbidden_count += 1;
}
word = String::new();
continue;
}
word += &char.to_string();
if char.is_numeric() {
stats.numeric_count += 1;
capitalized = false;
}
if !char.is_ascii_uppercase() {
capitalized = false;
}
}
}

View File

@ -46,12 +46,12 @@ impl Display for Stats {
)?;
writeln!(
f,
"forbidden word percentage: {:.0}%",
"forbidden word percentage: {:.2}%",
(self.forbidden_count as f32 / word_count) * 100.0,
)?;
write!(
f,
"capitalized word percentage: {:.0}%",
"capitalized word percentage: {:.2}%",
(self.capitalized_count as f32 / word_count) * 100.0,
)
}

View File

@ -1 +0,0 @@
I am Nastya and downvoted this post because post is considered spam. Your message will be removed from this community site due too much downvotes. See my profile for more information. Read my source code mentioned on my profile to see what you did wrong. Should be no problem for a developer.

View File

@ -1,20 +0,0 @@
HIRE Century Web Recovery TO RECOVER YOUR LOST BITCOIN
If youve lost your Bitcoin to an online scam, hiring a professional recovery service can significantly improve your chances of getting your funds back. Century Web Recovery specializes in Bitcoin recovery, helping victims reclaim their stolen assets. Heres what you need to know:
Understanding the Recovery Process
The recovery process begins with contacting Century Web Recovery. Their team will guide you through the steps necessary to initiate an investigation into your case. Understanding the process is key to managing your expectations.
Documenting Your Case
To facilitate recovery, its essential to document all relevant information regarding the scam. This includes transaction records, wallet addresses, and any communications with the scammer. Century Web Recovery will help you gather this information to build a strong case.
Investigation and Tracking
Once you hire Century Web Recovery, their experts will begin investigating your case. They use sophisticated tools to track the stolen Bitcoin, identifying the paths taken by the scammers. This tracing is crucial for successful recovery.
Freezing Stolen Assets
Quick action is vital in recovering stolen Bitcoin.Century Web Recovery works directly with cryptocurrency exchanges to freeze any stolen assets, preventing the scammers from cashing out your funds. This collaboration is essential for a successful recovery.
Legal Support and Guidance
If necessary, Century Web Recovery can provide legal support. They will guide you on reporting the scam to law enforcement and assist in filing any legal claims. Their expertise in crypto-related cases ensures you receive the best advice on how to proceed.
If youve lost Bitcoin to an online scam, dont hesitate. Hire Century Web Recovery to recover your lost assets and regain your financial security.

View File

@ -1,3 +0,0 @@
Email; digital hack recovery @ techie . com
WhatsApp +19152151930
Website; https : // digital hack recovery . com

View File

@ -1,7 +0,0 @@
TESTED CRYPTOCURRENCY RECOVERY SERVICE \\ DIGITAL HACK RECOVERY
When the devastating reality of lost or stolen Bitcoin strikes, the path to recovery can seem bleak and hopeless. However, the story of DIGITAL HACK RECOVERY stands as a shining beacon of hope, demonstrating the power of perseverance in the face of seemingly insurmountable odds. This specialized service, founded by a team of tenacious experts, has made it their mission to reunite people with their rightful digital assets, no matter how complex or convoluted the situation may be. Through their unwavering dedication and meticulous investigative techniques, DIGITAL HACK RECOVERY has time and again succeeded in tracking down lost Bitcoin, navigating the labyrinthine world of blockchain technology and leveraging their deep understanding of crypto ecosystems. Their success stories are a testament to the resilience of the human spirit, as they've helped individuals regain access to life-changing sums of money that had been presumed lost forever. In an industry rife with uncertainty and risk, DIGITAL HACK RECOVERY has emerged as a trusted ally, guiding clients through the darkness with a steadfast commitment to recovery. By combining cutting-edge digital forensics, strategic partnerships, and a relentless determination to leave no stone unturned, this remarkable organization has earned the gratitude of countless individuals who had resigned themselves to the permanent disappearance of their hard-earned Bitcoin. In a world where the digital landscape can feel overwhelming and unpredictable, DIGITAL HACK RECOVERY stands as a shining example of what can be achieved through perseverance, expertise, and an unwavering belief in the possibility of redemption. I tried everything I could think of. I contacted support forums, tried password recovery tools, scoured Reddit for advice, and spent countless hours following step-by-step guides. Every lead I followed seemed to end in disappointment. I felt like I was chasing an illusion—getting closer, but never quite reaching it. With every attempt that failed, my hope dwindled further. It was an overwhelming feeling, knowing that I had lost something irreplaceable, something I had worked so hard for, and worse—something I had no way of recovering. Months passed, and I was ready to give up. I had accepted that my Bitcoin was gone, lost forever. But that feeling of helplessness lingered, gnawing at me in the back of my mind but DIGITAL HACK RECOVERY made the change of my life when I got the news of the recovery. Thank you very much. Contact them via contact details bellow
Email; digital hack recovery @ techie . com
WhatsApp +19152151930
Website; https : // digital hack recovery . com