Compare commits

..

24 Commits

Author SHA1 Message Date
JestDotty
4415dd26ae sequential file read slightly faster
Some checks failed
isspam build / build (push) Failing after 2m24s
2025-03-24 16:03:36 -04:00
JestDotty
f9115a9e40 clean up README
Some checks failed
isspam build / build (push) Failing after 2m7s
2025-03-24 00:48:04 -04:00
JestDotty
b711d5a908 fxhash and I remembered how to static lazy right
Some checks failed
isspam build / build (push) Failing after 2m33s
2025-03-24 00:20:15 -04:00
JestDotty
5b8dd08348 do or do not, there is always a trie organization
Some checks failed
isspam build / build (push) Failing after 2m23s
2025-03-23 23:58:34 -04:00
JestDotty
12f2494411 if a word has forbidden only count it once. uses for loops again whoops
Some checks failed
isspam build / build (push) Failing after 2m3s
2025-03-23 23:23:40 -04:00
JestDotty
854a1c3991 muncher benchmark and data
Some checks failed
isspam build / build (push) Failing after 2m23s
2025-03-23 23:14:30 -04:00
JestDotty
ab32a81d9c don't track
Some checks failed
isspam build / build (push) Failing after 2m5s
2025-03-23 22:31:14 -04:00
JestDotty
019970b2aa optimization benchmarks. switched to tokio
Some checks failed
isspam build / build (push) Has been cancelled
2025-03-23 22:29:54 -04:00
JestDotty
5eded442b7 sync. rayon and build options, README
Some checks failed
isspam build / build (push) Failing after 2m2s
2025-03-23 21:39:12 -04:00
e4354a7645 updates.
Some checks failed
isspam build / build (push) Failing after 2m12s
2025-03-24 02:31:31 +01:00
Jest Dotty
a41f819cdf flatten jest_rust
Some checks failed
isspam build / build (push) Failing after 2m8s
2025-03-23 20:23:38 -04:00
Jest Dotty
c2fae46865 jest_rust, build, bench
Some checks failed
isspam build / build (push) Failing after 2m31s
2025-03-23 20:03:09 -04:00
BordedDev
c1a97c01f5
Removed TBB from default makefile
Some checks failed
isspam build / build (push) Failing after 2m2s
2025-03-23 23:37:31 +01:00
BordedDev
4f873d02ef
Fixed horrible mistake ;P
Some checks failed
isspam build / build (push) Failing after 2m39s
2025-03-23 22:40:43 +01:00
BordedDev
75092cb738
Updated file io perf
Some checks failed
isspam build / build (push) Failing after 2m35s
2025-03-23 22:06:48 +01:00
BordedDev
a08c181872
Tweaked some variables, add main2.cpp which removes the struct
Some checks failed
isspam build / build (push) Failing after 2m7s
2025-03-23 03:36:29 +01:00
fe0357b5d5 Added py
Some checks failed
isspam build / build (push) Failing after 2m9s
2025-03-23 03:25:51 +01:00
BordedDev
611ef8d124
Fixed issues with utf8
Some checks failed
isspam build / build (push) Failing after 1m58s
2025-03-23 02:13:38 +01:00
2f2710df41 Added python version.
Some checks failed
isspam build / build (push) Failing after 2m27s
2025-03-22 23:55:48 +01:00
BordedDev
fbb1db28d7
Updated Dockerfile to use gcc
Some checks failed
isspam build / build (push) Failing after 1m58s
Added borded cpp to bench
Made borded part of make all
2025-03-20 23:32:24 +01:00
BordedDev
b24b1b933c
Added borded cpp to make file
Some checks failed
isspam build / build (push) Failing after 2m5s
2025-03-20 23:21:33 +01:00
77926649ba Docker setup.
Some checks failed
isspam build / build (push) Failing after 2m0s
2025-03-20 22:52:53 +01:00
ce4997317a Retoorii fixes.
Some checks failed
isspam build / build (push) Failing after 2m4s
2025-03-20 22:23:41 +01:00
BordedDev
1fb6481f2b
Setup v1 of borded spam parser
Some checks failed
isspam build / build (push) Failing after 2m20s
2025-03-20 21:44:22 +01:00
13 changed files with 332 additions and 170 deletions

View File

@ -1,3 +1,2 @@
/target
/Cargo.lock
/test_books

View File

@ -9,7 +9,6 @@ tokio = { version = "1.44.1", features = ["full"] }
[profile.release]
codegen-units = 1 # less means more compile work but better optimized
lto = "fat" # thin has best performance. fat the worst
lto = "thin" # thin has best performance. fat the worst
strip = true
# opt-level = "z" # slows down
panic = "abort"

View File

@ -1,35 +1,5 @@
for https://retoor.molodetz.nl/retoor/isspam
extract `../books.tar.gz`
# local machine benchmarks
single threaded: `33.63373279571533`
rayon: `4.294418811798096`
tokio: `4.717588901519775`
tokio:
muncher: `2486ms`
for_loops: `1227ms`
for_loops_forbidden_only: `987ms`
trie creation and stats accumulation take 0ms
FxHashMap faster than BTreeMap
## compile options benchmarks
`lto` thin, fat doesn't change much
`codegen-units` 0, 1 doesn't change much
`opt-level = "z"` slow things down
# ubuntu terminal running
https://snek.molodetz.nl/terminal.html ubuntu running thing instructions:
```
mkdir /project
@ -81,4 +51,145 @@ Time Retoor Python: 287.63036131858826
***end benchmark***
```
add `/jisspam` to `.gitignore` to not commit the executable accidentally
add `/jisspam` to `.gitignore` to not commit the executable accidentally
# local machine benchmarks
single threaded:
```
***benchmarking***
Time C: 2.4082751274108887
Time Rust: 2.865687847137451
Time CPP: 1.1568822860717773
Time Borded CPP: 1.9657189846038818
Time Jest Rust: 33.63373279571533
Time Retoor Python: 133.92413425445557
***end benchmark***
```
rayon:
```
***benchmarking***
Time C: 2.457853317260742
Time Rust: 3.0170154571533203
Time CPP: 1.1482579708099365
Time Borded CPP: 2.002591371536255
Time Jest Rust: 4.294418811798096
Time Retoor Python: 201.2997748851776
***end benchmark***
```
tokio:
```
***benchmarking***
Time C: 2.448648452758789
Time Rust: 3.095592737197876
Time CPP: 1.1662013530731201
Time Borded CPP: 1.9207634925842285
Time Jest Rust: 4.717588901519775
Time Retoor Python: 139.8203284740448
***end benchmark***
```
## compile options benchmarks
lto not thin: `Time Jest Rust: 5.306957483291626` slower
lto fat: `Time Jest Rust: 5.413678407669067` slower
codegen-units 1: `Time Jest Rust: 4.451631546020508` faster
opt-level z: `Time Jest Rust: 7.045313119888306` slower
strip true: `Time Jest Rust: 4.337219476699829` faster
lto true: `Time Jest Rust: 4.703521728515625` slower
lto none: `Time Jest Rust: 4.817203998565674`
lto thin: `Time Jest Rust: 4.429729223251343` faster
# data integrity
(this isn't tested, just guessed, and I don't have data to compare it with)
for loops:
```
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 1237059
words per sentence average: 14.6
forbidden word percentage: 2%
capitalized word percentage: 2%
benchmark: 5033ms
```
muncher:
```
file count: 904
failed file count: 0
sentence count: 5338705
word count: 86765116
capitalized count: 13640820
numeric count: 10902254
forbidden count: 0
words per sentence average: 16.3
forbidden word percentage: 0%
capitalized word percentage: 16%
benchmark: 504ms
```
with forbidden words:
```
file count: 904
failed file count: 0
sentence count: 5338705
word count: 86765116
capitalized count: 13640820
numeric count: 10902254
forbidden count: 279717
words per sentence average: 16.3
forbidden word percentage: 0%
capitalized word percentage: 16%
benchmark: 6078ms
```
# forbidden words benchmarks
seems they take up about 4000ms to churn through in the original version
for loops count forbidden word once only:
```
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 1143234
words per sentence average: 14.6
forbidden word percentage: 1%
capitalized word percentage: 2%
benchmark: 4737ms
```
for loops with trie:
```
file count: 904
failed file count: 0
sentence count: 5602301
word count: 81701260
capitalized count: 1753639
numeric count: 14981248
forbidden count: 176528
words per sentence average: 14.6
forbidden word percentage: 0%
capitalized word percentage: 2%
benchmark: 1588ms
```
muncher with trie is 2600ms
for loops with fxhash trie: 1200ms

View File

@ -1,4 +1,3 @@
mod parser;
mod stats;
mod trie;
@ -51,35 +50,142 @@ static FORBIDDEN_WORDS: LazyLock<Trie> = LazyLock::new(|| {
trie
});
impl Stats {
pub fn process(&mut self, text: &str) {
// self.muncher(&text);
self.for_loops(&text);
}
#[allow(dead_code)]
/// probably buggy. for example, are new lines sentences? what if the text has no last period?
/// 500ms is without forbidden words check, but...
/// 6000ms if adding forbidden words.. so not faster
/// with trie this is 2600ms
fn muncher(&mut self, text: &str) {
let mut capitalized = true;
let mut whitespaced = false;
let mut dotted = false;
let mut word = String::new();
for char in text.chars() {
if whitespaced {
if !char.is_whitespace() {
whitespaced = false; //end whiteness
}
continue;
} else if char.is_whitespace() {
whitespaced = true;
self.word_count += 1; //end of word
if capitalized {
self.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
self.forbidden_count += 1;
}
word = String::new();
continue;
}
if dotted {
if char != '.' {
dotted = false; //end sentencing
}
continue;
} else if char == '.' {
dotted = true;
self.sentence_count += 1;
self.word_count += 1; //end of word
if capitalized {
self.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
self.forbidden_count += 1;
}
word = String::new();
continue;
}
word += &char.to_string();
if char.is_numeric() {
self.numeric_count += 1;
capitalized = false;
}
if !char.is_ascii_uppercase() {
capitalized = false;
}
}
}
#[allow(dead_code)]
/// typically 5000ms
/// with trie this is 1600ms
fn for_loops(&mut self, text: &str) {
for sentence in text
.split('.')
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
self.sentence_count += 1;
for word in sentence
.split_whitespace()
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
self.word_count += 1;
//get all numbers counted
let mut all_capitalized = true;
for char in word.chars() {
if char.is_numeric() {
self.numeric_count += 1;
//TODO are numbers capitalized or not? I don't know!
}
if !char.is_ascii_uppercase() {
all_capitalized = false;
}
}
if all_capitalized {
self.capitalized_count += 1;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
self.forbidden_count += 1;
}
}
}
}
}
#[tokio::main]
async fn main() {
let files = env::args().skip(1);
let mut stats = Stats::default();
let mut rx = {
let (tx, rx) = mpsc::unbounded_channel();
for file in files {
//reading files not sequentially average shaves 30ms (of 1250ms), and that's on a NVMe SSD so why not
let Ok(text) = fs::read_to_string(&file) else {
stats.failed_file_count += 1;
continue;
};
stats.file_count += 1;
let tx = tx.clone();
tokio::spawn(async move {
let mut stats = Stats::default();
//reading files in threads doesn't change speed of any sort but oh well
if let Ok(text) = fs::read_to_string(&file) {
stats.file_count += 1;
parser::for_loops::parse(&mut stats, &text);
} else {
stats.failed_file_count += 1;
}
let _ = tx.send(stats);
stats.process(&text);
tx.send(stats).unwrap();
});
}
rx
};
let mut stats = Stats::default();
while let Some(file_stat) = rx.recv().await {
stats += file_stat;
}
println!("{stats}");
}
/// needs ../books.tar.gz to be extracted into ../books
#[test]
fn test() {
use std::{env, fs, process::Command, time::Instant};
@ -93,6 +199,44 @@ fn test() {
Err(err) => eprintln!("compile failed: {err}"),
}
//get test files
let files = fs::read_dir("test_files")
.unwrap()
.map(|f| {
f.unwrap()
.path()
.canonicalize()
.unwrap()
.to_str()
.unwrap()
.to_string()
})
.collect::<Vec<_>>();
println!("test files found: {:#?}", files);
//benchmark run
let benchmark = Instant::now();
let mut run = Command::new("target/release/jisspam");
let run_arged = run.args(files);
match run_arged.output() {
Ok(output) => println!("{}", String::from_utf8_lossy(&output.stdout)),
Err(err) => eprintln!("run failed: {err}"),
}
println!("benchmark: {}ms", benchmark.elapsed().as_millis());
}
#[test]
fn books_test() {
use std::{env, fs, process::Command, time::Instant};
println!("cwd: {}", env::current_dir().unwrap().display());
//compile
let mut compile = Command::new("cargo");
let compile_arged = compile.arg("build").arg("--release");
match compile_arged.output() {
Ok(output) => println!("compiled {}", String::from_utf8_lossy(&output.stdout)),
Err(err) => eprintln!("compile failed: {err}"),
}
//get test files
let files = fs::read_dir("../books")
.unwrap()
@ -106,9 +250,7 @@ fn test() {
.to_string()
})
.collect::<Vec<_>>();
println!("test files found: {}", files.len());
println!();
println!("test files found: {:#?}", files);
//benchmark run
let benchmark = Instant::now();

View File

@ -1,37 +0,0 @@
use crate::{FORBIDDEN_WORDS, stats::Stats};
#[allow(dead_code)]
/// typically 5000ms
/// with trie this is 1600ms
pub fn parse(stats: &mut Stats, text: &str) {
for sentence in text
.split('.')
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
stats.sentence_count += 1;
for word in sentence
.split_ascii_whitespace()
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
stats.word_count += 1;
//get all numbers counted
let mut all_capitalized = true;
for char in word.chars() {
if char.is_numeric() {
stats.numeric_count += 1;
all_capitalized = false;
} else if !char.is_ascii_uppercase() {
all_capitalized = false;
}
}
if all_capitalized {
stats.capitalized_count += 1;
}
if FORBIDDEN_WORDS.contains(&word.to_lowercase()) {
stats.forbidden_count += 1;
}
}
}
}

View File

@ -1,14 +0,0 @@
use crate::{FORBIDDEN_WORDS, stats::Stats};
#[allow(dead_code)]
pub fn parse(stats: &mut Stats, text: &str) {
for word in text
.split_ascii_whitespace()
.map(|s| s.trim())
.filter(|s| !s.is_empty())
{
if FORBIDDEN_WORDS.contains(&word.to_lowercase()) {
stats.forbidden_count += 1;
}
}
}

View File

@ -1,3 +0,0 @@
pub mod for_loops;
pub mod for_loops_forbidden_only;
pub mod muncher;

View File

@ -1,66 +0,0 @@
use crate::{FORBIDDEN_WORDS, stats::Stats};
#[allow(dead_code)]
/// probably buggy. for example, are new lines sentences? what if the text has no last period?
/// 500ms is without forbidden words check, but...
/// 6000ms if adding forbidden words.. so not faster
/// with trie this is 2600ms
pub fn parse(stats: &mut Stats, text: &str) {
let mut capitalized = true;
let mut whitespaced = false;
let mut dotted = false;
let mut word = String::new();
for char in text.chars() {
if whitespaced {
if !char.is_whitespace() {
whitespaced = false; //end whiteness
}
continue;
} else if char.is_whitespace() {
whitespaced = true;
stats.word_count += 1; //end of word
if capitalized {
stats.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
stats.forbidden_count += 1;
}
word = String::new();
continue;
}
if dotted {
if char != '.' {
dotted = false; //end sentencing
}
continue;
} else if char == '.' {
dotted = true;
stats.sentence_count += 1;
stats.word_count += 1; //end of word
if capitalized {
stats.capitalized_count += 1;
} else {
//reset capitalized word
capitalized = true;
}
let lowercase_word = word.to_lowercase();
if FORBIDDEN_WORDS.contains(&lowercase_word) {
stats.forbidden_count += 1;
}
word = String::new();
continue;
}
word += &char.to_string();
if char.is_numeric() {
stats.numeric_count += 1;
capitalized = false;
}
if !char.is_ascii_uppercase() {
capitalized = false;
}
}
}

View File

@ -46,12 +46,12 @@ impl Display for Stats {
)?;
writeln!(
f,
"forbidden word percentage: {:.2}%",
"forbidden word percentage: {:.0}%",
(self.forbidden_count as f32 / word_count) * 100.0,
)?;
write!(
f,
"capitalized word percentage: {:.2}%",
"capitalized word percentage: {:.0}%",
(self.capitalized_count as f32 / word_count) * 100.0,
)
}

View File

@ -0,0 +1 @@
I am Nastya and downvoted this post because post is considered spam. Your message will be removed from this community site due too much downvotes. See my profile for more information. Read my source code mentioned on my profile to see what you did wrong. Should be no problem for a developer.

View File

@ -0,0 +1,20 @@
HIRE Century Web Recovery TO RECOVER YOUR LOST BITCOIN
If youve lost your Bitcoin to an online scam, hiring a professional recovery service can significantly improve your chances of getting your funds back. Century Web Recovery specializes in Bitcoin recovery, helping victims reclaim their stolen assets. Heres what you need to know:
Understanding the Recovery Process
The recovery process begins with contacting Century Web Recovery. Their team will guide you through the steps necessary to initiate an investigation into your case. Understanding the process is key to managing your expectations.
Documenting Your Case
To facilitate recovery, its essential to document all relevant information regarding the scam. This includes transaction records, wallet addresses, and any communications with the scammer. Century Web Recovery will help you gather this information to build a strong case.
Investigation and Tracking
Once you hire Century Web Recovery, their experts will begin investigating your case. They use sophisticated tools to track the stolen Bitcoin, identifying the paths taken by the scammers. This tracing is crucial for successful recovery.
Freezing Stolen Assets
Quick action is vital in recovering stolen Bitcoin.Century Web Recovery works directly with cryptocurrency exchanges to freeze any stolen assets, preventing the scammers from cashing out your funds. This collaboration is essential for a successful recovery.
Legal Support and Guidance
If necessary, Century Web Recovery can provide legal support. They will guide you on reporting the scam to law enforcement and assist in filing any legal claims. Their expertise in crypto-related cases ensures you receive the best advice on how to proceed.
If youve lost Bitcoin to an online scam, dont hesitate. Hire Century Web Recovery to recover your lost assets and regain your financial security.

View File

@ -0,0 +1,3 @@
Email; digital hack recovery @ techie . com
WhatsApp +19152151930
Website; https : // digital hack recovery . com

View File

@ -0,0 +1,7 @@
TESTED CRYPTOCURRENCY RECOVERY SERVICE \\ DIGITAL HACK RECOVERY
When the devastating reality of lost or stolen Bitcoin strikes, the path to recovery can seem bleak and hopeless. However, the story of DIGITAL HACK RECOVERY stands as a shining beacon of hope, demonstrating the power of perseverance in the face of seemingly insurmountable odds. This specialized service, founded by a team of tenacious experts, has made it their mission to reunite people with their rightful digital assets, no matter how complex or convoluted the situation may be. Through their unwavering dedication and meticulous investigative techniques, DIGITAL HACK RECOVERY has time and again succeeded in tracking down lost Bitcoin, navigating the labyrinthine world of blockchain technology and leveraging their deep understanding of crypto ecosystems. Their success stories are a testament to the resilience of the human spirit, as they've helped individuals regain access to life-changing sums of money that had been presumed lost forever. In an industry rife with uncertainty and risk, DIGITAL HACK RECOVERY has emerged as a trusted ally, guiding clients through the darkness with a steadfast commitment to recovery. By combining cutting-edge digital forensics, strategic partnerships, and a relentless determination to leave no stone unturned, this remarkable organization has earned the gratitude of countless individuals who had resigned themselves to the permanent disappearance of their hard-earned Bitcoin. In a world where the digital landscape can feel overwhelming and unpredictable, DIGITAL HACK RECOVERY stands as a shining example of what can be achieved through perseverance, expertise, and an unwavering belief in the possibility of redemption. I tried everything I could think of. I contacted support forums, tried password recovery tools, scoured Reddit for advice, and spent countless hours following step-by-step guides. Every lead I followed seemed to end in disappointment. I felt like I was chasing an illusion—getting closer, but never quite reaching it. With every attempt that failed, my hope dwindled further. It was an overwhelming feeling, knowing that I had lost something irreplaceable, something I had worked so hard for, and worse—something I had no way of recovering. Months passed, and I was ready to give up. I had accepted that my Bitcoin was gone, lost forever. But that feeling of helplessness lingered, gnawing at me in the back of my mind but DIGITAL HACK RECOVERY made the change of my life when I got the news of the recovery. Thank you very much. Contact them via contact details bellow
Email; digital hack recovery @ techie . com
WhatsApp +19152151930
Website; https : // digital hack recovery . com