Added regex.
This commit is contained in:
parent
4532cc4637
commit
cd9422355a
99
regex.md
Normal file
99
regex.md
Normal file
@ -0,0 +1,99 @@
|
||||
# Regex
|
||||
|
||||
## Obsession
|
||||
|
||||
If you looked good around my repositories, you've probably seen that I have a special thing for regex interpreters.
|
||||
I love writing them. It's the most underestimated skill there is—to write one from scratch.
|
||||
|
||||
Yes, you can follow some basic tutorial on the internet and learn how to do it the way everyone does.
|
||||
But the real game? It's writing something you can't find anywhere else.
|
||||
|
||||
And I've done that. Several times.
|
||||
|
||||
Compiled, bytecode, even used regex itself as bytecode—that one was very special.
|
||||
Nice interpreters, fast interpreters, winning, losing... But the end product is not the interpreter.
|
||||
It's your own brain.
|
||||
|
||||
## Why Do It?
|
||||
|
||||
Thinking and problem solving is actually one of the best things there is.
|
||||
And with problem solving, I do **not** mean solving it using Google or some book.
|
||||
Pure thinking. With good understanding of language basics, you're able to write a regex interpreter.
|
||||
It takes a serious—do not underestimate—amount of time.
|
||||
|
||||
But more than being hardcore at the basics (yes, that's a thing), you don’t need.
|
||||
The beautiful thing is, once you get into it, you can keep going on without having to Google or read a book.
|
||||
It's all in your head.
|
||||
|
||||
## The Trap of Research
|
||||
|
||||
The most fun is when you haven’t researched regex or interpreters beforehand.
|
||||
It makes you **extra creative** and lets your brain think freely.
|
||||
|
||||
Solutions from others can be inspiring... but they can also *pollute* your thought process.
|
||||
You can get stuck in someone else's way of thinking and end up building the same thing they did.
|
||||
|
||||
For me, the target is not to create a regex engine that beats everyone else's.
|
||||
That comes with many factors. In certain scenarios, I've even beaten the original glibc regex.
|
||||
Cool? Sure. But not the point.
|
||||
|
||||
The goal is: **write something decent and unique**.
|
||||
Own design. No influence from others. That's it.
|
||||
|
||||
## Questions Worth Asking
|
||||
|
||||
Do you know what an AST is?
|
||||
Will you use one? Or will you just interpret the regex directly?
|
||||
|
||||
The easiest way must be the fastest, right?
|
||||
Actually... no.
|
||||
|
||||
I've benchmarked interpreters a lot, and performance really depends on the regexes themselves.
|
||||
There's no one-size-fits-all solution.
|
||||
|
||||
An advanced byte-compiled one with JIT will always be slower on the first pass than a dumb interpreter that just walks character by character.
|
||||
But after parsing several lines? That JIT version takes the lead.
|
||||
|
||||
## Performance Myths
|
||||
|
||||
Validating strings is actually such a small task for a computer.
|
||||
When it comes to performance, for most users, **it doesn’t matter** which parser you pick.
|
||||
|
||||
That’s probably why everyone just uses the one bundled with their favorite programming language.
|
||||
|
||||
But I had a parser that could parse an entire book.
|
||||
We can’t say that for everyone—looking at you, glibc regex interpreter.
|
||||
That one dies at around 10MB of content, if I remember correctly. Something like that.
|
||||
So yeah, even things like that can be a target.
|
||||
|
||||
## Wild Ideas
|
||||
|
||||
What also could be fun?
|
||||
Using a parser that validates while walking a file descriptor.
|
||||
|
||||
By doing that, you can parse files of unlimited size—or even live network streams.
|
||||
James Bond stuff. Real-time regex over TCP. Tapping into streaming data.
|
||||
|
||||
And now we’re getting close to my next hobby: **protocol design**.
|
||||
But that’s a story for another time.
|
||||
|
||||
---
|
||||
|
||||
I don’t even expect people to read this far.
|
||||
|
||||
---
|
||||
|
||||
## Code you should never read
|
||||
|
||||
At least, not until you’re ready.
|
||||
|
||||
I'm talking about a basic regex interpreter in C and it is written in around 30 lines.
|
||||
I've read it in a book called Beautiful Code. The source was written by Brian Kernighan.
|
||||
|
||||
I'm not posting the source because it probably would destroy your
|
||||
creaitivty. It’s easy to find if you want to.
|
||||
|
||||
Once you’ve seen it, you can’t unsee it.
|
||||
|
||||
What he built? That’s the level I aim for.
|
||||
|
Loading…
Reference in New Issue
Block a user