lorex/README.md at 3d9c4aa00be1bfa0ecf660cfdd55c0e01337a428

 # loreg
 retoor <retoor@molodetz.nl>
 A high-performance regular expression interpreter implemented from scratch in plain C. The engine uses Thompson's NFA construction algorithm for efficient pattern matching.
 ## CI
 The project includes Gitea Actions CI that runs on every push and pull request:
 - Build verification (release and debug)
 - Full test suite (569 tests)
 - Valgrind memory leak detection
 - Code coverage generation
 ## Features
 - Full regex syntax support: literals, metacharacters, quantifiers, character classes, groups, alternation, anchors
 - NFA-based matching engine with Thompson construction
 - Capturing groups with match position tracking
 - Interactive REPL for testing patterns
 - Zero external dependencies
 - Comprehensive test suite with 569 tests
 - Memory-safe implementation verified with Valgrind
 ## Building
 ```sh
 make            # optimized release build
 make debug      # debug build with symbols
 make test       # run all tests
 make coverage   # generate coverage report
 make profile    # generate profiling report
 make valgrind   # run under valgrind
 ```
 ## Usage
 ### Command Line
 ```sh
 ./loreg "pattern" "text"           # search for pattern in text
 ./loreg -m "pattern" "text"        # full match mode
 ./loreg -i                         # start REPL
 ./loreg                            # start REPL (default)
 ```
 ### REPL Commands
 ```
 :p <pattern>  compile and set pattern
 :m <text>     match text (anchored)
 :s <text>     search for pattern in text
 <text>        search (default)
 :h            help
 :q            quit
 ```
 ### C API
 ```c
 #include "loreg.h"
 loreg_error_t err;
 loreg_regex_t *re = loreg_compile("\\d{3}-\\d{4}", &err);
 if (!re) {
     fprintf(stderr, "error: %s\n", loreg_error_string(err));
     return 1;
 }
 loreg_match_t result;
 if (loreg_search(re, "call 555-1234 now", &result)) {
     printf("match at [%zu-%zu]\n", result.match_start, result.match_end);
 }
 loreg_free(re);
 ```
 ## Supported Syntax
 | Pattern | Description |
 |---------|-------------|
 | `.` | any character except newline |
 | `*` | zero or more |
 | `+` | one or more |
 | `?` | zero or one |
 | `\|` | alternation |
 | `()` | grouping and capture |
 | `[]` | character class |
 | `[^]` | negated character class |
 | `[a-z]` | character range |
 | `^` | start anchor |
 | `$` | end anchor |
 | `{n}` | exactly n |
 | `{n,}` | n or more |
 | `{n,m}` | n to m |
 | `\d` | digit [0-9] |
 | `\w` | word [a-zA-Z0-9_] |
 | `\s` | whitespace |
 | `\D` | non-digit |
 | `\W` | non-word |
 | `\S` | non-whitespace |
 | `*?` `+?` `??` | non-greedy quantifiers |
 ## Architecture
 ```
 src/
 ├── lexer.c     tokenizer for regex patterns
 ├── parser.c    recursive descent parser producing AST
 ├── ast.c       abstract syntax tree node types
 ├── nfa.c       Thompson NFA construction
 ├── matcher.c   NFA simulation with epsilon closure
 ├── loreg.c     public API
 ├── repl.c      interactive REPL
 └── main.c      CLI entry point
 include/
 ├── loreg.h     public header
 ├── lexer.h     lexer interface
 ├── parser.h    parser interface
 ├── ast.h       AST types
 ├── nfa.h       NFA types
 ├── matcher.h   matcher interface
 └── repl.h      REPL interface
 tests/
 ├── test_lexer.c       lexer unit tests (10 tests)
 ├── test_parser.c      parser unit tests (20 tests)
 ├── test_nfa.c         NFA construction tests (14 tests)
 ├── test_matcher.c     matching tests (27 tests)
 ├── test_all.c         comprehensive tests (9 tests)
 └── test_integration.c integration tests (489 tests)
 ```
 ## Test Suite
 The test suite contains 569 tests covering:
 | Category | Description |
 |----------|-------------|
 | Lexer | Tokenization of patterns |
 | Parser | AST construction and error handling |
 | NFA | State machine construction |
 | Matcher | Pattern matching correctness |
 | Integration | Real-world regex patterns |
 Integration tests cover:
 - Literal matching and concatenation
 - Dot metacharacter and wildcards
 - Start/end anchors
 - All quantifiers (*, +, ?, {n,m})
 - Alternation and grouping
 - Character classes and ranges
 - Negated character classes
 - Escape sequences
 - Email, IP, URL, phone patterns
 - Greedy vs non-greedy matching
 - Nested groups and complex nesting
 - Edge cases and boundary conditions
 - Pathological/stress patterns
 Run tests with Valgrind verification:
 ```sh
 make test           # run all 569 tests
 make valgrind       # verify zero memory leaks
 ```
 ## Algorithm
 The implementation uses Thompson's construction to convert regex patterns to NFAs:
 . **Lexer**: Tokenizes the pattern into a stream of tokens
 . **Parser**: Builds an AST using recursive descent parsing
 . **NFA Construction**: Converts AST to NFA using Thompson's algorithm
 . **Matching**: Simulates NFA with epsilon closure for linear-time matching
 Time complexity: O(n*m) where n is pattern length and m is text length.
 ## License
 MIT