Provided links
This commit is contained in:
parent
711c3b4802
commit
994d5495b2
@ -3,10 +3,10 @@
|
||||
I've converted 8gb of PDF's to text in one afternoon on a decade old x270 using this script. Performant enough imho. Try to get 8Gb in your LLM and getting it to actually use it. That's the challenge.
|
||||
|
||||
## Convert all PDF's to text
|
||||
This is an script for converting a batch of PDF's to text for machine learning.
|
||||
This is an [script](/pdf2text) for converting a batch of PDF's to text for machine learning.
|
||||
It only has two dependencies:
|
||||
- python3
|
||||
- pdf.miner (python requirement, specified in requirements.txt file)
|
||||
- `python3`
|
||||
- `pdf.miner` (python requirement, specified in [requirements.txt](/requirements.txt) file)
|
||||
|
||||
## Installation
|
||||
```bash
|
||||
@ -22,3 +22,6 @@ source .venv/bin/activate
|
||||
./pdf2text [source/destination dir]
|
||||
```
|
||||
You read that correctly, the source directory is also the destination directory.
|
||||
|
||||
## Todo:
|
||||
Make decent python package so it's installable on system without having to load environment first. Not sure if worth it, it's not something you daily use.
|
||||
|
Loading…
Reference in New Issue
Block a user