diff --git a/README.md b/README.md index b3e81ce..475eedd 100644 --- a/README.md +++ b/README.md @@ -3,10 +3,10 @@ I've converted 8gb of PDF's to text in one afternoon on a decade old x270 using this script. Performant enough imho. Try to get 8Gb in your LLM and getting it to actually use it. That's the challenge. ## Convert all PDF's to text -This is an script for converting a batch of PDF's to text for machine learning. +This is an [script](/pdf2text) for converting a batch of PDF's to text for machine learning. It only has two dependencies: - - python3 - - pdf.miner (python requirement, specified in requirements.txt file) + - `python3` + - `pdf.miner` (python requirement, specified in [requirements.txt](/requirements.txt) file) ## Installation ```bash @@ -22,3 +22,6 @@ source .venv/bin/activate ./pdf2text [source/destination dir] ``` You read that correctly, the source directory is also the destination directory. + +## Todo: +Make decent python package so it's installable on system without having to load environment first. Not sure if worth it, it's not something you daily use.