Compare commits
	
		
			No commits in common. "746f6da5d5f56d57da5987e844e65d7471042ec9" and "711c3b4802ce5835c36680967e5ee6afc040d11c" have entirely different histories.
		
	
	
		
			746f6da5d5
			...
			711c3b4802
		
	
		
| @ -1,20 +0,0 @@ | |||||||
| name: pdf2text test |  | ||||||
| run-name: syntax check |  | ||||||
| on: [push] |  | ||||||
| 
 |  | ||||||
| jobs: |  | ||||||
|   Compile: |  | ||||||
|     runs-on: ubuntu-latest |  | ||||||
|     steps: |  | ||||||
|       - name: Check out repository code |  | ||||||
|         uses: actions/checkout@v4 |  | ||||||
|       - name: List files in the repository |  | ||||||
|         run: | |  | ||||||
|           ls ${{ gitea.workspace }} |  | ||||||
|       - run: echo "Install dependencies." |  | ||||||
|       - run: apt update |  | ||||||
|       - run: apt install python3 |  | ||||||
|       - run: python3 -m pip install -r requirements.txt |  | ||||||
|       - run: "Check if starts correcly. Syntax check." |  | ||||||
|       - run: ./pdf2text . |  | ||||||
|       - run: echo "This job's status is ${{ job.status }}." |  | ||||||
| @ -3,10 +3,10 @@ | |||||||
| I've converted 8gb of PDF's to text in one afternoon on a decade old x270 using this script. Performant enough imho. Try to get 8Gb in your LLM and getting it to actually use it. That's the challenge. | I've converted 8gb of PDF's to text in one afternoon on a decade old x270 using this script. Performant enough imho. Try to get 8Gb in your LLM and getting it to actually use it. That's the challenge. | ||||||
| 
 | 
 | ||||||
| ## Convert all PDF's to text | ## Convert all PDF's to text | ||||||
| This is an [script](/pdf2text) for converting a batch of PDF's to text for machine learning. | This is an script for converting a batch of PDF's to text for machine learning. | ||||||
| It only has two dependencies: | It only has two dependencies: | ||||||
|  - `python3`  |  - python3 | ||||||
|  - `pdf.miner` (python requirement, specified in [requirements.txt](/requirements.txt) file)  |  - pdf.miner (python requirement, specified in requirements.txt file)  | ||||||
| 
 | 
 | ||||||
| ## Installation | ## Installation | ||||||
| ```bash | ```bash | ||||||
| @ -22,6 +22,3 @@ source .venv/bin/activate | |||||||
| ./pdf2text [source/destination dir] | ./pdf2text [source/destination dir] | ||||||
| ``` | ``` | ||||||
| You read that correctly, the source directory is also the destination directory. | You read that correctly, the source directory is also the destination directory. | ||||||
| 
 |  | ||||||
| ## Todo: |  | ||||||
| Make decent python package so it's installable on system without having to load environment first. Not sure if worth it, it's not something you daily use. |  | ||||||
|  | |||||||
		Loading…
	
		Reference in New Issue
	
	Block a user