Added workflow

Provided links
2024-11-22 20:45:58 +01:00 · 2024-11-22 20:41:31 +01:00
2 changed files with 26 additions and 3 deletions
--- a/.gitea/workflows/test.yaml
+++ b/.gitea/workflows/test.yaml
@ -0,0 +1,20 @@
+name: pdf2text test
+run-name: syntax check
+on: [push]
+
+jobs:
+  Compile:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository code
+        uses: actions/checkout@v4
+      - name: List files in the repository
+        run: |
+          ls ${{ gitea.workspace }}
+      - run: echo "Install dependencies."
+      - run: apt update
+      - run: apt install python3
+      - run: python3 -m pip install -r requirements.txt
+      - run: "Check if starts correcly. Syntax check."
+      - run: ./pdf2text .
+      - run: echo "This job's status is ${{ job.status }}."
--- a/README.md
+++ b/README.md
@ -3,10 +3,10 @@
 I've converted 8gb of PDF's to text in one afternoon on a decade old x270 using this script. Performant enough imho. Try to get 8Gb in your LLM and getting it to actually use it. That's the challenge.

 ## Convert all PDF's to text
-This is an script for converting a batch of PDF's to text for machine learning.
+This is an [script](/pdf2text) for converting a batch of PDF's to text for machine learning.
 It only has two dependencies:
- - python3
- - pdf.miner (python requirement, specified in requirements.txt file) 
+ - `python3` 
+ - `pdf.miner` (python requirement, specified in [requirements.txt](/requirements.txt) file) 

 ## Installation
 ```bash
@ -22,3 +22,6 @@ source .venv/bin/activate
 ./pdf2text [source/destination dir]
 ```
 You read that correctly, the source directory is also the destination directory.
+
+## Todo:
+Make decent python package so it's installable on system without having to load environment first. Not sure if worth it, it's not something you daily use.