Olives

Excel training program generator

If you're like me, most of your favorite training programs come in the form of PDFs. This makes them annoying to work with since you want to keep a record of your progress which is usually not possible within the PDF itself.

To solve this problem, I have written a little python script which utilises OpenAIs 20b parameter model, through Ollama, to convert a PDF training progam into a ready-to-use excel file.

The actual setup in order to run this program is a little bit tedious since you need Ollama to run locally (or you need to edit the code to connect to a cloud API) and you need to set up a virtual python enviornment for security and to keep dependencies from messing stuff up.

You can download the program itself from my GitHub link here , I'll explain how to do so in the section below.

Preparation

  1. You need to have python installed. Personally, I use version 3.14.3 but other version might work too.
  2. You need to have Ollama installed and running. We will need to access Ollama through its API in order to basically clean up the data (that is, the raw text from the PDF). You can read more about the Ollama API here.

    In general, I don't really like to use AI very much since I find it makes me very intelectually lazy but I'm not a coder so I really don't know any other way to process the data. Furthermore, Ollama runs locally on a lot of mid-range hardware so it's the best option for now.

    The model that we will be using is the gpt-oss:20b. In order to download this model, run the following command in your terminal:

    ollama pull gpt-oss:20b

    I've tried using other models to make the script run faster but I found that the quality suffered way too much.

    To start ollama, use

    ollama serve

  3. You need to download the program itself. You can either go to the GitHub page and manually download it or you can just use git. For git, navigate to your desired directory in your terminal and run

    git clone https://github.com/OliverU201/PDF2Excel_Training_Program

  4. Lastly, we need to set up the virtual environment (venv). For this, we open a terminal inside PDF2Excel_Training_Program. We then run the following command:

    python -m venv venv

    How you activate the venv depends on your OS so if you're not on Linux, just look up how to activate venv on your OS. Since I'm on Fedora, I use

    source venv/bin/activate

    To install the required dependencies, we use pip:

    pip install pdfplumber ollama openpyxl

Usage

Now comes the fun part. Place your PDF in the input folder inside PDF2Excel_Training_Program. Now, in the terminal where you activated the venv, run

python main.py

Enter the page in the PDF where the program actually starts (skipping any intro and such). Then enter the page where it finishes. Lastly, the program needs some word (or words) that separate the workouts.

This part can get tricky if you're unlucky. The reason that we can't just feed the AI the whole, non-separated text is because it will start producing incorrectly formatted workouts. Therefore, we need to find some way to separate all the individual workouts.

In my program, Every workout is titles something along the lines of WEEK 1 PULL so I can just split the PDF using "WEEK" (which is perfect since you wouldn't write week in all caps as part of any normal sentence). In another program I tried, workouts were titles MONDAY PULL, TUESDAY PUSH , etc. I therefore created another option to split workouts by weekdays. If neither of these options fits your PDF, I have the option of a custom phrase/phrases but this option has been pretty unreliable in my experience and it's certainly something I could improve in the future.

Time to run

Since the program relies on quite a large LLM, it will take some time for it to complete. Ideally, you would have a dedicated GPU with at least 16GB of VRAM. I'm fortunate enough to have snagged a 5070ti around MSRP, before all the prices skyrocketed. Using that, the program processes a workout in under 10 seconds meaning that it can produce the excel in its entirety in just a few minutes.