Monday, March 25, 2024

Installing Tabula on Debian/Ubuntu

Mahesh Palamuttath March 25, 2024

Are you tired of manually extracting data from PDFs? Tabula might just be the solution you need. Tabula is an open-source tool designed to extract tables from PDF documents. In this guide, we'll walk you through the process of setting up a Tabula server for seamless PDF data extraction.

Step 1: Download Tabula

First things first, you'll need to download the Tabula JAR file. Navigate to your preferred directory (for example, /opt) and use wget to download the Tabula JAR file:

cd /opt && wget https://github.com/tabulapdf/tabula/releases/download/v1.2.1/tabula-jar-1.2.1.zip

Step 2: Install Unzip

Once the download is complete, you'll need to install unzip to extract the contents of the ZIP file. You can do this using apt:

sudo apt install unzip

Step 3: Unzip Tabula

Now, unzip the downloaded Tabula ZIP file:

sudo unzip tabula-jar-1.2.1.zip

Step 4: Configure Firewall

To allow access to the Tabula server, you'll need to open the desired port (default is 8080) on your firewall. For instance, if you're using ufw, you can open port 8080 like so:

sudo ufw allow 8080/tcp

Step 5: Run Tabula Server

Now, you're all set to run the Tabula server. Use the following command:

java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula/tabula.jar

If you want to run the server in the background, you can use nohup:

nohup java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula/tabula.jar &

And that's it! Your Tabula server is now up and running, ready to extract tables from PDF documents effortlessly.

Access Tabula: http://127.0.0.1:8080 or your-server-ip:8080

Reference: https://tabula.technology/