Are you tired of manually extracting data from PDFs? Tabula might just be the solution you need. Tabula is an open-source tool designed to extract tables from PDF documents. In this guide, we'll walk you through the process of setting up a Tabula server for seamless PDF data extraction.
Step 1: Download Tabula
First things first, you'll need to download the Tabula JAR file. Navigate to your preferred directory (for example, /opt) and use wget to download the Tabula JAR file:
cd /opt && wget https://github.com/tabulapdf/tabula/releases/download/v1.2.1/tabula-jar-1.2.1.zip
Step 2: Install Unzip
Once the download is complete, you'll need to install unzip to extract the contents of the ZIP file. You can do this using apt:
sudo apt install unzip
Step 3: Unzip Tabula
Now, unzip the downloaded Tabula ZIP file:
sudo unzip tabula-jar-1.2.1.zip
Step 4: Configure Firewall
To allow access to the Tabula server, you'll need to open the desired port (default is 8080) on your firewall. For instance, if you're using ufw, you can open port 8080 like so:
sudo ufw allow 8080/tcp
Step 5: Run Tabula Server
Now, you're all set to run the Tabula server. Use the following command:
java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula/tabula.jar
If you want to run the server in the background, you can use nohup:
nohup java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula/tabula.jar &
And that's it! Your Tabula server is now up and running, ready to extract tables from PDF documents effortlessly.
Access Tabula: http://127.0.0.1:8080 or your-server-ip:8080
Reference: https://tabula.technology/