Reading from a MySQL table

In Sparkflows, we can read data from relational stores.


Here we read the ‘employees’ table from MySQL into a DataFrame in Sparkflows and display the first 10 rows.

Installing MySQL


We first download and install MySQL from

Installing MySQL JDBC driver

  • Download the MySQL JDBC driver from
  • Extract the JDBC driver JAR file from the downloaded file. For example:
    • tar zxvf mysql-connector-java-5.1.42.tar.gz
  • Copy the mysql JDBC driver JAR file to the user-lib directory of fire-1.4.0
    • ​cd fire-1.4.0    (if you are not already in that directory)
    • cp ...../mysql-connector-java-5.1.42/mysql-connector-java.jar      user-lib

Load Test Data into MySQL


We load test data from :


git clone

/usr/local/mysql/bin/mysql -uroot -p < employees.sql


This creates the ‘employees’ databases with a few tables in it.

Check out the employees DB created


mysql> show databases;


| Database           |


| information_schema |

| employees          |

| mysql              |

| performance_schema |

| sys                |


5 rows in set (0.00 sec)


The following tables have been created in the employees database.


mysql> use employees

Database changed

mysql> show tables;


| Tables_in_employees  |


| current_dept_emp     |

| departments          |

| dept_emp             |

| dept_emp_latest_date |

| dept_manager         |

| employees            |

| salaries             |

| titles               |


8 rows in set (0.00 sec)


The employees table has 300,024 records


mysql> select count(*) from employees;


| count(*) |


|   300024 |


1 row in set (0.04 sec)


Read in the employees table into a Sparkflows workflow

We create the workflow below. It reads in the data from the ‘employees’ table and displays the first 10 rows.

The first node is the JDBC node. We provide the URL, DB Table and Driver.

On interactively executing the node ‘PrintNRows’, we see the below output.

Executing the workflow


When we execute the workflow, we get the below results.

Specifying a sub-query


In the configuration of the JDBC node, for ‘dbtable’ anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.


More details are available on the Spark Guide :

Above we have specified a subquery which select only the ‘first_name’ from the employees table.









Contact Us

© 2020 Sparkflows, Inc. All rights reserved. 


  • Facebook
  • LinkedIn
  • Twitter
  • angellist

Terms and Conditions