As I was exploring the Hadoop eco-system, I came across this tool called ‘Sqoop’. It is termed as ‘Sqoop’ to denote ‘SQL to Hadoop’ functionality. Sqoop is a wonderful tool to transport data from Hadoop ecosystem such as HDFS/Hive to Relational databases and vice-versa.
I just wanted to blog about it mentioning its basic import and export functionality. I shall try to show an example in which a table is imported to HDFS from a relational database and export it back on to the database. To demonstrate this, the following are used:
Relational database – MySQL
Linux distro – Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Hadoop ecosystem – Cloudera CDH3u3
Linux distro – Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Hadoop ecosystem – Cloudera CDH3u3
IMPORT DATA FROM A RELATIONAL TABLE TO HDFS
Firstly, let us create a table on MySQL database.
1. Create a database.
mysql> create database sqoop_tutorial;
Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
2. Create a table.
mysql> use sqoop_tutorial;
Database changed
Database changed
mysql> CREATE TABLE employee (id INT, name VARCHAR(20));
Query OK, 0 rows affected (0.12 sec)
Query OK, 0 rows affected (0.12 sec)
mysql> show tables;
+————————–+
| Tables_in_sqoop_tutorial |
+————————–+
| employee |
+————————–+
1 row in set (0.00 sec)
+————————–+
| Tables_in_sqoop_tutorial |
+————————–+
| employee |
+————————–+
1 row in set (0.00 sec)
3. Insert values into the table.
mysql> INSERT INTO employee(id,name) VALUES(1,”employee1″);
Query OK, 1 row affected (0.04 sec)
Query OK, 1 row affected (0.04 sec)
mysql> INSERT INTO employee(id,name) VALUES(2,”employee2″);
Query OK, 1 row affected (0.02 sec)
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO employee(id,name) VALUES(3,”employee3″);
Query OK, 1 row affected (0.01 sec)
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO employee(id,name) VALUES(4,”employee4″);
Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO employee(id,name) VALUES(5,”employee5″);
Query OK, 1 row affected (0.03 sec)
Query OK, 1 row affected (0.03 sec)
mysql> select * from employee;
+——+———–+
| id | name |
+——+———–+
| 1 | employee1 |
| 2 | employee2 |
| 3 | employee3 |
| 4 | employee4 |
| 5 | employee5 |
+——+———–+
5 rows in set (0.00 sec)
+——+———–+
| id | name |
+——+———–+
| 1 | employee1 |
| 2 | employee2 |
| 3 | employee3 |
| 4 | employee4 |
| 5 | employee5 |
+——+———–+
5 rows in set (0.00 sec)
4. Now let us import this table ‘employee’ on to HDFS.
We need to run the following command.
bash [~]$ sqoop import –connect jdbc:mysql://localhost/<database> –table <table_name> –username <username> -P –target-dir <HDFS_Location> -m 1
bash [~]$ sqoop import –connect jdbc:mysql://localhost/<database> –table <table_name> –username <username> -P –target-dir <HDFS_Location> -m 1
Our command would be as follows:
bash [~]$ sqoop import –connect jdbc:mysql://localhost/sqoop_tutorial –table employee –username devbld -P –target-dir /usr/devbld/employee -m 1
bash [~]$ sqoop import –connect jdbc:mysql://localhost/sqoop_tutorial –table employee –username devbld -P –target-dir /usr/devbld/employee -m 1
After we run this, we may face the following error.
Unable to load com.mysql.jdbc.Driver
To workaround this, download the jar from http://www.mysql.com/downloads/connector/j/ and copy it to your ‘lib’ of sqoop (say, /usr/lib/sqoop/lib/).
Run the command again, this should solve the problem.
That creates a file on HDFS with data from the table. We can see the content of the tables in the file.
bash [~]$ hadoop fs -ls /usr/devbld/employee
Found 3 items
-rw-r–r– 1 devbld supergroup 0 2012-08-18 13:23 /usr/devbld/employee/_SUCCESS
drwxr-xr-x – devbld supergroup 0 2012-08-18 13:23 /usr/devbld/employee/_logs
-rw-r–r– 1 devbld supergroup 60 2012-08-18 13:23 /usr/devbld/employee/part-m-00000
Found 3 items
-rw-r–r– 1 devbld supergroup 0 2012-08-18 13:23 /usr/devbld/employee/_SUCCESS
drwxr-xr-x – devbld supergroup 0 2012-08-18 13:23 /usr/devbld/employee/_logs
-rw-r–r– 1 devbld supergroup 60 2012-08-18 13:23 /usr/devbld/employee/part-m-00000
bash [~]$ hadoop fs -cat /usr/devbld/employee/part-m-00000
1,employee1
2,employee2
3,employee3
4,employee4
5,employee5
1,employee1
2,employee2
3,employee3
4,employee4
5,employee5
EXPORT DATA FROM HDFS TO RELATIONAL TABLE
1. Let us create a table on the database into which we want our data to be exported. Let us name it ‘employee_exported’.
mysql> CREATE TABLE employee_exported (id INT, name VARCHAR(20));
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
2. Run the following command to export the data.
bash [~]$ sqoop export –connect jdbc:mysql://localhost/sqoop_tutorial –table employee_exported –export-dir /usr/devbld/employee –username devbld -P -m 1
3. Now, we see the contents of the table to find the data exported from HDFS.
mysql> select * from employee_exported;
+——+———–+
| id | name |
+——+———–+
| 1 | employee1 |
| 2 | employee2 |
| 3 | employee3 |
| 4 | employee4 |
| 5 | employee5 |
+——+———–+
5 rows in set (0.00 sec)
+——+———–+
| id | name |
+——+———–+
| 1 | employee1 |
| 2 | employee2 |
| 3 | employee3 |
| 4 | employee4 |
| 5 | employee5 |
+——+———–+
5 rows in set (0.00 sec)
References:
No comments:
Post a Comment