As we know the following command will connect HiveServerHost at port 10000 with HiverServer2:
> library(RHive)
> rhive.connect("HiveServerHost", 10000, hiveServer2)
If we give no arguments for rhive.connect, it will connect the Hive server with port number 10000 on local host:
> rhive.connect()
Then you can do query works on the Hive server by rhive.query(...), however, you need more power by using RHive, such as access HDFS and write a table onto it.
If you want to access a HDFS, you need to connect the name node of the HDFS first:
> rhive.hdfs.connect("hdfs://namenode:8020")
If above command doesn't work or not supported, add a dot(.) before the command, like:
> .rhive.hdfs.connect("hdfs://namenode:8020")
Next, we can try to create a 3-column table named "test" in Hive and save it in HDFS by using R:
> L3 <- LETTERS[1:3]
> d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE))
> rhive.write.table(data=d, tableName="test")
[1] "test"
And show the content in table test:
> rhive.query("select * from test");
x y fac
1 1 1 B
2 1 2 A
3 1 3 A
4 1 4 B
5 1 5 B
6 1 6 A
7 1 7 C
8 1 8 A
9 1 9 B
10 1 10 A
> rhive.hdfs.ls("/user/hive/warehouse/test/")
Which means the content of the table has been saved to HDFS.
There are two good slides that you can refer on how to use RHive basic functions and functions related to HDFS:
RHive Basic Functions Tutorials
RHive HDFS Functions Tutorials
No comments:
Post a Comment