BookmarkSubscribeSubscribe to RSS Feed
scott_87

Community Trekker

Joined:

Feb 22, 2013

Hive SQL Query builder - Performance issues

Hi Guys,

I'm running JMP 13 and have used the cloudera hive ODBC driver to sucesfully connect the JMP query builder to our hadoop hive server. I am able to view the tables ok, but once i click on one of the tables it automatically sends a "select * from table" query to the hadoop cluster which completely hangs up my JMP due to the enormous size of the table its trying to pull without my consent. 

Is there a way to turn this off? 

I have attached a snapshot of my query builder preferences below:

 

Capture12343214.PNG

2 REPLIES
ih

Community Trekker

Joined:

Sep 30, 2016

Re: Hive SQL Query builder - Performance issues

@Eric_Hill gave me a solution here that might work for you.

aronwilson

Community Trekker

Joined:

Jan 27, 2017

Re: Hive SQL Query builder - Performance issues

I don't have a good solution other than to use custom SQL.

 

I've had to basically stop using the query builder interface because I'm accessing either vertica or hive and neither responds remotely well to a blanket "Select *" statement. 

 

My workaround: I've created a custom sql statement "select * from schema.table limit 0" which works very well for vertica to pull the columns of a table down into a jmp data table. Then I use that as a reference to build a custom sql query. It works fine to get basic data into jmp, but the simple inability to more efficiently query columns in query builder has rendered ALL of the great features it has completely useless and I'm about to give up and start building all the tables somewhere else and just opening them in jmp. Which is fine in theory except that makes updating or modifying the tables for new analysis a multi-tool process instead of just working in the jmp query builder (which works great when I'm working with our oracle exadata servers)

 

I actually tried a workaround accessing my vertica tables through our SAS server but I run into the same problem. So I'm going to try and build a SAS datastep program to sample the data and remove any of the blanket "select *" that SAS EG and SAS query builder in jmp loves to throws in everywhere. 

 

I'm just trying to take advantage of the nice stratified sampling capabilities and it's amazing how challenging that is when connecting between jmp or sas and vertica (or hive).

 

Aron