Subscribe Bookmark RSS Feed

SQL to matrix

ron_horne

Super User

Joined:

Jun 23, 2011

Hi all,

Is it possible to extract data from a database directly into a matrix without importing it first to a data table?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Well, yes and no.  You can import it directly into JSL as a matrix using:


matrix = Execute SQL(dsn, sql_string, table_name, invisible) << Get as matrix;

However, the execute SQL statement still creates a datatable. If "private" was an option then it would be possible since JMP automatically clears private datatables from your computers memory if they aren't stored somewhere. Alas, invisible is the best they give us. 

You can use the below script to close the invisible table.  In all practicality it'll be just as fast since executing the SQL statement will always be the bottle neck.

matrix = (dt = Execute SQL(dsn, sql_string, table_name, invisible)) << Get as matrix;

close(dt);

11 REPLIES
Solution

Well, yes and no.  You can import it directly into JSL as a matrix using:


matrix = Execute SQL(dsn, sql_string, table_name, invisible) << Get as matrix;

However, the execute SQL statement still creates a datatable. If "private" was an option then it would be possible since JMP automatically clears private datatables from your computers memory if they aren't stored somewhere. Alas, invisible is the best they give us. 

You can use the below script to close the invisible table.  In all practicality it'll be just as fast since executing the SQL statement will always be the bottle neck.

matrix = (dt = Execute SQL(dsn, sql_string, table_name, invisible)) << Get as matrix;

close(dt);

ron_horne

Super User

Joined:

Jun 23, 2011

thanks msharp,

this makes it clearer to me that i didn't miss anything.

The thing is that the data source is big and i only need to get a list of ID's (one column). After that I extract the data (many columns) for each ID and analyze it separately. opening the whole data set is just not feasible.

ron

msharp

Super User

Joined:

Jul 28, 2015

Sounds like you could optimize your SQL statement.

Something like:

With IDtemp as (Select IDcol from IDtable Where IDcol = 'FilterCriteria')

Select * From IDtable

Inner Join IDtemp


This will first create a temporary IDtemp table that SQL uses to Join with the larger IDtable.  The inner join greatly reduces the size of the tables SQL has to work with.  Of course, there may be simpler ways to make it faster.  Just all depends on how the data is stored and how large the tables are.


I would focus on optimizing your SQL statements over optimizing how JMP interacts and pulls the data.

ron_horne

Super User

Joined:

Jun 23, 2011

Thanks msharp,

Perhaps I should have shared my script before.

Any recommendation are welcome.

Usermatrix = (dt100 = Open Database(

       "DBQ=D:\MY DOCUMENTS;DefaultDir=D:\MY DOCUMENTS;Driver={Microsoft Text Driver (*.txt; *.csv)};DriverId=27;FIL=text;FILEDSN=D:\My Documents\SQLLog Data.dsn;MaxBufferSize=2048;MaxScanRows=8;PageTimeout=5;SafeTransactions=0;Threads=3;UID=admin;UserCommitSync=Yes;",

       // the next line will extract only the UserId from the dataset to reduce memory usage.

       "SELECT UserId FROM dataset.csv",

       "dt10 - Userid list",

       invisible

       // private // this option does not work!!!

)) << get as matrix;

Close( dt100, NoSave );

// identify the users and their ID number

Userlist = Associative Array( Usermatrix );

UniqueId = userlist << get keys;

//

// this is just a statement in a string

sqlstr = "SELECT * FROM dataset.csv where id = ^i^";

// loop through the dataset and extract individual data one at a time.

For ( i=1  , i<= N Items( uniqueId ) , i++,

      

       // extract a personal log for each user using the statement.

       dt10i = Open Database(

       "DBQ=D:\MY DOCUMENTS;DefaultDir=D:\MY DOCUMENTS;Driver={Microsoft Text Driver (*.txt; *.csv)};DriverId=27;FIL=text;FILEDSN=D:\My Documents\SQLLog Data.dsn;MaxBufferSize=2048;MaxScanRows=8;PageTimeout=5;SafeTransactions=0;Threads=3;UID=admin;UserCommitSync=Yes;",

       Eval Insert( sqlstr ),

       "Log Data User " || Char( i ),

      

       );

// preform the analysis ...//

close (dt10i, no save);

);

msharp

Super User

Joined:

Jul 28, 2015

A couple things.

First, change your first query to "SELECT DISTINCT UserId From dataset.csv";

This does two things, makes your query faster and also since you are pulling it as a matrix Usermatrix will now = UniqueID cutting several lines of code. This removes the need to store potentially very large lists and matrix's that aren't used (Userlist/UserMatrix).  Just but sure to change nitems to nrows in the for loop.

Second, there's no need to establish a database connection everytime in your for loop.  Use dbconnection = create database connection(dsn); where dsn is your long query string above.  Then change Open Database(...) to Execute SQL(dbconnection, sqlstr, "Log Data User " || Char(i), invisible). 

A couple take aways from this, you will save time from connecting to the database everytime (which should save alot of time).  You don't need to eval insert() the sqlstr, not a big deal but one less step. Lastly, and pretty important, is that opening dt10i invisibly will save a lot of memory space helping your code run faster.  Especially if you manipulate the data table at all in the "perform analysis" portion.  I see you end up closing it later on anyways.  To make a script run faster it's important to remember that any data table that you plan to close should be opened in either invisible or private mode.

Be sure to have a Close Database Connection(dbconnection); at the end of your code.

ron_horne

Super User

Joined:

Jun 23, 2011

Thank you msharp,

This is very helpful!

My SQL statements were just crude scripts I copied from the manual opening of the tables.

I think I managed to get most of it except for the Eval Insert. How do I go about without it?

This is my current script.

dbconnection = Create Database Connection(

      "DBQ=D:\MY DOCUMENTS\;DefaultDir=D:\MY DOCUMENTS\;Driver={Microsoft Text Driver (*.txt; *.csv)};DriverId=27;FIL=text;FILEDSN=D:\My Documents\SQLLog Data.dsn;MaxBufferSize=2048;MaxScanRows=8;PageTimeout=5;SafeTransactions=0;Threads=3;UID=admin;UserCommitSync=Yes;"

);

// Extract userid numbers out in a matrix:

Usermatrix = (dt100 = Execute SQL( dbconnection, "SELECT  DISTINCT UserId FROM data.csv", "dt10 - Userid list", invisible )) << get as matrix;

Close( dt100, NoSave );

sqlstr = "SELECT * FROM data.csv where UserID = ^id^";

For( i = 1, i <= N Rows( usermatrix ), i++,

     

      id = Usermatrix[i];

     

      dt10i = Execute SQL( dbconnection,

      Eval Insert( sqlstr ),

      "User Data Log   " || Char( i ) || " - " || Char( id ), invisible );

     

     

            //  analyze the data table   //

     

     

      Close( dt10i, no save );

     

);

Close Database Connection( dbconnection );

pmroz

Super User

Joined:

Jun 23, 2011

You do need evalinsert because you're evaluating i each time through the loop.

msharp

Super User

Joined:

Jul 28, 2015

Yeah, you will need eval insert, I wasn't thinking.

ron_horne

Super User

Joined:

Jun 23, 2011

thank you very much,

now my script is running much faster and has less issues.

ron