Re: Question on the table references (from Russel)

martindemel · Feb 16, 2022 11:21 AM

| Posted in reply to message from martindemel 16-02-2022

Question on the table references you showed today: If you process multiple tables sequentially as part of a script, do you iterate the table references ad infinitum (dt1, dt2, dt3,...) or is there a smarter solution?

Background: I frequently manipulate tables (join, concatenate, split, etc.) and sometimes have quite a few tables that I sequentially open, process and close before moving on to the next table in the same script until I have the final table I want.

/****NeverStopLearning****/

martindemel · Feb 16, 2022 11:22 AM

Hi @Ressel ,

it really depends what you want to finally do. If you have let's say 100 data tables you want to process in the same way and be closed afterwards, may a loop (to open, process, close each table) would do the job better than opening all 100 tables and then processing it one after the other. One reason is that you do not create that many windows handles which might provoke another issue on the system side.

In general I would be a bit more descriptive in the table variable name, so I better understand what it is, if that makes sense. "dt" is for the dirty hack mainly

You could also get all open data tables and then loop through them using

Get Data Table List();

This probably would be a smart move.

How do others feel about this?

/****NeverStopLearning****/

DonMcCormack · Feb 16, 2022 12:51 PM

I agree with @martindemel, it depends on where the files exist, the order you process them, and what you want to do with them. JSL has many built in functions outside of the Enhanced Log that can help. For example, Files in Directory gives you a list of all files in a directory. Pick File lets you interactively select one or more files from a directory. While the Enhanced Log does a good job capturing interactive operations, you still want to replace the hard coded table name references with variable references to the tables. @martindemel showed some of this using dt = Open(... In most cases a JSL operation that opens a file will return a handle to the data table that gets created. Data table handles also get created when you message a table with an operation from the Tables menu. For example, if you're creating a subset the syntax

subsetDT = dt << Subset(...

will let you capture the reference to the newly created table. The other table operations work in a similar fashion.

Ressel · Feb 16, 2022 01:11 PM

Thanks for the replies. A little more on the background of what I am working with.

I load the tables from our Google Cloud Platform implementation via ODBC. This tables are regularly updated with recent manufacturing data. For example, the first table I load (I reference it as dt) has batch IDs, parameter names for different QC parameters, measurement results and units of measurement in separate columns. I split it by QC parameter name (and reference the output as dt2, closing dt). Then, for example, I load another table that cross references batch IDs with manufacturing locations (which I reference as dt3). After that I may join dt2 with dt3, creating dt4 and so on and so forth, until I have a table that has all the information I think I need before I go ahead with whatever analysis. In some scripts I am processing close to 10 tables, all referenced as dt(n). This is what my current skill level permits.

I have a medium term plan to use such a script for control charting. At the moment I only visually assess trends and send screenshots or html exported charts to different stake holders.

What I take away from this conversation is to use more descriptive table references. These are also what are referred to as handles?

jthi · Feb 16, 2022 02:14 PM

Descriptive handle/reference names will help you (or someone else who has to modify your code) at some point. This also makes debug printing easier, when you can use Show(dt_batchid_loc); for example to see the created tables name.

Also it might be useful to use Output table("name of table") when creating tables. This will help you if you (when you) have to debug the code. If you don't do this (I tend to skip this and it comes to haunt me from time to time) it is much easier to to see which table is open from name like "BatchID_with_Loc" when compared to "Untitled 3041 summary by QC, Loc" or something.

-Jarmo

Georg · Feb 16, 2022 05:23 PM

There is also the possibility to do the table manipulation in the database (query definition by e.g. Query Builder or in native SQL). This is advantageous for large data sets, as you avoid traffic and put the heavy load to the server. Then you also need to handle less tables in JMP.

Georg

Ressel · Feb 24, 2022 07:56 AM

I have tried using the query builder before. It was incredibly slow and kept crashing constantly. So I contacted SAS technical support, but they could not resolve the issue. As a result I stuck to handling individual tables in a script, which; given the dismal performance of the query builder; is much more convenient and efficient. It might be that this is due to the tables I'm loading having some 100 000+ rows - that was at least tech support's conclusion. Alas, I have practically nowhere else to turn to than these "massive" tables.

Also, I think I need to urgently familiarize with debugging. While I manage to "wrangle" a few useful lines of code, I am not considering architecture, aesthetics or; as a matter of fact; how to build a script that can be debugged. Unfortunately it is as of yet a foreign concept to me.

Georg · Feb 24, 2022 05:29 PM

Did you try to change the settings in Preferences for Query Builder? That may change a lot, by default a lot of data is transferred for each change (all levels on database, previews etc).

Additionally, once you made the query via GUI, you can convert it into custom SQL to create plain SQL. That then will be processed at the database only and is much faster.

My best known method is always doing the heavy stuff on database, but this always depends on the environment. There are other tools that let you define the query at database, that you can use in JMP (e.g. SQL Query Builder, depending on your database vendor).

Also you could ask your IT (e.g. database administrator) to help you with certain queries that are needed regularly, and import the plain SQL into JMP.

Getting the data from database in the right format is crucial, and it's worth to look at the full process finally.

Georg

Ressel · Mar 1, 2022 05:40 AM

Hi, yes, a lot of settings were changed although I can't exactly remember which ones. I mean, this was in the hands of SAS Tech Support, our IT support and mine. A substantial amount of time was used trying to fix this. Little did it help. (And I have on recently started to become mildly data literate.)

Also, I have no issue with getting the data by loading it through a script and manipulating the table, although it truly appears inefficient, given that I each time load 100 000s of rows, delete a majority and repeat a similar step with the next table. I imagine the query builder would probably do the same job, but on a different interface? At the moment I have access to the data I need, but; depending on workload; I may look into importing plain SQL into JMP. (Didn't know it was possible.)

The link below contains information on creating SQL for use with/in JMP?

Thanks!

Georg · Mar 1, 2022 06:48 AM

Hi @Ressel , it does not contain information on creating/writing the SQL, but more on deploying queries in JMP by means of an SQL-query or different ways. So my usual way is that I use a different tool to write that plain SQL (like "select name from big_class where age > 15"), and simply deploy that script in JMP. But you will see several examples how this may look.

Georg

Georg · Feb 25, 2022 02:45 AM

BTW, perhaps you want to have a look at this presentation, it's all about getting data from database with JMP:

https://community.jmp.com/t5/Discovery-Summit-Europe-2021/Strategies-and-Examples-for-Data-Acquisiti...

Georg

Ressel · Mar 26, 2022 01:38 PM

@Georg thank you!

Question on the table references (from Russel)

This question came up in the thread about "Re: 2022Q1 Practical Session - Beginners" - and I'd like to have others joining the discussion as well, so I put it to the Discussion forum.