Hello everybody.
My name is Peter Mroz.
I'm with Janssen Pharmaceutical,
and today I'm going to talk to you
about how combining JMP and Oracle can lead to a happy marriage.
I work for Janssen R&D.
We're a wholly- owned subsidiary of Johnson and Johnson,
and our charter is to discover and develop
innovative medicines and solutions that transform individuals lives
and solve the most important unmet medical needs of our time.
Within that world,
I'm in the Global Medical Safety Department,
and our charter is to protect patients by driving
robust medical safety excellence and benefit -risk assessment.
And then within that department,
I'm in a group called Methods and Analysis,
and our aim is to develop and implement analytic tools
to increase efficiency and analytical capability
to detect and evaluate safety signals.
Again, my name is Peter Mroz and I've been a JMP user since 2007.
Standard Disclaimer.
These are my views
and do not imply any endorsement of any product by Janssen or J &J.
Here's our agenda for today.
I'm going to give an introduction
and then we'll jump right into Oracle things.
We'll talk about ODBC, configuring the Oracle client,
the ODBC connection string, bringing data from Oracle into JMP,
and then writing data back to Oracle from JMP,
then fast data insertion, faster data loading,
then executing an Oracle PL /SQL procedures,
hiding the passwords, error trapping,
building IN lists,
and sprinkled throughout, I've got some miscellaneous tips.
So my department is called Global Medical Safety,
and we collect, process, and report and analyze adverse event data
for the medicinal products that we produce.
They are mostly spontaneous cases,
although there are some clinical trial cases.
These are called post -marketing.
They're a man or woman in the street, walking down the street,
and you experience some sort of drug side effect,
and you call it into our call center
and we run it through our process and store it in Oracle.
Our volume is about 5,000 cases a day, and a case consists of a person...
They're not a patient, they're not in a clinical trial.
It's a person, the drugs they took,
the events they experienced, the side effects they experienced,
maybe some medical history, and we store something called a narrative.
And here's an example, patient narrative.
S ubject had cancer,
which was diagnosed in June 1998, et cetera, et cetera.
It's a lengthy story about the patient and the side effects,
and this is very important for our surveillance physicians
and other scientists to look at.
This is stored as a [inaudible 00:02:52] in Oracle, by the way.
JMP, as we all know,
is great at statistical analysis and visualization.
Oracle is great at data storage and transactional processing.
Our users want to analyze and visualize data from Oracle using JMP.
Primarily, we look at tabular reports of safety data,
summary information, patient narrative drill downs,
and we do some visualization of safety data
via trending or forest plots.
With all that,
JMP and Oracle together make a happy marriage.
We'll start by talking about ODBC,
which stands for Open Database Connectivity.
ODBC drivers access the database using SQL.
SQL stands for Structured Query language.
So the ODBC driver allows a JMP client software
to communicate with the Oracle database.
The first thing you have to do is install the Oracle client on the PC.
This is an exercise left to the reader.
It's not a tutorial on installing this, so you can Google it.
However, once you've installed it,
there are a couple of things you need to supply for the Oracle client.
You need to define two environment variables.
One is ORACLE_ HOME, the other is TNS _ADMIN.
So ORACLE_HOME points to a folder where the client is actually installed.
Here, it's in C, Oracle , 19 , client _1.
And then you want to include the bin directory
in the path environment variable.
So in this case, it's the ORACLE_HOME with the slash bin.
The TNS_ADMIN points to the location of TNSNAMES. ORA,
—I'll explain what that is in a second—
and that's typically located
in the network admin path underneath the ORACLE_HOME .
Here's a hint;
you can point this TNS_ADMIN variable to a file share location
so multiple users can all point to this file
and it's easier to maintain one version of TNSNAMES.ORA.
What exactly is TNSNAMES.ORA?
It's a configuration file.
It's like a secret decoder ring,
it translates between a database alias and information needed
by the Oracle client to talk to your database.
Here's my example; my alias is MYDEVDB,
and then here's my description
for how to connect MYDEVDB to my Oracle database.
With that completed,
now we need to determine an ODBC connection string,
and the easiest way to do this is to click the Windows button and type ODBC,
and we want to match the hatch.
For 64 -bit JMP, we want the 64-bit ODBC data sources.
For 32 -bit JMP, we want the 32- bit ODBC data sources.
I have 64-bit JMP,
so I clicked that and I bring up this screen here,
and I click on the drivers tab,
and here are the drivers for ODBC that are installed on my system.
I have three Oracle clients and one SQL Server.
I have the Oracle client version 11, 12, and 19,
so I'll click on this Oracle version 19 driver,
and you want to make note of this; Oracle in Ora Client19 Home 1.
That's all you need to know.
Now we have our ODBC connection string.
We combine that like so with driver equals that string,
DBQ equals our database alias.
UID equals your username, PWD equals your password.
So here's a fully -formed ODBC connection string.
Driver equals my Oracle 19 driver. DBQ is MYDEVDB,
username is MYUSERNAME, password is MYPASSWORD.
Okay, now that we're all configured, we can bring data into JMP.
There are several ways to get Oracle data into JMP.
Ther e's Open Database, Execute SQL , New SQL Query, and Query Builder,
which is under the File Database menu.
This talk will focus exclusively on Execute SQL
because you can create a database connection
and then you can execute
several SQL commands with Execute SQL and then close the database connection.
If you compare that to Open Database, Open Database in one call,
opens the connection,
runs a SQL command, closes the connection.
So if you have 20 SQL statements,
you're opening and closing the connection 20 times,
whereas with Execute SQL, you only open it once,
execute your 20 commands with 20 execute SQL commands,
and then close the database connection, so it speeds things up.
In the scripting index, this is what Execute SQL looks like.
It takes the following arguments;
there's a database connection handle which is here,
defined by your connection string.
There's either a SELECT statement or a SQLFILE equals statement,
or a SQL FILE equals a pointer to a file containing your SQL commands.
An invisible keyword, if you supply a table Name,
that's equivalent to saying SELECT star from table Name,
and then an output Table Name provides the name for the JMP data set
and Execute SQL returns, pointer to a table
if you're issuing a SELECT statement , which returns a data set.
Here's an example;
I have my connection string, driver equals Oracle in Ora Client19 Home1,
MYDEVDB, my username, my password.
I'm calling it create database connection with this string.
I've got my SQL statement here,
I'm selecting some columns from a table called eba_sales _salesreps,
and I'm passing my connection, my SQL statement,
and then a title for the table to execute SQL,
then I'm closing the database connection.
Here's my table I rendered from Oracle.
That's seven columns, 20 rows.
Here's the first tip,
and that is if you have a string with a single quote inside it,
in order to use it with Oracle,
you have to replace that single quote with two single quotes like this.
Here, my SQL statement is SELECT star from my _table m,
where m name equals O'Malley,
and since this is inside the string, I have to replace it with two quotes.
Here's the second tip;
use column aliases for readability.
These are my column aliases, so here's my column,
and then in double quotes, I've got an alias.
You notice it's mixed case,
there are spaces in there, it makes it more readable.
Here's my table with the more readable column headers.
The other thing about this is,
I'm using backslash open square bracket, close square bracket backslash
to avoid the need to escape my double quotes.
You notice I've got double quote here,
then I've got my backslash open square bracket,
and then I've got double quotes,
and then I close it out here so it looks a lot cleaner.
What if you want to write data back to Oracle?
You can issue an UPDATE statement or an INSERT statement.
Here is my UPDATE statement, and I'm simply defining that,
passing to a variable, and then passing to Execute SQL.
I'm updating the table,
setting the last name to Smith where the first name is Sweed.
Or here I'm inserting a new record into this table
and I'm setting the value of these fields to the values shown here.
One thing to note about UPDATE and INSERT
is Execute SQL does an implicit COMMIT for these commands,
so you don't need to do a COMMIT yourself.
One thing about INSERT,
if you have multiple INSERT statements to execute,
they can be slow,
so I found an alternative which is INSERT ALL.
Here's an example where I'm inserting into this table.
The column is called sample _number
and I'm inserting ten values all at once with one statement.
The only weird thing is,
you have to put something like SELECT 1 FROM DUAL
at the end of it and then it works.
Let's see that in action.
I want to insert 100 rows one by one and compare it to 100 rows all at once.
I have a little example here, let's go ahead and run it.
So it took 8 seconds to do one at a time versus 0.15 for INSERT ALL,
and so it was 52 times faster.
Let's look at the code a little bit.
We have making a connection, truncating a table.
Here's my one at a time.
I have my Execute SQL inside my loop,
and by the way, I'm looping a hundred times.
Here, Execute SQL is inside the loop, and down here for INSERT ALL,
I'm starting with INSERT ALL
and I keep adding into the table, fields, values.
Keep adding that,
and then I only run one SQL command.
And if I look at this command, you can see it's pretty hefty.
Here's my SQL statement.
It's very long, but it took 0 .15 seconds to run.
INSERT ALL into TEST IMPORT,
field names, values, into, into, into.
Okay.
Let's go back to slide mode.
What if you have
more than a thousand rows to insert into your database?
What if you have 10,000 rows or 50,000 rows?
You can use a tool from Oracle called SQL Loader for faster data loading.
SQL Loader requires a data file which can be comma separated,
tab -delimited, fixed format, and a control file.
The control file describes
the structure of the data file and the target table,
and we're going to add another layer on this
because we're going to do all this from within JSL.
I'm going to create a command file
which runs SQL Loader in a command window.
It also generates the control and command files using JSL,
and I'll use run program to execute the file.
Here's my file.
It's very exciting,
it's six columns, four rows and it's tab delimited.
Here I'm showing the reveal code, so this is my tab character,
and here's my example.
I've got setting some variables,
and here I'm creating my control file
and I'm using eval insert to make these variables,
—surrounded by the little carets— convert to their values up here.
Import _file name will be test_import.txt, dest_table will be TEST_IMPORT.
Fields are terminated by tab, actually enclosed by double quotes.
Here's my six fields,
and I'm adding a couple of other fields, date _loaded and username _loaded.
Date _loaded will be the system date.
The username _loaded will be the account name
of the person running it.
I'm saving this file out to the directory
and I'm creating a command file to run SQL Loader.
Setting my drive to the C drive, seeding into this directory,
and then here's my command;
SQL Loader user ID equals my credentials at my database name,
and here's my control file, my log file.
And if you notice,
I had to add these backslash exclamation mark capital N.
These are hard returns.
For some reason, it didn't work without these in there,
so I had to add those in for the command file to work.
I'm saving it,
and then I'm running it here with run program.
Then I'm checking the results.
If it does not contain row successfully loaded in the output,
then I display an error message and display the output from run SQL load.
If it was successful,
then I load the log file in and display that.
Here's my log file.
I've got about four rows successfully loaded in 1.5 seconds.
I want to do a demo of six columns with 30,000 records,
and let's run that one.
I'm loading into the same table, I'm truncating the table,
and I'm running SQL Loader, load the data file.
It's the same control file.
When it's all done,
it's going to display the output from the log file.
Here's the output.
I've got 30,000 rows successfully loaded.
No rows were not loaded due to data errors and it took 16 seconds.
That was 30,000 rows, and we can look at that data.
Here I am in a tool called PL/SQL Developer.
SELECT star from the table and here's my values,
here's my date loaded, which is today,
my username,
and it's going to select all the rows.
I'll let that run.
Okay, so here's a look at the data file, and here's the log or previous run.
It took 25 seconds.
All right, moving on.
Now, what if you want to execute an Oracle PL /SQL procedure?
PL /SQL stands for Procedural Language Extensions to SQL,
and it's a sort of a 3GL, 4GL language Oracle uses
to do functions, procedures and the like.
If you have a procedure,
you simply surround it with begin and end and then pass it to Execute SQL.
Here is BEGIN, that's my schema name.
This is a package called package util, and then inside there,
there's a procedure called send email with an argument success ,
and then I add the END at the end.
So when you do this,
it runs it and control will return to JMP when the procedure is done,
so we'll wait.
Okay, let's talk about some security things.
When you pull data from Oracle,
by default, there's a source property in the data set,
and that will show you
the username and password and connection string,
and you might not want to show that to all your users.
If you run this command,
it will hide the connection string in the data set that's returned.
I go a couple of steps beyond that.
I create a connection in an encrypted JSL function.
This function contains a database name, the username, and the password,
and it returns a database connection
and default local ensures that function variables are not visible.
Let's have a look at that.
Here's a little function called my _dbc,
and this is the unencrypted version.
Here's my default local.
I check environment variables.
Here's my connection string.
This is the one that you don't want people to see.
Here's my driver, my database, my username , my password.
I create a database connection with that string
and then just return my database connection.
Go to encrypt the script,
you click on edit, encrypt script, enter a decrypt password.
—I don't use run passwords—
and then click yes here.
Here's my encrypted script,
and then I'm going to save it as my_dbc .jsl.
To use it, I include that script,
that encrypted script in my JSL code, and that defines that function for me.
Then I call my _dbc to get a database connection,
SELECT star from this table
execute SQL, close database connection,
so here's my table.
And if you look at DBC in the log,
all it shows is database and then your Oracle client driver.
Many times when you run SQL commands in JMP,
you run into errors, and it's not very easy to debug this,
so I wrote a function called log_execute_ sql,
which executes SQL commands and traps any ODBC errors found in the log.
If it finds errors, it displays a warning message to the user
along with the SQL, and along with the error.
If you set a global variable to one, it displays a SQL before executing it.
This has become very handy for developing and debugging SQL.
The function uses log capture to inspect the log for errors.
This is the syntax for log capture, string equals log capture expression,
and this is whatever commands you want executed and captured,
and anything that normally go to log will go into string,
and then you can inspect the string.
Log _execute _sql takes five arguments: the name of the calling program,
a database connection, a SQL statement,
an invisible flag, and a table name to return,
and here are two examples.
One works and one doesn't work.
This has SELECT SYSDATE FROM DUAL,
which is a standard Oracle command to get the current system date,
and this has an intentional error in it, dual X,
which I know doesn't exist.
When we run the first SQL statement, we get the system date.
Very good.
When we run the second statement, we get an error message.
Calling program is listed here, the error message is here.
This is very important,
along with this code, ORA-00942,
and then here's your SQL statement,
and this whole message is inside of a text edit box,
so you can copy and paste it.
Here's an example for debug output.
I turn on my debug flag and when I run my statement,
I get an informational message
—It's not an error, it's just informational—
showing the calling program, database connection,
whether it's invisible or not, the table name, and the SQL state,
and then I can click this checkbox
if I want to turn off subsequent debug output.
So here's log _execute _sql, there's a description,
a description of the arguments,
a couple of example calls, and then here's the function itself.
Here's my arguments, I check the database connection,
I check the SQL statement, I check the debug flag.
If the debug flag is on, I make a little window
and I display the current SQL in a text edit box,
and then I give the user the option to turn off subsequent debug output.
If they click that, I reset the flag to zero.
Then here's the meat of this function
I force all errors to go to the log with batch interactive one,
then I call log capture
with either an invisible flag on or non- invisible flag on
for execute SQL,
and then I turn batch interactive, set it back to zero,
then I check the log window for ODBC errors,
I look for Oracle ODBC or the word error and I set a flag,
and then I use words and the [inaudible 00:25:02]
to remind me to parse the output of the log into separate lines.
Then the error message is always on line one of this message.
If we found an error, then it displays the error.
I have an example here yet.
So here's an example where it says FROM keyword not found where expected.
If we look at the SQL,
I happen to know there's a comma missing here.
Let's talk about building IN lists.
The Oracle IN operator determines whether a value matches any values in a list.
Here's an example.
SELECT star, FROM EBA _SALES_ SALES REPS,
where the last name is in one of these values,
and it's similar to the JSL contains function
where here I'm saying does this list contain the word Raj?
In this case it was down at position two.
There's a caveat with the IN operator.
There's a limit of a thousand values.
Of course, I wrote my own function, get_sql _in _list,
which gets around that limitation.
And what it does,
it builds an inlist from the list provided.
If there are more than 1000 items on the list,
it separates them into 1,000 element chunks
connected via union to avoid the limit of a thousand items.
And if the elements are of type string,
any single quotes inside the strings will be replaced with two single quotes,
and single quotes will be put around each item.
So there's two arguments.
First one is item list.
It's the list of items to create an endless FROM,
and then a preamble,
which is a SQL string to preface the IN list with,
so we'll see what these mean in this example.
Here's an example where I have a numeric list,
and my preamble is select this ID from schema info,
where the ID's in open parentheses.
So here's my call to get _sql_in _list, my id_list, preamble
and the output looks like this;
SELECT star from my table m.
The ID is in here.
Select ID from schema info where ID is in one of these numbers.
If you look at a string example, here's four elements in this list.
The first one and the third one has a single quote inside them,
and here's my preamble.
When I call get _sql_in_list and combine it in my SQL statement,
this is my result.
SELECT star from my table m,
where the product name is IN and here it's my preamble,
where alert name is IN, A, B, C, or D.
And you'll notice for A and C
it replaced the single quotes with two quotes,
and it also converted these double quotes to single quotes here.
Here's an extract of a long example where I had a thousand of these ID values,
and so here's the first thousand, and then a union statement,
and then the next thousand and so on and so forth.
All right, here's tip number three.
That is to use an integrated development environment
for developing your SQL statements.
These have a GUI front end
and they let you develop and debug SQL and PL/SQL.
These are some popular tools that I'm aware of.
I use PL/SQL Developer.
It's from All round Automations.
There's a tool called SQL Developer from Oracle that's a freebie,
and TOAD comes from Quest.
Many people are familiar with TOAD.
Let's have a look at PL/SQL Developer.
We saw it earlier when I selected from this table.
I can do things like highlight these,
copy with header, back to JMP,
create an empty data set and click on Edit,
paste with column names.
Boom, there's my data.
Okay.
And I can browse functions, procedures, packages, tables, et cetera, et cetera.
Here's a little more
on that debugging example that we saw earlier,
and again, a comma missing here.
It's just another more explicit showing of that error message.
So you take this, copy it, look at it, and rework it.
Another tip, this is a soft tip,
and that is to avoid inline comments using dash- dash
as it can confuse the parser.
These are comments where the dash -dash says
everything after this on this line is a comment.
Sometimes, some situations,
the parser gets confused
and doesn't treat these properties as comments,
so it's better to use slash-star-star-slash.
I've just seen a couple of times where it didn't work
and I traced it down to these comments.
One more tip, and that is to use
this Oracle SYS _CONTEXT function to get useful information.
There's a namespace called U SERENV in Oracle
and you can get the IP address, the client computer,
the program making the ODBC call,
the operating system identifier for the client,
the current session,
operating system username, and the database name.
There's many more.
If you Google it, there 's many more parameters,
but these are the ones that I use.
Here I'm saying select IP address, module, terminal,
operating system user and service name.
Here's my call to SYS_CONTEXT,
and I'm just selecting it from DUAL, unioning these together,
and the results look like this down here.
Here's my IP address.
I'm calling from JMP .exe, no surprise there.
My username, my database, and then my computer name.
That's all I have today.
The conclusions I'll draw,
or if you configure the Oracle client properly,
get the ODBC connection string
and use Execute SQL JMP in Oracle can do great things.
So once again, JMP and Oracle equals a happy marriage.
Thank you, are there any questions?