Extending Data Connectors with Python: Query Builder Integration and More

JMP 19 will expand on the Data Connector feature introduced in JMP 18 to allow uniform access to more data sources. Besides adding more built-in connection types, JMP 19 will also allow users to define their own connection types by means of an exposed Python API. Like the built-in types, these types will share in the benefits of the Data Connector feature, including its configuration management and Query Builder integration. And because they're implemented in Python, they can make use of existing Python libraries. In this presentation, we describe the API, discuss the concepts involved in it, and show it in action.

Hello. I'm here today to talk about JMP's Data Connector framework, specifically how in JMP 19, it'll be able to be extended with Python. JMP 18, we introduced the Data Connector framework, and we used it to provide revamped connectivity with ODBC. In JMP 19, we're expanding that.

You might hear elsewhere about how we're using it to reconnect to SaaS, how we're providing native connectivity to Snowflake. We're also using this very framework that I'm going to talk about, this Python API, to implement connectivity to Amazon S3 and Microsoft SharePoint that won't be bundled with JMP, but will be distributed via the JMP Marketplace.

There's a few things I want to cover in this talk. First of all, I want to briefly make the case for why you might, if you have your own data source, integrate it with JMP. If you do want to integrate it, then you'll want to know where it fits in, what JMP needs to do in order to talk to your data source, and what you need to provide to let that happen. In particular, what code you need to write.

Finally, we'll talk about how you can share that. Your code that you've written works for you, but others in your company might want to use it. You might want to share it with other JMP users worldwide. In order to make this easy, we've chosen Python, in part because of the libraries that are already available for it, or code that you might have written. This is how we've done the connections for Amazon S3 and Microsoft SharePoint. We're able to wrap existing Python libraries, and we hope that as a result, it'll also be easier for you.

Another reason you might want to integrate, besides that we've tried to make it simple, is that there's the ability to manage your configurations. A user can create and edit the configurations that describe connections using the Connector Editor, Query Builder, is something that is accessible in this way, so you can graphically create queries, filter your data, select columns, all sorts of things. Finally, you can collaborate. The connections that you create and the configurations that you create can be exported and shared via add-ins.

In case you haven't seen this functionality before, this is something that you'll get to see more of shortly when I show all this in action. But before I do that, let's talk about where everything fits in, how it all fits together. In the beginning, there's your data source, and you want to make it talk to JMP.

But when we talk about data in JMP, we're talking about data tables. Those tables are something that maybe it's very straightforward to provide from your data source. Maybe it's something that you have to do some work to make happen. For example, maybe you have some web endpoint somewhere that returns this hierarchically structured JSON data. In that case, there is some processing that will need to happen. But as long as it can be exposed as a data table for JMP, it can potentially be used.

Query Builder, when it accesses that data, is going to look at it as though it were a traditional database, as if it were something like MySQL or SQL Server. It expects that the data that you have is organized into named tables, and that those tables are potentially organized further into named schemas.

But you probably have more than one data source. Or if you don't, there's at least multiple ways of accessing it. People might have different credentials. Or you might just not want to hard code address of where your data is. Somehow or other, you're probably going to have options. By providing specific values for those options, you get a configuration. Then with that configuration, you can create a data source. Those configurations are things that you can create with the Data Connector editor.

That's the picture in brief. But how does it turn into code? Let's take a look. I'm over here in JMP. We'll get that window out of the way for us, and let's start fresh. We have code here on the left that implements a couple of Data Connector types. A type is something like ODBC, which is built into JMP. It could be one of the SaaS types, which could be local, remote, or via connectivity. It could be Snowflake. But regardless, Python is also an option with any particular type that you might implement.

At the top here, we have some imports. In particular, we have an import of the built-in JMP module. We have this utility function here that we can use to make sure that we can see when a function is getting called. That's used for this demo here because JMP is what calls the code. We're not going to call the code so much. It's important that, in terms of understanding what's going on, that we can keep track of that.

The code below this, this is what actually implements the type. Before I go into this more, let's check out this reference that I have. These various things here in this list correspond, in many ways, to the diagram that I just showed you. We need to have options. We'll do that by subclassing this class jmp.DataConnectorType, and defining a field attribute that defines those options. We have configuration that we need to turn into our data source. We'll subclass jmp.DataConnector, and we'll define a function, _do_as_data_source, that performs that conversion.

Finally, we have the data source itself. That's another subclass. We'll define functions that will let us know what the schemas are, if we have them, what our tables are, what the data is for a particular table. Though we're not going to see it in our demo here, something that's useful to keep in mind is that if you have resources that won't automatically be cleaned up, for example, if you have some connection that doesn't already have this behavior, then you will want to find Python's special double underscore method that can clean that up when things are no longer referenced, because leaving it up to Python to clean it up is how JMP will handle closing the data source.

With this in mind, let's go and look at the code again. We see SimpleFolderConnectorType, which subclasses jmp.DataConnectorType. It has fields, as said, that has the options. The first and only option here is "Folder".

In this case, what we're going to have folder mean, indeed, what our SimpleFolder connection is going to look like, is that we have our data source. That's going to correspond to a single folder just on my local file system here, and that folder is going to expose data in the form of the files that are in it.

To get us started, we'll have "Folder," and its value here in this dictionary that describes our options, is this type, str | None. This is saying that we're going to have this optional string value. Then we have our configuration class, SimpleFolderConnector, that subclasses jmp.DataConnector. It will define this _do_as_data_source function that will convert or rather connect from our configuration.

We can access our configuration value, the "Folder" one, just by indexing into self, just like you would with a built-in dictionary. We will test if it's none, if the user left out this default value, and let them know it's required by throwing an error.

Finally, we'll convert it to the data source. The data source class is the last one on our list here. Of course, we have to be able to create it. We define an init function. We check, in this case, if it's actually a directory, because what we want to do is let the user, yourself, or whoever's using the code, know if there's an error. When you create a data source, that should be when you connect. In this case, there's not really any connection to do, but we can at least check that the folder is there.

Next thing on the list was get-schemas. In this case, the way we were talking about data source is there are no schemas. We just have direct access to the tables. We don't have to define get-schemas, and our JMP will know that our data source does not use schemas.

But we do have to define what the tables are. In this case, while we have to take a schema parameter, we'll ignore it, because we're not using them. But then, since we're looking at the files, we'll use some way to name those files. In this case, just the name. We'll iterate over the folder, and every file that is in it will return its name.

Finally, we have to get the data tables. That's what this last open table function is for. First, we have to reconstitute the path that is the file that holds the data. We'll take the table name, which is just the file name and appendage to the folder, and then we'll return the path as a string.

The model I was just talking about calls for returning a data table, and you can do that. If you have, like I was giving example earlier, a web endpoint, it returns this hierarchical JSON or whatever. That's something that you probably want to turn to a data table yourself or return directly.

But in this case, since this just from the file system, and this was a case that we saw in the creation of the S3 and SharePoint add-ins, and something that I think we've heard interest expressed in terms of accessing other file-like data stores, it is possible to open the file not directly, but just by passing this string. Then JMP will open the file for us.

It'll take care of things like making sure that the file is opened privately. The data table returned here is an implementation detail. It's something that shouldn't directly be exposed to the user. When you use Query Builder, you don't see intermediate tables popping up, just the final result. In this case, JMP will take care of that for us. Doing it this way also lets JMP preserve any settings that might be involved in opening the table. I'll talk about that more later.

Finally, we have this invocation here, function jmp.DataConnector.tie. What this is doing is it's taking its two arguments, the type class and the connector class, and making sure that they know about each other. Currently, neither references the other, but the DataConnectorType class here at the top, this is what JMP uses as the access point. The type is this all-encompassing thing that captures what all the code, all the definitions are, that make up their connectivity. They need to know about each other.

Finally, I have a comment here that just indicates what that final type string is. If it's DBC, or it would just be ODBC. It might be SAS Viya or SAS Remote. In this case, for Python, we say that this Python colon, indicating it's a function type. We have __main__ which is the name of the module in this case because we've just run it directly from JMP. Then finally, we have SimpleFolderConnectorType, name of this class.

Let's see this in action. If I go to the Data Connector dialog, and I create a new connection, I'll open up the Data Connector Editor for that. If we go and change the type to Python, then we have a way to type the rest of the string here. We fill that in.

Once I do that, we can see our "Folder" option. See it's at no value, so that will correspond to None. So you see if I try to test it, which we'll try to connect, we'll try to call this function and this function, we'll see our error message here, Folder is required.

Now, in order to actually have some data access, I'm going to use the sample import data that comes with JMP. That's this folder here. I'm going to copy. Show it again. We'll copy the path, and go over JMP, and put that in here. We see what we tested. Now all is well, so we're going to go ahead and connect.

Now we've connected, you can see a list of tables here. In fact, these tables correspond to the files that we have here. We see Animals_L.txt here, and we see that again here. If we look at the log, we can see our use of this log entry function has come in handy. We can see that we've called _do_as_data_source multiple times, that we've called the init of the data source multiple times as we were testing it. We also see that get-tables was called, which provides us again with this list.

If I go, and I call Query Builder here, then we can see that we have access to what the columns are. We see that if we go ahead to build the query, that we can add them. We can do some sort of filtering here. For example, we could select just two of these seasons. If we go look at the SQL, you see that it captures all of that. Run the query, and we have our results.

You might wonder how this all works. We create a SQL query. We ran SQL code. Yet when we were defining things earlier, there was scarcely a notion of SQL. The reason it can work is because we've provided this open table call; we've provided this way to go and get the data. Then JMP does the rest for us. It uses SQLite, sort of embedded database under the hood in order to expose the data that we have and be able to run SQL code. For example, that where clause that let us see only two of the seasons.

Let's look at the more complicated file. Maybe let's go back to the big class, our favorite, which is in Excel here. Go to Query Builder again. We see that before actually, to do anything, this dialog popped up here. We decided that if you have files, [inaudible 00:17:43] times, you have to get prompting in order to actually specify what's there. When JMP tries to open this file, it prompts you to fill in, in this case, information you need with the Import Wizard.

This all looks good, so click Import, and now you can see as before, I can see the column names, I can see the snapshot. I'm just going to go ahead and click Import now. We see we have our data. Now, you might be wondering, do I have to go through that every time? The answer is no.

Let's look at this source script. At the top, we can see what we expect. We have New SQL Query. We have our connection information, which in this case is provided by the Data Connector. You can see it's referencing our type, and it has our folder that we've specified as well.

But when we get down to the table, we see that there's this new option here, Open Settings, and the Open Settings capture what was specified in that Import Wizard. In particular, it's taking the additional arguments that were passed to open in the source scripts of the underlying big class table and providing those here.

This way, if I go ahead, and I run this again, see, everything's opened up, looks good. If we had done something nonstandard with the settings, maybe we had to start at a later column, then that would still be reflected here. We see that started all over again just from the log here. We can see that we called _do_as_data_source again. Then we got the list of tables, and then we actually opened this table again.

That's the basics. This looks something a little more complicated. But it's not only more complicated; it's a little more full-featured as well. This was, again, sort of the minimum, the smallest that you can do and have a connection.

It looks a little fancier now. Instead of a simple Folder Connector, we'll have a full-fledged Folder Connector. See, fields definition is a lot more complicated, but it's built about out of similar building blocks as before.

What is added in this case, first of all, is this use of jmp.DataConnectoGroupedFields. What that does for us is it lets us specify that the fields are in different groups. This affects how they're shown in the Data Connector Editor. For ODBC, the options are grouped into different groups, and we can use the same functionality here with Python.

Each group specifies a pair in this list that's passed to jmp.DataConnectorGroupedFields. We have the name, and then we have a dictionary, which is like the one from before, where we have the option name and then the field definition. The field definition, you can still see that type in there, that str | None. But we've also wrapped it in this use of jmp.DataConnectorField, which I've aliased because we're using a few times, and it's a little long, in order to specify additional information, such as, for example, a tooltip that will show up in the Data Connector Editor if you hover over the name of an option.

You specify default value. You can also specify a name that will show up in the Data Connector Editor that is different from the name that's going to be using configuration files that's going to be used in JSL. We use this functionality in JMP, for example, with ODBC, in order to localize the names of the options or to provide slightly better strings than we would otherwise, because by default, these are what are used in JSL. So we want a JSL style name.

"Ordner" here is, I believe, the German word for folder. We'll see that there. As for what's changed with how we're viewing our data source, it's still based on the folder, but we're also going to optionally support schemas. In this case, we'll have an option that instead of looking at the data as we did before, let's look at it in a new way where we consider folders that are children of the folder provided here to be schemas, and then their files are considered the data tables.

We'll have another option that will mix up how we look at the mapping from table names to the underlying files by limiting to certain extension and also clipping it off.

Our conversion to a data source when we connect is similar but slightly more involved because we have more options. We're still checking "Folder", but we're also going to check this extension option. We're going to raise an error, and then we'll pass everything to our new data source.

Our new data source is going to check the folders before, also set the other options. It will provide schemas. If we are using the schemas, then we'll have this similar-looking code, where we're looking through the folders in the folder. Any child that's directory will get its name, use that as the name of the schema.

If we're not using schemas, though, if we didn't set our new fancy option, then we'll just return None. Before, we weren't defining schemas, get-schemas, but we can't optionally define a function here, so instead, you're allowed to return None to mean "I don't have any." We're not using schemas.

Get-tables, similar idea to before. Ultimately, we're getting the name of a bunch of files. But because of our options, we do it in potentially few different ways. We get directory. It's going to depend on what the schema is. Indeed, it's going to depend on whether we're using schemas at all. Then we'll filter down to the files before getting their names, or doing something fancier if we're processing the extensions.

Finally, in order to get the data table, we reverse the process of get-schemas and get-tables. We have to reconstruct the path. We might add on the extension back. But it won't get the directory that's involved, and we'll put that file name back on the end before returning it as a path again… Returning the path as a string again. We tie it together, and then we have our type name here as before.

Let's take a look at that. Let me scroll back here so we can reference it. As before, let's set the type here. We can see that we have our customized name here along with the tooltip. We'll use the same as before in terms of the folder. Let's get the path to that. In fact, let's go one up. Let's use our schema functionality. We'll copy that.

Then we can say we do want to use child folders as schemas. Since we threw in this fancy option, we'll use it. Make sure it all looks good. Yep. We can connect. Let's make sure all the windows are front and center here.

Now we see on the left, the schemas, which are in fact just the folder names. Here's our import data. We see we've limited it to the .txt files. Take our animals example as before, open it up. We have our data. All works. We have the calls here as before. If we look at the log, we have calls to get-schemas, get-tables to our utility function, and to open table.

You've done all this work to create these types. Now you want to share it with others. That's the last thing we want to talk about, how you do that. I have another checklist.

You have your Python code that you've written, but depending on how you've written it, you might want to structure it a bit into a package. The goal of structuring to a package is to make sure that it's all together and that it can be installed with a tool like pip, or in this case, a jpip, wrapped by JMP, that you can specify dependencies when you're doing that. Again, for example, with S3, we depend on the Boto3 package, which is on PyPI, and using a package provides a very natural way to specify that dependency.

You'll want to create base configurations. Remember earlier, I was going and filling in the type name directly in the Data Connector Editor. That's all well and good, but it would be easier if I could just find something in the list of configurations. I could just create something new based on that instead of having to remember the full name.

You'll probably want to write an install script. We have this wonderful Python package, but we need to put it in a place where it can actually be used. If we can do it automatically, then that would be pretty neat.

Finally, we want to put it all into an add-in. With an add-in, we have something that's easy to install. We have something that can be distributed via the Marketplace. At any rate, it's all together in a single package file.

Let's see what that looks like. I have here an add-in that contains a further developed version of the code that we were looking at earlier today. I've unzipped it here so that we can see what all is in it. The first thing on that list, I believe, was the Python package. I've created that here. We have our Python code, but this is structured into a Python project… Sorry, into Python package. We have our pyproject.toml file. I don't really have time today to go over exactly how you construct one of these, but there are online resources available for that.

We have a Data Connectors folder. This is where the configurations that go with the add-on are provided, so in this case, the base configurations. If I look at one of these, for example, we'll see is it just has this one line where it specifies the type. It's not specifying additional configuration, but it provides something that the user can point at. Then we have, in this case, addinLoad, which is going to have the install script.

Now, of course, you can try to create all this directly, but you can also use JMP's add-in creator. If we go to File, New, Add-in, then I can go... I can provide a name. I can provide an ID. I can specify minimum JMP version, which this case should be 19. Start-up Script here is where we'd put the script that will install the package. We want to make sure that we only install it once.

We have the files that we want to add. In this case, they're already unzipped in here, but you might have these somewhere else if you're creating it yourself for the first time. We can add the package. See, we got all the stuff there. We can add the Data Connectors. If we have the subscript, then we'll be able to really save and create our own add-in. Once you have everything there, the creation process is hopefully pretty straightforward.

That is all I wanted to show here. Let's sum up. First of all, I want to talk about why you'd want to integrate. I hope that, as we've seen as I started everything off, that there's a lot that the Data Connector framework provides in terms of being able to edit configurations, share them, use Query Builder to interactively construct your query. Then that provides a good reason to go and wrap your data source. This is where it fits in. We talked about how when it comes to JMP, everything's a data table, and we want to make it look like a database.

What code to write? We went over the various subclasses that you want to define, the various functions that you'll need to write, how you specify the options. Finally, we talked about how you might create an add-in and share what you've written with others or onto the JMP Marketplace.

This is, in some ways, a relatively quick overview of everything that's possible. Some of you might be taking very good notes and taking lots of screenshots or photos, but if you're like me, then probably, you want to have something else to reference. The good news is we have the information, and we are creating more.

The API that I've shown today, it is currently documented. It's in the EA Notes you can find under the Data Access section. You'll want to make sure that you're getting the latest version, which depending, might not be the current EA, if there haven't been changes. We are adding some documentation that will be in the scripting index. It won't be able to provide the overview that, for example, is in the EA Notes, but it will hopefully be able to serve as a reference for individual functions in the API.

Finally, that add-in that I was showing before, we're developing a little bit more, and we're going to put it on the JMP Marketplace. We want that to serve as a resource in terms of both showing off the API but also showing how to create a Python package, and then also publish that in an add-in.

All that documentation, we'll, again, show off everything you've seen today and show off a couple other functions that I didn't have time to discuss that are a little more advanced but might still be useful. Be sure to check that out.

Finally, I want to say what we might be doing in the future. In JMP 20 and beyond, there are additional extensions to this API that we're considering. For example, you might have noticed that in order to process your data, JMP will effectively bring it all in locally, bring your entire data table, because that's what you have to return from open table. Then we'll use it locally, as I mentioned via SQLite.

But if you can do some of that processing server side, if you don't have to download everything locally, then we want to support that. That's something that we are going to be looking into, how we can provide that sort of functionality as well.

With that, I think we're done. Thank you for listening. I hope that you can take what you've learned here today and go and create new data sources that are wrapped into this Data Connector framework that you'll be able to take those and publish an add-in, and overall, just have a better, easier time working with your data and getting it into JMP. Thank you again. That's all.

Presented At Discovery Summit Europe 2025