SUMMARY: I propose a visual Table Builder interface that would allow database selection, in-database operations such as joining and filtering, in-memory operations like "Table>Concatenate" and "Rows> Select Where...", and some JSL operations like Add Property to be chained in a visual flow. I believe this approach would provide a very intuitive way for people with little data science training to automate data preparation and cleanup to easily capture live data sources into JMP for insight generation and analysis.
DETAILS: I find myself using the JMP Query Builder quite a lot to create data queries from our internal Postgres database, and it works well as a basic database querying tool. That said, I still find myself returning to Python, KNIME, or raw SQL to address more complex workflows, such as generating a table by joining two subqueries, themselves the result of joins and transformations on base tables. Ultimately, I still have to make a simple JMP query to get the data in JMP from the database, though. Additionally, the current query builder is rather slow, and not very intuitive to people who do not have a strong foundation in database operations.
The approach taken by KNIME and Alteryx here feels really good to me: a visual flow that lends itself to a powerful mental model of database operations. Each step generates a new virtual table (really an abstracted SQL operation) that can then be pieced together into larger queries.
Configuring individual steps in KNIME is also much faster than in Query Builder, I assume because predicate steps remain in memory and only the modified operation needs to execute on the in-memory representation of the previous step. QB instead runs a lot of SQL queries (turning off automatic previews helps immensely but you lose the preview functionality). Basically every time you select a column or rename a column with automatic preview on is a database query and a disruption in flow.
I think improving Query Builder would go a long way toward democratizing data operations and database usage to scientists and engineers. Here is what I'm proposing:
Separate table selection from join operations and filtering/ordering
This can leverage the existing query builder, as if a single table was selected. The JMP QB way of selecting and renaming columns is pretty good!
Wherever possible, use an in-memory representation of a table sample (including all columns and ~100-500 rows) for preview operations to avoid SQL queries as often as possible, since IO is by far the slowest part of most operations.
I include column selection, renaming, group by, aggregation functions, etc in this.
A possible concern is absurdly wide tables (bioinformatics comes to mind), so there may need to be a column limit as well for in-memory mode, but I'm pretty confident it would cover the vast majority of uses.
Build a visual flow representation for joining, filtering, sorting/order by operations.
Each operation would use the in-memory representation for preview of the output if possible.
The in-memory representation of the source table samples would also remove the need to query column names for the join operation, a step which is shockingly slow in JMP.
Ideally, allow JMP table operations (split, stack, concat, maybe even addition of columns with formulas, row selection, row hide & exclude, etc) to be chained to the flow as visual steps after query from database.
I find a lot of my colleagues end up stringing together various data cleanup and prep steps as menu operations that I then need to help convert to a post-query script for them.
Attaching scripts to the final data table could also be a visual step, which would make for a convenient flow of having a series of N steps for N attached scripts.
From at least a UI standpoint, most or all of these pieces can leverage existing UIs in JMP: the configuration panel for a JMP Table Concatenate step can literally use the same format as the Tables>Concatenate option in JMP already.
My ideal state is that a colleague without significant SQL or database training could easily experiment with table selection, joining, filtering, and data preparation to end up with a fully-automated workflow that results in exactly the table they need to do their visualization and analysis. I think the visual flow approach is an excellent way to facilitate that! This would be a true Table Builder, combining the features of database querying, in-database operations, and in-memory table, row, column, and script operations to build a fully-automatic data preparation workflow.
The ultimate output would be kind of like query builder: either a .jmpflow file (or whatever) that will open to the Table Builder flow or a saved script that can be run from an add-in or the script window.
I've attached a badly-drawn picture of what I'm thinking. If this piques developer interest, please feel free to reach out to me via private message if you'd like to connect to talk about this!