Purpose
Here is an attempt of my first application built with app-builder using the connection between JMP and Python to handle a complex machine learning algorithm.
The purpose of this app is to launch a graphical user interface write in jsl and connect that interface with the “XGBoost” machine learning API.
How to Use
I have implemented the possibility to work on 3 types of validation scheme: holdout / cross validation / grid search. Both regressions and classifications have been developed on few objectives and classical metrics.
At the end of the training, a report is supplied with metrics value on both train / validate set or on mean/standard dev for cross validation. In the case of a grid search training the report is supplied with best hyperparameters results and we can choose to use early stopping. It is possible too to export the python model to predict new datas without training the model again (but should be done in a python environment)
You should then check that JMP can handle that link with Python prior to use the app...
For the New 2024 release:
A lot of updates were necessary with the upgrades of the python libraries, so I took the opportunity to add features that I was missing in practical use and to fix quite a few bugs...
- I now force column names to have a certain pattern to avoid any conflict between python/JMP (I clear all special characters, punctuation, spaces) and I take advantage of the new possibility introduced by JMP 16 to force the decimal format with dots to avoid certain crashes that I had previously.
- I'm also now checking that the responses are in the correct format for classification problems (it must be numeric and starting at 0 since the changes made to xgboost, presumably linked to numpy changes) and I'm correcting the class problem on binary responses that could occur in some cases with the first version of the addin.
- Added a checkbox to allow you to keep the generated train and test datasets so that you can export them as csvs and work on the same basis when you want to mix this addin with classic python (and therefore other ML algos for even more fun ^^).
- Addition of the 'Gamma' hyperparameter (a.k.a. 'min_split_loss', i.e. a check on the degree of tree regularisation, by specifying the minimum reduction in the loss function required to justify the additional splitting of a node).
- I now systematically display the model's hyperparameters after fitting, in the same order as the addin, to make it easier to associate these hyperparameters with the model's performance and also to make it easier to copy them back into the addin (when you go from grid search to cross validation, for example).
- I now retrieve the list of features used by the trained model, normalised by F-Score, and I display those whose normalisation is greater than 0.20, then display it in the logs in a format that can be directly copied and pasted into JMP (and therefore more easily added to another analysis). This list is sorted by decreasing F-Score value.
- Installation of a matplotlib output so that the feature importance of the model can now be displayed directly without having to export and then import into an ide python (this was envisaged in my initial release but JMP15 systematically crashed as soon as matplotlib was imported, this seems to have been resolved in JMP 17 but I haven't tested it in JMP 16), the graph adapts to the number of features displayed (font size and margin).
Example on a classic dataset ("Boston house prices": https://www.kaggle.com/competitions/home-data-for-ml-course/data
System Configuration
Add-in developed and test using the following system configuration:
JMP 17
Python 3.11 or 3.12 with numpy, pandas, pathlib, multiprocess, scikit-learn, xgboost
Tested using classical dataset (titanic, boston houses,...)