cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
View Original Published Thread

Application case: Finding influencing factors in patients with Helicobacter pylori infection based on JMP data exploration

Helicobacter pylori (hereinafter referred to as Hp) infection has become an international problem. The Hp infection rate in China is about 50% to 90%, which is higher than the average level of developed countries (50% to 70%). Helicobacter pylori infection has become a public health crisis in my country. At present, there is no consensus on H. pylori infection routes and risky behaviors in China. How to effectively find out the influencing factors of the disease is also a topic worthy of discussion in research.

 

Today's AI artificial intelligence technology and big data analysis can play an important role in research and auxiliary analysis. In this study, the pharmacist-in-charge of the Traditional Chinese Medicine Hospital of Zhongshan City, Guangdong Province Yuan Yiming, Sharing how he uses data to explore the linear relationship between relative feature data and results, fully applying the visualization and interpretability of machine learning, aiming to understand the infection patterns and dangerous behaviors of HP in China, and establish an easy-to-use and economical The predictive infection model provides a reference for asymptomatic infections.

 

This study selected healthy people who underwent carbon 13 and carbon 14 breath tests or gastroscopic mucosal test for Helicobacter pylori detection in the endoscopy and physical examination center as the research subjects.

 

The researchers designed a non-scale electronic questionnaire composed of two categories and multiple categories related to Helicobacter pylori infection based on experts and literature data. The questionnaire form is divided into three dimensions and 63 variables: ① patient basic information form, ② clinical manifestation form and ③ living and eating habits form.

 

The researcher used JMP Pro to perform statistical analysis on the data. First, single-factor analysis was performed on the data, and then multiple logistic regression analysis was performed to establish a prediction model. Finally, a forest plot was drawn based on the results of the model effect terms.

 

Main analysis ideas and steps

Step 1: Clean the existing data

JMP can be used to classify and manage batch variables. Since the collected variables are qualitative variables, standardized column feature operations are performed on all variables in JMP Pro, and all variables are uniformly converted into character name variables.

undefined

 

undefined

 

Step 2: Generate validation columns

The verification function of JMP can be used to classify the data into training set and verification set. The training group and validation group are randomly selected according to 7.5:2.5, as shown below:

undefined

 

Step 3: Single factor analysis

Through contingency analysis and the Distribution platform, it can be found that people live in floors, have stomach pain, acid reflux, bloating, loss of appetite, bad breath, bitter mouth, stomach rumbling, hunger, anemia, cook lunch at home, smoke, and use public transportation when going out. Except for chopsticks, the habit of using serving chopsticks, and whether family members and friends are infected, there is no statistical significance in other variables.

 

undefined

undefined

 

Step 4: Multi-factor analysis

In JMP, click the "Analysis" menu → " Fit model (Fit Model) ” (Figure 2), in the pop-up dialog box, import whether you are infected with Helicobacter pylori into “Y”, and import the independent variables to be analyzed into the “Constructed Model Effect” box. Verify based on the generated verification column.

 

undefined

 

From the effect likelihood ratio test shown in the figure below, we can see that the probability of breath, bad breath, whether to use public chopsticks, cooking lunch at home, the habit of using public chopsticks, anemia, and how many floors the family lives in is > chi-square, which is statistically significant significance.

 

undefined

 

 

From the effect summary results, we can see the contribution of the independent variables

From the effect summary results (picture), this part reflects the contribution value of each variable, which is mainly reflected by the LogWorth value. The LogWorth value is equal to -log 10 P.

 

As can be seen from the figure below, among the 15 independent variables, the biggest contributors are breath, bad breath and the habit of using serving chopsticks, while nausea and stomachache have a smaller impact. This result can intuitively reflect the contribution of the independent variables, and can also be used as a reference for further variable screening. From the overall model test, the value of probability > chi-square is less than 0.001, which proves that the model is statistically significant. Here, variables with low contribution values are deleted.

 

undefined

Before deletion

 

undefined

After deletion

 

parameter estimates

Through parameter estimation (picture), the formula of the corresponding prediction model can be obtained, as well as the impact of each variable on the final observation.

undefined

 

Model validation

The areas under the ROC curve of the training group and validation group were 0.7334 (95% CI 0.709 ~ 0.784) and 0.7153 (95% CI 0.6729 ~ 0.7577) respectively. After analysis, the model is at a good level.

undefined

 

Based on the forest map, determine which factors are protective factors and which are dangerous factors

Save the table of multiple logistic regression results and OR values, and create a forest plot in the graph generator based on the series of OR data.

 

undefined

 

undefined

 

Through single-factor to multi-factor regression analysis, it can be seen that Bloating, bad breath, lunch ( Cook it at home ) , whether you have the habit of using serving chopsticks at home and when going out, whether your family members are infected, and how many floors you live on are the main factors for Helicobacter pylori infection. Through the forest map, we can see which factors are protective factors and which are dangerous factors.

 

In summary, by analyzing the infection characteristics and risk behavioral factors of Helicobacter pylori, we can provide directions for exploring the infection pathways of Helicobacter pylori, and formulate pre-treatment nursing strategies based on the constructed model.

 

Original text index : Data mining to build prediction models and prevention and treatment strategies for patients with Helicobacter pylori infection [J]. Journal of Gastroenterology and Hepatology ,2022,31(09): 992-998.

 

If you are interested in the ideas and methods of this case study or want to have further in-depth discussions with the author, you are welcome to sign up for the JMP online seminar on April 27, 14:00-15:30:

Click here to register

 

At that time, Dr. Yuan will visit the JMP live broadcast room to share with you in detail the valuable analysis ideas and experiences of this case study. Wonderful content, don’t miss it! Now Try JMP 17 for free for 30 days .

 

Recommended reading:

 

This post originally written in Chinese (Traditional) and has been translated for your convenience. When you reply, it will also be translated back to Chinese (Traditional).