<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to get correct and Predictive model in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246835#M48467</link>
    <description>&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/4358"&gt;@statman&lt;/a&gt;, &lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/12606"&gt;@dale&lt;/a&gt;&lt;BR /&gt;&lt;BR /&gt;Thank you for your feedback.&lt;BR /&gt;First of all, as you said, that's a time series.&lt;BR /&gt;As you mentioned, I checked the correlation and reduced the list of variables to 40, and I think I got some satisfactory results.&lt;BR /&gt;Since 'Y value' was not measured in real time, the amount of data is not as much as I thought, but I think I was able to determine the cause by applying Parallel Plot, etc.</description>
    <pubDate>Tue, 11 Feb 2020 00:27:08 GMT</pubDate>
    <dc:creator>Dongjin</dc:creator>
    <dc:date>2020-02-11T00:27:08Z</dc:date>
    <item>
      <title>How to get correct and Predictive model</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246456#M48398</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi, I need your help again.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I've been doing an analysis recently and the contents are simple.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It's data with 376 rows, 223 columns, and single Y value.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;To get the Y-value prediction model, I used 'Fit - Models'(Fit Least squares - standard Least squares / Effect Screening), and I first got the R-Square 0.95, using row 1 to 263.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I added the Predict formula to the 'New column' and compared the estimates from the remaining 264 to 376 with the actual measurements.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;As a result, you can see that there is a greater error with new data than below.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="pred_02.JPG" style="width: 629px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/21633i3B888055B5D1AE3C/image-dimensions/629x416?v=v2" width="629" height="416" role="button" title="pred_02.JPG" alt="pred_02.JPG" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I wonder what additional work I can do or take advantage of other features.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I attached my data, thank you.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;(FYI, X_1 is just numbering for indexing)&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Feb 2020 16:40:10 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246456#M48398</guid>
      <dc:creator>Dongjin</dc:creator>
      <dc:date>2020-02-07T16:40:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to get correct and Predictive model</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246459#M48399</link>
      <description>&lt;P&gt;plus, '&lt;SPAN&gt;Effect summary' results show that PValue represents 223 rankings, from the smallest to the largest, &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I wonder what criteria you can distinguish by trying to determine which variables you can exclude from creating a predictive model.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Using Top 30% ? or Under Pvalue 0.05?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Feb 2020 16:16:10 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246459#M48399</guid>
      <dc:creator>Dongjin</dc:creator>
      <dc:date>2020-02-07T16:16:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to get correct and Predictive model</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246622#M48416</link>
      <description>&lt;P&gt;You need to provide more context about these variables - that is a lot of potential explanatory variables and they appear to be highly correlated with each other (which makes me think you might want to try something like a principle-components analysis to reduce the features).&amp;nbsp; Also, it looks as if it is time series data - in which case you should analyze it as such.&amp;nbsp; Finally, the first 263 rows certainly behave differently than what comes after - so no matter how good your model is on those rows, that model is not likely to predict the trend that suddenly appears after that row.&amp;nbsp; I would want to focus on whether any of the predictors can pick up on that change in the pattern.&amp;nbsp; It is not likely you can find that by only modeling the 263 rows that don't exhibit that trend.&amp;nbsp; So, you have sort of broken your data into training and test data sets, but your training set looks very different than your test data - that is not a good way to produce a model for the test data.&amp;nbsp; Randomly selecting the training and test data (use a validation column) would be a better approach.&amp;nbsp; But, first and foremost, is this time series data (it looks like it to me)?&lt;/P&gt;</description>
      <pubDate>Sun, 09 Feb 2020 12:29:56 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246622#M48416</guid>
      <dc:creator>dale_lehman</dc:creator>
      <dc:date>2020-02-09T12:29:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to get correct and Predictive model</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246637#M48421</link>
      <description>&lt;P&gt;Where to start regarding the question about getting the "correct" predictive model,&amp;nbsp;there are two schools of thought.&lt;/P&gt;&lt;P&gt;1. develop mathematical models based solely on data analysis (e.g., neural networks, PCA)&lt;/P&gt;&lt;P&gt;2. understand, with scientific basis, relationships between input variables and output variable (What Deming Called the analytic problem)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Being deterministic, the 2nd approach is what I prefer. &amp;nbsp;This begins with statements of hypotheses about the relationships between inputs and outputs. &amp;nbsp;It requires an understanding of inference (over what conditions do you want the model to be effective). &amp;nbsp;Then the appropriate "sampling plan" to acquire the data (directed sampling or experimentation). &amp;nbsp;Certainly you can examine historical data, but only to help develop hypotheses which then need to be tested.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your data set lacks any context (as Dale suggests). &amp;nbsp;There is no "meaning" to the columns, just columns of numbers. &amp;nbsp;I'm not sure why Dale thinks this is a time series as I see no times or dates in the data? So, creating the correct predictive model is left to option 1 above.&amp;nbsp;If you include all of the columns (224) and run Fit Model, you get Rsquare Adj of .59 and Rsquare of .83. &amp;nbsp;These values are way too different which suggests you have over-specified the model (unimportant terms are in the model). &amp;nbsp;If you look at VIFs (Parameter Estimates table) there are many above the threshold of &amp;gt;5 (or &amp;gt;10) which is a measure of multicollinearity.&amp;nbsp;So the model needs to be reduced. &amp;nbsp;However there is no intelligence how to do this as there is no context.&lt;/P&gt;</description>
      <pubDate>Sun, 09 Feb 2020 15:59:35 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246637#M48421</guid>
      <dc:creator>statman</dc:creator>
      <dc:date>2020-02-09T15:59:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to get correct and Predictive model</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246835#M48467</link>
      <description>&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/4358"&gt;@statman&lt;/a&gt;, &lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/12606"&gt;@dale&lt;/a&gt;&lt;BR /&gt;&lt;BR /&gt;Thank you for your feedback.&lt;BR /&gt;First of all, as you said, that's a time series.&lt;BR /&gt;As you mentioned, I checked the correlation and reduced the list of variables to 40, and I think I got some satisfactory results.&lt;BR /&gt;Since 'Y value' was not measured in real time, the amount of data is not as much as I thought, but I think I was able to determine the cause by applying Parallel Plot, etc.</description>
      <pubDate>Tue, 11 Feb 2020 00:27:08 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-get-correct-and-Predictive-model/m-p/246835#M48467</guid>
      <dc:creator>Dongjin</dc:creator>
      <dc:date>2020-02-11T00:27:08Z</dc:date>
    </item>
  </channel>
</rss>

