Solved: Creating a Sampling Distribution - Page 2

Report Inappropriate Content · Apr 23, 2019 12:38 PM

I would like to create a true sampling distribution from my dataset and I am not sure what formula(s) to use in JMP to create it. I have a population dataset of 10,000 and would like to create a sampling distribution with n=100, which would then result in 100 sample means. What formula do I use to create this in JMP?

nil · Jan 16, 2020 12:24 AM

Thank You!! Yes, I shall do the suggested on JSL learning.

Mark_Bailey · Apr 23, 2019 03:06 PM

Now I understand. The ANOVA assumes that the random errors are normally distributed. So let's not use ANOVA!

There is a simple way around all of your issues. First of all, the length of call time is generally not normally distributed. It is a measure of the life of the call or simply life time data. More generally, it is time to event data (event is end of call). You want to use Analyze > Reliability and Survival > Life Distribution instead of ANOVA. Select the Compare Groups tab at the top of the launch dialog. Select the column with the length of the call values and click Y, Time to Event. Select the column with the values that identify the agents and click Grouping. (Big assumption for now but let's get things going: all the calls were completed. That is, none of the length of call observations represent incomplete calls. That situation is known as censoring. We can deal with that case properly later if necessary.) Now click Go.

I suggest using the Weibull distribution model and scaling. Click the checkbox before Weibull and the radio button after Weibull. You are going to get a lot of information back at you.

First, the plot at the top in Compare Distributions is useful to visually assess goodness of fit and assess differences between agents. Second, the Summary report informs you about each agent. Open and examine the Wilcoxon Group Homogeneity Test. It assumes that the distribution of all the agents is the same but is significant if any agents are different. Third, there is a tab with an analysis of each agent. Each report is detailed and specific. You might not need or want all the information.

I will stop here and see if we are going in a good direction and if you have further questions.

P_Bartell · Apr 24, 2019 6:32 AM

Too add a bit to @Mark_Bailey 's recommended solution, I'll throw in another issue. What about focusing on the mean (or have you thought about median instead?) AND the variance/spread? Recall Jack Welch's famous quote, paraphrasing, "Customers rarely experience the mean...they feel and experience the variance." So if your ultimate goal is to improve or make more consistent AVERAGE contact length...I encourage you to also think about minimizing/reducing variance when your ultimate goal is improved customer satisfaction through reducing contact length VARIANCE.

tmfortney · Apr 24, 2019 10:08 AM

Very true and a good comment. I usually make it a point to look at both the medians and the spread of the data to determine if the process is in control. Thanks for the feedback.

Mark_Bailey · Apr 24, 2019 10:20 AM

You can use the profilers in Life Distribution to get more than the mean or median. You can estimate any quantile (time) or probability you like.

P_Bartell · Apr 25, 2019 11:03 AM

Then if you want to really go crazy, if you have access to written transcriptions of calls (say in a .txt file) AND JMP Pro...then a whole world of Text Analytics and Predictive/Exploratory modeling work is at your fingertips. With JMP Pro you can analyze the free form text of agents conversations using simple word/phrase counts up to and including latent class analysis, topic analysis, and latent semantic analysis for exploration. From there between the document term matrix or other dimensionality reduction methods, it's a short leap over to the Generalized Regression platform and the quantile regression capabilities for modeling text (or it's surrogates) to contact time quantiles for median or say, 95th quantile. Now you have a link between words and talk time! And if you have customer satisfaction scores wrt to an engagement you can model these as well. Here's a link to a Mastering JMP event that illustrates much of this workflow:

https://www.jmp.com/en_us/events/ondemand/mastering-jmp/using_text_explorer_to_extend_analysis.html

tmfortney · Apr 24, 2019 10:06 AM

Thanks for the advice and instructions. I have never used this platform but after reading more about it I see how it could be very useful in this situation. However, I am having some difficulty reading the output. I see under the Wilcoxon Group Homogeneity Test that the p-value is <.0001 so the variances are definitely not homogenous. With ANOVA I usually look at pairwise comparisons and can compare, but I am struggling with how to compare the Weibull. As you mentioned, I can visually assess the goodness of fit under Compare Distributions, but is there a way I can statistically determine the difference (like p-value) as we do with ANOVA?

Mark_Bailey · Apr 24, 2019 10:18 AM

You are correct, there is no analog to the choices for multiple comparisons as found in the Oneway platform.

The Wilcoxon test is an omnibus indicator of any difference. It is not specific to one parameter like the mean or variance. The plot at the top can help there, though. Parallel lines have the same variance or scale. Displaced lines have different mean or location. So if one agent is consistently completing their calls more quickly, their curve would shift to the left.

You also have the parameter point estimates and confidence intervals for each group (agent) for comparison, although that information is not the same as a multiple comparison test.

You can also use the profilers to extract information about each group. These answers are provided both as a point estimate and interval estimate.

I am not apologizing but simply recognizing that the methodology here comes from the reliability engineering field. The same methods were independently discovered in medical mortality and morbidity. The terminology, therefore, pertains to those fields but the methods are none the less relevant. It just requires a bit of translation. Sometimes it also requires reversing the goals. In reliability, an increasing hazard function is bad. In your case, though, it is good. It means that an event is more likely to happen. But in your case an event is not a failure, it is a completed call.

There are analogous methods for regression models with time to event data. So if you had covariates, you could include them in the model for lifetime and test them. There is a lot of flexibility here.

tmfortney · Apr 24, 2019 10:25 AM

Thanks- this helps a alot. It definitely helps me with what I am trying to do. One last question- if the variances were equal and the population were normally distributed (or my sample size was sufficiently large) I could have used an ANOVA as I have in the past, correct? I just want to make sure I am not using the wrong tool for the job.

Mark_Bailey · Apr 24, 2019 10:45 AM

Yes.

If the variances were unequal but the errors were normally distributed, then you could use the Welch ANOVA, which JMP automatically provides if you select Unequal Variances from the Oneway platform menu (red triangle).

Life Distribution is the right tool in this case, as far as I can tell.

Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Re: Creating a Sampling Distribution

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Multiple-Group Analysis in Structural Equation Modeling