Teaching Workflows from Day 1: Using JMP® Projects in the Classroom (2020-US-30MP-608)

5 Kudos

Level: Beginner

Ruth Hummel, JMP Academic Ambassador, SAS
Rob Carver, Professor Emeritus, Stonehill College / Brandeis University

Statistics educators have long recognized the value of projects and case studies as a way to integrate the topics in a course. Whether introducing novice students to statistical reasoning or training employees in analytic techniques, it is valuable for students to learn that analysis occurs within the context of a larger process that should follow a predictable workflow.

In this presentation, we’ll demonstrate the JMP Project tool to support each stage of an analysis of Airbnb listings data. Using Journals, Graph Builder, Query Builder and many other JMP tools within the JMP Project environment, students learn to document the process. The process looks like this:

Ask a question.
Specify the data needs and analysis plan.
Get the data.
Clean the data.
Do the analysis.
Tell your story.

We do our students a great favor by teaching a reliable workflow, so that they begin to follow the logic of statistical thinking and develop good habits of mind. Without the workflow orientation, a statistics course looks like a series of unconnected and unmotivated techniques. When students adopt a project workflow perspective, the pieces come together in an exciting way.

Auto-generated transcript...

Speaker	Transcript
	So welcome everyone. My name is
00	07.933
	3
	Ambassador with JMP. I am now a
	retired professor of Business
00	30.566
	7
	between a student and a
	professor working on a project.
00	49.700
	11
	12
	engage students in statistical
	reasoning, teach that
00	12.433
	16
	to that, current thinking is
	that students should be learning
	about reproducible workflows,
00	36.266
	21
	elementary data management. And,
	again, viewing statistics as
00	58.800
	25
	26
	wanted to join you today on this
	virtual call. Thanks for having
00	20.600
	30
	and specifically in Manhattan,
	and you'd asked us so so you
00	36.433
	34
	And we chose to do the Airbnb
	renter perspective. So we're
00	51.733
	38
	expensive.
	So we
	started filling out...you gave us
00	09.166
	43
	44
	separate issue, from your main
	focus of finding a place in
00	36.066
	49
	you get...if you get through the
	first three questions, you've
00	54.100
	53
	know, is there a part of
	Manhattan, you're interested in?
00	11.133
	58
	repository that you sent us to.
	And we downloaded the really
00	26.433
	32.866
	63
	thing we found, there were like
	four columns in this data set
00	46.766
	67
	figured out so that was this
	one, the host neighborhood. So
00	58.100
	71
	72
	figured out that the first two
	just have tons of little tiny
00	13.300
	76
	Manhattan. So we selected
	Manhattan. And then when we had
00	29.700
	80
	that and then that's how we got
	our Manhattan listings. So
00	44.033
	84
	data is that you run into these
	issues like why are there four
00	03.300
	88
	restricted it to Manhattan, I'll
	go back and clean up some
00	18.033
	92
	data will describe everything we
	did to get the data, we'll talk
00	28.400
	33.200
	97
	know I'm supposed to combine
	them based on zip, the zip code,
00	47.166
	101
	102
	107 columns,
	it's just hard to find the
00	09.366
	106
	them, so we knew we had to clean
	that up. All right, we also had
00	27.366
	111
	journal of notes. In order to
	clean this up, we use the recode
00	45.500
	115
	Exactly. Cool.
	Okay, so we we did the cleanup
00	02.200
	119
	Manhattan tax data has this zip
	code. So I have this zip code
00	19.300
	123
	day of class, when we talked
	about
	data types. And notice in the
00	42.300
	128
	the...analyze the distribution of
	that column, it'll make a funny
00	03.200
	133
	Manhattan doesn't really tell
	you a thing.
	But the zip code clean data in
00	18.466
	23.266
	139
	just a label, an identifier, and
	more to the point,
	when you want to join or merge
00	41.833
	48.766
	145
	important. It's not just an
	abstract idea. You can't merge
00	03.166
	11.266
	150
	nominal was the modeling type,
	we just made sure.
00	26.200
	31.033
	155
	about the main table is the
	listings. I want to keep
00	45.533
	159
	to combine it with Manhattan tax
	data.
	Yeah. Then what? Then we need to
00	03.266
	164
	tell it that the column called
	zip clean,
	zip code clean...
	Almost. There we go.
	And the column called zip, which
00	33.200
	171
	172
	Airbnb listing
	and match it up with anything in
00	57.033
	177
	178
	them in table every row, whether
	it matches with the other or
00	13.233
	182
	main table, and then only the stuff
	that overlaps from the second
00	29.600
	186
	another name like, Air BnB IRS
	or something? Yeah, it's a lot
00	50.966
	190
	do one more thing
	because I noticed these are just
	data tables scattered around
00	06.666
	195
	running. Okay. So I'll save this
	data table. Now what?
	And really, this is the data
00	19.833
	22.033
	26.266
	35.466
	203
	anything else, before we lose
	track of where we are, let's
00	49.733
	58.800
	01.833
	209
	or Oak Team?
	And then
	part of the idea of a project
00	23.700
	214
	thing. So if you
	grab, I would say, take the
00	50.100
	218
	219
	220
	two original data sets, and then
	my final merged. Okay Now
00	16.200
	225
	them as tabs.
	And as you generate graphs and
00	36.566
	229
	230
	231
	even when I have it in these
	tabs. Okay, that's really cool.
00	58.833
	02.500
	236
	right, go Oak Team.
	Well, hi, Dr. Carver, thanks so
00	19.233
	240
	you would just glance at some of
	these things, and let me know if
00	32.300
	244
	we used Graph Builder to look at
	the price per neighborhood. And
00	45.400
	248
	help it be a little easier to
	compare between them. So we kind
00	01.000
	252
	have a lot of experience with
	New York City. So we plotted
00	18.166
	256
	stand in front of the UN and
	take a picture with all the
00	31.733
	260
	saying in Gramercy Park or
	Murray Hill.
	If we look back at the
00	46.566
	265
	thought we should expand our
	search beyond that neighborhood to
00	58.766
	269
	270
	just plotted what the averages
	were for the neighborhoods but
00	14.533
	274
	the modeling, and to model the
	prediction. So if we could put
00	30.766
	279
	expected price. We started
	building a model and what we've
00	42.800
	283
	factors. And so then when we put
	those factors into just a
00	58.833
	287
	more, some of the fit statistics
	you've told us about in class.
00	15.466
	292
	but mostly it's a cloud around
	that residual zero line. So
00	30.766
	296
	which was way bigger than any of
	our other models. So we know
00	45.800
	300
	reasons we use real data.
	Sometimes, this is real. This is
00	58.266
	304
	looking?
	Like this is residual values.
00	19.266
	309
	is good. Ah, cool.
	Cool. Okay, so I'll look for
00	34.966
	313
	is sort of how we're answering
	our few important questions. And
00	47.300
	317
	was really difficult to clean
	the data and to join the data.
00	57.866
	03.500
	322
	wanted to demonstrate how JMP
	in combination with a real world
00	28.700
	327
	Number one in a real project,
	scoping is important. We want to
00	47.600
	331
	hope to bring to the
	to the group. Pitfall number two,
	it's vital to explore the
00	08.033
	336
	the area of linking data
	combining data from multiple
00	27.800
	341
	recoding
	and making sure that linkable
00	45.100
	345
	346
	reproducible research is vital,
	especially in a team context,
	especially for projects that may
00	05.966
	351
	habits of guaranteeing
	reproducibility. And finally,
	we hope you notice that in these
00	32.633
	356
	on the computation and
	interpretation falls by the
00	51.900
	360

cdjacobs · ‎10-12-2020

Great beginner presentation on data project workflows and its workflow thinking!