Speaker | Transcript |
| So welcome everyone. My name is |
00 | 07.933 |
| 3 |
| Ambassador with JMP. I am now a |
| retired professor of Business |
00 | 30.566 |
| 7 |
| between a student and a |
| professor working on a project. |
00 | 49.700 |
| 11 |
| 12 |
| engage students in statistical |
| reasoning, teach that |
00 | 12.433 |
| 16 |
| to that, current thinking is |
| that students should be learning |
| about reproducible workflows, |
00 | 36.266 |
| 21 |
| elementary data management. And, |
| again, viewing statistics as |
00 | 58.800 |
| 25 |
| 26 |
| wanted to join you today on this |
| virtual call. Thanks for having |
00 | 20.600 |
| 30 |
| and specifically in Manhattan, |
| and you'd asked us so so you |
00 | 36.433 |
| 34 |
| And we chose to do the Airbnb |
| renter perspective. So we're |
00 | 51.733 |
| 38 |
| expensive. |
| So we |
| started filling out...you gave us |
00 | 09.166 |
| 43 |
| 44 |
| separate issue, from your main |
| focus of finding a place in |
00 | 36.066 |
| 49 |
| you get...if you get through the |
| first three questions, you've |
00 | 54.100 |
| 53 |
| know, is there a part of |
| Manhattan, you're interested in? |
00 | 11.133 |
| 58 |
| repository that you sent us to. |
| And we downloaded the really |
00 | 26.433 |
| 32.866 |
| 63 |
| thing we found, there were like |
| four columns in this data set |
00 | 46.766 |
| 67 |
| figured out so that was this |
| one, the host neighborhood. So |
00 | 58.100 |
| 71 |
| 72 |
| figured out that the first two |
| just have tons of little tiny |
00 | 13.300 |
| 76 |
| Manhattan. So we selected |
| Manhattan. And then when we had |
00 | 29.700 |
| 80 |
| that and then that's how we got |
| our Manhattan listings. So |
00 | 44.033 |
| 84 |
| data is that you run into these |
| issues like why are there four |
00 | 03.300 |
| 88 |
| restricted it to Manhattan, I'll |
| go back and clean up some |
00 | 18.033 |
| 92 |
| data will describe everything we |
| did to get the data, we'll talk |
00 | 28.400 |
| 33.200 |
| 97 |
| know I'm supposed to combine |
| them based on zip, the zip code, |
00 | 47.166 |
| 101 |
| 102 |
| 107 columns, |
| it's just hard to find the |
00 | 09.366 |
| 106 |
| them, so we knew we had to clean |
| that up. All right, we also had |
00 | 27.366 |
| 111 |
| journal of notes. In order to |
| clean this up, we use the recode |
00 | 45.500 |
| 115 |
| Exactly. Cool. |
| Okay, so we we did the cleanup |
00 | 02.200 |
| 119 |
| Manhattan tax data has this zip |
| code. So I have this zip code |
00 | 19.300 |
| 123 |
| day of class, when we talked |
| about |
| data types. And notice in the |
00 | 42.300 |
| 128 |
| the...analyze the distribution of |
| that column, it'll make a funny |
00 | 03.200 |
| 133 |
| Manhattan doesn't really tell |
| you a thing. |
| But the zip code clean data in |
00 | 18.466 |
| 23.266 |
| 139 |
| just a label, an identifier, and |
| more to the point, |
| when you want to join or merge |
00 | 41.833 |
| 48.766 |
| 145 |
| important. It's not just an |
| abstract idea. You can't merge |
00 | 03.166 |
| 11.266 |
| 150 |
| nominal was the modeling type, |
| we just made sure. |
00 | 26.200 |
| 31.033 |
| 155 |
| about the main table is the |
| listings. I want to keep |
00 | 45.533 |
| 159 |
| to combine it with Manhattan tax |
| data. |
| Yeah. Then what? Then we need to |
00 | 03.266 |
| 164 |
| tell it that the column called |
| zip clean, |
| zip code clean... |
| Almost. There we go. |
| And the column called zip, which |
00 | 33.200 |
| 171 |
| 172 |
| Airbnb listing |
| and match it up with anything in |
00 | 57.033 |
| 177 |
| 178 |
| them in table every row, whether |
| it matches with the other or |
00 | 13.233 |
| 182 |
| main table, and then only the stuff |
| that overlaps from the second |
00 | 29.600 |
| 186 |
| another name like, Air BnB IRS |
| or something? Yeah, it's a lot |
00 | 50.966 |
| 190 |
| do one more thing |
| because I noticed these are just |
| data tables scattered around |
00 | 06.666 |
| 195 |
| running. Okay. So I'll save this |
| data table. Now what? |
| And really, this is the data |
00 | 19.833 |
| 22.033 |
| 26.266 |
| 35.466 |
| 203 |
| anything else, before we lose |
| track of where we are, let's |
00 | 49.733 |
| 58.800 |
| 01.833 |
| 209 |
| or Oak Team? |
| And then |
| part of the idea of a project |
00 | 23.700 |
| 214 |
| thing. So if you |
| grab, I would say, take the |
00 | 50.100 |
| 218 |
| 219 |
| 220 |
| two original data sets, and then |
| my final merged. Okay Now |
00 | 16.200 |
| 225 |
| them as tabs. |
| And as you generate graphs and |
00 | 36.566 |
| 229 |
| 230 |
| 231 |
| even when I have it in these |
| tabs. Okay, that's really cool. |
00 | 58.833 |
| 02.500 |
| 236 |
| right, go Oak Team. |
| Well, hi, Dr. Carver, thanks so |
00 | 19.233 |
| 240 |
| you would just glance at some of |
| these things, and let me know if |
00 | 32.300 |
| 244 |
| we used Graph Builder to look at |
| the price per neighborhood. And |
00 | 45.400 |
| 248 |
| help it be a little easier to |
| compare between them. So we kind |
00 | 01.000 |
| 252 |
| have a lot of experience with |
| New York City. So we plotted |
00 | 18.166 |
| 256 |
| stand in front of the UN and |
| take a picture with all the |
00 | 31.733 |
| 260 |
| saying in Gramercy Park or |
| Murray Hill. |
| If we look back at the |
00 | 46.566 |
| 265 |
| thought we should expand our |
| search beyond that neighborhood to |
00 | 58.766 |
| 269 |
| 270 |
| just plotted what the averages |
| were for the neighborhoods but |
00 | 14.533 |
| 274 |
| the modeling, and to model the |
| prediction. So if we could put |
00 | 30.766 |
| 279 |
| expected price. We started |
| building a model and what we've |
00 | 42.800 |
| 283 |
| factors. And so then when we put |
| those factors into just a |
00 | 58.833 |
| 287 |
| more, some of the fit statistics |
| you've told us about in class. |
00 | 15.466 |
| 292 |
| but mostly it's a cloud around |
| that residual zero line. So |
00 | 30.766 |
| 296 |
| which was way bigger than any of |
| our other models. So we know |
00 | 45.800 |
| 300 |
| reasons we use real data. |
| Sometimes, this is real. This is |
00 | 58.266 |
| 304 |
| looking? |
| Like this is residual values. |
00 | 19.266 |
| 309 |
| is good. Ah, cool. |
| Cool. Okay, so I'll look for |
00 | 34.966 |
| 313 |
| is sort of how we're answering |
| our few important questions. And |
00 | 47.300 |
| 317 |
| was really difficult to clean |
| the data and to join the data. |
00 | 57.866 |
| 03.500 |
| 322 |
| wanted to demonstrate how JMP |
| in combination with a real world |
00 | 28.700 |
| 327 |
| Number one in a real project, |
| scoping is important. We want to |
00 | 47.600 |
| 331 |
| hope to bring to the |
| to the group. Pitfall number two, |
| it's vital to explore the |
00 | 08.033 |
| 336 |
| the area of linking data |
| combining data from multiple |
00 | 27.800 |
| 341 |
| recoding |
| and making sure that linkable |
00 | 45.100 |
| 345 |
| 346 |
| reproducible research is vital, |
| especially in a team context, |
| especially for projects that may |
00 | 05.966 |
| 351 |
| habits of guaranteeing |
| reproducibility. And finally, |
| we hope you notice that in these |
00 | 32.633 |
| 356 |
| on the computation and |
| interpretation falls by the |
00 | 51.900 |
| 360 |