Hello.
My name is Rob Carver, and today I want to share a story
about a project I've been working on in my small town in Massachusetts.
At the outset, I'll point out that the slides
and the JMP data table are up on the discovery website
and there's a new academic case study on this very topic that will be posted.
It's not already posted. It will be posted very soon.
What I'm hoping to do in 30 minutes is spend most of our time with a JMP demo,
but you're going to need some context and background.
I want to provide a little bit of a scenario,
give you a sense of the problem that I'm trying to solve,
and talk about the research strategy
and then get into the demo and wrap up with some conclusions.
I live in a town called Sharon, which is an archetypal New England town.
Here you see a picture of our talent center.
It was incorporated in 1765, so an old community.
Like many New England communities, the legislative function of the town,
is managed by the annual town meeting in which anyone can come and speak.
M emorialized in the Norman Rockwell images.
From the start, we have used an open-town meeting
and the executive function is carried out by a three-member board
known as The Select Board.
But since Norman Rockwell's day, municipal government has become
technologically, financially, legally more complex,
even for the most fundamental services that a town provides.
Attendance at the town meeting has really dwindled.
About a year and a half ago,
The Select Board created a governance study committee, of which I'm a member.
we are doctors and lawyers
and accountants, and teachers,
and marketing people, local business people.
I'm the resident stats guy.
Overtime, the population of the town has evolved, it's grown .
It's more diverse than it was 100 years ago.
We've gone from an agrarian manufacturing community
to a bedroom community for the city of Boston.
Lots of professionals working in hospitals and universities, law firms in the city.
People tend not to live and work in the town,
and that has impactsq on participation in town governance.
The charge to the governance study committee
is find ways to boost citizen engagement.
we've been doing our due diligence.
We've been researching, we've surveyed residents,
we've read the literature, we've interviewed town officials.
One part of our research, and that's what this talk is about,
is we wanted to reach out to towns like Sharon
to find out what are they doing, what's their experience.
There's some comparative research.
There are 350 towns in Massachusetts.
We have time constraints,
and so we're looking for a way to identify a smallish number of communities
that are similar to us.
We didn't want to reinvent the wheel,
but we thought that modernizing it some would be a good idea.
The driving question covered in this research
is which towns are similar to this town.
A little bit about Sharon.
We sit in South-eastern Massachusetts.
We are not too far from Plymouth,
which is where the 1620 May Flower landing happened.
This community was originally populated by Wampanoag peoples.
Europeans arrived in 163 7.
We're about halfway between Boston and Providence.
For the sports fans out there,
we are next door to where the New England Patriots play football.
Population about 18,500, which is quite average in Massachusetts.
We have great percentage of the voters of the population are registered voters.
Yet out of all those people, we get 2% for a town meeting.
Most recently in May of 2022.
This was the scene and a lot of that is COVID related.
There was social distancing rules in effect, but turnout is low,
partly because of COVID,
partly because of factors that we don't fully understand.
One task for the governance study community is to consider
other alternatives to town meeting,
or tweaks and enhancements to town meeting.
Under state law in an open-town meeting
to participate, you have to be in the room.
It's broadcast on local television,
but you have to be present to speak or to vote.
State law also says there's three ways to run local government.
74% of the communities, the large majority do what Sharon does.
Open-town meeting once or twice a year.
A small number have what's called representative town meeting in which
voters elect their neighbors, maybe a few hundred of them,
to participate and vote in town meeting.
Traditionally cities have had small councils
with a mayor or administrator of some kind.
Increasingly that's being adopted by towns, and so we're looking into that.
For this talk, the task is identify peer towns,
that we can then we could then interview and consult with and reach out to them.
I mentioned some of the state legal constraints.
One other constraint is the town boards,
like a government study committee, have to have open meetings.
Anything we do and decide and deliberate about has to be in public,
which is a good thing.
We have no budget.
We have some wonderful staff in the town hall, but they are
busy doing other things as well.
Data availability was a mixed story.
Plenty of data available about characteristics of communities.
We're really interested in how many people participate in local government
and there's no centralized data about that,
so we needed to hunt for proxies.
We also had no ability to compel folks in other towns
to meet with us, advise us, or share data with us.
We're operating in a topic area that is heavily governed by tradition.
People really cleave to that Norman Rockwell inch.
We came up with a three- stage plan.
As a committee, we brainstorm variables, say why do people participate?
Why don't people participate?
Why are different towns different?
I then grabbed some data from voter turnout in a recent state-wide election
to use as a proxy for citizen engagement.
Ran some models in JMP
to identify those variables that seem to have predictive value.
The committee then discussed and added some more variables
that they thought were important on the town meeting dimension.
That generated 20 predictor columns,
which I knew was far more than I wanted to deal with.
I consulted my brain trust some academic colleagues special thanks go to,
Mia Stevens and Ruth Humble at Chomp,
who advised me on principal components analysis,
which I'll note that the outset was not part of my comfort zone,
so I told her a little bit about that.
Then I ran cluster analysis.
That's the main event today.
People on the committee understood
that we probably want to be talking to towns of comparable size,
but there's more to similarity than size.
There's more to similarity than being a geographic neighbor.
Part of the work involved
instructing the committee a little bit on cluster analysis.
Just in case anybody watching doesn't have much background in this,
here's how I did it.
I said well,
we can look at population and something else at the same time.
and maybe though that something else has an impact on participation.
In this case, the Y axis was a single family property tax bills.
You can see that there's a bunch of towns similar in size to Sharon,
but which might have very different tax impacts.
The idea and cluster analysis,
if you are going to work in two dimensions,
choose two attributes that you think are relevant to your query,
spread the towns out on those two dimensions,
and then identifying a reasonable number of towns
that are reasonably similar to Sharon.
That's a big idea in cluster analysis.
Fortunately we're not limited to two attributes or two dimensions.
We can have more than that.
with that, I think you now know enough to follow the demo.
Where we're walking into this demo,
I had used gathered data from a variety of state and publicly available sources.
Used query builder to build a large data table
inspected for outliers and missing data.
The one real outlier is the city of Boston,
which is just unique.
T hat's excluded from all the analysis.
A little bit of missing this, but nothing terrible.
I'm going to be showing you a JMP project.
Let me switch gears, move into the demo and I hope that I do this correctly.
What we're looking at is my data table of 351 cities and towns.
The first several columns are identification,
size of The Select Board,
their legislative option name of community.
The next 20 columns are our predictors.
Just to round us a bit,
if we look at some basic descriptives of the communities,
towns in Massachusetts tend to be on the small size.
The medium is only 10,000 people.
Sharon is quite near the mean community size.
Terms of legislative function, 74% use open- town meetings,
so we are in good company,
and in terms of the size of The Select Board which is,
another thing, the governance committee is looking at just about 50/ 50.
Half of the town's with a Select Board have three members, half have five.
We've got these 20 predictors.
One issue that comes up pretty early in the analysis is the linearity.
Here I have five
variables that all speak to the size and the electric, the size of the town.
You can see that there are some very strong correlations.
We generally speaking,
don't want to deal with so much of the linearity.
One way out is principal components analysis.
At this point, not quite ready to jump into clustering,
but want to take those 20 columns and distil them down,
conserve as much information as possible,
but reduce the redundancy and collinearity across columns.
To do that,
principal components analysis is an excellent option.
I don't have the ability today to give a full crash course
in principal components analysis,
but we can see that we have variables that seem to be
overlapping in terms of their message.
We also can see that
when you give a PCA 20 columns, it initially comes up with 20 components.
The first few of which seem to capture most of the variability.
We have to make a decision about
how many principal components to use and what they represent.
For this, the screen plot is helpful
and we're looking for a kink or an elbow in the plot.
That seems to happen somewhere down here, around 4, 5, 6 components.
If we consult the
loading matrix to see how
different variables associate load into different components,
we can begin to subjectively assign meaning to the components.
I'll cut to the chase.
We selected six principal components
as being informative for the purposes of cluster analysis,
and came up with some interpretations that made sense to us.
Things like how big is the town?
How affluent is the town?
How fast is it growing?
Now we're ready for clustering.
Back to JMP.
There are two basic approaches to clustering.
You might think of them as a top- down and a bottom- up.
Both of them take the raw data, standardize it,
and then compute Euclidean distances for each pair of rows in the data table
for each pair of communities.
Taking into account six factors,
six principle components.
Size, affluence, education, things like that,
which ones are similar to Sharon
in hierarchical clustering?
The report starts us off with a dazzling graph that with 350 rows,
this is hard to interpret.
Let's begin we'll come back to it.
Begin with something that's a little easier to interpret,
which is the cluster summary.
In the hierarchical method, JMP has found for us 16 clusters.
I can tell you because I peaked
that Sharon turns out to be in cluster 15.
Sharon and 23 other towns.
For example,
if you scan down the affluence column, we see that,
again these are standardized scores,
these are the most affluent towns in the state.
If we come over here to the growth column,
and this is largely growth in the housing units and population.
The least growth, in fact, some negative growth.
If we come back up here.
Now, having looked at the cluster summary,
all of the clusters have been identified and colored.
JMP gives us a cut point.
If we zoom in on cluster 15.
Let's make that a little bigger.
Sharon is here in the center.
It's nearest Euclidean neighbor is Winchester,
which is about an hour's drive.
We now have a provisional list of towns to consult with.
All right, so that's a crash course in the hierarchical clustering.
I'll move out to the K-Means.
Hierarchical is bottom up.
We start with 350 individual towns as clusters.
We interrogate the distance matrix, all the parallelized distances,
find those two towns that are nearest to each other,
they form a cluster.
We take the mean distance of that cluster.
Now we're either looking for the next two nearest towns
or the next town that's closest to that cluster,
and iteratively process for the tree until
we have one gigantic cluster of 350 towns.
With K-M eans clustering.
We flip the process.
We start with 350 towns in one cluster
and then begin slicing and dividing in multiple dimensions.
In this approach, same utility and distances, same distance matrix,
we end up with Sharon being in cluster number 4,
with a full compliment of 33 towns.
We automatically get a cluster means picture.
Again, very affluent low growth,
not necessarily the lowest, but low growth again.
We get slightly different results.
I think in the interest of time, I will show you one other graph.
There's various things to look at,
but let's look at the parallel coordinate blocks.
What is this tool?
We have 16 clusters,
and by the way in K-Means,
it's up to the user to specify the number of clusters.
I chose 16 as a starting point because that's what hierarchical gave us.
H ere we are, cluster four.
The dark brown line is Sharon.
Here we see the six characteristics, the six principal components.
F or example, if we compare,
how is cluster 4 different from cluster 3 let's say, or cluster 5,
maybe similar in sizes.
Cluster five, less affluent.
Property values are a little lower.
Permanent population refers to
their communities with universities, hospitals, prisons, so forth,
vacation homes, snowbirds who leave for the winter.
Towns differ in terms of their permanent populations in cluster three, much lower.
Here's where we find our university towns.
I just popped up the town of Shirley, Massachusetts
as the state's largest
maximum security prison.
I don't know if we consider these folks permanent residence or not, but any event.
We've done two different clustering methods.
Let's take a look at how the results compare.
I saved within each clustering method,
the cluster assignments for each town, created binary variables.
Are you in the same cluster as Sharon or are you in a different cluster?
I also, just as an aside,
JMP has lots of wonderful built in geographic maps.
It does not have a built- in map showing
municipality borders within the state of Massachusetts.
But it turns out that with JMP
it's fairly easy to create a new geographic map.
I was able to do this without very much work at all.
Here are the results of hierarchical clustering, cluster 15.
Sharon is here.
It's similar towns are in blue.
I was pleased to see that my little tiny hometown of Marbleh ead
is similar to the place I moved to.
Hierarchical clustering 15 gives us these 24 towns.
K-Means clustering and some more towns, but there's an awful lot of overlap.
I also, just out of curiosity,
look to identify the 33 towns about 10% of the state
that is most similar to Sharon.
This is a larger group.
Again, an awful lot of repeats.
A lot of repeats.
That last approach also gave us some other advantages.
I want to now shift back to PowerPoint and talk about some of those.
O ne last point before I finish the demo.
We were also curious to ask...
Mostly our goal was, who shall we interview?
Who shall we call in to meet publicly with our committee?
But while we're at it,
let's see what our peers do in terms of governance.
State-wide,
open-town meeting— OTM— dominates 74% and there is no dominant fourth size.
If we look at who's in our cluster,
let me use K-Means, because it's a little bit larger group.
That 74%,
jumps up to 85% with time meeting.
By a two to one margin,
towns have five-member Select Boards.
Now, this isn't definitive as to channel my mother.
If all the other towns jumped off the Empire State Building,
we wouldn't necessarily want to jump off.
But it's interesting to note that the towns most similar to Sharon
favor the five-member Board
and are even more inclined to open- town meeting.
With that, let me
get back to some conclusions.
So what did we learn?
One thing we learned was the geographic proximity is uninformative in those maps,
none of the abutting towns came up blue.
Our most similar communities are not our next door neighbors.
As I just noted,
open-town meeting in five- member boards really predominate.
So what did we do?
This work actually happened several months ago.
We were able to prioritize our outreach,
begin contacting those towns most similar to us.
Many were extremely cooperative and shared a lot of information and data.
We also didn't want to assume that open-town meeting
was the only way to consider,
so we wanted to talk to people with representative or councils.
Those Euclidean distances became instructive in terms of,
okay, none of our immediate neighbors, closest neighbors
use town council or representative town meeting.
But which RTM town, which council town is most like us?
And we contacted those folks as well.
W e went from having to contemplate outreach to 350 towns
in a limited amount of time and with no money and staff,
to focused sampling method.
Then because town officials talk to one another
and they are professionally active,
that led us to other interviewees.
With that, I think that's about my time.
I hope this has been interesting and constructive.
Thank you for coming
and I hope you enjoy the rest of the program.