Geographic Voting Cluster Analysis Edit
A discussion of higher-dimension visualization of geographic clusters of voting patterns. Most visualizations are based on one dimensional (Democrat/Republican axis) or two-dimensional (correlation between two races, e.g. gap) relationships.
It is possible to measure the "distance" between precincts in a higher dimensional space in terms of their voting patterns, where the distance is represented as sqrt( (candidate_a^2 + candidate_b^2 ...)/number of candidates)
Correlation and Targeting of Voters Edit
Central to any effort to promote a candidate or issue is identifying voters as likely to be supportive or not.
Paradoxically, on a macro scale the efforts by large campaigns and the major political Parties rely on microtargeting: the use of personal information such as race, community affiliations, past campaign donations and personal information such as hypothetically shopping patterns, magazine subscriptions, what kind of car someone drives and so forth to establish correlations which suggest predictively that one might support a candidate or an issue being contested in the present.
Correlation is establishing, based on polling data or other sources, that such factors as the above "follow on the same lines" as the level of support for past issues or candidates.
The predictive element involves selecting past issues or candidates, and then the appropriate microtargeting indicators, in hopes of identifying cohorts of voters likely to support the cause celebre.
In fact, with enough different indicators and historical information, this can be done with a level of certainty as to be undeniably effective in the political sphere.
Clustering As Visualization Edit
The second part of the puzzle is visualizing the data. Visualization may not be necessary with simple one-dimensional (e.g. Democrat/Republican axis) analysis, or two-dimensional (gap) analysis.
But beyond that it becomes a challenge to visualize any dataset.
Geographic Clustering and Mapping Edit
This refers to looking at geographic voting patterns; establishing correlations; and developing a visualization of those correlations.
Predictive assertions, for the author, are secondary. While some tentative experiments have been tried, they've mostly been strategic or else in the realm of determining "like-mindedness": valuable, but not particularly suited to the immediate tactical needs of an issue in the throes of a campaign.
The argument for geographic clustering Edit
We accept as received wisdom (the author believes correctly) that there are "red" and "blue" states. We can make some large cultural stereotypes to account for this. But this breaks down as we get into neighborhoods: nobody likes to admit that they moved to a certain neighborhood because people there agree with them (how would they determine that: "new car smell"?) or that their thinking was affected by their neighbors ("it's something in the water").
But the facts remain the facts. Here in King County Washington (where the author lives), voting results are reported on the precinct level, with most precincts aggregating 100 to 300 geographically contiguous voters. This is the data we are reporting upon here.
I reproduce below a histogram. On the X (horizontal axis) is the percentage of the vote for Gore in 2000; on the Y (vertical axis) is the number of precincts with that percentage vote.
This is not a statistically normal distribution! Most if not all other races exhibit non-normal distributions, as discussed in Distributions Are Not Normal
Cluster maps (different kinds) as a visualization tool Edit
Cluster map is the generic name given to what is often also referred to as a dot plot. However a dot plot is a representation of data in two dimensions. In higher dimensions, what is often presented as a cluster map is usually a projection. Indeed what we are about to delve into here is not even a (Cartesian) projection but rather an elastic "squishing" into two dimensions of what is manifested in a higher-dimensional space.
What does that mean?
There are many different algorithms which are used by "licensed statisticians" for cluster analysis. The particular algorithm utilized here is (hopefully) popularly accessible if we describe it as a model of stardust coalescing into a solar system (but in two dimensions). The algorithm is iterative, as would be a model of a coalescing solar system.
This algorithm is based on an invariate attraction between precincts based on the calculated distance as described at the beginning of this article, and a non-linear mutual repulsion (inverse of distance in the iterating map): these conditions being sufficient to ensure self-organizing behavior (Prigogine et al).
Many examples of these cluster maps can be found here.
If you find more than two clusters, they probably don't disagree for the same reasons. Given clusters (A,B,C) and issues (X,Y), you will see that A agrees with B on X and disagrees on Y, and that A agrees with C on Y. This sort of analysis tends to reveal the local issues of contention.
Emmissaries are hard to find (and doubly valuable if you do!). Whether you subscribe to the "new car smell" or "it's the water" camps of theorists, the fact remains that generally the people in a cluster have the most empathic connection with others in that cluster: if you are going to send an emissary to talk to people in that cluster, best if it's someone from that cluster! Problem is that the clusters do in fact tend to cluster geographically, so it is hard to bridge geographic gaps. Of course when you can find a cluster which is on your side, which bridges a geographic gap, and you can identify an actual person as an emmissary in that cluster then that person is extremely valuable.
Fred Morris November 2006