<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.ouseful.info/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>PLEASE change feed URL to http://blog.ouseful.info/feed/</title>
	
	<link>http://blog.ouseful.info</link>
	<description>In the expectation that Google will axe feedburner, please change any subscriptions to this feed to http://blog.ouseful.info/feed/</description>
	<lastBuildDate>Sat, 25 May 2013 17:29:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain="blog.ouseful.info" port="80" path="/?rsscloud=notify" registerProcedure="" protocol="http-post" />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog...</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.ouseful.info/ouseful" /><feedburner:info uri="ouseful" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://blog.ouseful.info/?pushpress=hub" /><xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" /><feedburner:emailServiceId>ouseful</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Pondering Bibliographic Coupling and Co-citation Analyses in the Context of Company Directorships</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/uQWpy4eZNzQ/</link>
		<comments>http://blog.ouseful.info/2013/05/24/pondering-bibliographic-coupling-and-co-citation-analyses-in-the-context-of-company-directorships/#comments</comments>
		<pubDate>Fri, 24 May 2013 12:13:25 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[opencorporates]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10734</guid>
		<description>Over the last month or so, I&amp;#8217;ve made a start reading through Mark Newman&amp;#8217;s Networks: An Introduction, trying (though I&amp;#8217;m not sure how successfully!) to bring an element of discipline to my otherwise osmotically acquired understanding of the techniques employed by various network analysis tools. One distinction that made a lot of sense to me [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10734&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>Over the last month or so, I&#8217;ve made a start reading through Mark Newman&#8217;s <a href="http://www.amazon.co.uk/Networks-Introduction-Mark-Newman/dp/0199206651?tag=ouseful-21">Networks: An Introduction</a>, trying (though I&#8217;m not sure how successfully!) to bring an element of discipline to my otherwise osmotically acquired understanding of the techniques employed by various network analysis tools.</p>
<p>One distinction that made a lot of sense to me came from the domain of bibliometrics, specifically between the notions of <em>bibliographic coupling</em> and <em>co-citation</em>.</p>
<p><em>Co-citation</em><br />
The idea of co-citation will be familiar to many &#8211; when one article cites a set of other articles, those other articles are &#8220;co-cited&#8221; by the first. When the same articles are co-cited by lots of other articles, we may have reason to believe that they are somehow related in a meaningful way.</p>
<p><a href="http://en.wikipedia.org/wiki/Co-citation"><img src="http://ouseful.files.wordpress.com/2013/05/cocitation-analysis.png?w=700" alt="cocitation analysis"   class="alignnone size-full wp-image-10736" /></a><br />
<small><em><a href="http://en.wikipedia.org/wiki/Co-citation">Image via Wikipedia</a></em></small></p>
<p>In graph terms, we might also represent this as simpler graph within which edges between two articles indicate that they have been co-cited by documents within a particular corpus, with the weight of each edge representing the number of documents within that corpus that have co-cited them.</p>
<p><em>Bibliographic coupling</em><br />
Bibliographic coupling is actually an earlier notion, describing the extent to which two works are related by virtue of them both referencing the same other work.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/bibliographic-coupling.png"><img src="http://ouseful.files.wordpress.com/2013/05/bibliographic-coupling.png?w=700" alt="Bibliographic coupling"   class="alignnone size-full wp-image-10735" /></a><br />
<small><em><a href="http://en.wikipedia.org/wiki/Bibliographic_coupling">Image via Wikipedia</a></em></small></p>
<p>Again, in graph terms, we might think of a simpler undirected network in which edges between two articles act as an indicator that they have cited or referenced the same work, with the weight of the edge representing the number of documents that they have co-cited.</p>
<p>A comparison of co-citation and bibliographic coupling networks shows one to be &#8220;retrospective&#8221; and the other to be &#8220;forward looking&#8221;. The articles referenced in bibliographic coupling network can be generated directly from a corpus set of articles, and to this extent bibliographic coupling looks to the past. In a co-citation network, the edges that connect two articles can only be generated when a future published article cites them both.</p>
<p><em>Co-citation, Bibliographic Coupling and Company Director Networks</em></p>
<p>For some time I&#8217;ve been tinkering with the notion of co-director networks, using OpenCorporates data as a data source (eg <a href="http://blog.ouseful.info/2013/04/23/mapping-corporate-networks-with-opencorporates/">Mapping Corporate Networks With OpenCorporates</a>). What I&#8217;ve tended to focus on are networks built up from active companies and their current directors, looking to see which companies are currently connected by virtue of currently sharing the same directors. On the to do list are timelines showing the companies that a particular director has been associated with, and when, as well as directorial appointments and terminations within a particular company.</p>
<p>In both co-citation and bibliographic analyses, the nodes are the same type of thing (that is, works that are citated, such as articles). A work cites a work. <em>(Note: does <tt>author co-citation analysis</tt> rely on mappings from works to cited authors, or citing authors to cited authors?)</em>. In company-director networks, we have bipartite representation, with directors and companies representing the two types of node and where edges connect companies and directors but not companies and companies or directors and directors; unless a company is a director, but we generally fudge the labelling there.</p>
<p>If we treat &#8220;companies that retain directors&#8221; as &#8220;articles that cite other articles&#8221;:</p>
<p>- under a &#8220;co-citation&#8221; style view, we generate links between companies that share common directors;<br />
- under a &#8220;bibliographic coupling&#8221; style view, we generate links between directors of the same companies.</p>
<p>I&#8217;ve been doing this anyway, but the bibliographic coupling/co-citation distinction may help me tighten it up a little, as well as improving ways of calculating and analysing these networks by reusing analyses described by the bibliometricians?</p>
<p>Pondering the &#8220;future vs. past&#8221; distinction, the following also comes to mind:</p>
<p>- at the moment, I am generating networks based on current directors of active companies;<br />
- could we construct a dynamic (temporal?) hypergraph from hyperedges that connect all the directors associated with a particular company at a particular time? If so, what could we do with this graph?! (As an aside, it&#8217;s probably worth noting that I know absolutely nothing about hypergraphs!)</p>
<p>I&#8217;ve also started wondering about &#8216;director pathways&#8217; in which we define directors as nodes (where all we require was that a person was a director of a company at some time) and directed &#8220;citation&#8221; edges. These edges would go from one director to other director nodes under the condition that the &#8220;citing&#8221; director was appointed to a particular company within a particular time period t1..t2 before the appointment to the same company of a &#8220;cited&#8221; director. If one director follows another director into more than one company, we increase the weight of the edge accordingly. (We could maybe also explore modes in which edge weights represent the amount of time that two directors are in the same company together.)</p>
<p>The aim is&#8230; probably pointless and not that interesting. Unless it is&#8230; The sort of questions this approach would allow us to ask would be along the lines of: are there groups of directors whose directorial appointments follow similar trajectories through companies; or are there groups of directors who appear to move from one company to another along with each other?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10734/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10734/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10734&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=uQWpy4eZNzQ:vS_BSeCA43I:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=uQWpy4eZNzQ:vS_BSeCA43I:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=uQWpy4eZNzQ:vS_BSeCA43I:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=uQWpy4eZNzQ:vS_BSeCA43I:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=uQWpy4eZNzQ:vS_BSeCA43I:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=uQWpy4eZNzQ:vS_BSeCA43I:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=uQWpy4eZNzQ:vS_BSeCA43I:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=uQWpy4eZNzQ:vS_BSeCA43I:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=uQWpy4eZNzQ:vS_BSeCA43I:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/uQWpy4eZNzQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/24/pondering-bibliographic-coupling-and-co-citation-analyses-in-the-context-of-company-directorships/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/cocitation-analysis.png" medium="image">
			<media:title type="html">cocitation analysis</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/bibliographic-coupling.png" medium="image">
			<media:title type="html">Bibliographic coupling</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/24/pondering-bibliographic-coupling-and-co-citation-analyses-in-the-context-of-company-directorships/</feedburner:origLink></item>
		<item>
		<title>Notes on Narrative Science and Automated Insights</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/LlSS0JY1sio/</link>
		<comments>http://blog.ouseful.info/2013/05/22/notes-on-narrative-science-and-automated-insight/#comments</comments>
		<pubDate>Wed, 22 May 2013 14:40:33 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[narrativeScience]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10703</guid>
		<description>In October 2009, the New York Times Media Decoder blog picked up on a story that had been doing the rounds about a research project called Stats Monkey from the Intelligent Information Laboratory at Northwestern University. The Robots Are Coming!, it declared, with the immediate rejoinder, Oh, They’re Here. Using play by play baseball data, [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10703&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>In October 2009, the New York Times <em>Media Decoder</em> blog picked up on a story that had been doing the rounds about a research project called <a href="http://infolab.northwestern.edu/projects/stats-monkey/">Stats Monkey</a> from the <a href="http://infolab.northwestern.edu/">Intelligent Information Laboratory</a> at Northwestern University. <a href="http://mediadecoder.blogs.nytimes.com/2009/10/19/the-robots-are-coming-oh-theyre-here/"><em>The Robots Are Coming!</em></a>, it declared, with the immediate rejoinder, <em>Oh, They’re Here.</em> Using play by play baseball data, <em>Stats Monkey</em> produced human readable reports of a baseball game, formulaic admittedly, but good enough, particularly when complemented by quotes from a post-match press conference report. Mechanical churnalism complementing data-driven analysis, cast into prose. (It&#8217;s worth noting that the Media Decoder post itself is little more than a restatement of what was presumably the Stats Monkey website blurb at the time.)</p>
<p>In April 2010, Bloomberg Businessweek Magazine asked <a href="http://www.businessweek.com/magazine/content/10_19/b4177037188386.htm">Are Sportswriters Really Necessary?</a>, describing how <a href="http://narrativescience.com/">Narrative Science</a>, a company that incorporated at the start of that year and spun out off the back of the Stats Monkey project had teamed up with the Big Ten Network to produce automatically generated sports reports, <a href="http://btn.com/?s=narrative+science">a relationship that presumably continues to this day</a>.</p>
<p><a href="http://btn.com/2013/03/15/track-no-4-wisconsin-vs-no-5-michigan/"><img src="http://ouseful.files.wordpress.com/2013/05/btn-and-narratve-science.png?w=700" alt="BTN and Narrative Science?"   class="alignnone size-full wp-image-10711" /></a></p>
<p>A year later, and Forbes magazine produced a report in June 2011 about <a href="http://www.forbes.com/sites/bobcook/2011/06/17/gamechanger-and-narrative-science-fulfilling-the-heretofore-unrealized-demand-for-stilted-stories-about-childrens-game/">GameChanger and Narrative Science: Fulfilling the Heretofore Unrealized Demand for Stilted Stories About Children&#8217;s Games</a>, describing a tie-up between Narrative Science and <a href="http://www.gamechanger.io/">GameChanger</a>, a company that produces a scorekeeping app that allows sports fans, parents and coaches to capture data about a match.</p>
<p><em>(What other companies/apps are out there for crowdsourcing sports analytics in this way, I wonder?)</em></p>
<p>Using GameChanger data and narrative Science story generation tools, it was possible to automate the creation of match reports for small number audiences. I don&#8217;t know if these stories used to be freely accessible, but today the match reports appear to take the form of paywalled notion of <em><a href="http://help.gamechanger.io/customer/portal/articles/354794-game-recap-stories">recap stories</em>.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/recap-stories-commercial.png"><img src="http://ouseful.files.wordpress.com/2013/05/recap-stories-commercial.png?w=700" alt="recap stories commercial"   class="alignnone size-full wp-image-10710" /></a></p>
<p>Paywall aside, examples of other stories generated by Narrative Science using GameChanger data can be found using a simple web search on the phrase <a href="https://duckduckgo.com/?q=%22Powered+by+Narrative+Science+and+GameChanger+Media%22">&#8220;Powered by Narrative Science and GameChanger Media&#8221;</a></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/powered-by-gamechanger-media.png"><img src="http://ouseful.files.wordpress.com/2013/05/powered-by-gamechanger-media.png?w=700&#038;h=651" alt="powered by gamechanger media" width="700" height="651" class="alignnone size-full wp-image-10709" /></a></p>
<p>You can also just search for the byline, as for example it appears in this report:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/narrative-science-byline.png"><img src="http://ouseful.files.wordpress.com/2013/05/narrative-science-byline.png?w=700" alt="Narrative science byline"   class="alignnone size-full wp-image-10708" /></a></p>
<p>In passing, it&#8217;ll be interesting to see how automatically generated stories start to feed into the <em>glitch aesthetic</em> (h/t @danmcquillan for introducing me to this phrase and the related notion of the <a href="http://www.wired.com/beyond_the_beyond/2012/04/an-essay-on-the-new-aesthetic/"><em>new aesthetic</em></a> in his presentation at #opentech last week).</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/glitch-aesthetic.png"><img src="http://ouseful.files.wordpress.com/2013/05/glitch-aesthetic.png?w=700" alt="GLitch aesthetic"   class="alignnone size-full wp-image-10714" /></a></p>
<p>September 2011 saw a media outlook report from Mediabistro&#8217;s <em>Media Jobs Daily</em> noting that <a href="http://www.mediabistro.com/mediajobsdaily/narrative-sciences-robot-journalists-now-tackling-real-estate_b8366">Narrative Science’s ‘Robot Journalists’ Now Tackling Real Estate</a>. The story links through to a page on Builder Online that provides a <a href="http://www.builderonline.com/local-housing-data/mid-atlantic/new-york-northern-new-jersey-long-island-ny-nj-pa.aspx">summary report of housing data</a> for various US cities.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/canned-reporting.png"><img src="http://ouseful.files.wordpress.com/2013/05/canned-reporting.png?w=700" alt="Canned reporting"   class="alignnone size-full wp-image-10712" /></a></p>
<p>What this example, and the GameChanger example, show is how the generation of timely text stories can be automated on top of the regularly updated datasets. The use of natural language interpretive text to describe patterns observed in the underlying data presumably also has SEO benefits.</p>
<p>That same month, September 2011, saw another stats-to-insight company, again emerging from the automated interpretation of sports data, renaming itself <a href="http://techcrunch.com/2011/09/12/statsheet-changes-name-to-automated-insights-lands-4-million/">from StatSheet to Automated Insights</a>. Today, Statsheet continues to publish <a href="http://statsheet.com/statblogs_mlb/boston-red-sox/boston-red-sox/game-recap/red-sox-fall-white-sox-3-1">game recaps</a> combining short natural language summaries with statistical charts, all of which are presumably automatically generated. Within a year, the parent company, <a href="http://automatedinsights.com/">Automated Insights</a> had scaled up and begun publishing <a href="http://automatedinsights.com/yahoo">recaps for Yahoo!&#8217;s fantasy sports matches</a>.</p>
<div class="embed-vimeo"><iframe src="http://player.vimeo.com/video/56590321" width="640" height="360" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe></div>
<p>More recently, Automated Insights have started producing realtime content feeds to support sports commentators &#8211; <a href="http://realtime.automatedinsights.com/mlb">Real-time Insights for MLB</a> &#8211; as well as feeding consumers via the <a href="http://stat.us/">stat.us</a> powered Twitter feeds.</p>
<p><em>(See also: <a href="http://www.yseop.com/EN/home.php">yseop</a>, a French company that generates automated reports from data. [Any more?])</em></p>
<p>Fast forward to the start of 2013, and Narrative Science started publishing human readable prose reports based on US schools data (<a href="http://www.propublica.org/nerds/item/how-to-edit-52000-stories-at-once">ProPublica: <em>How To Edit 52,000 Stories at Once</em></a>). They&#8217;re also doing a lot more work with financial reporting, for example <a href="http://www.forbes.com/sites/narrativescience/">with Forbes</a> as well as for financial services clients, <a href="http://www.youtube.com/watch?v=qc5uxFTUYvw">as this interview with Narrative Science&#8217;s Stuart Frankel describes</a>.</p>
<p><a href="http://www.forbes.com/sites/narrativescience/"><img src="http://ouseful.files.wordpress.com/2013/05/narrative-science-forbes.png?w=700&#038;h=534" alt="narrative science forbes" width="700" height="534" class="alignnone size-full wp-image-10725" /></a></p>
<p>Generating human readable reports from Google Analytics data and dashboards also appears to be a hot topic, with both Narrative Science (<a href="http://www.datarunsdeep.com.au/blog/automated-insight-from-google-analytics-with-quill/">Automated Insight From Google Analytics With Quill</a>) and Automated Insights (<a href="http://techcrunch.com/2013/05/20/automated-insights-site-ai/">With Site Ai, Automated Insights Provides A Cliffs Notes Version Of Your Web Analytics</a>) recently developing tools around this topic.</p>
<p>What I thought was particularly interesting about the ProPublica example was how it suggests a possible widespread future use of &#8220;automatically generated insight&#8221; pulling out headline interpretations from open data sets, as touched on in this great <a href="http://vimeo.com/55439427">introductory technical presentation by Narrative Science&#8217;s Larry Adams</a> (which also happens to mention the possibility of Narrative Science offering platform services via an API&#8230;? It also mentions work with the NHS?):</p>
<div class="embed-vimeo"><iframe src="http://player.vimeo.com/video/55439427" width="700" height="394" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe></div>
<p>At one point during that presentation, Larry Adams suggests that Narrative Science use small set of narrative templates or story types (&#8220;the horserace&#8221; for example, or &#8220;top 10&#8243;) to frame the construction of their stories, as well as mentioning the sorts of feature that they look for within a data set (trends and changes in trends, for example, or outliers). Another presentation, this time by <a href="http://www.youtube.com/watch?v=VJUcChX9Y7w">Narrative Science&#8217;s Kris Hammond</a> also hints at some of the features they look for in data: &#8220;inflexion points, trends, correlations&#8221;.</p>
<p>So what sorts of techniques might we use ourselves to start generating the insights that we might be able to work up into simple narrative sentences, at least for starters?</p>
<p>Top 10, bottom 5 are easy pickings if we can rank the data somehow. I thought this trick for detecting inflexions by coding a time series symbolically and then using a regular expression to detect features was really interesting: <a href="http://dahtah.wordpress.com/2013/05/17/finding-patterns-in-time-series-using-regular-expressions/">Finding patterns in time series using regular expressions</a>. And I wonder, how does the <a>OpenSecrets anomaly tracker </a> define the anomalies it detects?</p>
<p><em>Other posts you might be interested in:<br />
- <a href="http://blog.ouseful.info/2008/11/06/the-tesco-data-business-notes-on-scoring-points/">The Tesco Data Business &#8211; Notes on “Scoring Points”</a><br />
- <a href="http://blog.ouseful.info/2008/12/11/more-remarks-on-the-tesco-data-play/">More Remarks on the Tesco Data Play</a></em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10703/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10703/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10703&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=LlSS0JY1sio:7OKUpfNqhhY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=LlSS0JY1sio:7OKUpfNqhhY:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=LlSS0JY1sio:7OKUpfNqhhY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=LlSS0JY1sio:7OKUpfNqhhY:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=LlSS0JY1sio:7OKUpfNqhhY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=LlSS0JY1sio:7OKUpfNqhhY:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=LlSS0JY1sio:7OKUpfNqhhY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=LlSS0JY1sio:7OKUpfNqhhY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=LlSS0JY1sio:7OKUpfNqhhY:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/LlSS0JY1sio" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/22/notes-on-narrative-science-and-automated-insight/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/btn-and-narratve-science.png" medium="image">
			<media:title type="html">BTN and Narrative Science?</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/recap-stories-commercial.png" medium="image">
			<media:title type="html">recap stories commercial</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/powered-by-gamechanger-media.png" medium="image">
			<media:title type="html">powered by gamechanger media</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/narrative-science-byline.png" medium="image">
			<media:title type="html">Narrative science byline</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/glitch-aesthetic.png" medium="image">
			<media:title type="html">GLitch aesthetic</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/canned-reporting.png" medium="image">
			<media:title type="html">Canned reporting</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/narrative-science-forbes.png" medium="image">
			<media:title type="html">narrative science forbes</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/22/notes-on-narrative-science-and-automated-insight/</feedburner:origLink></item>
		<item>
		<title>Are We Just Google’s Lab Rats?</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/KO5nWf7UD_I/</link>
		<comments>http://blog.ouseful.info/2013/05/19/are-we-just-googles-rats/#comments</comments>
		<pubDate>Sun, 19 May 2013 11:19:24 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10691</guid>
		<description>There are some interesting comments relating to my previous post on Google Lock-In Lock-Out in a comment thread on OSnews: Why Google gets so much credit. Here are some of my own lazy Sunday morning notes/thoughts relating to that, and other comments&amp;#8230; - killing Google Reader does not kill RSS/there was no &amp;#8220;malicious intent&amp;#8221; mapping [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10691&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>There are some interesting comments relating to my previous post on <a href="http://blog.ouseful.info/2013/05/16/google-lock-in/">Google Lock-In Lock-Out</a> in a <a href="http://www.osnews.com/comments/27049">comment thread on OSnews: <em>Why Google gets so much credit</em></a>. Here are some of my own lazy Sunday morning notes/thoughts relating to that, and other comments&#8230;</p>
<p><em>- killing Google Reader does not kill RSS/there was no &#8220;malicious intent&#8221; mapping out the Reader/RSS strategy:</em></p>
<p>A nice phrase in an #opentech talk yesterday was that we (technologists and engineers and data scientists, for example) have to &#8220;act responsibly&#8221;. Google Reader helped popularise feed reading when some of us were hopeful for its future (<a href="http://ouseful.open.ac.uk/blogarchive/010271.html">&#8220;We ignore RSS at OUr Peril&#8221;</a>), and as such attracted many readers away from other clients (myself included), with the result that competition was harder (&#8220;compete against Google? Hmm&#8230; maybe not&#8230;&#8221;). Google Reader&#8217;s infrastructure and unofficial APIs enabled folk to build services off the back of the Google Reader infrastructure turning it into de facto infrastructure for other peoples&#8217; applications and services. (Remember: <a href="http://www.wired.com/culture/lifestyle/news/2005/05/67514">the Google Maps API was unofficial at first</a>). There aren&#8217;t many OPML bundlers out there, for example, but for hackers into appropriating tech Google Reader is one. Since I moved away from Google Reader (to <em>theoldreader</em>) I haven&#8217;t used Flipboard so much, which as far as I was concerned was using Reader essentially as infrastructure. <em>Caveat emptor</em>, I guess, for developers building on top of other companies services (as many Twitter and Facebook app developers keep discovering).</p>
<p>With Feedburner, Google bought up a service that acted as a proxy, taking public syndication feeds, instrumenting them with analytics, and then encouraging the people taking up the syndicated content to subscribe to the Feedburner feed. Where RSS and Atom were designed to support syndication between independent parties, Feedburner &#8211; and then Google &#8211; insinuated itself between those parties. By replacing self-controlled feeds as the subscription endpoint with Google controlled endpoints, publishers gave up control of their syndication infrastructure. With Google losing interest in open syndication feeds as it pursues its own closed content network agenda, we are faced with a situation whereby Google can potentially trash a widespread syndication infrastructure that would have remained resilient if Google hadn&#8217;t insinuated itself into it. Or if we hadn&#8217;t been so stupid as to simplistically accept it&#8217;s overtures.</p>
<p><em>Hmmm&#8230; thinks&#8230; do we need a Google users&#8217; motto? <strong>Don&#8217;t be stupid</strong> perhaps&#8230;?!</em></p>
<p>I applaud Google for developing the services it does, getting them to scale and opening up API access. But as these services become <em>de facto</em> infrastructure, the question of how Google acknowledges any responsibility, that flows from this (even if this responsibility is incorrectly assumed) becomes an issue. Responsibilities arise in other areas too, of course. Such as taxation and corporate transparency. But that&#8217;s another issue. (Would Google act differently if its motto was &#8220;Be responsible&#8221;</em> or <em>&#8220;Act responsibly&#8221;</em> rather than <em><a href="http://investor.google.com/corporate/code-of-conduct.html">&#8220;Don&#8217;t be evil&#8221;</a></em>? It strikes me that &#8220;Act responsibly&#8221; could work as a motto for both companies and their users?)</p>
<p>It seems to me that with Google+, Google is not adopting open syndication standards in two ways: not using it &#8220;internally&#8221;, and not making feeds publicly available. There may be good technical reasons for the first, but by the second Google is *not allowing* its community members to participate in a open content syndication network/system. Google&#8217;s choice, but I&#8217;m not playing.</p>
<p>Google is not killing the open standards by closing off access to them in commercial licensing terms, but it may contribute to stifling their adoption by adopting alternative standards that others feel they have to adopt because of the influence Google has on web traffic.</p>
<p>Consider this other way of looking at it &#8211; Google is presumably trying to get other parties to adopt WebP by developing it as an openstandard. Google assumes that it can drive adoption of this as a web standard by adopting it itself. In terms of argumentation, it doesn&#8217;t follow that by not adopting something Google can prevent it being adopted, (i.e. not adopting or by stopping its own use of a standard, Google kills it generally) but people follow bad logic all the time (and if they follow Google for their technology choices, or have a technology model based on being parasitic on Google infrastructure, Google&#8217;s dropping of a standard effectively kills it for those people) &#8230;</p>
<p><em>- control of what we see</em></p>
<p>Google makes money by putting ad-links in front of eyeballs that people click on. By presenting &#8220;relevant&#8221; ads, Google presumably tries to maximise the click-thru rate so that it can make more money per displayed link.</p>
<p>To encourage you to spend your attention on pages that Google controls, Google has adopted the idea that by presenting you (and me; us) with &#8220;relevant&#8221; content, we are likely to remain engaged. With Google web search, the relevance of search results supposedly attracts us back to the Google search tool. With services such as Google now, Google pre-emptively tries to present you with information it thinks you need, presumably based on predictive models of sequences of action that other people (or you yourself) have demonstrated in the past.</p>
<p>I&#8217;m not really up on behavioural psychology models, but I have a vague memory that intermittent reinforcement schedules were demonstrated to be one of the more effect modes of behaviourist training/operant conditioning. So I wonder: how effective are predictive intermittent positive reinforcement schedules. (You get the idea, right? We&#8217;re pigeons that peck at Android phones and Google is the experimenter trying to get us to peck the right way, by reinforcing us every now and again by satisfying out intent. That is, has there been in a flip away from Google using us to provide reinforcement training signals to its algorithms  in to a situation in which we have become Google&#8217;s experimental lab rats that are coupled in a series of ongoing experiments that train us and its algorithms, jointly, together, to maximise&#8230; something&#8230;)</p>
<p>There is a danger, I think, in Google chasing the &#8220;relevance&#8221; thing too far, seeing the maximisation of whatever conversion metrics it decides on as being a sign that it has &#8220;got things right&#8221; for us, that it is satisfying our &#8220;intent&#8221;. And if operant conditioning does influence the way we behave, maybe we do actually need to start thinking about what the machine algorithms are training us to do. Are training us to do. Training us.</p>
<p><a />Google&#8217;s stated aim</a> is to &#8220;organize the world’s information and make it universally accessible and useful&#8221;.</p>
<p>- Through web search, it started to organise information it presented to use through search results that were more appealingly ranked (seemed &#8220;more relevant&#8221;) than the other search engines did.</p>
<p>- Through personalised search, it started to organise the way it presented results to each of us individually.</p>
<p>- Through web tracking, it presents us with information &#8211; adverts &#8211; organised in a way it presumably thinks are more personally meaningful to use (but maximising what metic exactly? More likely to cause us to act in a particular way, as measured by whether we click the link, or linger on a page, or engage in a particular behaviour that can be captured &#8211; for model building and exploitation purposes &#8211; by web tracking algorithms?)</p>
<p>- Through Google Now, and the new Google image gallery tools, Google is seeking to organise <em>our</em> information (we&#8217;re part of the world, right?) on our behalf and present it back to us in a way that the Google algorithms decide.</p>
<p>The old photos in a drawer back at my family home are sorted howsoever (by whatever algorithm &#8220;use&#8221; and random access results in). Now they&#8217;ll be sorted by Google. Maybe the algorithms are similar. Or maybe they&#8217;re not. What would be evil, I think, was if the ranking algorithms that are used to decide the order in which organic information is presented us start to be influenced by the algorithms that are tied to advertising or marketing, that is, to algorithms that are used to try to maximise the extent to which we are influenced in accord with the goals, beliefs, desires and intents of others (with a hat tip there to agent logic and the theories of intelligent software agents).</p>
<p>At the moment I believe that Google believes it is trying to develop algorithms that benefit us personally, in an utilitarian way. But I&#8217;m not sure what function it is they are maximising or how they think it maps onto any personal theories or preferences we may have about what is &#8220;accessible&#8221; and &#8220;useful&#8221;. I guess we might also ask whether &#8220;accessible&#8221; and &#8220;useful&#8221; are the road to a Good Life (because in the end this comes down to <a href="http://www.google.com/about/company/philosophy/">philosophy</a> and ethics, doesn&#8217;t it?) or whether we should be &#8220;organising the world’s information&#8221; with some other purpose in mind?</p>
<p><em>PS Just by the by, it&#8217;s worth noting that the educational arena is seeking to use <em>learning analytics</em> to instrumentalise our behaviour and engagement within learning systems and contexts for our, erm, learning benefit. (Measured how?)</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10691/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10691/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10691&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=KO5nWf7UD_I:RdmNN_WhyxE:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=KO5nWf7UD_I:RdmNN_WhyxE:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=KO5nWf7UD_I:RdmNN_WhyxE:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=KO5nWf7UD_I:RdmNN_WhyxE:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=KO5nWf7UD_I:RdmNN_WhyxE:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=KO5nWf7UD_I:RdmNN_WhyxE:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=KO5nWf7UD_I:RdmNN_WhyxE:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=KO5nWf7UD_I:RdmNN_WhyxE:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=KO5nWf7UD_I:RdmNN_WhyxE:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/KO5nWf7UD_I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/19/are-we-just-googles-rats/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/19/are-we-just-googles-rats/</feedburner:origLink></item>
		<item>
		<title>Google Lock-In Lock-Out</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/0dpzcgX7zwg/</link>
		<comments>http://blog.ouseful.info/2013/05/16/google-lock-in/#comments</comments>
		<pubDate>Thu, 16 May 2013 18:13:34 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10673</guid>
		<description>As John Naughton feels obliged to remind folk every now and again, the web is not the internet. Because we all know that for many people, Facebook apparently is. Or Google is. And as anyone following my tweets over the last year or two will know, I&amp;#8217;ve started finding Google more and more irksome. It&amp;#8217;s [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10673&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>As John Naughton feels obliged to remind folk every now and again, <a href="http://www.guardian.co.uk/technology/2010/jun/20/internet-everything-need-to-know">the web is not the internet</a>. Because we all know that for many people, Facebook apparently is. Or Google is.</p>
<p>And as anyone following my tweets over the last year or two will know, I&#8217;ve started finding Google more and more irksome.</p>
<p>It&#8217;s not just that the one or two people I know who use Google Plus (Google+?) are now all but lost to me as sources of neat ideas because I don&#8217;t do Gooplus and it doesn&#8217;t do RSS&#8230;</p>
<p>It&#8217;s not just because Google is shutting down the Google Reader backbone that powers a lot of RSS and Atom syndication feed services (and leaves me wondering: how long is Feedburner for this world? Maybe it&#8217;s time to start moving your feeds and trying to get folk off that piece of infrastructure&#8230;)&#8230;</p>
<p>It&#8217;s not just that <a href="https://support.google.com/fusiontables/answer/171215?hl=en&amp;ref_topic=27017">geocoding done within Fusion Tables is not exported</a> &#8211; if you look at a KML feed from Google Fusion Tables, you&#8217;ll find there&#8217;s no lat-long data there. To get a geo-view, you need to stick in Google Fusion Tables or wire the feed into Google Earth, which will then &#8220;initiate geocoding of location descriptions while viewing [the] KML file&#8221;&#8230;</p>
<p>It&#8217;s not just that Google is <a href="http://support.google.com/drive/bin/answer.py?hl=en&amp;answer=2791335&amp;ctx=cb&amp;src=cb&amp;cbid=-au8fo9d68c25&amp;cbrank=0">deprecating gadgets from spreadsheets</a>, which as Martin points out means that <a href="http://mashe.hawksey.info/2013/05/punchcard-charts-in-google-sheets/">if I want to visualise data in a spreadsheet all I’m going to be left with is Google’s crappy charts</a>&#8230;</p>
<p>It&#8217;s not just that <a href="https://developers.google.com/google-apps/calendar/caldav">Google moved away from using CalDav</a> to support calendar interoperability&#8230; (<a href="http://googleblog.blogspot.com.au/2013/03/a-second-spring-of-cleaning.html">announcement</a>: <em>&#8220;CalDAV API will become available for whitelisted developers, and will be shut down for other developers on September 16, 2013. Most developers’ use cases are handled well by Google Calendar API, which we recommend using instead.&#8221;</em>)</p>
<p>It&#8217;s not just that <a href="https://news.ycombinator.com/item?id=5714557">Google is moving away from using the XMPP instant messaging protocol</a> (and <a href="http://mqtt.org/2011/08/mqtt-and-android-make-great-partners">nor</a>, I think, making a move towards using <a href="http://bits.blogs.nytimes.com/2013/04/25/a-messenger-for-the-internet-of-things/">MQTT</a>?)&#8230;</p>
<p>It&#8217;s not just that Google will be using your photos to create <a href="http://www.tomshardware.com/news/google-io-auto-awesome-photo-enhancement-highlights,22602.html">photos you never took</a> and presumably offer them up via your image gallery in favour of photos it thinks aren&#8217;t up to scratch&#8230;</p>
<p>Though I&#8217;m sure that Google wouldn&#8217;t start pushing images in <em>just</em> the <a href="http://www.webmonkey.com/2013/03/put-your-site-on-a-diet-with-googles-image-shrinking-webp-format/">WebP image format</a> so that you&#8217;d feel obliged to use Chrome&#8230;</p>
<p><em>And also in the browser, I&#8217;m sure Google wouldn&#8217;t start using <a href="https://developers.google.com/speed/public-dns/">Google Public DNS</a> as a Chrome default setting. (Is the same true of Chromebook? Presumably folk connected to <a href="https://fiber.google.com/about/">Google Fiber</a> use Google Public DNS?) But does it use <a href="http://blog.chromium.org/2012/01/making-web-speedier-and-safer-with-spdy.html">SPDY as a default</a>? How about <a href="http://blog.chromium.org/2013/03/data-compression-in-chrome-beta-for.html">on Android</a>?</em></p>
<p>It&#8217;s not just that <a href="http://www.technologyreview.com/news/514836/googles-social-network-gets-smarter/">Google will tag your social media posts</a> using tags you might never use yourself, and as it does so altering the externalised memory embodied by that post&#8230;</p>
<p>It&#8217;s not just that as web search gets increasingly personalised and localised, we lose any sense of <a href="http://blog.ouseful.info/2011/06/21/filter-bubbles-google-ground-truth-and-twitter-echochambers/">Google ground truth</a>; I&#8217;m not quite sure how the info-skills trainers are going to address this when training a motley crew of different learners to discover a particular resource other than by using known-item search strategies (which sort of misses the point). Or maybe it&#8217;s right that a cohort of students should all get different results when they run ostensibly the same search?</p>
<p><em>Hmmm.. thinks: if personalised/localised search could be reduced to raw search phrase (whatever I put in the search box) plus a set of invisible search limits that reflect the personalisation/localisation tweaks applied to my search, how might my hidden/invisible search limits compare with yours?</em></p>
<p>It&#8217;s not just that Google uses <a href="http://uk.reuters.com/article/2013/05/01/uk-tax-uk-google-specialreport-idUKBRE94005R20130501">tax efficient corporate structures</a> to <a href="http://www.parliamentlive.tv/Main/Player.aspx?meetingId=13138">minimise its tax bill</a>, because lots of companies do that&#8230;</p>
<p>It&#8217;s not just any one of these things, taken on its own merits&#8230; it&#8217;s all of them taken together&#8230;</p>
<p><em>&#8220;Embrace, extend, extinguish&#8221;</em>&#8230; where have we heard that before?</p>
<p>Drip; drip; drip&#8230;</p>
<p>PS see also M. Wunsch on <a href="http://blog.markwunsch.com/post/50588412660/on-google">The Great Google Goat Rodeo</a></p>
<p>PPS Although not an open standard, I forgot this one &#8211; <a href="http://redmondmag.com/articles/2013/03/27/sync-squabble.aspx">Google dropped support for the closed Microsoft ActiveSync protocol</a> (see also <a href="http://support.google.com/a/bin/answer.py?hl=en-uk&amp;hlrm=en&amp;answer=2716936">Google Sync End of Life</a>)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10673/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10673/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10673&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=0dpzcgX7zwg:JrSxjvn08S0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0dpzcgX7zwg:JrSxjvn08S0:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0dpzcgX7zwg:JrSxjvn08S0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=0dpzcgX7zwg:JrSxjvn08S0:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0dpzcgX7zwg:JrSxjvn08S0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=0dpzcgX7zwg:JrSxjvn08S0:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0dpzcgX7zwg:JrSxjvn08S0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=0dpzcgX7zwg:JrSxjvn08S0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0dpzcgX7zwg:JrSxjvn08S0:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/0dpzcgX7zwg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/16/google-lock-in/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/16/google-lock-in/</feedburner:origLink></item>
		<item>
		<title>Asking Questions of Data Contained in a Google Spreadsheet Using a Basic Structured Query Language</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/lKJ-EdaXBRk/</link>
		<comments>http://blog.ouseful.info/2013/05/15/asking-questions-of-data-contained-in-a-google-spreadsheet-using-a-basic-structured-query-language/#comments</comments>
		<pubDate>Wed, 15 May 2013 10:00:53 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Infoskills]]></category>
		<category><![CDATA[School_Of_Data]]></category>
		<category><![CDATA[gspreadsheets]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10652</guid>
		<description>There is an old saying along the lines of &amp;#8220;give a man a fish and you can feed him for a day; teach a man to fish and you&amp;#8217;ll feed him for a lifetime&amp;#8221;. The same is true when you learn a little bit about structure queries languages&amp;#8230; In the post Asking Questions of Data [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10652&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>There is an old saying along the lines of &#8220;give a man a fish and you can feed him for a day; teach a man to fish and you&#8217;ll feed him for a lifetime&#8221;. The same is true when you learn a little bit about structure queries languages&#8230; In the post <a href="http://schoolofdata.org/2013/05/13/asking-questions-of-data-some-simple-one-liners/">Asking Questions of Data – Some Simple One-Liners</a>, we can see how the SQL query language could be used to ask questions of an election related dataset hosted on Scraperwiki that had been compiled by scraping a &#8220;Notice of Poll&#8221; PDF document containing information about election candidates. In this post, we&#8217;ll see how a series of queries constructed along very similar lines can be applied to data contained within a Google spreadsheet using the <a href="https://developers.google.com/chart/interactive/docs/querylanguage">Google Chart Tools Query Language</a>.</p>
<p>To provide some sort of context, I&#8217;ll stick with the local election theme, although in this case the focus will be on <em>election results</em> data. If you want to follow along, the data can be found in this Google spreadsheet &#8211; <a href="https://docs.google.com/spreadsheet/ccc?key=0AirrQecc6H_vdEZOZ21sNHpibnhmaEYxbW96dkNxZGc&amp;usp=sharing">Isle of Wight local election data results, May 2013</a> (the spreadsheet key is <tt>0AirrQecc6H_vdEZOZ21sNHpibnhmaEYxbW96dkNxZGc</tt>).</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/iw-poll-spreadsheet.png"><img src="http://ouseful.files.wordpress.com/2013/05/iw-poll-spreadsheet.png?w=700&#038;h=380" alt="IW Poll spreadsheet" width="700" height="380" class="alignnone size-full wp-image-10653" /></a></p>
<p>The data was obtained from <a href="http://onthewight.com/2013/05/03/isle-of-wight-election-results-2013-the-detail/">a dataset originally published by the OnTheWight hyperlocal blog</a> that was shaped and cleaned using OpenRefine using a data wrangling recipe similar to the one described in <a href="http://blog.ouseful.info/2013/05/03/a-wrangling-example-with-openrefine-making-ready-data/">A Wrangling Example With OpenRefine: Making “Oven Ready Data”</a>.</p>
<p>To query the data, I&#8217;ve popped up a simple query form on Scraperwiki: <a href="https://views.scraperwiki.com/run/google_spreadsheet_query/">Google Spreadsheet Explorer</a></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/google-spreadsheet-explorer.png"><img src="http://ouseful.files.wordpress.com/2013/05/google-spreadsheet-explorer.png?w=700&#038;h=470" alt="Google spreadsheet explorer" width="700" height="470" class="alignnone size-full wp-image-10654" /></a></p>
<p>To use the explorer, you need to:</p>
<ol>
<li>provide a spreadsheet key value and optional sheet number (for example, <tt>0AirrQecc6H_vdEZOZ21sNHpibnhmaEYxbW96dkNxZGc</tt>);</li>
<li>preview the table headings;</li>
<li>construct a query using the column letters;</li>
<li>select the output format;</li>
<li>run the query.</li>
</ol>
<p>So what sort of questions might we want to ask of the data? Let&#8217;s build some up.</p>
<p>We might start by just looking at the raw results as they come out of the spreadsheet-as-database: <tt>SELECT A,D,E,F</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/simple-query1.png"><img src="http://ouseful.files.wordpress.com/2013/05/simple-query1.png?w=700&#038;h=377" alt="SImple query" width="700" height="377" class="alignnone size-full wp-image-10666" /></a></p>
<p>We might then want to look at each electoral division seeing the results in rank order: <tt>SELECT A,D,E,F WHERE E != 'NA' ORDER BY A,F DESC</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/results-in-order1.png"><img src="http://ouseful.files.wordpress.com/2013/05/results-in-order1.png?w=700&#038;h=352" alt="Results in order" width="700" height="352" class="alignnone size-full wp-image-10667" /></a></p>
<p>Let&#8217;s bring the spoiled vote count back in: <tt>SELECT A,D,E,F WHERE E != 'NA' OR D CONTAINS 'spoil'  ORDER BY A,F DESC</tt> (we might equally have said <tt>OR D = 'Papers spoilt'</tt>).</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/papers-spoilt-included.png"><img src="http://ouseful.files.wordpress.com/2013/05/papers-spoilt-included.png?w=700&#038;h=415" alt="Papers spoilt included" width="700" height="415" class="alignnone size-full wp-image-10662" /></a></p>
<p>How about doing some sums? How does the league table of postal ballot percentages look across each electoral division? <tt>SELECT A,100*F/B WHERE D CONTAINS 'Postal' ORDER BY 100*F/B DESC</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/postal-turnout.png"><img src="http://ouseful.files.wordpress.com/2013/05/postal-turnout.png?w=700&#038;h=314" alt="Postal Turnout" width="700" height="314" class="alignnone size-full wp-image-10659" /></a></p>
<p>Suppose we want to look at the turnout. The &#8220;NoONRoll&#8221; column B gives the number of people eligible to vote in each electoral division, which is a good start. Unfortunately, using the data in the spreadsheet we have, we can&#8217;t do this for all electoral divisions &#8211; the &#8220;votes cast&#8221; is not necessarily the number of people who voted because some electoral divisions (Brading, St Helens &amp; Bembridge and Nettlestone &amp; Seaview) returned <em>two</em> candidates (which meant people voting were each allowed to cast up to an including two votes; the number of people who voted was in the original OnTheWight dataset). If we bear this <em>caveat</em> in mind, we can run the number for the other electoral divisions though. The <tt>Total votes cast</tt> is actually the number of &#8220;good&#8221; votes cast &#8211; the turnout was actually the <tt>Total votes cast</tt> <em>plus</em> the <tt>Papers spoilt</tt>. Let&#8217;s start by calculating the &#8220;good vote turnout&#8221; for each ward, rank the electoral divisions by turnout (<tt>ORDER BY 100*F/B DESC</tt>), label the turnout column appropriately (<tt>LABEL 100*F/B 'Percentage'</tt>) and format the results (<tt> FORMAT 100*F/B '#,#0.0'</tt>) using the query <tt>SELECT A, 100*F/B WHERE D CONTAINS 'Total' ORDER BY 100*F/B DESC LABEL 100*F/B 'Percentage' FORMAT 100*F/B '#,#0.0'</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/good-vote-turnout.png"><img src="http://ouseful.files.wordpress.com/2013/05/good-vote-turnout.png?w=700&#038;h=369" alt="Good vote turnout" width="700" height="369" class="alignnone size-full wp-image-10661" /></a></p>
<p>Remember, the first two results are &#8220;nonsense&#8221; because electors in those electoral divisions may have cast two votes.</p>
<p>How about the three electoral divisions with the lowest turn out? <tt>SELECT A, 100*F/B WHERE D CONTAINS 'Total' ORDER BY 100*F/B ASC LIMIT 3 LABEL 100*F/B 'Percentage' FORMAT 100*F/B '#,#0.0'</tt> (Note that the order of the arguments &#8211; such as where to put the <tt>LIMIT</tt> &#8211; is important; the wrong order can prevent the query from running&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/worst-3-by-turnout.png"><img src="http://ouseful.files.wordpress.com/2013/05/worst-3-by-turnout.png?w=700&#038;h=272" alt="Worst 3 by turnout" width="700" height="272" class="alignnone size-full wp-image-10669" /></a></p>
<p>The actual turn out (again, with the caveat in mind!) is the total votes cast plus the spoilt papers. To calculate this percentage, we need to sum the total and spoilt contributions in each electoral division and divide by the size of the electoral roll. To do this, we need to SUM the corresponding quantities in each electoral division. Because multiple (two) rows are summed for each electoral division, we find the size of the electoral roll in each electoral division as SUM(B)/COUNT(B) &#8211; that is, we count it twice and divide by the number of times we counted it. The query (without tidying) starts off looking like this: <tt>SELECT A,SUM(F)*COUNT(B)/SUM(B) WHERE D CONTAINS 'Total' OR D CONTAINS 'spoil' GROUP BY A</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/summing-rows-in-a-group.png"><img src="http://ouseful.files.wordpress.com/2013/05/summing-rows-in-a-group.png?w=700&#038;h=333" alt="Summing rows in a group" width="700" height="333" class="alignnone size-full wp-image-10660" /></a></p>
<p>In terms of popularity, who were the top 5 candidates in terms of people receiving the largest number of votes? <tt>SELECT D,A, E, F WHERE E!='NA' ORDER BY F DESC LIMIT 5</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/top-5-by-votes-cast.png"><img src="http://ouseful.files.wordpress.com/2013/05/top-5-by-votes-cast.png?w=700&#038;h=295" alt="Top 5 by votes cast" width="700" height="295" class="alignnone size-full wp-image-10658" /></a></p>
<p>How about if we normalise these numbers by the number of people on the electoral roll in the corresponding areas &#8211; <tt>SELECT D,A, E, F/B WHERE E!='NA' ORDER BY F/B DESC LIMIT 5</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/top-5-as-percentage-on-roll.png"><img src="http://ouseful.files.wordpress.com/2013/05/top-5-as-percentage-on-roll.png?w=700&#038;h=291" alt="TOp 5 as percentage on roll" width="700" height="291" class="alignnone size-full wp-image-10657" /></a></p>
<p>Looking at the parties, how did the sum of their votes across all the electoral divisions compare? <tt>SELECT E,SUM(F) where E!='NA' GROUP BY E ORDER BY SUM(F) DESC</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/votes-by-party.png"><img src="http://ouseful.files.wordpress.com/2013/05/votes-by-party.png?w=700&#038;h=380" alt="VOtes by party" width="700" height="380" class="alignnone size-full wp-image-10656" /></a></p>
<p>How about if we bring in the number of candidates who stood for each party, and normalise by this to calculate the average &#8220;votes per candidate&#8221; by party? <tt>SELECT E,SUM(F),COUNT(F), SUM(F)/COUNT(F) where E!='NA' GROUP BY E ORDER BY SUM(F)/COUNT(F) DESC</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/average-votes-per-candidate.png"><img src="http://ouseful.files.wordpress.com/2013/05/average-votes-per-candidate.png?w=700&#038;h=319" alt="Average votes per candidate" width="700" height="319" class="alignnone size-full wp-image-10655" /></a></p>
<p>To summarise then, in this post, we have seen how we can use a structured query language to interrogate the data contained in a Google Spreadsheet, essentially treating the Google Spreadsheet as if it were a database. The query language can also be used to to perform a series of simple calculations over the data to produce a derived dataset. Unfortunately, the query language does not allow us to nest SELECT statements in the same way we can nest SQL SELECT statements, which limits some of the queries we can run.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10652/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10652&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=lKJ-EdaXBRk:8FJpZf_XL6c:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=lKJ-EdaXBRk:8FJpZf_XL6c:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=lKJ-EdaXBRk:8FJpZf_XL6c:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=lKJ-EdaXBRk:8FJpZf_XL6c:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=lKJ-EdaXBRk:8FJpZf_XL6c:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=lKJ-EdaXBRk:8FJpZf_XL6c:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=lKJ-EdaXBRk:8FJpZf_XL6c:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=lKJ-EdaXBRk:8FJpZf_XL6c:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=lKJ-EdaXBRk:8FJpZf_XL6c:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/lKJ-EdaXBRk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/15/asking-questions-of-data-contained-in-a-google-spreadsheet-using-a-basic-structured-query-language/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/iw-poll-spreadsheet.png" medium="image">
			<media:title type="html">IW Poll spreadsheet</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/google-spreadsheet-explorer.png" medium="image">
			<media:title type="html">Google spreadsheet explorer</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/simple-query1.png" medium="image">
			<media:title type="html">SImple query</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/results-in-order1.png" medium="image">
			<media:title type="html">Results in order</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/papers-spoilt-included.png" medium="image">
			<media:title type="html">Papers spoilt included</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/postal-turnout.png" medium="image">
			<media:title type="html">Postal Turnout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/good-vote-turnout.png" medium="image">
			<media:title type="html">Good vote turnout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/worst-3-by-turnout.png" medium="image">
			<media:title type="html">Worst 3 by turnout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/summing-rows-in-a-group.png" medium="image">
			<media:title type="html">Summing rows in a group</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/top-5-by-votes-cast.png" medium="image">
			<media:title type="html">Top 5 by votes cast</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/top-5-as-percentage-on-roll.png" medium="image">
			<media:title type="html">TOp 5 as percentage on roll</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/votes-by-party.png" medium="image">
			<media:title type="html">VOtes by party</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/average-votes-per-candidate.png" medium="image">
			<media:title type="html">Average votes per candidate</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/15/asking-questions-of-data-contained-in-a-google-spreadsheet-using-a-basic-structured-query-language/</feedburner:origLink></item>
		<item>
		<title>To What Extent Do Candidates Support Each Other Redux – A One-Liner, Thirty Second Route to the Info</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/_RLgydzIRCM/</link>
		<comments>http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/#comments</comments>
		<pubDate>Wed, 08 May 2013 10:50:40 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[Tutorial]]></category>
		<category><![CDATA[schoolofdata]]></category>
		<category><![CDATA[scraperwiki]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sqlite]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10643</guid>
		<description>In More Storyhunting Around Local Elections Data Using Gephi – To What Extent Do Candidates Support Each Other? I described a visual route to finding out which local council candidates had supported each other on their nomination papers. There is also a thirty second route to that data that I should probably have mentioned;-) From [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10643&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>In <a href="http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/">More Storyhunting Around Local Elections Data Using Gephi – To What Extent Do Candidates Support Each Other?</a> I described a visual route to finding out which local council candidates had supported each other on their nomination papers. There is also a thirty second route to that data that I should probably have mentioned;-)</p>
<p>From the <a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/">Scraperwiki database</a>, we need to interrogate the API:</p>
<p><a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/"><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-api.png?w=700" alt="scraperwiki api"   class="alignnone size-full wp-image-10645" /></a></p>
<p>To do this, we&#8217;ll use a database query language &#8211; SQL.</p>
<p>What we need to ask the database is which of the assentors (members of the <em>support</em> column) are also candidates (members of the <em>candinit</em> column, and just return those rows. The SQL command is simply this:</p>
<p><tt>select * from support where support in (select candinit from support)</tt></p>
<p>Note that &#8220;support&#8221; refers to two things here &#8211; these are columns:</p>
<p><tt>select <strong>*</strong> from support where <strong>support</strong> in (select <strong>candinit</strong> from support)</tt></p>
<p>and these are the table the columns are being pulled from:</p>
<p><tt>select * from <strong>support</strong> where support in (select candinit from <strong>support</strong>)</tt></p>
<p>Here&#8217;s the result of <em>Run</em>ing the query:</p>
<p><a href="https://scraperwiki.com/docs/api?name=iw_poll_notices_scrape#sqlite"><img src="http://ouseful.files.wordpress.com/2013/05/sql-select-on-scraperwiki.png?w=700" alt="sql select on scraperwiki"   class="alignnone size-full wp-image-10644" /></a></p>
<p>We can also get a <a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=htmltable&amp;name=iw_poll_notices_scrape&amp;query=select%20*%20from%20%60support%60%20where%20support%20in%20(select%20candinit%20from%20support)">direct link to a tabular view of the data</a> (or generate a link to a CSV output etc from the <em>format</em> selector).</p>
<p><a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=htmltable&amp;name=iw_poll_notices_scrape&amp;query=select%20*%20from%20%60support%60%20where%20support%20in%20(select%20candinit%20from%20support)"><img src="http://ouseful.files.wordpress.com/2013/05/candidates-mutual-table.png?w=700&#038;h=241" alt="candidates mutual table" width="700" height="241" class="alignnone size-full wp-image-10646" /></a></p>
<p>There are 15 rows in this result compared to the 15 edges/connecting lines discovered in the Gephi approach, so each method corroborates the other:</p>
<p><a href="http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/"><img src="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png?w=700&#038;h=618" alt="Tidier intra-candidate support map" width="700" height="618" class="alignnone size-full wp-image-10602" /></a></p>
<p>Simples:-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10643/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10643&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=_RLgydzIRCM:8GF5KMGFZSs:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=_RLgydzIRCM:8GF5KMGFZSs:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=_RLgydzIRCM:8GF5KMGFZSs:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=_RLgydzIRCM:8GF5KMGFZSs:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=_RLgydzIRCM:8GF5KMGFZSs:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=_RLgydzIRCM:8GF5KMGFZSs:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=_RLgydzIRCM:8GF5KMGFZSs:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=_RLgydzIRCM:8GF5KMGFZSs:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=_RLgydzIRCM:8GF5KMGFZSs:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/_RLgydzIRCM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-api.png" medium="image">
			<media:title type="html">scraperwiki api</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/sql-select-on-scraperwiki.png" medium="image">
			<media:title type="html">sql select on scraperwiki</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/candidates-mutual-table.png" medium="image">
			<media:title type="html">candidates mutual table</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png" medium="image">
			<media:title type="html">Tidier intra-candidate support map</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/</feedburner:origLink></item>
		<item>
		<title>More Storyhunting Around Local Elections Data Using Gephi – To What Extent Do Candidates Support Each Other?</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/ef_n0ke3NZQ/</link>
		<comments>http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/#comments</comments>
		<pubDate>Wed, 08 May 2013 09:05:06 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[gephi]]></category>
		<category><![CDATA[schoolofdata]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10599</guid>
		<description>In Questioning Election Data to See if It Has a Story to Tell I started to explore various ways in which we could start to search for stories in a dataset finessed out of a set of poll notices announcing the recent Isle of Wight Council elections. In this post, I&amp;#8217;ll do a little more [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10599&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>In <a href="http://blog.ouseful.info/2013/05/05/questioning-election-data-to-see-if-it-has-a-story-to-tell/">Questioning Election Data to See if It Has a Story to Tell</a> I started to explore various ways in which we could start to search for stories in a dataset finessed out of a set of poll notices announcing the recent Isle of Wight Council elections. In this post, I&#8217;ll do a little more questioning, especially around the assentors (proposers, seconders etc) who supported each candidate, looking to see whether there are any social structures in there resulting from candidates supporting each others&#8217; applications. The essence of what we&#8217;re doing is some simple social network analysis around the candidate/assentor network. (For an alternative route to the result, see <a href="http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/">To What Extent Do Candidates Support Each Other Redux – A One-Liner, Thirty Second Route to the Info</a>.)</p>
<p>This is what we&#8217;ll be working towards:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png"><img src="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png?w=700&#038;h=618" alt="Tidier intra-candidate support map" width="700" height="618" class="alignnone size-full wp-image-10602" /></a></p>
<p>If you want to play along, you can get the data from my <a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/">IW poll notices scrape</a> on ScraperWiki, specifically the <em>support</em> table.</p>
<p><a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/"><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-council-elections-assentors.png?w=700&#038;h=275" alt="scraperwiki council elections - assentors" width="700" height="275" class="alignnone size-full wp-image-10620" /></a></p>
<p>Here&#8217;s a reminder of what the <a href="http://www.iwight.com/azservices/documents/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf">original PDF</a> doc looked like (<a href="https://dl.dropboxusercontent.com/u/1156404/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf">archive copy</a>):</p>
<p><a href="http://www.iwight.com/azservices/documents/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf"><img src="http://ouseful.files.wordpress.com/2013/05/iw-poll-notice-assentors.png?w=700&#038;h=538" alt="IW poll notice assentors" width="700" height="538" class="alignnone size-full wp-image-10619" /></a></p>
<p>Checking the extent to which candidates supported each other is something we could do by hand, looking down each candidate&#8217;s list of  assentors for names of other candidates, but it would be a laborious job. It&#8217;s far easier(?!;-) to automate it&#8230;</p>
<p>When we want to compare names using a computer programme or script, the simplest approach is to do an <strong>exact string match</strong> (a <em>string</em> is a list of characters). Two strings match if they are exactly the same, so for example: <em>This string</em> is the same as <em>This string</em>, but not <em>this string</em> (they differ in their first character &#8211; upper case <em>T</em> in the first example as compared with lower case <em>t</em> in the last. We&#8217;ll be using exact string matching to identify whether a candidate has the same name as any of the assentors, so on the scraper, I did a little fiddling around with the names, in particular generating a new column that recasts the name of the candidate into the same presentation form used to identify the assentors (<em>Firstname I. Lastname</em>).</p>
<p>We can download a <a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&amp;name=iw_poll_notices_scrape&amp;query=select+*+from+`support`&amp;apikey=">CSV representation of the data</a> from the scraper directly:</p>
<p><a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&amp;name=iw_poll_notices_scrape&amp;query=select+*+from+`support`&amp;apikey="><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-csv-download.png?w=700" alt="Scraperwiki CSV download"   class="alignnone size-full wp-image-10626" /></a></p>
<p>The first thing I want to explore is the extent to which candidates support other candidates to see if we can identify any political groupings. The tool I&#8217;m going to use to visualise the data is Gephi, an open-source cross-platform application (requires Java) that you can download for free from <a href="http://gephi.org">gephi.org</a>.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/gephi-org.png"><img src="http://ouseful.files.wordpress.com/2013/05/gephi-org.png?w=700&#038;h=337" alt="Gephi.org" width="700" height="337" class="alignnone size-full wp-image-10622" /></a></p>
<p>To view the data in Gephi, it&#8217;s easiest if we rename a couple of columns so that Gephi can recognise relations between supporters and candidates; if we open the CSV download file in a text editor, we can rename the <em>candinit</em> as <em>target</em> and the <em></em> column as <em>Source</em> to represent an arrow going from an assentor to a candidate, where the arrow reads something along the lines of &#8220;is a supporter of&#8221;.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/csv-rename.png"><img src="http://ouseful.files.wordpress.com/2013/05/csv-rename.png?w=700" alt="csv rename"   class="alignnone size-full wp-image-10618" /></a></p>
<p>Start Gephi, select Data Laboratory tab and then New Project from the File menu.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/geohi-data-lab-new-project.png"><img src="http://ouseful.files.wordpress.com/2013/05/geohi-data-lab-new-project.png?w=700&#038;h=279" alt="geohi data lab new project" width="700" height="279" class="alignnone size-full wp-image-10617" /></a></p>
<p>You should now see a toolbar that includes an &#8220;Import Spreadsheet option&#8221;:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/gephi-import-spreadsheet.png"><img src="http://ouseful.files.wordpress.com/2013/05/gephi-import-spreadsheet.png?w=700&#038;h=49" alt="gephi import spreadsheet" width="700" height="49" class="alignnone size-full wp-image-10616" /></a></p>
<p>Import the CSV file as such, identifying it as an <em>Edges Table</em>:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/import-data-into-gephi-ata-laboaratory.png"><img src="http://ouseful.files.wordpress.com/2013/05/import-data-into-gephi-ata-laboaratory.png?w=700" alt="import data into gephi data laboaratory"   class="alignnone size-full wp-image-10615" /></a></p>
<p>You should notice that the Source and Target columns have been identified as such and we have the choice to import the other column or not &#8211; let&#8217;s bring them in&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/source-and-target-recognised.png"><img src="http://ouseful.files.wordpress.com/2013/05/source-and-target-recognised.png?w=700" alt="SOurce and Target recognised"   class="alignnone size-full wp-image-10614" /></a></p>
<p>You should now see the data has been loaded in to Gephi&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/data-loaded-in.png"><img src="http://ouseful.files.wordpress.com/2013/05/data-loaded-in.png?w=700" alt="Data loaded in"   class="alignnone size-full wp-image-10613" /></a></p>
<p>If you click on the <em>Overview</em> tab button, you should see a mass of nodes/circles representing candidates and assentors with arrows going from assentors to candidates.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/mess1.png"><img src="http://ouseful.files.wordpress.com/2013/05/mess1.png?w=700" alt="mess..."   class="alignnone size-full wp-image-10628" /></a></p>
<p>Let&#8217;s see how they connect &#8211; we can <em>Run</em> the <em>Force Atlas 2</em> <strong>Layout</strong> algorithm for starters. I tweaked the <em>Scaling</em> value and ticked on <em>Stronger Gravity</em> to help shape the resulting layout:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/force-layout-tweaks.png"><img src="http://ouseful.files.wordpress.com/2013/05/force-layout-tweaks.png?w=700&#038;h=492" alt="force layout tweaks" width="700" height="492" class="alignnone size-full wp-image-10611" /></a></p>
<p>If you look closely, you&#8217;ll be able to see that there are many separate groupings of connected circles  &#8211; this represent candidates who are supported by folk who are not also candidates (sometimes a node sits on top of a line so it looks as if two noes are connected when in fact they aren&#8217;t&#8230;)</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/close-up-simple-patterns1.png"><img src="http://ouseful.files.wordpress.com/2013/05/close-up-simple-patterns1.png?w=700" alt="Close up simple patterns"   class="alignnone size-full wp-image-10629" /></a></p>
<p>However, there are also other groupings in which one candidate may support another:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/candidate-support1.png"><img src="http://ouseful.files.wordpress.com/2013/05/candidate-support1.png?w=700" alt="candidate support"   class="alignnone size-full wp-image-10630" /></a></p>
<p>These connections may allow us to see grouping of candidates supporting each other along party lines.</p>
<p>One of the powerful things about Gephi is that it allows us to construct quite complex, nested filters that we can apply to the data based on the properties of the network the data describes so that we can focus on particular aspects of the network I&#8217;m going to filter the network so that it shows only those individuals who are supported by at least one person (in-degree 1 or more) <em>and</em> who support at least one person (out-degree one or more) &#8211; that is, folk who are candidates (in-degree 1 or more) who also supported (oit degree 1 or more) another candidate. Let&#8217;s also turn labels on to see which candidates the filter identifies, and colour the edges along party lines. We can now see some information about the connectedness a little more clearly:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/lots-going-on.png"><img src="http://ouseful.files.wordpress.com/2013/05/lots-going-on.png?w=700&#038;h=483" alt="lots going on" width="700" height="483" class="alignnone size-full wp-image-10608" /></a></p>
<p>Hmmm.. how about if we extend out filter to see who&#8217;s connected to these nodes (this might include other candidates who do not themselves assent to another candidate), and also rezise the nodes/labels so we can better see the candidates&#8217; names. The Neigbours Network filter takes the nodes we have and then also finds the nodes that are connected to them to depth 2 in this case (that is, it brings in nodes connected to the candidates who are also supporters (depth 1), and the nodes connected to those nodes (depth two). Which is to say, it will being in the candidates who are supported by candidates, and their supporters:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/a-few-more-tweaks.png"><img src="http://ouseful.files.wordpress.com/2013/05/a-few-more-tweaks.png?w=700&#038;h=401" alt="A few more tweaks" width="700" height="401" class="alignnone size-full wp-image-10607" /></a></p>
<p>That&#8217;s a bit clearer, but there are still overlapping lines, so it may make sense to layout the network again:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/improve-the-layout.png"><img src="http://ouseful.files.wordpress.com/2013/05/improve-the-layout.png?w=700&#038;h=390" alt="improve the layout" width="700" height="390" class="alignnone size-full wp-image-10606" /></a></p>
<p>We can also experiment with other colourings &#8211; if we go to the Statistics panel, we can run a <em>Connected Components</em> filter that tries to find nodes that are connected into distinct groups. We can then colour each of the separate groups uniquely:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/colour-the-groups.png"><img src="http://ouseful.files.wordpress.com/2013/05/colour-the-groups.png?w=700&#038;h=410" alt="colour the groups" width="700" height="410" class="alignnone size-full wp-image-10634" /></a></p>
<p>Let&#8217;s reset the colours and go back to colourings along party lines:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/gephi-reset-colours.png"><img src="http://ouseful.files.wordpress.com/2013/05/gephi-reset-colours.png?w=700" alt="Gephi reset colours"   class="alignnone size-full wp-image-10633" /></a></p>
<p>If we go to the <em>Preview</em> view, we can generate a prettified view of the network:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/preview-layout.png"><img src="http://ouseful.files.wordpress.com/2013/05/preview-layout.png?w=700&#038;h=450" alt="Preview layout" width="700" height="450" class="alignnone size-full wp-image-10605" /></a></p>
<p>In it, we can clearly see groupings along party lines (inside the blue boxes). There is something odd, though? There appears to be a connection between UKIP and Independent groupings? Let&#8217;s zoom in:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/this-is-odd.png"><img src="http://ouseful.files.wordpress.com/2013/05/this-is-odd.png?w=700" alt="this is odd"   class="alignnone size-full wp-image-10604" /></a></p>
<p>Going back to the Graph view and zooming in, we see that <em>Paul G. taylor</em> appears to be supporting two candidates of different parties&#8230; Hmm &#8211; I wonder: are there actually <em>two</em> Paul G. Taylors, I wonder, with different political preferences? (Note to self: check on Electoral Commission website what regulations there are about assenting. Can you only assent to one person, and then only within the ward in which you are registered to vote? For local elections, could you be registered to vote in more than one electoral division within the same council area?)</p>
<p>To check that there are no other names that support more than one candidate, we can create another, simple filter that just selects nodes with out-degree 2 or more &#8211; that is, who support 2 or more other nodes:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/filter-on-nodes-out-degree-2.png"><img src="http://ouseful.files.wordpress.com/2013/05/filter-on-nodes-out-degree-2.png?w=700" alt="Filter on nodes out degree 2"   class="alignnone size-full wp-image-10600" /></a></p>
<p>Just that one then&#8230;</p>
<p>Looking at the fuller chart, it&#8217;s still rather scruffy. We could tidy it by removing assentors who are not themselves candidates (that is, there are no arrows pointing in to them). The way Gephi filters work support chaining. If you look at the filters, you will see they are nested, much like a nested comment thread in a forum. Filters at the bottom of the tree act on the graph and pass the filtereed network to date up the tree to the next filter. This means we can pass the network as shown above into another filter layer that removes folk who are &#8220;just&#8221; assentors and not candidates.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/nested-filters.png"><img src="http://ouseful.files.wordpress.com/2013/05/nested-filters.png?w=700" alt="nested filters"   class="alignnone size-full wp-image-10601" /></a></p>
<p>Here&#8217;s the result:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/nesting-filters-in-gephi.png"><img src="http://ouseful.files.wordpress.com/2013/05/nesting-filters-in-gephi.png?w=700&#038;h=396" alt="Nesting filters in gephi" width="700" height="396" class="alignnone size-full wp-image-10603" /></a></p>
<p>And again we can go into Preview mode to generate a nice vectorised version of the graph:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png"><img src="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png?w=700&#038;h=618" alt="Tidier intra-candidate support map" width="700" height="618" class="alignnone size-full wp-image-10602" /></a></p>
<p>This quite clearly shows several mutual support networks between Labour candidates (red edges), Conservative candidates (blue edges), independents (black edges) and a large grouping of UKIP candidates (purple edges).</p>
<p>So there we have it a quick tour of how to use Gephi to look at the co-support structure of group of local election candidates. Were the highlighted candidates to be successful in their election, it could signify possible factions or groupings within the council, particular amongst the independents? Along the way we saw how to make use of filters, and spotted something we need to check (whether the same person supported two candidates (if that isn&#8217;t allowed?) or whether they are two different people sharing the same name.</p>
<p>If this all seems like too much effort, remembers that there&#8217;s always the <a href="http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/">One-Liner, Thirty Second Route to the Info</a>.</p>
<p>PS by the by, a recent FOI request on WhatDoTheyKnow suggests another possible line of enquiry around possible candidates &#8211; if they have been elected to the council before, <a href="https://www.whatdotheyknow.com/request/charles_chapman_former_councillo">how good was their attendance record</a>? (I don&#8217;t think OpenlyLocal scrapes this information? Presumably it is available somewhere on the council website?)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10599/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10599&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=ef_n0ke3NZQ:1W_WEBU6WSc:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=ef_n0ke3NZQ:1W_WEBU6WSc:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=ef_n0ke3NZQ:1W_WEBU6WSc:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=ef_n0ke3NZQ:1W_WEBU6WSc:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=ef_n0ke3NZQ:1W_WEBU6WSc:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=ef_n0ke3NZQ:1W_WEBU6WSc:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=ef_n0ke3NZQ:1W_WEBU6WSc:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=ef_n0ke3NZQ:1W_WEBU6WSc:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=ef_n0ke3NZQ:1W_WEBU6WSc:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/ef_n0ke3NZQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png" medium="image">
			<media:title type="html">Tidier intra-candidate support map</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-council-elections-assentors.png" medium="image">
			<media:title type="html">scraperwiki council elections - assentors</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/iw-poll-notice-assentors.png" medium="image">
			<media:title type="html">IW poll notice assentors</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-csv-download.png" medium="image">
			<media:title type="html">Scraperwiki CSV download</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/gephi-org.png" medium="image">
			<media:title type="html">Gephi.org</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/csv-rename.png" medium="image">
			<media:title type="html">csv rename</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/geohi-data-lab-new-project.png" medium="image">
			<media:title type="html">geohi data lab new project</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/gephi-import-spreadsheet.png" medium="image">
			<media:title type="html">gephi import spreadsheet</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/import-data-into-gephi-ata-laboaratory.png" medium="image">
			<media:title type="html">import data into gephi data laboaratory</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/source-and-target-recognised.png" medium="image">
			<media:title type="html">SOurce and Target recognised</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/data-loaded-in.png" medium="image">
			<media:title type="html">Data loaded in</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/mess1.png" medium="image">
			<media:title type="html">mess...</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/force-layout-tweaks.png" medium="image">
			<media:title type="html">force layout tweaks</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/close-up-simple-patterns1.png" medium="image">
			<media:title type="html">Close up simple patterns</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/candidate-support1.png" medium="image">
			<media:title type="html">candidate support</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/lots-going-on.png" medium="image">
			<media:title type="html">lots going on</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/a-few-more-tweaks.png" medium="image">
			<media:title type="html">A few more tweaks</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/improve-the-layout.png" medium="image">
			<media:title type="html">improve the layout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/colour-the-groups.png" medium="image">
			<media:title type="html">colour the groups</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/gephi-reset-colours.png" medium="image">
			<media:title type="html">Gephi reset colours</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/preview-layout.png" medium="image">
			<media:title type="html">Preview layout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/this-is-odd.png" medium="image">
			<media:title type="html">this is odd</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/filter-on-nodes-out-degree-2.png" medium="image">
			<media:title type="html">Filter on nodes out degree 2</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/nested-filters.png" medium="image">
			<media:title type="html">nested filters</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/nesting-filters-in-gephi.png" medium="image">
			<media:title type="html">Nesting filters in gephi</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png" medium="image">
			<media:title type="html">Tidier intra-candidate support map</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/</feedburner:origLink></item>
		<item>
		<title>Ephemeral Citations – When Presentations You Have Cited Vanish from the Public Web</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/p_6KuJOnySI/</link>
		<comments>http://blog.ouseful.info/2013/05/07/ephemeral-citations-when-presentations-you-have-cited-vanish-from-the-public-web/#comments</comments>
		<pubDate>Tue, 07 May 2013 17:35:44 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Infoskills]]></category>
		<category><![CDATA[digischol]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10595</guid>
		<description>A couple of months ago, I came across an interesting slide deck reviewing some of the initiatives that Narrative Science have been involved with, including the generation of natural language interpretations of school education grade reports (I think: some natural language take on an individual&amp;#8217;s academic scores, at least?). With MOOC fever in part focussing [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10595&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>A couple of months ago, I came across an interesting slide deck reviewing some of the initiatives that Narrative Science have been involved with, including the generation of natural language interpretations of school education grade reports (I think: some natural language take on an individual&#8217;s academic scores, at least?). With MOOC fever in part focussing on the development of automated marking and feedback reports, this represents one example of how we might take numerical reports and dashboard displays and turn them into human readable text with some sort of narrative. (Narrative Science do a related thing for reports on schools themselves &#8211; <a href="http://www.propublica.org/nerds/item/how-to-edit-52000-stories-at-once">How To Edit 52,000 Stories at Once</a>.)</p>
<p>Whenever I come across a slide deck that I think may be in danger of being taken down (for example, because it&#8217;s buried down a downloads path on a corporate workshop promoter&#8217;s website and has CONFIDENTIAL written all over it) I try to grab a copy of it, but this presentation looked &#8220;safe&#8221; because it had been on Slideshare for some time.</p>
<p>Since I discovered the presentation, I&#8217;ve been recommending it to variou folk, particularly slides 20-22? that refer to the educational example. Trying to find the slidedeck today, a websearch failed to turn it up so I had to go sniffing around to see if I had mentioned a <a href="http://www.slideshare.net/justinlink/introduction-to-narrative-science-13728805">link to the original presentation</a> anywhere. Here&#8217;s what I found:</p>
<p><a href="http://www.slideshare.net/justinlink/introduction-to-narrative-science-13728805"><img src="http://ouseful.files.wordpress.com/2013/05/no-narrative-science-slideshow.png?w=700&#038;h=300" alt="no narrative science slideshow" width="700" height="300" class="alignnone size-full wp-image-10596" /></a></p>
<p>The Wayback machine had grabbed bits and pieces of text, but not the actual slides&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/wayback-narrative-science.png"><img src="http://ouseful.files.wordpress.com/2013/05/wayback-narrative-science.png?w=700&#038;h=482" alt="wayback narrative science" width="700" height="482" class="alignnone size-full wp-image-10597" /></a></p>
<p>Not only did I not download the presentation, I don&#8217;t seem to have grabbed any screenshots of the slides I was particularly interested in&#8230; bah:-(</p>
<p>For what it&#8217;s worth, here&#8217;s the commentary:</p>
<blockquote><p>Introduction to Narrative Science — Presentation Transcript</p>
<p>We Transform Data IntoStories and Insight…In Seconds<br />
Automatically,Without Human Intervention and at a Significant Scale<br />
To Help Companies: Create New Products Improve Decision-MakingOptimize Customer Interactions<br />
Customer Types Media and Data Business Publishing Companies Reporting<br />
How Does It Work? The Data The Facts The Angles The Structure Stats Tests Calls The Narrative Language Completed Text Our technology platform, Quill™, is a powerful integration of artificial intelligence and data analytics that automatically transforms data into stories.<br />
The following slides are examples of our work based upon a simple premise: structured data in, narrative out. These examples span several domains, including Sports Journalism, Financial Reporting, Real Estate, Business Intelligence, Education, and Marketing Services.<br />
Sports Journalism: Big Ten Network – Data InTransforming Data into Stories<br />
Sports Journalism: Big Ten Network – NarrativeTransforming Data into Stories<br />
Financial Journalism: Forbes – Data InTransforming Data into Stories<br />
Financial Journalism: Forbes – NarrativeTransforming Data into Stories<br />
Short Sale Reporting: Data Explorers &#8211; JSON Input<br />
Short Sale Reporting: Data Explorers &#8211; Overview North America Consumer Services Short Interest Update There has been a sharp decline in short interest in Marriott International (MAR) in the face of an 11% increase in the companys stock price. Short holdings have declined nearly 14% over the past month to 4.9% of shares outstanding. In the last month, holdings of institutional investors who lend have remained relatively unchanged at just below 17% of the companys shares. Investors have built up their short positions in Carnival (CCL) by 54.3% over the past month to 3.1% of shares outstanding. The share price has gained 8.3% over the past week to $31.93. Holdings of institutional investors who lend are also up slightly over the past month to just above 23% of the common shares in issue by the company. Institutional investors who make their shares available to borrow have reduced their holdings in Weight Watchers International (WTW) by more than 26% to just above 10% of total shares outstanding over the past month. Short sellers have also cut back their positions slightly to just under 6% of the market cap. The price of shares in the company has been on the rise for seven consecutive days and is now at $81.50.<br />
Sector Reporting: Data Explorers &#8211; JSON Input<br />
Sector Reporting: Data Explorers &#8211; OverviewThursday, October 6, 2011 12:00 PM: HEALTHCARE MIDDAY COMMENTARY:The Healthcare (XLV) sector underperformed the market in early trading on Thursday. Healthcarestocks trailed the market by 0.4%. So far, the Dow rose 0.2%, the NASDAQ saw growth of 0.8%, andthe S&amp;P500 was up 0.4%.Here are a few Healthcare stocks that bucked the sectors downward trend.MRK (Merck &amp; Co Inc.) erased early losses and rose 0.6% to $31.26. The company recentlyannounced its chairman is stepping down. MRK stock traded in the range of $31.21 &#8211; $31.56. MRKsvolume was 86.1% lower than usual with 2.5 million shares trading hands. Todays gains still leavethe stock about 11.1% lower than its price three months ago.LUX (Luxottica Group) struggled in early trading but showed resilience later in the day. Shares rose3.8% to $26.92. LUX traded in the range of $26.48 &#8211; $26.99. Luxottica Group’s early share volumewas 34,155. Todays gains still leave the stock 21.8% below its 52-week high of $34.43. The stockremains about 16.3% lower than its price three months ago.Shares of UHS (Universal Health Services Inc.) are trading at $32.89, up 81 cents (2.5%) from theprevious close of $32.08. UHS traded in the range of $32.06 &#8211; $33.01…<br />
Real Estate: Hanley Wood – Data InTransforming Data into Stories<br />
Real Estate: Hanley Wood – NarrativeTransforming Data into Stories<br />
BI: Leading Fast Food Chain – Data InTransforming Data into Stories<br />
BI: Leading Fast Food Chain – Store Level Report January Promotion Falling Behind Region The launch of the bagels and cream cheese promotion began this month. While your initial sales at the beginning of the promotion were on track with both your ad co-op and the region, your sales this week dropped from last week’s 142 units down to 128 units. Your morning guest count remained even across this period. Taking better advantage of this promotion should help to increase guest count and overall revenue by bringing in new customers. The new item with the greatest growth opportunity this week was the Coffee Cake Muffin. Increasing your sales by just one unit per thousand transactions to match Sales in the region would add another $156 to your monthly profit. That amounts to about $1872 over the course of one year.Transforming Data into Stories<br />
Education: Standardized Testing – Data InTransforming Data into Stories<br />
Education: Standardized Testing – Study RecommendationsTransforming Data into Stories<br />
Marketing Services &amp; Digital Media: Data InTransforming Data into Stories<br />
Marketing Services &amp; Digital Media: NarrativeTransforming Data into Stories</p></blockquote>
<p>Bah&#8230;:-(</p>
<p>PS Slideshare appears to have a new(?) feature &#8211; Saved Files &#8211; that keeps a copy of files you have downloaded. Or does it? If I save a file and someone deletes it, will the empty shell only remain in my &#8220;Saved Files&#8221; list?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10595/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10595/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10595&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=p_6KuJOnySI:TXd_Ebc7pz8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=p_6KuJOnySI:TXd_Ebc7pz8:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=p_6KuJOnySI:TXd_Ebc7pz8:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=p_6KuJOnySI:TXd_Ebc7pz8:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=p_6KuJOnySI:TXd_Ebc7pz8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=p_6KuJOnySI:TXd_Ebc7pz8:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=p_6KuJOnySI:TXd_Ebc7pz8:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=p_6KuJOnySI:TXd_Ebc7pz8:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=p_6KuJOnySI:TXd_Ebc7pz8:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/p_6KuJOnySI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/07/ephemeral-citations-when-presentations-you-have-cited-vanish-from-the-public-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/no-narrative-science-slideshow.png" medium="image">
			<media:title type="html">no narrative science slideshow</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/wayback-narrative-science.png" medium="image">
			<media:title type="html">wayback narrative science</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/07/ephemeral-citations-when-presentations-you-have-cited-vanish-from-the-public-web/</feedburner:origLink></item>
		<item>
		<title>Questioning Election Data to See if It Has a Story to Tell</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/0fdIopdDiTI/</link>
		<comments>http://blog.ouseful.info/2013/05/05/questioning-election-data-to-see-if-it-has-a-story-to-tell/#comments</comments>
		<pubDate>Sun, 05 May 2013 23:38:40 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[digital storytelling]]></category>
		<category><![CDATA[OpenRefine]]></category>
		<category><![CDATA[ddj]]></category>
		<category><![CDATA[openrefine]]></category>
		<category><![CDATA[schoolofdata]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10547</guid>
		<description>I know, I know, the local elections are old news now, but elections come round again and again, which means building up a set of case examples of what we might be able to do &amp;#8211; data wise &amp;#8211; around elections in the future could be handy&amp;#8230; So here&amp;#8217;s one example of a data-related question [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10547&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>I know, I know, the local elections are old news now, but elections come round again and again, which means building up a set of case examples of what we might be able to do &#8211; data wise &#8211; around elections in the future could be handy&#8230;</p>
<p>So here&#8217;s one example of a data-related question we might ask (where in this case by data I mean &#8220;information available in: a) electronic form, that b) can be represented in a structured way): <em>are the candidates standing in different seats local to that ward/electoral division?</em>. By &#8220;local&#8221;, I mean &#8211; can they vote in that ward by virtue of having a home address that lays within that ward?</p>
<p>Here&#8217;s what the original data for my own local council (the Isle of Wight council, a unitary authority) looked like &#8211; a multi-page PDF document collating the <a href="http://www.iwight.com/azservices/documents/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf">Notice of polls</a> for each electoral division (<a href="https://dl.dropboxusercontent.com/u/1156404/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf">archive copy</a>): </p>
<p><a href="http://www.iwight.com/azservices/documents/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf"><img src="http://ouseful.files.wordpress.com/2013/05/iw-council-notice-of-poll.png?w=700" alt="IW council - notice of poll"   class="alignnone size-full wp-image-10548" /></a></p>
<p>Although it&#8217;s a PDF, the document is reasonably nicely structured for scraping (I&#8217;ll do a post on this over the next week or two) &#8211; you can find a Scraperwiki scraper <a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/">here</a>. I pull out three sorts of data &#8211; information about the polling stations (the table at the bottom of the page), information about the signatories (of which, more in a later post&#8230;;-), and information about the candidates, including the electoral division in which they were standing (the &#8220;ward&#8221; column) and a home address for them, as shown here:</p>
<p><a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/"><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-candidates.png?w=700&#038;h=371" alt="scraperwiki candidates" width="700" height="371" class="alignnone size-full wp-image-10549" /></a></p>
<p>So what might we be able to do with this information? Does the <em>home address</em> take us anywhere interesting? Maybe. If we can easily look up the electoral division the home addresses fall in, we have a handful of news story search opportunities: 1) to what extent are candidates &#8211; and election winners &#8211; &#8220;local&#8221;? 2) do any of the parties appear to favour standing in/out of ward candidates? 3) if candidates are standing out of their home ward, why? If we complement the data with information about the number of votes cast for each candidate, might we be able to find any patterns suggestive of a beneficial or detrimental effect living within, or outside of, the electoral division a candidate is standing in, and so on.</p>
<p>In this post, I&#8217;ll describe a way of having a conversation with the data using OpenRefine and Google Fusion Tables as a way of starting to explore some the stories we may be able to tell with, and around, the data. (Bruce Mcphereson/<em>Excel Liberation</em> blog has also posted an Excel version of the methods described in the post: <a href="http://excelramblings.blogspot.co.uk/2013/05/mashing-up-electoral-data-follow-on.html">Mashing up electoral data</a>. Thanks, Bruce:-)</p>
<p>Let&#8217;s get the data into OpenRefine so we can start to work it. Scraperwiki provides a CSV output format for each scraper table, so we can <a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&amp;name=iw_poll_notices_scrape&amp;query=select+*+from+`candidates`&amp;apikey=">get a URL</a> for it that we can then use to pull the data into OpenRefine:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/scraperwiki-csv-export.png"><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-csv-export.png?w=700" alt="scraperwiki CSV export"   class="alignnone size-full wp-image-10581" /></a></p>
<p>In OpenRefine, we can Create a New Project and then import the data directly:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/openrefine-import-from-url.png"><img src="http://ouseful.files.wordpress.com/2013/05/openrefine-import-from-url.png?w=700&#038;h=150" alt="openrefine import from URL" width="700" height="150" class="alignnone size-full wp-image-10580" /></a></p>
<p>The data is in comma separated CSV format, so let&#8217;s specify that:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/import-as-csv-comma-separated.png"><img src="http://ouseful.files.wordpress.com/2013/05/import-as-csv-comma-separated.png?w=700" alt="import as csv comma separated"   class="alignnone size-full wp-image-10579" /></a></p>
<p>We can then name and create the project and we&#8217;re ready to start&#8230;</p>
<p>&#8230;but start what? If we want to find out if a candidate lives in ward or out of ward, we either need to know whether their address is in ward or out of ward, or we need to find out which ward their address is in and then see if it is the same as the one they are standing in.</p>
<p>Now it just so happens (:-) that MySociety run a service called <a href="http://mapit.mysociety.org/">MapIt</a> that lets you submit a postcode and it tells you a whole host of things about what administrative areas that postcode is in, including (in this case) the unitary authority electoral division.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/mapit-postcode-lookup.png"><img src="http://ouseful.files.wordpress.com/2013/05/mapit-postcode-lookup.png?w=700&#038;h=487" alt="mapit postcode lookup" width="700" height="487" class="alignnone size-full wp-image-10577" /></a></p>
<p>And what&#8217;s more, MapIt also makes the data available in a format that&#8217;s data ready for OpenRefine to be able to read at a web address (aka a URL) that we can construct from a postcode:</p>
<p><a href="http://mapit.mysociety.org/postcode/po36%200JT.html"><img src="http://ouseful.files.wordpress.com/2013/05/mapit-json.png?w=700" alt="mapit json"   class="alignnone size-full wp-image-10578" /></a></p>
<p>Here&#8217;s an example of just such a web address: <tt><a href="http://mapit.mysociety.org/postcode/PO36%200JT" rel="nofollow">http://mapit.mysociety.org/postcode/PO36%200JT</a></tt></p>
<p>Can you see the postcode in there? <tt><a href="http://mapit.mysociety.org/postcode/" rel="nofollow">http://mapit.mysociety.org/postcode/</a><strong>PO36%200JT</strong></tt></p>
<p>The %20 is a character encoding for a space. In this case, we can also use a +.</p>
<p>So &#8211; to get information about the electoral division an address lays in, we need to get the postcode, construct a URL to pull down corresponding data from MapIt, and then figure out some way to get the electoral division name out of the data. But one step at a time, eh?!;-)</p>
<p><em>Hmmm&#8230;I wonder if postcode areas <em>necessarily</em> fall within electoral divisions? I can imagine (though it may be incorrect to do so!) a situation where a division boundary falls within a postcode area, so we need to be suspicious about the result, or at least bear in mind that an address falling near a division boundary may be wrongly classified. (I guess if we plot postcodes on a map, we could look to see how close to the boundary line they are, because <a href="http://schoolofdata.org/2013/05/02/proving-the-data-a-quick-guide-to-mapping-local-elections/">we already know how to plot boundary lines</a>.</em></p>
<p>To grab the postcode, a quick skim of the addresses suggests that they are written in a standard way &#8211; the postcode always seems to appear at the end of the string preceded by a comma. We can use this information to extract the postcode, by splitting the address at each comma into an ordered list of chunks, then picking the last item in the list. Because the postcode might be preceded by a space character, it&#8217;s often convenient for us to <em>strip()</em> any white space surrounding it.</p>
<p>What we want to do then is to create a new, derived column based on the address:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/add-derived-column.png"><img src="http://ouseful.files.wordpress.com/2013/05/add-derived-column.png?w=700" alt="Add derived column"   class="alignnone size-full wp-image-10576" /></a></p>
<p>And we do this by creating a list of comma separated chunks from the address, picking the last one (by counting backwards from the end of the list), and then stripping off any whitespace/space characters that surround it:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/grab-a-postcode.png"><img src="http://ouseful.files.wordpress.com/2013/05/grab-a-postcode.png?w=700" alt="grab a postcode"   class="alignnone size-full wp-image-10575" /></a></p>
<p>Here&#8217;s the result&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/postcodes.png"><img src="http://ouseful.files.wordpress.com/2013/05/postcodes.png?w=700" alt="postcodes..."   class="alignnone size-full wp-image-10574" /></a></p>
<p>Having got the postcode, we can now generate a URL from it and then pull down the data from each URL:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/col-from-url.png"><img src="http://ouseful.files.wordpress.com/2013/05/col-from-url.png?w=700" alt="col from URL"   class="alignnone size-full wp-image-10573" /></a></p>
<p>When constructing the web address, we need to remember to encode the postcode by escaping it so as not to break the URL:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/get-data-from-url.png"><img src="http://ouseful.files.wordpress.com/2013/05/get-data-from-url.png?w=700" alt="get data from URL"   class="alignnone size-full wp-image-10572" /></a></p>
<p>The throttle value slows down the rate at which OpenRefine loads in data from the URLs. If we set it to 500 milliseconds, it will load one page every half a second.</p>
<p>When it&#8217;s loaded in all the data, we get a new column, filled with data from the MapIt service&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/lots-of-data.png"><img src="http://ouseful.files.wordpress.com/2013/05/lots-of-data.png?w=700&#038;h=307" alt="lots of data" width="700" height="307" class="alignnone size-full wp-image-10571" /></a></p>
<p>We now need to parse this data (which is in a JSON format) to pull out the electoral division. There&#8217;s a bit of jiggery pokery required to do this, and I couldn&#8217;t work it out myself at first, but <a href="http://stackoverflow.com/questions/10782737/google-refine-iterate-over-a-json-dictionary">Stack Overflow came to the rescue</a>:</p>
<p><a href="http://stackoverflow.com/questions/10782737/google-refine-iterate-over-a-json-dictionary"><img src="http://ouseful.files.wordpress.com/2013/05/thats-handy.png?w=700&#038;h=710" alt="that&#039;s handy..." width="700" height="710" class="alignnone size-full wp-image-10570" /></a></p>
<p>We need to tweak that expression slightly by first grabbing the <em>areas</em> data from the full set of MapIt data. Here&#8217;s the expression I used:</p>
<p><tt>filter(('[' + (value.parseJson()['areas'].replace( /"[0-9]+":/,""))[1,-1] + ']' ).parseJson(), v, v['type']=='UTE' )[0]['name']</tt></p>
<p>to create a new column containing the electoral division:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/parse-out-the-electroal-division.png"><img src="http://ouseful.files.wordpress.com/2013/05/parse-out-the-electroal-division.png?w=700" alt="parse out the electroal division"   class="alignnone size-full wp-image-10569" /></a></p>
<p>Now we can create another column, this time based on the new Electoral Division column, that compares the value against the corresponding original &#8220;ward&#8221; column value (i.e. the electoral division the candidate was standing in) and prints a message saying whether they were standing <em>in</em> ward or <em>out</em>:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/inward-or-out.png"><img src="http://ouseful.files.wordpress.com/2013/05/inward-or-out.png?w=700" alt="inward or out"   class="alignnone size-full wp-image-10568" /></a></p>
<p>If we collapse down the spare columns, we get a clearer picture:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/collapse.png"><img src="http://ouseful.files.wordpress.com/2013/05/collapse.png?w=700" alt="collapse..."   class="alignnone size-full wp-image-10567" /></a></p>
<p>Like this:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/summary-data.png"><img src="http://ouseful.files.wordpress.com/2013/05/summary-data.png?w=700&#038;h=189" alt="summary data" width="700" height="189" class="alignnone size-full wp-image-10566" /></a></p>
<p>If we generate a text facet on the In/Out column, and increase the number of rows displayed, we can filter the results to show just the candidates who stood in their local electoral division (or conversely, those who stood outside it):</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/facet-on-inout.png"><img src="http://ouseful.files.wordpress.com/2013/05/facet-on-inout.png?w=700" alt="facet on inout"   class="alignnone size-full wp-image-10565" /></a></p>
<p>We can also start to get investigative, and ask some more questions of the data. For example, we could apply a text facet on the party/desc column to let us filter the results even more&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/inout-facet-filter.png"><img src="http://ouseful.files.wordpress.com/2013/05/inout-facet-filter.png?w=700&#038;h=128" alt="inout facet filter" width="700" height="128" class="alignnone size-full wp-image-10564" /></a></p>
<p>Hmmm&#8230; were most of the Labour Party candidates standing outside their home division (and hence unable to vote for themselves?!)</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/hmm-labour-out.png"><img src="http://ouseful.files.wordpress.com/2013/05/hmm-labour-out.png?w=700&#038;h=230" alt="Hmm.. labour out" width="700" height="230" class="alignnone size-full wp-image-10563" /></a></p>
<p>There aren&#8217;t too many parties represented across the Island elections (a text facet on the desc/party description column should reveal them all), so it wouldn&#8217;t be too hard to treat the data as a source, get paper and pen in hand, and write down the in/out counts for each party describing the extent to which they fielded candidates who lived in the electoral divisions they were standing in (and as such, could vote for themselves!) versus those who lived &#8220;outside&#8221;. This data could reasonably be displayed using a staggered bar chart (the data collection and plotting are left as an exercise for the reader  <em>[See Bruce Mcphereson's <a href="http://excelramblings.blogspot.co.uk/2013/05/mashing-up-electoral-data-follow-on.html">Mashing up electoral data</a> post for a stacked bar chart view.]</em>;-) Another possible questioning line is how do the different electoral divisions fare in terms of in-vs-out resident candidates. If we pull in affluence/poverty data, might it tell us anything about the likelihood of candidates living in area, or even tell us something about the likely socio-economic standing of the candidates?</p>
<p>One more thing we could try to do is to geocode the postcode of the address of the each candidate rather more exactly. A blog post by Ordnance Survey blogger John Goodwin (@gothwin) shows how we might do this (<em>note: copying the code from John&#8217;s post won&#8217;t necessarily work; WordPress has a tendency to replace single quotes with all manner of exotic punctuation marks that f**k things up when you copy and paste them into froms for use in other contexts</em>). When we &#8220;Add column by fetching URLs&#8221;, we should use something along the lines of the following:</p>
<p><tt>'http://beta.data.ordnancesurvey.co.uk/datasets/code-point-open/apis/search?output=json&amp;query=' + escape(value,'url')</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/os-postcode-lookup.png"><img src="http://ouseful.files.wordpress.com/2013/05/os-postcode-lookup.png?w=700" alt="os postcode lookup"   class="alignnone size-full wp-image-10562" /></a></p>
<p>The data, as imported from the Ordnance Survey, looks something like this:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/osdata.png"><img src="http://ouseful.files.wordpress.com/2013/05/osdata.png?w=700&#038;h=233" alt="o:sdata" width="700" height="233" class="alignnone size-full wp-image-10587" /></a></p>
<p>As is the way of national services, the Ordnance Survey returns a data format that is all well and good but isn&#8217;t the one that mortals use. Many of my geo-recipes rely on latitude and longitude co-ordinates, but the call to the Ordnance Survey API returns Eastings and Northings.</p>
<p>Fortunately, Paul Bradshaw had come across this problem before (<a href="http://onlinejournalismblog.com/2011/08/12/how-to-convert-eastingnorthing-into-latlong-for-an-interactive-map/">How to: Convert Easting/Northing into Lat/Long for an Interactive Map</a>) and bludgeoned(?!;-) Stuart harrison/@pezholio, ex- of Lichfield Council, now of the Open Data Institute, to produce a pop-up service that returns lat/long co-ordinates in exchange for a Northing/Easting pair.</p>
<p>The service relies on URLs of the form <tt><a href="http://www.uk-postcodes.com/eastingnorthing.php?easting=" rel="nofollow">http://www.uk-postcodes.com/eastingnorthing.php?easting=</a><strong>EASTING</strong>&amp;northing=<strong>NORTHING</strong></tt>, which we can construct from data returned from the Ordnance Survey API:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/easting-northing-lat-long.png"><img src="http://ouseful.files.wordpress.com/2013/05/easting-northing-lat-long.png?w=700" alt="easting northing lat -long"   class="alignnone size-full wp-image-10561" /></a></p>
<p>Here&#8217;s what the returned lat/long data looks like:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/lat-long-json.png"><img src="http://ouseful.files.wordpress.com/2013/05/lat-long-json.png?w=700" alt="lat-long json"   class="alignnone size-full wp-image-10560" /></a></p>
<p>We can then create a new column derived from this JSON data by parsing it as follows<br />
<a href="http://ouseful.files.wordpress.com/2013/05/parse-latlong-to-lat.png"><img src="http://ouseful.files.wordpress.com/2013/05/parse-latlong-to-lat.png?w=700" alt="parse latlong to lat"   class="alignnone size-full wp-image-10559" /></a></p>
<p>A similar trick can be used to generate a column containing just the longitude data.</p>
<p>We can then export a view over the data to a CSV file, or <a href="https://www.google.com/fusiontables/DataSource?docid=1-Ngzhdy3WoRO9_CHFzh48rqeV36aMnzAB-PPEqs">direct to Google Fusion tables</a>.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/postcode-lat-long-export.png"><img src="http://ouseful.files.wordpress.com/2013/05/postcode-lat-long-export.png?w=700" alt="postcode lat long export"   class="alignnone size-full wp-image-10558" /></a></p>
<p>With the data in Google Fusion Tables, we can let Fusion Tables know that the Postcode lat and Postcode long columns define a location:2222</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/fusion-table-edit-column.png"><img src="http://ouseful.files.wordpress.com/2013/05/fusion-table-edit-column.png?w=700" alt="Fusion table edit column"   class="alignnone size-full wp-image-10557" /></a></p>
<p>Specifically, we pick either the lat or the long column and use it to cast a two column latitude and longitude location type:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/fusion-table-config-cols-to-location-type.png"><img src="http://ouseful.files.wordpress.com/2013/05/fusion-table-config-cols-to-location-type.png?w=700" alt="fusion table config cols to location type"   class="alignnone size-full wp-image-10556" /></a></p>
<p>We can inspect the location data using a more convenient &#8220;natural&#8221; view over it&#8230;</p>
<p><a href="https://www.google.com/fusiontables/DataSource?docid=1-Ngzhdy3WoRO9_CHFzh48rqeV36aMnzAB-PPEqs"><img src="http://ouseful.files.wordpress.com/2013/05/fusion-table-add-map.png?w=700" alt="fusion table add map"   class="alignnone size-full wp-image-10555" /></a></p>
<p>By applying a filter, we can look to see where the candidates for a particular ward have declared their home address to be:</p>
<p><a href="https://www.google.com/fusiontables/DataSource?docid=1-Ngzhdy3WoRO9_CHFzh48rqeV36aMnzAB-PPEqs"><img src="http://ouseful.files.wordpress.com/2013/05/havenstreet-candidates.png?w=700&#038;h=422" alt="havenstreet candidates" width="700" height="422" class="alignnone size-full wp-image-10553" /></a></p>
<p>(Note &#8211; it would be more useful to plot these markers over a boundary line defined region corresponding to the area covered by the corresponding electoral ward. I don&#8217;t think Fusion Table lets you do this directly (or if it does, I don&#8217;t know how to do it..!). This workaround &#8211; <a href="http://fusion-tables-api-samples.googlecode.com/svn/trunk/FusionTablesLayerWizard/src/index.html">FusionTablesLayer Wizard</a> &#8211; on merging outputs from Fusion Tables as separate layers on a Google Map is the closest I&#8217;ve found following a not very thorough search;-)</p>
<p>We can go back to the tabular view in Fusion Tables to run a filter to see who the candidates were in a particular electoral division, or we can go back to OpenRefine and run a filter (or a facet) on the ward column to see who the candidates were:</p>
<p><a href="https://www.google.com/fusiontables/DataSource?docid=1-Ngzhdy3WoRO9_CHFzh48rqeV36aMnzAB-PPEqs"><img src="http://ouseful.files.wordpress.com/2013/05/refine-filter-by-division.png?w=700&#038;h=349" alt="refine filter by division" width="700" height="349" class="alignnone size-full wp-image-10552" /></a></p>
<p>Filtering on some of the other wards using local knowledge (i.e. using the filter to check/corroborate things I knew), I spotted a couple of missing markers. Going back to the OpenRefine view of the data, I ran a facetted view on the postcode to see if there were any &#8220;none-postcodes&#8221; there that would in turn break the Ordnance Survey postcode geocoding/lookup:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/postcode-missing.png"><img src="http://ouseful.files.wordpress.com/2013/05/postcode-missing.png?w=700&#038;h=320" alt="postcode missing..." width="700" height="320" class="alignnone size-full wp-image-10554" /></a></p>
<p>Ah &#8211; oops&#8230; It seems we have a &#8220;data quality&#8221; issue, although albeit a minor one&#8230;</p>
<p><em>So, what do we learn from all this? One take away for me is that data is a source we can ask questions of. If we have a story or angle in mind, we can tune our questions to tease out corroborating facts (possbily! <tt>caveat emptor</tt> applies!) that might confirm, helpdevelop, or even cause us to rethink, the story we are working towards telling based on the support the data gives us.</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10547/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10547&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=0fdIopdDiTI:cYb4p9L7D8s:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0fdIopdDiTI:cYb4p9L7D8s:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0fdIopdDiTI:cYb4p9L7D8s:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=0fdIopdDiTI:cYb4p9L7D8s:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0fdIopdDiTI:cYb4p9L7D8s:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=0fdIopdDiTI:cYb4p9L7D8s:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0fdIopdDiTI:cYb4p9L7D8s:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=0fdIopdDiTI:cYb4p9L7D8s:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=0fdIopdDiTI:cYb4p9L7D8s:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/0fdIopdDiTI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/05/questioning-election-data-to-see-if-it-has-a-story-to-tell/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/iw-council-notice-of-poll.png" medium="image">
			<media:title type="html">IW council - notice of poll</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-candidates.png" medium="image">
			<media:title type="html">scraperwiki candidates</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-csv-export.png" medium="image">
			<media:title type="html">scraperwiki CSV export</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/openrefine-import-from-url.png" medium="image">
			<media:title type="html">openrefine import from URL</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/import-as-csv-comma-separated.png" medium="image">
			<media:title type="html">import as csv comma separated</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/mapit-postcode-lookup.png" medium="image">
			<media:title type="html">mapit postcode lookup</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/mapit-json.png" medium="image">
			<media:title type="html">mapit json</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/add-derived-column.png" medium="image">
			<media:title type="html">Add derived column</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/grab-a-postcode.png" medium="image">
			<media:title type="html">grab a postcode</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/postcodes.png" medium="image">
			<media:title type="html">postcodes...</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/col-from-url.png" medium="image">
			<media:title type="html">col from URL</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/get-data-from-url.png" medium="image">
			<media:title type="html">get data from URL</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/lots-of-data.png" medium="image">
			<media:title type="html">lots of data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/thats-handy.png" medium="image">
			<media:title type="html">that's handy...</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/parse-out-the-electroal-division.png" medium="image">
			<media:title type="html">parse out the electroal division</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/inward-or-out.png" medium="image">
			<media:title type="html">inward or out</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/collapse.png" medium="image">
			<media:title type="html">collapse...</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/summary-data.png" medium="image">
			<media:title type="html">summary data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/facet-on-inout.png" medium="image">
			<media:title type="html">facet on inout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/inout-facet-filter.png" medium="image">
			<media:title type="html">inout facet filter</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/hmm-labour-out.png" medium="image">
			<media:title type="html">Hmm.. labour out</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/os-postcode-lookup.png" medium="image">
			<media:title type="html">os postcode lookup</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/osdata.png" medium="image">
			<media:title type="html">o:sdata</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/easting-northing-lat-long.png" medium="image">
			<media:title type="html">easting northing lat -long</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/lat-long-json.png" medium="image">
			<media:title type="html">lat-long json</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/parse-latlong-to-lat.png" medium="image">
			<media:title type="html">parse latlong to lat</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/postcode-lat-long-export.png" medium="image">
			<media:title type="html">postcode lat long export</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/fusion-table-edit-column.png" medium="image">
			<media:title type="html">Fusion table edit column</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/fusion-table-config-cols-to-location-type.png" medium="image">
			<media:title type="html">fusion table config cols to location type</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/fusion-table-add-map.png" medium="image">
			<media:title type="html">fusion table add map</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/havenstreet-candidates.png" medium="image">
			<media:title type="html">havenstreet candidates</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/refine-filter-by-division.png" medium="image">
			<media:title type="html">refine filter by division</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/postcode-missing.png" medium="image">
			<media:title type="html">postcode missing...</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/05/questioning-election-data-to-see-if-it-has-a-story-to-tell/</feedburner:origLink></item>
		<item>
		<title>A Wrangling Example With OpenRefine: Making “Oven Ready Data”</title>
		<link>http://feeds.ouseful.info/~r/ouseful/~3/2Z5KZm7WhbE/</link>
		<comments>http://blog.ouseful.info/2013/05/03/a-wrangling-example-with-openrefine-making-ready-data/#comments</comments>
		<pubDate>Fri, 03 May 2013 23:33:20 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[OpenRefine]]></category>
		<category><![CDATA[openrefine]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10500</guid>
		<description>As well as being a great tool for cleaning data, OpenRefine can also be used to good effect when you need to wrangle a dataset into another shape. Take this set of local election results published by the Isle of Wight local online news blog, onthewight.com: There&amp;#8217;s lots of information in there (rank of each [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;#038;blog=325417&amp;#038;post=10500&amp;#038;subd=ouseful&amp;#038;ref=&amp;#038;feed=1" width="1" height="1" /&gt;</description>
				<content:encoded><![CDATA[<p>As well as being a great tool for cleaning data, OpenRefine can also be used to good effect when you need to wrangle a dataset into another <em>shape</em>. Take this set of <a href="http://onthewight.com/2013/05/03/isle-of-wight-election-results-2013-the-detail/">local election results</a> published by the Isle of Wight local online news blog, onthewight.com:</p>
<p><a href="http://onthewight.com/2013/05/03/isle-of-wight-election-results-2013-the-detail/"><img src="http://ouseful.files.wordpress.com/2013/05/onthewight-results.png?w=700&#038;h=671" alt="onthewight results" width="700" height="671" class="alignnone size-full wp-image-10501" /></a></p>
<p>There&#8217;s lots of information in there (rank of each candidate for each electoral division, votes cast per candidate, size of electorate for the division, and hence percentage turnout, and so on), and it&#8217;s very nearly available in a <em>ready data</em> format &#8211; that is, a data format that is ready for reuse&#8230; Something like this, for example:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/slightly-tidier.png"><img src="http://ouseful.files.wordpress.com/2013/05/slightly-tidier.png?w=700" alt="Slightly tidier"   class="alignnone size-full wp-image-10502" /></a></p>
<p>Or how about something like this, that shows the size of the electorate for each ward:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/turnout.png"><img src="http://ouseful.files.wordpress.com/2013/05/turnout.png?w=700" alt="turnout"   class="alignnone size-full wp-image-10538" /></a></p>
<p>So how can we get from the OnTheWight results into a ready data format?</p>
<p>Let&#8217;s start by copying all the data from OnTheWight (click into the results frame, select all (ctrl-A) and copy (ctrl-v); I&#8217;ve also posted a copy of the data I grabbed <a href="https://dl.dropboxusercontent.com/u/1156404/IW-results.tsv">here</a>*), then paste the data into a new OpenRefine project:</p>
<p><a href="https://docs.google.com/spreadsheet/ccc?key=0AirrQecc6H_vdDZzakNidXMtY05rRlF4U0VRd2VjM0E&amp;usp=sharing"><img src="http://ouseful.files.wordpress.com/2013/05/paste-data-into-openrefine.png?w=700&#038;h=357" alt="Paste data into OpenRefine" width="700" height="357" class="alignnone size-full wp-image-10534" /></a></p>
<p><small><em>* there were a couple of data quality issues (now resolved in the sheet published by OnTheWight) which relate to the archived data file/data used in this walkthrough. Here are the change notes from @onTheWight:</p>
<p><tt>_Corrected vote numbers<br />
Totland - Winning votes wrong - missed zero off end - 420 not 42<br />
Brading, St Helens &amp; Bembridge - Mike Tarrant (UKIP) got 741 not 714</p>
<p>_Votes won by figures - filled in<br />
Lots of the 'Votes won by figures' had the wrong number in them. It's one of the few figures that needed a manual formula update and in the rush of results (you heard how fast they come), it just wasn't possible.</p>
<p>'Postal votes (inc)' line inserted between 'Total votes cast' and 'Papers spoilt'</p>
<p>Deleted an empty row from Ventnor West</tt></em></small></p>
<p>The data format is &#8220;tab separated&#8221;, so we can import it as such. We might as well get rid of the blank lines at the same time.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/import-data-as-tsv-no-blanks.png"><img src="http://ouseful.files.wordpress.com/2013/05/import-data-as-tsv-no-blanks.png?w=700&#038;h=532" alt="import data as TSV no blanks" width="700" height="532" class="alignnone size-full wp-image-10533" /></a></p>
<p>Here&#8217;s what we end up with:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/election-data-raw-import.png"><img src="http://ouseful.files.wordpress.com/2013/05/election-data-raw-import.png?w=700" alt="ELection data raw import"   class="alignnone size-full wp-image-10532" /></a></p>
<p>The data format I want is has a column specifying the ward each candidate stood in. Let&#8217;s start by creating a new column that is a copy of the column that has the Electoral Division names in it:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/copy-a-column.png"><img src="http://ouseful.files.wordpress.com/2013/05/copy-a-column.png?w=700" alt="COpy a column"   class="alignnone size-full wp-image-10531" /></a></p>
<p>Let&#8217;s define the new column as having exactly the same <tt>value</tt> as the original column:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/create-new-col-as-copy-of-old.png"><img src="http://ouseful.files.wordpress.com/2013/05/create-new-col-as-copy-of-old.png?w=700" alt="Create new col as copy of old"   class="alignnone size-full wp-image-10530" /></a></p>
<p>Now we start puzzling based on what we want to achieve bearing in mind what we can do with OpenRefine. (Sometimes there are many ways of solving a problem, sometimes there is only one, sometimes there may not be any obvious route&#8230;)</p>
<p>The Electoral Division column contains the names of the Electoral Divisions on some rows, and numbers (highlighted green) on others. If we identify the rows containing numbers in that column, we can blank them out&#8230; The Numeric facet will let us do that:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/facet-the-numbers.png"><img src="http://ouseful.files.wordpress.com/2013/05/facet-the-numbers.png?w=700" alt="Facet the numbers"   class="alignnone size-full wp-image-10529" /></a></p>
<p>Select just the rows containing a numeric value in the Electoral Division column, and then replace those values with blanks.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/filter-and-blank.png"><img src="http://ouseful.files.wordpress.com/2013/05/filter-and-blank.png?w=700&#038;h=315" alt="filter and blank" width="700" height="315" class="alignnone size-full wp-image-10528" /></a></p>
<p>Then remove the numeric facet filter:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/filter-update.png"><img src="http://ouseful.files.wordpress.com/2013/05/filter-update.png?w=700" alt="filter update"   class="alignnone size-full wp-image-10527" /></a></p>
<p>Here&#8217;s the result, much tidier:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/much-tidier.png"><img src="http://ouseful.files.wordpress.com/2013/05/much-tidier.png?w=700&#038;h=504" alt="Much tidier" width="700" height="504" class="alignnone size-full wp-image-10526" /></a></p>
<p>Before we fill in the blanks with the Electoral Division names, let&#8217;s just note that there is at least one &#8220;messy&#8221; row in there corresponding to Winning Margin. We don&#8217;t really need that row &#8211; we can always calculate it &#8211; so let&#8217;s remove it. One way of doing this is to display just the rows containing the &#8220;Winning margin&#8221; string in column three, and then delete them. We can use the TExt filter to highlight the rows:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/selectt-openrefine-filter.png"><img src="http://ouseful.files.wordpress.com/2013/05/selectt-openrefine-filter.png?w=700" alt="Selectt OpenRefine filter"   class="alignnone size-full wp-image-10525" /></a></p>
<p>Simply state the value you want to filter on and blitz the matching rows&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/choose-rows-then-blitz-them.png"><img src="http://ouseful.files.wordpress.com/2013/05/choose-rows-then-blitz-them.png?w=700&#038;h=308" alt="CHoose rows then blitz them" width="700" height="308" class="alignnone size-full wp-image-10524" /></a></p>
<p>&#8230;then remove the filter:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/then-remove-the-filter.png"><img src="http://ouseful.files.wordpress.com/2013/05/then-remove-the-filter.png?w=700" alt="then remove the filter"   class="alignnone size-full wp-image-10523" /></a></p>
<p>We can now fill down a the blanks in the Electoral Division column:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/fill-down-on-electoral-division.png"><img src="http://ouseful.files.wordpress.com/2013/05/fill-down-on-electoral-division.png?w=700" alt="Fill down on Electoral Division"   class="alignnone size-full wp-image-10522" /></a></p>
<p>Fill down starts at the top of the column then works its way down, filling in blank cells in that column with whatever was in the cell immediately above.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/filled-down-now-flag-unwanted-row.png"><img src="http://ouseful.files.wordpress.com/2013/05/filled-down-now-flag-unwanted-row.png?w=700" alt="Filled down - now flag unwanted row"   class="alignnone size-full wp-image-10521" /></a></p>
<p>Looking at the data, I notice the first row is also &#8220;unwanted&#8221;. If we flag it, we can then facet/filter on that row from the All menu:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/facet-on-flagged-row.png"><img src="http://ouseful.files.wordpress.com/2013/05/facet-on-flagged-row.png?w=700&#038;h=212" alt="facet on flagged row" width="700" height="212" class="alignnone size-full wp-image-10520" /></a></p>
<p>Then we can Remove all matching rows from the cell menu as we did above, then remove the facet.</p>
<p>Now we can turn to just getting the data relating to votes cast per candidate (we could also leave in the other returns). Let&#8217;s use a trick we&#8217;ve already used before &#8211; facet by numeric:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/remove-header-rows.png"><img src="http://ouseful.files.wordpress.com/2013/05/remove-header-rows.png?w=700" alt="Remove header rows"   class="alignnone size-full wp-image-10519" /></a></p>
<p>And then this time just retain the non-numeric rows.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/electoral-ward-properties.png"><img src="http://ouseful.files.wordpress.com/2013/05/electoral-ward-properties.png?w=700&#038;h=239" alt="Electoral ward properties" width="700" height="239" class="alignnone size-full wp-image-10518" /></a></p>
<p>Hmmm..before we remove it, this data could be worth keeping too in its own right? Let&#8217;s rename the columns:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/rename-column.png"><img src="http://ouseful.files.wordpress.com/2013/05/rename-column.png?w=700" alt="Rename column"   class="alignnone size-full wp-image-10517" /></a></p>
<p>Like so:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/columns-renamed.png"><img src="http://ouseful.files.wordpress.com/2013/05/columns-renamed.png?w=700" alt="columns renamed"   class="alignnone size-full wp-image-10516" /></a></p>
<p>Now let&#8217;s just make those comma mangled numbers into numbers, by transforming them:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/transform-the-cells-by-removeing-commas.png"><img src="http://ouseful.files.wordpress.com/2013/05/transform-the-cells-by-removeing-commas.png?w=700" alt="transform the cells by removeing commas"   class="alignnone size-full wp-image-10515" /></a></p>
<p>The transform we&#8217;re going to use is to replace the comma by nothing:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/replace-comma.png"><img src="http://ouseful.files.wordpress.com/2013/05/replace-comma.png?w=700" alt="replace comma"   class="alignnone size-full wp-image-10514" /></a></p>
<p>Then convert the values to a number type.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/then-convert-to-number.png"><img src="http://ouseful.files.wordpress.com/2013/05/then-convert-to-number.png?w=700" alt="then convert to number"   class="alignnone size-full wp-image-10513" /></a></p>
<p>We can the do the same thing for the Number on Roll column:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/reuse-is-good.png"><img src="http://ouseful.files.wordpress.com/2013/05/reuse-is-good.png?w=700" alt="reuse is good"   class="alignnone size-full wp-image-10512" /></a></p>
<p>We seem to have a rogue row in there too &#8211; a Labour candidate with a 0% poll. We can flag that row and delete it as we did above.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/final-stages-of-electroal-division-data.png"><img src="http://ouseful.files.wordpress.com/2013/05/final-stages-of-electroal-division-data.png?w=700&#038;h=274" alt="Final stages of electroal division data" width="700" height="274" class="alignnone size-full wp-image-10511" /></a></p>
<p>There also seem to be a couple of other scrappy rows &#8211; the overall count and another rogue percentage bearing line, so again we can flag these, do an All facet on them, remove all rows and then remove the flag facet.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/a-little-more-tidying-to-do.png"><img src="http://ouseful.files.wordpress.com/2013/05/a-little-more-tidying-to-do.png?w=700" alt="a little more tidying to do"   class="alignnone size-full wp-image-10510" /></a></p>
<p>Having done that, we can take the opportunity to export the data.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/openrefine-exporter.png"><img src="http://ouseful.files.wordpress.com/2013/05/openrefine-exporter.png?w=700" alt="openrefine exporter"   class="alignnone size-full wp-image-10509" /></a></p>
<p>Using the custom tabular exporter, we can select the columns we wish to export.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/export-column-select.png"><img src="http://ouseful.files.wordpress.com/2013/05/export-column-select.png?w=700&#038;h=453" alt="Export column select" width="700" height="453" class="alignnone size-full wp-image-10508" /></a></p>
<p>Then we can export the data to the desktop as a file in a variety of formats:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/openrefine-export-download.png"><img src="http://ouseful.files.wordpress.com/2013/05/openrefine-export-download.png?w=700&#038;h=275" alt="OPenrefine export download" width="700" height="275" class="alignnone size-full wp-image-10506" /></a></p>
<p>Or we can upload it to a Google document store, such as Google Spreadsheets or Google Fusion Tables:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/openrefine-upload-to-goole.png"><img src="http://ouseful.files.wordpress.com/2013/05/openrefine-upload-to-goole.png?w=700" alt="OPenRefine upload to goole"   class="alignnone size-full wp-image-10507" /></a></p>
<p>Here&#8217;s <a href="https://docs.google.com/spreadsheet/ccc?key=0AirrQecc6H_vdG1GNGYtSmxvWmV2Mzg0ZDNDdlhDV1E&amp;usp=sharing">the data I uploaded</a>.</p>
<p>If we go back to the results for candidates by ward, we can export that data too, although I&#8217;d be tempted to do a little bit more tidying, for example by removing the &#8220;Votes won by&#8221; rows, and maybe also the Total Votes Cast column. I&#8217;d probably also rename what is now the Candidates column to something more meaningful! (Can you work out how?!;-)</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/change-filter-settings.png"><img src="http://ouseful.files.wordpress.com/2013/05/change-filter-settings.png?w=700&#038;h=218" alt="change filter settings" width="700" height="218" class="alignnone size-full wp-image-10505" /></a></p>
<p>When we upload the data, we can tweak the column ordering first so that the data makes a little more sense at first glance:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/reorder-columns.png"><img src="http://ouseful.files.wordpress.com/2013/05/reorder-columns.png?w=700&#038;h=457" alt="reorder columns" width="700" height="457" class="alignnone size-full wp-image-10504" /></a></p>
<p>Here&#8217;s what I uploaded to a <a href="https://docs.google.com/spreadsheet/ccc?key=0AirrQecc6H_vdDZzakNidXMtY05rRlF4U0VRd2VjM0E&amp;usp=sharing">Google spreadsheet</a>:</p>
<p><a href="https://docs.google.com/spreadsheet/ccc?key=0AirrQecc6H_vdDZzakNidXMtY05rRlF4U0VRd2VjM0E&amp;usp=sharing"><img src="http://ouseful.files.wordpress.com/2013/05/spreadsheet.png?w=700" alt="Spreadsheet"   class="alignnone size-full wp-image-10503" /></a></p>
<p>[<a href="https://dl.dropboxusercontent.com/u/1156404/IW-elections.google-refine.tar.gz">OpenRefine project file</a>]</p>
<p><em>So &#8211; there you have it&#8230; another OpenRefine walkthrough. Part conversation with the data, part puzzle. As with most puzzles, once you start to learn the tricks, it becomes ever easier&#8230; Or you can start taking on ever more complex puzzles&#8230;</p>
<p>Although you may not realise it, most of the work related to generating raw graphics has now been done. Once the data has a reasonable shape to it, it becomes oven ready, data ready, and is relatively easy to work with.</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10500/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10500/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10500&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" /><div class="feedflare">
<a href="http://feeds.ouseful.info/~ff/ouseful?a=2Z5KZm7WhbE:hN2t2MgtAJM:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ouseful?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=2Z5KZm7WhbE:hN2t2MgtAJM:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=2mJPEYqXBVI" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=2Z5KZm7WhbE:hN2t2MgtAJM:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ouseful?i=2Z5KZm7WhbE:hN2t2MgtAJM:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=2Z5KZm7WhbE:hN2t2MgtAJM:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ouseful?i=2Z5KZm7WhbE:hN2t2MgtAJM:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=2Z5KZm7WhbE:hN2t2MgtAJM:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/ouseful?i=2Z5KZm7WhbE:hN2t2MgtAJM:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.ouseful.info/~ff/ouseful?a=2Z5KZm7WhbE:hN2t2MgtAJM:cGdyc7Q-1BI"><img src="http://feeds.feedburner.com/~ff/ouseful?d=cGdyc7Q-1BI" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ouseful/~4/2Z5KZm7WhbE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/03/a-wrangling-example-with-openrefine-making-ready-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/onthewight-results.png" medium="image">
			<media:title type="html">onthewight results</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/slightly-tidier.png" medium="image">
			<media:title type="html">Slightly tidier</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/turnout.png" medium="image">
			<media:title type="html">turnout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/paste-data-into-openrefine.png" medium="image">
			<media:title type="html">Paste data into OpenRefine</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/import-data-as-tsv-no-blanks.png" medium="image">
			<media:title type="html">import data as TSV no blanks</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/election-data-raw-import.png" medium="image">
			<media:title type="html">ELection data raw import</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/copy-a-column.png" medium="image">
			<media:title type="html">COpy a column</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/create-new-col-as-copy-of-old.png" medium="image">
			<media:title type="html">Create new col as copy of old</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/facet-the-numbers.png" medium="image">
			<media:title type="html">Facet the numbers</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/filter-and-blank.png" medium="image">
			<media:title type="html">filter and blank</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/filter-update.png" medium="image">
			<media:title type="html">filter update</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/much-tidier.png" medium="image">
			<media:title type="html">Much tidier</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/selectt-openrefine-filter.png" medium="image">
			<media:title type="html">Selectt OpenRefine filter</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/choose-rows-then-blitz-them.png" medium="image">
			<media:title type="html">CHoose rows then blitz them</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/then-remove-the-filter.png" medium="image">
			<media:title type="html">then remove the filter</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/fill-down-on-electoral-division.png" medium="image">
			<media:title type="html">Fill down on Electoral Division</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/filled-down-now-flag-unwanted-row.png" medium="image">
			<media:title type="html">Filled down - now flag unwanted row</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/facet-on-flagged-row.png" medium="image">
			<media:title type="html">facet on flagged row</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/remove-header-rows.png" medium="image">
			<media:title type="html">Remove header rows</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/electoral-ward-properties.png" medium="image">
			<media:title type="html">Electoral ward properties</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/rename-column.png" medium="image">
			<media:title type="html">Rename column</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/columns-renamed.png" medium="image">
			<media:title type="html">columns renamed</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/transform-the-cells-by-removeing-commas.png" medium="image">
			<media:title type="html">transform the cells by removeing commas</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/replace-comma.png" medium="image">
			<media:title type="html">replace comma</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/then-convert-to-number.png" medium="image">
			<media:title type="html">then convert to number</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/reuse-is-good.png" medium="image">
			<media:title type="html">reuse is good</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/final-stages-of-electroal-division-data.png" medium="image">
			<media:title type="html">Final stages of electroal division data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/a-little-more-tidying-to-do.png" medium="image">
			<media:title type="html">a little more tidying to do</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/openrefine-exporter.png" medium="image">
			<media:title type="html">openrefine exporter</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/export-column-select.png" medium="image">
			<media:title type="html">Export column select</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/openrefine-export-download.png" medium="image">
			<media:title type="html">OPenrefine export download</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/openrefine-upload-to-goole.png" medium="image">
			<media:title type="html">OPenRefine upload to goole</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/change-filter-settings.png" medium="image">
			<media:title type="html">change filter settings</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/reorder-columns.png" medium="image">
			<media:title type="html">reorder columns</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/spreadsheet.png" medium="image">
			<media:title type="html">Spreadsheet</media:title>
		</media:content>
	<creativeCommons:license>http://creativecommons.org/licenses/by/2.0/</creativeCommons:license><feedburner:origLink>http://blog.ouseful.info/2013/05/03/a-wrangling-example-with-openrefine-making-ready-data/</feedburner:origLink></item>
	<item><title>Links for 2013-01-05 [del.icio.us]</title><link>http://feeds.ouseful.info/~r/ouseful/~3/H16YaMdnLbA/feedthru</link><pubDate>Sun, 06 Jan 2013 00:00:00 PST</pubDate><guid isPermaLink="false">http://del.icio.us/psychemedia/feedthru#2013-01-05</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.johntedesco.net/blog/2012/06/21/how-to-solve-impossible-problems-daniel-russells-awesome-google-search-techniques/"&gt;How to solve impossible problems: Daniel Russell&amp;rsquo;s awesome Google search techniques&lt;/a&gt;&lt;br/&gt;
Handy summary of Google search tricks. Worth reading simply as a refresher... do you know about the intext: search limit for example?&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/ouseful/~4/H16YaMdnLbA" height="1" width="1"/&gt;</description><feedburner:origLink>http://del.icio.us/psychemedia/feedthru#2013-01-05</feedburner:origLink></item><item><title>Links for 2012-12-17 [del.icio.us]</title><link>http://feeds.ouseful.info/~r/ouseful/~3/ToQLXSSkOMg/feedthru</link><pubDate>Tue, 18 Dec 2012 00:00:00 PST</pubDate><guid isPermaLink="false">http://del.icio.us/psychemedia/feedthru#2012-12-17</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.mikestirling.co.uk/bbc-micro-on-an-fpga/"&gt;BBC Micro on an FPGA | mikestirling.co.uk&lt;/a&gt;&lt;br/&gt;
This is just insane - a BBC Micro on an FPGA. THe creator, Mike Stirling, has also done a ZX Spectrum on an FPGA. /sort of via @benosteen&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/ouseful/~4/ToQLXSSkOMg" height="1" width="1"/&gt;</description><feedburner:origLink>http://del.icio.us/psychemedia/feedthru#2012-12-17</feedburner:origLink></item><item><title>Links for 2012-12-15 [del.icio.us]</title><link>http://feeds.ouseful.info/~r/ouseful/~3/8TXrUaXlSuk/feedthru</link><pubDate>Sun, 16 Dec 2012 00:00:00 PST</pubDate><guid isPermaLink="false">http://del.icio.us/psychemedia/feedthru#2012-12-15</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.open.edu/openlearn/science-maths-technology/mathematics-and-statistics/statistics/diary-data-sleuth-getting-grips-the-census-data"&gt;Diary of a data sleuth: Getting to grips with the census data&lt;/a&gt;&lt;br/&gt;
Following the release of statistical tables around the 2011 England and Wales census, I thought I'd have a quick look at what sort of shape the data was in...&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/ouseful/~4/8TXrUaXlSuk" height="1" width="1"/&gt;</description><feedburner:origLink>http://del.icio.us/psychemedia/feedthru#2012-12-15</feedburner:origLink></item><item><title>Links for 2012-12-12 [del.icio.us]</title><link>http://feeds.ouseful.info/~r/ouseful/~3/DMseenkkTQc/feedthru</link><pubDate>Thu, 13 Dec 2012 00:00:00 PST</pubDate><guid isPermaLink="false">http://del.icio.us/psychemedia/feedthru#2012-12-12</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.open.edu/openlearn/science-maths-technology/mathematics-and-statistics/statistics/two-can-play-game-when-polls-collide"&gt;Two can play at that game: When polls collide - OpenLearn - Open University&lt;/a&gt;&lt;br/&gt;
Two polls, operated by the same polling organisation, that ostensibly ask similar questions but apparently give opposing results. What's going on? I search for the data...&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.open.edu/openlearn/science-maths-technology/engineering-and-technology/engineering/partial-history-ou-adventures-lego-engineering"&gt;A partial history of OU adventures in lego engineering - OpenLearn - Open University&lt;/a&gt;&lt;br/&gt;
A brief, personal history of recent OU courses where Lego has played a role...&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/ouseful/~4/DMseenkkTQc" height="1" width="1"/&gt;</description><feedburner:origLink>http://del.icio.us/psychemedia/feedthru#2012-12-12</feedburner:origLink></item><item><title>Links for 2012-12-03 [del.icio.us]</title><link>http://feeds.ouseful.info/~r/ouseful/~3/Foj2A0GZdvs/feedthru</link><pubDate>Tue, 04 Dec 2012 00:00:00 PST</pubDate><guid isPermaLink="false">http://del.icio.us/psychemedia/feedthru#2012-12-03</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.guardian.co.uk/news/datablog/2012/nov/28/data-shadows-twitter-uk-floods-mapped"&gt;Digital trails of the UK floods - how well do tweets match observations? | News | guardian.co.uk&lt;/a&gt;&lt;br/&gt;
You can imagine maps like these being used to illustrate academic research papers published months, or years, after the event. Rather than a few *days* after the event.

What does this say about the sorts of things that are valid in an academic research context, given academic research timescales?&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/ouseful/~4/Foj2A0GZdvs" height="1" width="1"/&gt;</description><feedburner:origLink>http://del.icio.us/psychemedia/feedthru#2012-12-03</feedburner:origLink></item><item><title>Links for 2012-11-28 [del.icio.us]</title><link>http://feeds.ouseful.info/~r/ouseful/~3/UzzKz8-GBs0/feedthru</link><pubDate>Thu, 29 Nov 2012 00:00:00 PST</pubDate><guid isPermaLink="false">http://del.icio.us/psychemedia/feedthru#2012-11-28</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.open.edu/openlearn/science-maths-technology/mathematics-and-statistics/statistics/diary-data-sleuth-football-injury-time"&gt;Diary of a data sleuth: Football injury time&lt;/a&gt;&lt;br/&gt;
Does Fergie-tike exist? Does injury time lead to last-minute goals?

(By me), a (failed) attempt at finding football stats and open data sets that might have helped me answer this question for myself...&lt;/li&gt;
&lt;li&gt;&lt;a href="http://blog.stephenwolfram.com/2012/11/mathematica-9-is-released-today/"&gt;Stephen Wolfram Blog : Mathematica 9 Is Released Today!&lt;/a&gt;&lt;br/&gt;
As I read of Mathematica 9 I appreciate one of the many magical things about software is how it bootstraps itself...

...and I wonder, by Mathematica 10 will we just have to write the first line of an analysis, or present a data file to it, and then just click to accept various suggestions about what code or analysis generating step to try next?&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/ouseful/~4/UzzKz8-GBs0" height="1" width="1"/&gt;</description><feedburner:origLink>http://del.icio.us/psychemedia/feedthru#2012-11-28</feedburner:origLink></item><item><title>Links for 2012-11-02 [del.icio.us]</title><link>http://feeds.ouseful.info/~r/ouseful/~3/VE0sooQYqB8/feedthru</link><pubDate>Sat, 03 Nov 2012 00:00:00 PDT</pubDate><guid isPermaLink="false">http://del.icio.us/psychemedia/feedthru#2012-11-02</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.bbc.co.uk/blogs/blogcollegeofjournalism/posts/How-to-map-your-social-network"&gt;BBC - Blogs - College of Journalism - How to map your social network&lt;/a&gt;&lt;br/&gt;
A post wot I wrote for the BBC College blog...&lt;/li&gt;
&lt;/ul&gt;&lt;img src="http://feeds.feedburner.com/~r/ouseful/~4/VE0sooQYqB8" height="1" width="1"/&gt;</description><feedburner:origLink>http://del.icio.us/psychemedia/feedthru#2012-11-02</feedburner:origLink></item></channel>
</rss>
