<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Why Open Data Is More Important than Open Source</title>
	<atom:link href="http://blog.iandavis.com/2009/03/open-data-open-source/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.iandavis.com/2009/03/open-data-open-source</link>
	<description>blog.iandavis.com</description>
	<lastBuildDate>Wed, 04 Nov 2009 14:19:37 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Jonah</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1377</link>
		<dc:creator>Jonah</dc:creator>
		<pubDate>Fri, 13 Mar 2009 03:50:39 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1377</guid>
		<description>Well put.  Now I am even more sorry I missed code4lib this year.

I have been making this argument lately by raising the concern that the free software/culture movement is being outflanked by a land grab for data. Free software is only one corner piece of this puzzle - to complete the jigsaw we need the corners of free data, in a free format.  If I can get my data back out in a free format, do I really care which calendar service I am using (if the data is meant to be public and shared, that is)? As the applications become commodities, data is king.

http://alchemicalmusings.org/2006/03/12/saints-in-the-church-of-writely/</description>
		<content:encoded><![CDATA[<p>Well put.  Now I am even more sorry I missed code4lib this year.</p>
<p>I have been making this argument lately by raising the concern that the free software/culture movement is being outflanked by a land grab for data. Free software is only one corner piece of this puzzle &#8211; to complete the jigsaw we need the corners of free data, in a free format.  If I can get my data back out in a free format, do I really care which calendar service I am using (if the data is meant to be public and shared, that is)? As the applications become commodities, data is king.</p>
<p><a href="http://alchemicalmusings.org/2006/03/12/saints-in-the-church-of-writely/" rel="nofollow">http://alchemicalmusings.org/2006/03/12/saints-in-the-church-of-writely/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Gall</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1372</link>
		<dc:creator>Nick Gall</dc:creator>
		<pubDate>Mon, 09 Mar 2009 22:17:41 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1372</guid>
		<description>Ian, you&#039;ll be happy to know that the adage &quot;data outlasts code&quot; been around for a while. Here is a quote from 1994: &quot;Awareness is growing throughout the world that data outlasts computer programs,
in the same way that an aircraft manual long outlasts any typewriter, ...&quot; ( see http://books.google.com/books?filter=0&amp;um=1&amp;q=%22long+outlasts+any+typewriter%22&amp;btnG=Search+Books ).

I also just saw this quote in a circa 1997 Oracle MDM whitepaper: &quot;It has been said that data outlasts applications.&quot; (see www.oracle.com/master-data-management/oracle-master-data.pdf ).</description>
		<content:encoded><![CDATA[<p>Ian, you&#8217;ll be happy to know that the adage &#8220;data outlasts code&#8221; been around for a while. Here is a quote from 1994: &#8220;Awareness is growing throughout the world that data outlasts computer programs,<br />
in the same way that an aircraft manual long outlasts any typewriter, &#8230;&#8221; ( see <a href="http://books.google.com/books?filter=0&amp;um=1&amp;q=%22long+outlasts+any+typewriter%22&amp;btnG=Search+Books" rel="nofollow">http://books.google.com/books?filter=0&amp;um=1&amp;q=%22long+outlasts+any+typewriter%22&amp;btnG=Search+Books</a> ).</p>
<p>I also just saw this quote in a circa 1997 Oracle MDM whitepaper: &#8220;It has been said that data outlasts applications.&#8221; (see <a href="http://www.oracle.com/master-data-management/oracle-master-data.pdf" rel="nofollow">http://www.oracle.com/master-data-management/oracle-master-data.pdf</a> ).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Roessler</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1369</link>
		<dc:creator>Michael Roessler</dc:creator>
		<pubDate>Sat, 07 Mar 2009 00:26:59 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1369</guid>
		<description>Data is in the eye of the beholder.

In my opinion, the following is the most valuable statement among your slides and deserves much attention and discussion:

&quot;Much of the value in our data will be unexpected and unintended, therefore we should engineer for serendipity.&quot; - Ian Davis

Data may be important, code may be important, but the ability to incorporate into engineering certain methodologies to enhance the ability to read and understand significance in data is magic - especially in a manner that does not excessively limit potential meaning in data through preconceptions. Data is in the eye of the beholder, at least as related to using data for decision-making.

How do we ask questions that help us to engineer for serendipity? This, in my opinion, is a hugely significant question for all of us that we ought not obscure through an open data vs open source argument. I think this question should be investigated on its own.

How do we add value to data by engineering for serendipity?

This fits in with another comment you made:

&quot;Network effects arise when the act of participation makes the entire network more useful for everyone.&quot; - Ian Davis

This is an excellent statement! Most of us know how the example of social networking fits in to this statement. We know that the more people who participate in social networking, the more valuable that network can become for each of us. We know how the web fits in, as for example, your presenting these ideas on a public web page makes the web more useful for each of us. But not enough of us understand networking effects in raw data, especially within a corporate business environment. Not enough of us understand that keeping data in a spreadsheet within one department may add insufficient value to the organization because there can be few serendipitous value creations since few people will be exposed to the data. Placing the spreadsheet on an intranet might not be the most efficient method of producing the network effects that are possible.

Combining network effects with engineering for serendipity is a good recipe for changing the world by making data not only open, but also useful and positioning it squarely in the eye of the beholder.

I&#039;d very much like to pursue this further.</description>
		<content:encoded><![CDATA[<p>Data is in the eye of the beholder.</p>
<p>In my opinion, the following is the most valuable statement among your slides and deserves much attention and discussion:</p>
<p>&#8220;Much of the value in our data will be unexpected and unintended, therefore we should engineer for serendipity.&#8221; &#8211; Ian Davis</p>
<p>Data may be important, code may be important, but the ability to incorporate into engineering certain methodologies to enhance the ability to read and understand significance in data is magic &#8211; especially in a manner that does not excessively limit potential meaning in data through preconceptions. Data is in the eye of the beholder, at least as related to using data for decision-making.</p>
<p>How do we ask questions that help us to engineer for serendipity? This, in my opinion, is a hugely significant question for all of us that we ought not obscure through an open data vs open source argument. I think this question should be investigated on its own.</p>
<p>How do we add value to data by engineering for serendipity?</p>
<p>This fits in with another comment you made:</p>
<p>&#8220;Network effects arise when the act of participation makes the entire network more useful for everyone.&#8221; &#8211; Ian Davis</p>
<p>This is an excellent statement! Most of us know how the example of social networking fits in to this statement. We know that the more people who participate in social networking, the more valuable that network can become for each of us. We know how the web fits in, as for example, your presenting these ideas on a public web page makes the web more useful for each of us. But not enough of us understand networking effects in raw data, especially within a corporate business environment. Not enough of us understand that keeping data in a spreadsheet within one department may add insufficient value to the organization because there can be few serendipitous value creations since few people will be exposed to the data. Placing the spreadsheet on an intranet might not be the most efficient method of producing the network effects that are possible.</p>
<p>Combining network effects with engineering for serendipity is a good recipe for changing the world by making data not only open, but also useful and positioning it squarely in the eye of the beholder.</p>
<p>I&#8217;d very much like to pursue this further.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Linksvayer</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1365</link>
		<dc:creator>Mike Linksvayer</dc:creator>
		<pubDate>Thu, 05 Mar 2009 01:50:16 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1365</guid>
		<description>Rocks outlive humans. In other words, the argument seems facile.

What &quot;open&quot; are you referring to in &quot;open data&quot; in the context of this conjecture? As you say, legal restrictions expire, eventually. Do you mean data in open formats and open standards? Those are defined by a legal component (lack of patent encumberance, which is more certain to expire if present anyway than are copyright restrictions), documentation, and sometimes -- open source reference implementation.

If pressed I&#039;d argue the whole &quot;open&quot; stack requires free/open source software (even if all software isn&#039;t), thus in a sense &quot;open&quot; software is more important than &quot;open&quot; data (whether data is more important than software is a somewhat different argument). For something along those lines (briefly, open formats/standards was a losing battle without source code ... the article at the link tries to argue that we&#039;re in danger of losing the battle again as code moves to servers), see http://lists.canonical.org/pipermail/kragen-tol/2006-July/000818.html ... note that this is from a person who considers closed formats unethical http://lists.canonical.org/pipermail/kragen-tol/2002-July/000725.html :)

Or maybe by &quot;open data&quot; you merely mean data which is accessible, eg downloadable. Your comments toward the end contrasting having lost your data and having lost your code seem to indicate that this is what you mean. Ok, but this again seems rather facile, at least not without some context. For example, you could argue that it is more important for governments to make their data available than it is for governments to use open source software.</description>
		<content:encoded><![CDATA[<p>Rocks outlive humans. In other words, the argument seems facile.</p>
<p>What &#8220;open&#8221; are you referring to in &#8220;open data&#8221; in the context of this conjecture? As you say, legal restrictions expire, eventually. Do you mean data in open formats and open standards? Those are defined by a legal component (lack of patent encumberance, which is more certain to expire if present anyway than are copyright restrictions), documentation, and sometimes &#8212; open source reference implementation.</p>
<p>If pressed I&#8217;d argue the whole &#8220;open&#8221; stack requires free/open source software (even if all software isn&#8217;t), thus in a sense &#8220;open&#8221; software is more important than &#8220;open&#8221; data (whether data is more important than software is a somewhat different argument). For something along those lines (briefly, open formats/standards was a losing battle without source code &#8230; the article at the link tries to argue that we&#8217;re in danger of losing the battle again as code moves to servers), see <a href="http://lists.canonical.org/pipermail/kragen-tol/2006-July/000818.html" rel="nofollow">http://lists.canonical.org/pipermail/kragen-tol/2006-July/000818.html</a> &#8230; note that this is from a person who considers closed formats unethical <a href="http://lists.canonical.org/pipermail/kragen-tol/2002-July/000725.html" rel="nofollow">http://lists.canonical.org/pipermail/kragen-tol/2002-July/000725.html</a> <img src='http://blog.iandavis.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Or maybe by &#8220;open data&#8221; you merely mean data which is accessible, eg downloadable. Your comments toward the end contrasting having lost your data and having lost your code seem to indicate that this is what you mean. Ok, but this again seems rather facile, at least not without some context. For example, you could argue that it is more important for governments to make their data available than it is for governments to use open source software.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: robert</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1364</link>
		<dc:creator>robert</dc:creator>
		<pubDate>Wed, 04 Mar 2009 19:15:23 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1364</guid>
		<description>oh, what i should have added: each generation of code should leave the data in a better state than before.</description>
		<content:encoded><![CDATA[<p>oh, what i should have added: each generation of code should leave the data in a better state than before.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Rochkind</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1363</link>
		<dc:creator>Jonathan Rochkind</dc:creator>
		<pubDate>Wed, 04 Mar 2009 14:18:51 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1363</guid>
		<description>I&#039;m actually shocked that this is a controversial claim among librarians and library workers. It seems so obviously true to me, _especially_ from the experience of working with library metadata. As you note, we&#039;re still working with data files that have been essentially untouched for years.   AACR was published in 1967, and AACR2 in 1978, and most of our catalogs include both AACR and even pre-AACR records in them still. 

Software comes and goes, but data sticks around.  The data is actually more expensive to produce or replace than software. 

One of the unfortunate things about much of our data is that it was sort of fitted to the idiosyncracies of a particular piece of software that was going to be used to display it when it was created.  Values were put in certain fields because they would then be displayed (or not displayed) by certain software, in an idiosyncratic ad hoc way.  But the data has long outlasted the software whose behaviors it was molded to. 

You&#039;d think catalogers would be pleased to have computer programmers acknowledging that what they do (data control and generation) is more important than what we do!</description>
		<content:encoded><![CDATA[<p>I&#8217;m actually shocked that this is a controversial claim among librarians and library workers. It seems so obviously true to me, _especially_ from the experience of working with library metadata. As you note, we&#8217;re still working with data files that have been essentially untouched for years.   AACR was published in 1967, and AACR2 in 1978, and most of our catalogs include both AACR and even pre-AACR records in them still. </p>
<p>Software comes and goes, but data sticks around.  The data is actually more expensive to produce or replace than software. </p>
<p>One of the unfortunate things about much of our data is that it was sort of fitted to the idiosyncracies of a particular piece of software that was going to be used to display it when it was created.  Values were put in certain fields because they would then be displayed (or not displayed) by certain software, in an idiosyncratic ad hoc way.  But the data has long outlasted the software whose behaviors it was molded to. </p>
<p>You&#8217;d think catalogers would be pleased to have computer programmers acknowledging that what they do (data control and generation) is more important than what we do!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Tolley</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1360</link>
		<dc:creator>Steve Tolley</dc:creator>
		<pubDate>Wed, 04 Mar 2009 12:14:12 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1360</guid>
		<description>An interesting and perhaps stretching view given the likely audience for the comments. There certainly seems to be a growing sense of utility in computing provision and as such more fundamental social needs are being debated. SAS and cloud approaches will hopefully focus more brain power on the philosophical issues of informational purpose in broader context and longer time line.

You seem more at the evangelist end of the CTO spectrum, and this debate will hopefully become more commonplace and thought provoking.

Looking forward to more semantic serendipity!</description>
		<content:encoded><![CDATA[<p>An interesting and perhaps stretching view given the likely audience for the comments. There certainly seems to be a growing sense of utility in computing provision and as such more fundamental social needs are being debated. SAS and cloud approaches will hopefully focus more brain power on the philosophical issues of informational purpose in broader context and longer time line.</p>
<p>You seem more at the evangelist end of the CTO spectrum, and this debate will hopefully become more commonplace and thought provoking.</p>
<p>Looking forward to more semantic serendipity!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: robert</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1359</link>
		<dc:creator>robert</dc:creator>
		<pubDate>Wed, 04 Mar 2009 08:40:53 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1359</guid>
		<description>fully agree with your &quot;data outlasts code&quot; statement. even in my own 5 years of programming in a digital library environment, i&#039;ve seen code bit-rot and getting replaced, but still working with the same data(bases).

i&#039;d even go a bit further and propose that whenever you have to create code to access/manage some data, this must not take longer than - say - 1/10 of the time necessary to create the data. and maybe we should put a hard limit of 5 years on that, too, because in most parts of the programming world, that&#039;s about the time it takes to make technologies obsolete.</description>
		<content:encoded><![CDATA[<p>fully agree with your &#8220;data outlasts code&#8221; statement. even in my own 5 years of programming in a digital library environment, i&#8217;ve seen code bit-rot and getting replaced, but still working with the same data(bases).</p>
<p>i&#8217;d even go a bit further and propose that whenever you have to create code to access/manage some data, this must not take longer than &#8211; say &#8211; 1/10 of the time necessary to create the data. and maybe we should put a hard limit of 5 years on that, too, because in most parts of the programming world, that&#8217;s about the time it takes to make technologies obsolete.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Egon Willighagen</title>
		<link>http://blog.iandavis.com/2009/03/open-data-open-source/comment-page-1#comment-1358</link>
		<dc:creator>Egon Willighagen</dc:creator>
		<pubDate>Wed, 04 Mar 2009 07:49:06 +0000</pubDate>
		<guid isPermaLink="false">http://iandavis.com/blog/?p=1336#comment-1358</guid>
		<description>Hi Ian,

thanx for getting back on this and your detailed explanation of your perspective.

From my perspective, your arguments seem to be build on two assumptions:

1. data is static
2. code is cheap to build

About 2: in science this is typically not the case. Surely, writing a XML parser is easy enough, and writing a simple text editor likewise. However, much of the scientific code output is the outcome of several years of development. Believe me, it is not cheap to rebuild scientific software. The market is scientific programmers is so small, the costs for rebuilding software is quite. Unlike with desktop software, where there is a huge user market, and redoing it is trivial.

Unfortunately, it is common to not distinguish between desktop software and scientific software. People typically do not understand what makes development of new scientific algorithms more difficult than writing a desktop tool. A choice of CSV versus XML is a non-issue in scientific software design; Proper representation of the scientific problem, that requires a lot of domain knowledge, and a lot of testing against that knowledge. These scientific algorithms easily outlast a lot of the data around.

In short, the development of scientific software which reflects new scientific insights is certainly not by default cheaper than measuring the data again.

About 1) However, there is another reason why data does not outlast code. Street map data of 100 years ago surely has some historical value, but I leave it to the reader how useful it is to current problems. Likewise, chemical data measured 50 years ago is certainly not as useful as it is now: measurements we make now are so much more accurate and more precise. Infrared (IR) spectroscopy was the main identification method in organic chemistry until NMR came around. The latter practically made IR obsolete, though it often still is used to back up the presence of a certain chemical fragment. I&#039;m sure this is even more the case in biological sciences.

Old scientific data is nice, but to use it, you just have to redo your experiment (it&#039;s exponentially cheaper to do the measuring now, then when it was originally done), gaining much more information. Who cares about the speed of light measured 100 years ago. That data did simply not last. Scientific data does not last (and I do not think telephone numbers of 5 years ago are useful either). The assumption that data is static and lasts is flawed. The assumption that measuring things is prohibitively expensive is flawed too. Well, if you don&#039;t have code, it surely is.

Bottom line is that either can be less or more expensive to reproduce: the data and the code. At least in science.

Egon</description>
		<content:encoded><![CDATA[<p>Hi Ian,</p>
<p>thanx for getting back on this and your detailed explanation of your perspective.</p>
<p>From my perspective, your arguments seem to be build on two assumptions:</p>
<p>1. data is static<br />
2. code is cheap to build</p>
<p>About 2: in science this is typically not the case. Surely, writing a XML parser is easy enough, and writing a simple text editor likewise. However, much of the scientific code output is the outcome of several years of development. Believe me, it is not cheap to rebuild scientific software. The market is scientific programmers is so small, the costs for rebuilding software is quite. Unlike with desktop software, where there is a huge user market, and redoing it is trivial.</p>
<p>Unfortunately, it is common to not distinguish between desktop software and scientific software. People typically do not understand what makes development of new scientific algorithms more difficult than writing a desktop tool. A choice of CSV versus XML is a non-issue in scientific software design; Proper representation of the scientific problem, that requires a lot of domain knowledge, and a lot of testing against that knowledge. These scientific algorithms easily outlast a lot of the data around.</p>
<p>In short, the development of scientific software which reflects new scientific insights is certainly not by default cheaper than measuring the data again.</p>
<p>About 1) However, there is another reason why data does not outlast code. Street map data of 100 years ago surely has some historical value, but I leave it to the reader how useful it is to current problems. Likewise, chemical data measured 50 years ago is certainly not as useful as it is now: measurements we make now are so much more accurate and more precise. Infrared (IR) spectroscopy was the main identification method in organic chemistry until NMR came around. The latter practically made IR obsolete, though it often still is used to back up the presence of a certain chemical fragment. I&#8217;m sure this is even more the case in biological sciences.</p>
<p>Old scientific data is nice, but to use it, you just have to redo your experiment (it&#8217;s exponentially cheaper to do the measuring now, then when it was originally done), gaining much more information. Who cares about the speed of light measured 100 years ago. That data did simply not last. Scientific data does not last (and I do not think telephone numbers of 5 years ago are useful either). The assumption that data is static and lasts is flawed. The assumption that measuring things is prohibitively expensive is flawed too. Well, if you don&#8217;t have code, it surely is.</p>
<p>Bottom line is that either can be less or more expensive to reproduce: the data and the code. At least in science.</p>
<p>Egon</p>
]]></content:encoded>
	</item>
</channel>
</rss>
