Failing is an expensive way to learn

Reading Josh Infiesto’s post got me thinking about failure:

Learning how to reflexively avoid stupidity is a key ingredient to attaining great heights with any skill. It’s amazing how many hours you can piss away trying add new and interesting techniques to your repertoire before you’ve really mopped up the basics.

I commented that perhaps a paraphrase of his “Stop doing stupid shit” principle was “avoid failure” but avoiding failure doesn’t necessarily bring success.

The European startup scene is often contrasted with Silicon Valley in terms of tolerance to failure. On the west coast investors are comfortable with previous failures recognising that they are great opportunities for entrepreneurs to learn and are evidence of appetite for risk.

But aren’t business failures hugely expensive lessons to take? Perhaps a typical entrepreneur has 2 or 3 failed ideas behind them before stumbling on even moderate success let alone getting to superstar level. Maybe that’s 2-3 million dollars of burned investment, 20-30 employees lives turned over, a few thousand consumers left without a product. Maybe an angel investor lawsuit for good measure.

That’s an expensive way to learn those lessons and you don’t know how many you ‘re going to need before you get successful.

The west coast startup mindset is to focus on finding the great idea and accept failures along the way.

Maybe a cheaper and more efficient way would be to prioritise avoiding failure to give yourself a the stability to iterate like crazy on ideas and find the one that’s going to have the impact you crave?

 

 

 

 

 

Posted in Opinion | 1 Comment

Kasabi Transit Data for New York City

I just published a new dataset on Kasabi: Transit data for Metropolitan Transportation Agency of New York City

The NYCTA operates the Staten Island Railway and the New York City Subway which covers Manhattan, The Bronx, Brooklyn, and Queens. The Kasabi dataset was converted from their publicly available transit data which they publish in GTFS format.

The dataset I published on Kasabi is pretty rich. It contains details of each of the 27 routes operated by the agency including the official colours of the lines for use on a route map. Those routes encompass 493 stops each one of which is geolocated with latitude and longitude. The agency operates 19,000 services on those 27 routes, where each service is a combination of schedule and the particular set of stop serviced, split into inbound and outbound portions. These services include a full outline of the route in geoJSON format too for plotting on maps. Overall there are over half a million data items describing which service calls at which stop on a particular day and time, including distinct arrival and departure times and whether the service picks up, drops off or both. Information about special transfers between stops is also included.

I created a set of sample SPARQL queries that should help you get started. I also created a new transit vocabulary for representing this kind of data which based on the GTFS format. I plan to publish more GTFS conversions in the coming weeks. Hopefully feedback on this current Kasabi dataset will improve the schema design and the future conversions.

Posted in Projects | Tagged , , , , , , | Leave a comment

Ordnance Survey Linksets on Kasabi

I just published 5 new datasets in Kasabi. These are quite simple datasets that provide links between the Ordnance Survey Administrative Geography and various other geographic datasets hosted in Kasabi.

To create them I first extracted all the shapefile data from the Ordnance Survey dataset. Because the OS publish shapefile data in their RDF I could do this by simply running a sparql query on the Ordnance Survey dataset (here’s an example of one of the GML shapefiles). That streamed out about 800MB of JSON results, maxing out my ADSL bandwidth much to the frustration of the rest of the family!

To make each of my datasets I sparqled the target dataset to extract all the geographic points then simply checked each OS shapefile to see if it contained any of the points. I used the os:within and os:contains properties defined by the OS to link the regions to the points they contain.

Here are the linksets I created:

What are they useful for? Well, you can get a list of Norfolk’s renewable energy generators or find out which consituency or county council to inform about the traffic incident that caused the closure of the A93 Spittal of Glenshee on 6th July 2011.

Finally, I licensed these under CC0 so there are no restrictions on how you can use them.

Posted in Projects | Tagged , , , , , , | Leave a comment

In Search of Ambiguity

This is inspired by Jeni’s recent blog post What do URIs mean anyway? where she writes:

The imperfection of the real world as it applies to linked data is that URIs will be used in ambiguous ways. We might not like it; we might write best practice documents that encourage people to have separate URIs for web-thing and non-web-thing, develop tools that help people detect when they’ve used the wrong URI, and so on. But it will still happen, and in my opinion we need to work out how to cope.

I think there is less ambiguity than Jeni states.

A lot of the perception of ambiguity in these arguments comes from in-built preconceptions about the nature of documents on the web. It’s easy to forget that when you think you’re accessing a webpage you’re not really getting the actual document that is on the web server but just a kind of snapshot of it at a point in time. In HTTP we call those snapshots “representations”. The important point is that the URI always identifies the resource and never the representation. You use the representations to learn about the resource you are interacting with.

Continue reading

Posted in Opinion | Tagged , , , , | 8 Comments

Big Ben

Ambiguous references happen in the real world all the time. For example, the name Big Ben is used to refer to the clock tower and to the bell, but strictly speaking only one of those is right.

Most of the time these misplaced names don’t matter – listeners make assumptions about the intended referent and correct those assumptions as more information comes to light.

A lot of the time the possible referents share a lot of characteristics and can be used interchangeably. When giving directions you can say “head towards Big Ben” without fear of confusing people. You could even get away with “listen to the sound of Big Ben” because sound is perceived to come from a general location. The information a speaker produces Big Ben can be inconsistent for quite a long time before it becomes a problem.

But what happens when the speaker says something that crystallizes that inconsistency for the listener?

Suppose the speaker believes that Big Ben refers to the bell and the listener assumes it refers to the tower. The speaker suddenly says:

“Big Ben was cast in 1856 in Stockton-on-Tees”

The listener realizes now that they have been talking about different things.

“Oh, I thought you were talking about the tower”, the listener might exclaim.

“No, you ignoramous, the tower is called Victoria Tower. Big Ben is the bell inside it! How can a tower possibly be cast in a northern town?” retorts the speaker.

“Well, you learn something new every day,” says the listener ruefully.

Now the listener backtracks and applies all the information they had learned from the speaker about Bug Ben to the bell instead of the tower.

However, throughout this conversation neither the speaker nor the listener assumed the name Big Ben applied to two things at once. They both assumed it applied to a single referent, but they happened to be different ones. Once more information was shared then they managed to reach agreement on the actual referent for Big Ben.

Posted in Opinion | Tagged , , , , | Leave a comment

Usability of JSON Serializations of RDF

Thomas Steiner produced a wiki page showing how various RDF + JSON proposals serialized a particular RDF graph as part of the RDF working group’s work in this area. I found it particularly interesting because I’m the main originator of one of the proposals (RDF/JSON) and I looked at all the issues the WG is considering when writing it.

One of the things I think would be useful is to include examples of how a Javascript developer would use each serialization for various tasks. Here’s what some could look like in RDF/JSON:

Get a photo of Jon:

data['http://jondoe.example.org/#me']['http://xmlns.com/foaf/0.1/depiction'][0]['value'];

Get the names of Jon’s friends:

var knows = data['http://jondoe.example.org/#me']['http://xmlns.com/foaf/0.1/knows'];
for (var i=0; i < knows.length; i++) {
  alert(data[knows[i]['value']]['http://xmlns.com/foaf/0.1/name'][0]['value']);
}

Get the french description of Jon:

var descriptions = data['http://jondoe.example.org/#me']['http://xmlns.com/foaf/0.1/description'];
for (var i=0; i < descriptions.length; i++) {
  if (descriptions[i]['lang'] == 'fr') {
    alert(descriptions[i]['value']);
  }
}

I also noticed that there are missing commas between the resource blocks {} in the RDF/JSON example. That JSON isn’t quite valid because of that but it doesn’t get in the way of what Thomas is trying to show.

BTW, I am writing this as a blog post because there isn’t a good way to supply feedback to a W3C working group that is still in the exploratory phase. I would either have to join the group or post to the comments list which seems inappropriate for a side discussion like this especially since this is not an official WG document.

Posted in Ideas and Experiments | Tagged , | Leave a comment

Is Idiomatic JSON for RDF Desirable?

The RDF Working Group seems to be making some useful progress in many areas. However, they are circling around the JSON serialisation a bit. Lee Feigenbaum asked on twitter:

#RDF WG #JSON task force — should the group focus on RDF serializations in JSON, or bridging the worlds of (normal) JSON and RDF?

Here’s what I wrote in email when David Wood asked my opinion on it a few weeks back:

I wouldn’t underestimate the trivial use case that JSON is a convenient data format for parsing and most languages have extremely fast JSON parsers. It’s certainly much simpler to parse than XML (only one character encoding). It’s also extremely compact, with a low syntax to content ratio (unlike XML again). This is the use case the Talis RDF/JSON serialisation is targetted at.

The main problem I see with the “idiomatic JSON” use case is that although it’s much more usable by the average web author, it’s always going to butt up against various mismatches in model: graphs vs trees, URIs vs shortnames, literals/languages/datatypes vs strings, repeated properties vs simple values, blank nodes, lists/collections vs arrays/dictionaries.

The blunt truth is all of those things make RDF an unfriendly model to web authors and I think it will be very hard, or impossible, to develop an idiomatic JSON serialisation that web authors will care about.

I also tend to agree with Leigh Dodds that what we really want is a standardised Javascript API for RDF.

Note: The Talis RDF/JSON serialisation can now be found at http://docs.api.talis.com/platform-api/output-types/rdf-json. Redirect should be in place soon.

Posted in Opinion | Tagged , , , | 2 Comments