James picked up on my corporate FOAF idea and suggested that:
Looking at patterns in press releases (a major source of this kind of information), there is often a trailer with the company's URL...
...A rule for disambiguation could be that the companies in the press release are identified by any URL that is a top-level domain or subdomain root.
It's a good idea, although not foolproof. It made me think of a related area that James and I used to endlessly debate at Calaba. James was very keen to devise an xml format to describe companies and businesses including contact information and opening hours. Search engines could then use this information to answer questions like "find me a nearby dentist open on Sundays". Current yellow pages directories can answer the former part of that question but not the latter, probably because it's too expensive to scale the business to gather and keep that data accurate.
However if the businesses themselves could maintain the data, say by editing some xml on their website, then the scalabality problems would be solved. My position, being a scraper at heart, was that persuading companies to do it would be too difficult and we should try to generate it instead.
However, I'm coming to the opinion that the syndic8 evangelism model could be the way to go. That means informing and educating businesses about the hypothetical format, providing validation, search facilities and help guides.
What would such a format look like? It would make sense to make it RDF-aware in some way, but not so that it impedes its uptake. FOAF is far and away the best example of this.
As a straw man, what about this?
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/vocab/busy" > <Business> <name>Glenwood Cafe</name> <phone rdf:resource="tel:+44-7966-473-239"/> <location> <StreetAddress> <number>1</number> <street>Glenwood Avenue</street> <town>Southend on sea</town> <county>Essex</county> <country>United Kingdom</country> <postcode>SS0 91S</postcode> </StreetAddress> </location> <businessType>Restaurant</businessType> <hoursOfBusiness> <Schedule> <open>9:00</open> <close>17:00</close> </Schedule> </hoursOfBusiness> </Business> </rdf:RDF>