Nov 10 2005

Planetary Collateral Damage

Published by Ian Davis under Uncategorized and tagged as , , ,

Phil Ringnalda wrote recently about the potential damage to the blogosphere of planet-style aggregators. I’ve thought of another problem. If you’re being aggregated onto a planet site then none of the other people on that planet will link to your posts. Why should they? After all, it’ll just look like noise when it hits the aggregator. This could have serious repercussions for your GoogleRank.

This issue doesn’t affect me too much since, as far as I know, there are no planets aggregating the whole of my blog. Planet RDF just takes my RDF category feed and I don’t aggregate myself on the two planets I run (Agile Planet and Planet Web 2.0) because they’re designed to be my reading lists not a publishing mechanism. (Actually I do partly aggregate myself on both – on Agile Planet I aggregate a category of posts from this blog but they’re just for status messages and I write on the Silkworm group blog which is aggregated by Planet Web 2.0)

3 responses so far

Sep 27 2004

WordPress Hack for Slim Pages

Published by Ian Davis under Uncategorized and tagged as ,

Here’s the PHP file I use to generate my slim page. It’s called wp-slim.php and lives in the same directory as index.php.


<?php
if (!isset($feed)) {
    $blog = 1;
    $doing_rss = 1;
    require('wp-blog-header.php');
}
$more = 1;
$charset = get_settings('blog_charset');
if (!$charset) $charset = 'UTF-8';
header('Content-type: text/html', true);

?>
<?php echo '<?xml version="1.0" encoding="' . $charset . '"?'.'>'; ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <base href="<?php bloginfo_rss('url') ?>"/>
    <title><?php bloginfo_rss('name') ?></title>
    <meta name="author" content="Ian Davis" />
    <meta name="copyright" content="Copyright (c) 1999-<?php _e(gmdate("Y")) ?> Ian Davis" />
    <meta name="description" content="<?php bloginfo_rss("description") ?>" />
  </head>
  <body>
    <?php $items_count = 0; if ($posts) { foreach ($posts as $post) { start_wp(); ?>
    <div class="entry" id="entry<?php _e($post->ID) ?>">
      <h1><a href="<?php permalink_single_rss() ?>"><?php the_title_rss() ?></a></h1>
<?php if (get_settings('rss_use_excerpt')) : ?>
      <div class="content"><?php the_excerpt_rss(get_settings('rss_excerpt_length'), 2) ?></div>
<?php else : ?>
      <div class="content"><?php the_content('', 0, '') ?></div>
<?php endif; ?>

    </div>
    <?php $items_count++; if (($items_count == get_settings('posts_per_rss')) && empty($m)) { break; } } } ?>
  </body>
</html>

I also changed wp-feed.php to dispatch requests for slim pages (additions in green):

    case 'rss2':
        require('wp-rss2.php');
        break;
    case 'slim':
        require('wp-slim.php');
        break;
    }
}

I changed the rewrite rule in my .htaccess to map index.slim to wp-feed.php:

RewriteRule ^index.(feed|rdf|rss|rss2|atom|slim)$ /2004/09/wordpress/wp-feed.php?feed=$1 [QSA]

Note: I’m not using the standard WordPress rewrite rule set. For backwards and future compatibility with other weblog systems I prefer to use file extensions for the various formats of a document.

One response so far

Sep 27 2004

Slim Pages

Published by Ian Davis under Uncategorized and tagged as

I’m experimenting with a slim page version of this site. Slim Pages are slimmed down versions of standard web pages. The basic rules are:

  1. Slim Pages are a subset of XHTML strict. They cannot contain script/noscript elements, style attributes nor any of the event attributes such as onclick.
  2. The body tag can only contain div tags of class “entry” and an id attribute providing a site-unique identifier for the entry the div contains.
  3. The entry divs contain a h1 heading as the first tag. The heading contains a link to the permalink for the entry. The link text is the title for the entry.
  4. The heading is followed by another div with a class of “content” which contains the content of the entry.

That’s it. Here’s an example:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <base href="http://internetalchemy.org/" />
  <title>Internet Alchemy</title>
  <meta name="author" content="Ian Davis" />
  <meta name="copyright" content="Copyright (c) 1999-2004 Ian Davis" />
  <meta name="description" content="Digital explorations and experiments" />
 </head>
 <body>
  <div class="entry" id="entry810">
   <h1>
    <a href="http://internetalchemy.org/2004/09/more-comment-spam">More Comment Spam</a>
   </h1>
   <div class="content">
    <p>
     I'm still getting comment spam, despite the posting timeslot
     idea. Obviously the assumptions I made there were unsound.
     So, here's the supplement that might raise the bar a little. If
     you want to comment on an entry here, you now have to
     enter a particular word of a well known quotation. If you
     don't or enter the wrong word then you get locked out for 10
     seconds (a standard WordPress feature).
    </p>
   </div>
  </div>
 </body>
</html>

Because it’s XHTML it has all kinds of nice properties such as being viewable on smartphones and PDAs. It prints nicely if all you want is the content and it can be styled using CSS. It’s readable by ordinary people with a web browser. The tags it uses are pretty well known by every web developer so it’s quite easy to write, perhaps even using an off the shelf authoring program. All the meta and link conventions in HTML headers such as geo location work too. It can be transformed using XSLT into any flavour of RSS or Atom although there are less programs that understand those formats than understand Slim Pages.

It has a regular entry structure, which means you could aggregate it and because the id attributes are site-unique, the aggregator can work out when something new is posted.

Slim Pages are just the content without the candy. Some people like candy, some don’t. Now you can choose :)

3 responses so far

Mar 16 2004

The Nucleus of Atom

Published by Ian Davis under Uncategorized and tagged as

I’ve carefully stayed away from the Atom discussions for several reasons. Most of these are around the inaccessibility of
the original discussions which required permanent connectivity to participate. I’m also quite happy with RSS
which meets my needs right now. I’m planning to support Atom in myRSS when it’s more stable. Anyway, tonight I thought I’d take a look at the current syntax specification and was taken aback by the obvious overlap with Dublin Core.

Nearly every element in Atom is already contained in one of the DC specs. I took the time to compare the Atom elements with their DC equivalents and found something quite interesting: when you remove the overlap with Dublin Core what’s left is pure syndication.

The only elements that don’t have obvious conterparts in DC are those that deal with the syndication aspect of Atom, and to be honest there aren’t many of them: atom:feed, atom:info, atom:entry, atom:link, atom:content. So my question is this: why isn’t Atom defining a Syndication Element Set as a complement to the Dublin Core Element Set? Why duplicate all that effort when the Dublin Core people have been over the issues again and again for nine years? There are many people that I know are intimately familiar with Dublin Core who are involved in the syntax effort so why aren’t these elements being used. Is it NIH or something else?

Here are the Dublin Core/Atom correspondences:

atom:title – conveys a human-readable title for the entry
dc:title – A name given to the resource.

atom:author – construct that indicates the default author of the feed
dc:creator – An entity primarily responsible for making the content of the resource.

atom:contributor – construct that indicates a person or other entity who contributes to the feed.
dc:contributor – An entity responsible for making contributions to the content of the resource.

atom:tagline - conveys a human-readable description or tagline for the feed
dc:description – An account of the content of the resource.

atom:id – conveys a permanent, globally unique identifier for the feed.
dc:identifier – An unambiguous reference to the resource within a given context.

atom:generator – indentifies the software agent used to generate the feed
dc:publisher – An entity responsible for making the resource available

atom:copyright – conveys a human-readable copyright statement for the feed.
dc:rights – Information about rights held in and over the resource

atom:modified – indicates the time when the state of the feed was last modified
dcterms:modified – Date on which the resource was changed.

atom:link – The “atom:link” element is a Link construct that conveys a URI associated with the entry.
dc:relation – A reference to a related resource.

atom:issued – construct that indicates the time that the entry was issued.
dcterms:issued – Date of formal issuance (e.g., publication) of the resource.

atom:created – construct that indicates the time that the entry was created
dcterms:created – Date of creation of the resource.

atom:summary – construct that conveys a short summary, abstract or excerpt of the entry.
dcterms:abstract – A summary of the content of the resource.

One response so far

Nov 02 2002

RSS 0.91 Test Cases For LiSA

Published by Ian Davis under Uncategorized and tagged as

As part of my development of a Perl based LiSA parser I have devised a suite of test cases for the RSS 0.91 parser. Each test case presents the input XML and the expected sequence of LiSA events.

Continue Reading »

Comments Off

Oct 18 2002

LiSA LightWeight Syndication API

Published by Ian Davis under Uncategorized and tagged as

LiSA is an attempt to abstract away the details of the various syndication formats such as RSS that now proliferate on
the web. It’s premise is that there is a core set model used by all the formats, namely a channel that
contains a number of items each with a title, link and description.

Continue Reading »

One response so far

Jun 18 2002

AmphetaDesk and the Adventures of Morbus Iff

Published by Ian Davis under Uncategorized and tagged as

AmphetaDesk and the Adventures of Morbus Iff.
Morbus Iff shares his thoughts on the development of AmphetaDesk and
the current state of syndicated news

For developers, this creates a wonderful perplexity. Developers can
support RSS 1.0 and know that the format won’t change – the primary
means of development are in extended modules (which typically fill a
small niche and are not widely used). Supporting 1.0, however, may be
more difficult than supporting the easier v0.9x versions. The downside
of supporting v0.9x is that you can’t be sure a new version won’t come
out while you sleep the weekend away.

Comments Off

May 14 2002

Open Content Network

Published by Ian Davis under Uncategorized and tagged as

Open Content Network.
Sounds interesting – peer to peer content distribution. [via HTP]

The Open Content Network [...] aims to be the world’s largest content
delivery network (CDN).

Users will soon be able to download open source and public domain
software, movies, and music at incredibly fast speeds from this
global, distributed network.

Using a new Peer-to-Peer technology, called the “Content-Addressable
Web”, indviduals will be able to contribute to the open source
movement by donating their spare bandwidth and disk space to the
network.

Comments Off

May 04 2001

Making Revenue from Content Syndication

Published by Ian Davis under Uncategorized and tagged as

Is Content Syndication A Viable Revenue Stream?

The other benefit of syndication, many analysts believe, is increased traffic.
But Mardle cautions against using such metrics. “We have seen the idiocy of
“quickly building traffic” as a business objective. That is what killed most
of the dotcoms. Traffic doesn’t mean squat, it is not the same as audience.
Got to get that across,” he says.

Comments Off

May 04 2001

Figby

Published by Ian Davis under Uncategorized and tagged as

Amusingly I found the above article while visiting Figby a
news aggregation site. The article appeared in the right hand column three times from three different
sources which just underlines the importance of the following excerpt:

“The syndication method is an interesting one because it presumes
that there is no readership overlap to devalue the information,”
Mardle adds. “However, the Internet is an excellent multi- channel
delivery system and I will quickly discover that the information has
massive overlap because it is syndicated, which will naturally devalue
all the places on which the information is repeated.” If a consumer
is checking six or ten sites a day, he suggests, they would expect a
wider range of information or they would be unlikely to return.

Comments Off

Next »