Grunk

I’ve just come across Grunk, a Java based toolkit for scraping semi-structured text. The concept is a bit like having regular expressions that work in terms of words rather than characters. The patterns are specified in an XML format so conceivably a library of common ones could be built up. Output is also in XML so coupled with some XSLT this could be a significant addition to the scrapers toolset.

About Ian Davis

British entrepreneur and CEO of Kasabi. Primary interests are open data, the semantic web and decentralization.
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s