Jan 21 2004

Grunk

Published by Ian Davis at 5:14 pm under Uncategorized and tagged as

I’ve just come across Grunk, a Java based toolkit for scraping semi-structured text. The concept is a bit like having regular expressions that work in terms of words rather than characters. The patterns are specified in an XML format so conceivably a library of common ones could be built up. Output is also in XML so coupled with some XSLT this could be a significant addition to the scrapers toolset.

Comments Off

Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.