I had a great idea yesterday for spamming Google. Why would I want to spam Google? Well, I don't but that doesn't stop me thinking about how it could be done. Google works on the citation principle - if many sites link to me then I must have a good site so I should appear higher in the rankings. [Aside: Google's flaw is that it assumes every citation is a good one which means that all those 'Microsoft Sucks' links end up boosting Microsoft's site in the rankings]. My idea is to create a script that outputs a slew of random but sensible html with some fake content and links, perhaps using Wordnet to include as many relevant keywords as possible. The links would be to subdomains of the domain the script is hosted in. It's possible to set up a wildcard host record in a DNS server so news.thegreatgooglespam.com and football.thegreatgooglespam.com both point to the same server and the same script. The script can detect which subdomain the request is for and serves up appropriate content related to the subdomain name which of course could even contain multiple keywords separated by dots. What we have now are essentially an infinite number of domains, each with unique content, each citing one another as good sites. I guess Google might catch on fairly quickly but if the script and server configuration was distributed widely enough across enough domains then they'll have a fair amount of work on their hands separating the signal from the noise. The key to the script is making the random html generation different enough on different subdomains, otherwise it'll be easy to spot the fake sites. Pointless I know but the it annoys me that the pagerank algorithm is so damn effective!
Other posts tagged as search-engines