Link schemes
Formerly, Google's PageRank ratings could be gamed using links placed in comment spam. "Between about 2004 and 2007, people used to be able to leave comments on blogs on other sites with links to their own sites, to build up links quickly," explains Wall.
In response, starting in 2005, Google promoted the "nofollow" attribute for link coding. Nofollow links are excluded when calculating PageRank ratings. Nofollow was subsequently adopted by blogging systems like WordPress and Movable Type for links inserted within comments, cutting off that source of links, Wall says.
Webmasters also used to trade links to mutually boost their PageRank ratings, but now they are supposed to use the nofollow attribute in those links, Wall adds. Consequently, such links are no longer of any benefit. (Advertisements on a page don't get counted as links, as they are typically linked to a click counter.)
Because they are no longer able to use linked comments and traded links, webmasters seeking to maximize their PageRank rating have two options: offer compelling content that other sites (especially blogs) will spontaneously link to -- or pay other sites to link, like J.C. Penney did. The latter is a direct violation of Google's guidelines.
When done by amateurs, paid links are easy to spot, says Fishkin. "When you see a dentist's site with links to student credit card loan sites, you can assume the guy is getting a couple of hundred bucks a month," he says.
In order to get around this, businesses called link farms arrange paid links from established sites with respectable public PageRank ratings, since those links carry more weight.
The most successful link farms don't operate openly. "You just have to know about them," notes Fox. "But often they have been discovered by Google. They continue operating, but their links are not valued by the search engine."
Several reputed link farms were approached for comment but none responded -- except one, whose spokesman said, off the record, that it was an advertising firm and not a link farm, and then hung up. Another announced in a blog that it was getting out of the link business.
Another problem with the link farm business is that it is based on the public PageRank ratings of the linking sites. But Google's Ohye says that public PageRank ratings are actually manipulated by Google to make it harder to game the system. While Google updates the public PageRank database every few months, the public ratings of pages known to be selling links are never updated, rendering their PageRank rating meaningless, she explains.
Anyway, Ohye says, Google's algorithm does not even use the public PageRank ratings to decide how to rank search results. Instead, it uses a completely different, nonpublic database, whose values (fractional numbers rather than a zero to 10 scale) are updated continuously.
As for how this works in practice, Ohye uses the hypothetical example of a news site with a breaking story whose owners find that a lot of other sites are linking to the page with that story.
"They might decide to try to sell links from that page, since it has PageRank," she says, "so we may decide that the rest of the site is good, but we are unsure about the links from that particular page. So we adjust the database so that links from that article don't propagate PageRank. So many link buyers are getting nothing."
Content schemes
The first search engines rated pages purely on their content, spawning keyword-stuffing schemes in which selected words were added to a page to suggest that it was about a popular topic. The added words were then hidden using various coding tricks, such as using white text on a white background or indenting the text so far that it wouldn't appear on the screen.
A variant of this is called cloaking, where the search engine is shown one thing (usually text) and the user is shown something else (usually ads). There are also screen scrapes, which are spam sites composed of material copied from other sites just to draw traffic for ad revenue.
"Do any of these things, and you will probably get caught," Fox warns. "Once Google has found a site that uses one technique, they can use that knowledge to find all other sites that use that technique."
Site hacking
Finally, there is outright hacking, which involves taking over sites with poor security for use in linking schemes. Ohye says that hacking appears to follow two-year cycles, as new techniques erupt and are then brought under control with security patches. Currently, things are under control, she says.
Crime and punishment
There are risks in using these methods in order to boost your rankings. J.C. Penney got off lightly.
"The likelihood of being caught in a few days is low, within six months is pretty good, and within two to four years it's nearly impossible to avoid," Fishkin says. "I have seen black-hat techniques that worked for multiple years, but no one who used a particular technique in 2007 is using that technique today. It takes Google a while to catch up, but it does catch up."
As a result, "I get a call about once a week from firms that have tried black-hat methods and their ranking got hurt when Google found them out," Fox says. "If you were really egregious and not trying to build content at all, you will be yanked out of the index entirely. If there is actual value in your site, you will be demoted in ranking. You must fix the problem and then file a request with Google to get your ranking back. It may take some time to do that.
"It can put you out of business," Fox adds. "I have seen it where the traffic drop-off was so severe that by the time they fixed it, they had had to lay everyone off. In other cases, they saw from the start that it would take too much investment to fix the problem. But most of the time you will see the site come back -- it can be done in a few weeks."
Getting reinstated can be a Kafkaesque process, Fishkin notes, since webmasters are not informed of the specific complaint against them.
Google's Ohye confirms that this is intentional. "Since we are trying to protect our algorithm, we cannot tell you that you have not done X, since you could be a spammer trying to find out where the line is," she says. But Google does give webmasters as much information as possible if their site has been hacked, she adds.


- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- IBM Synchronizes its Commerce 2.0 Strategy with 'Smarter Commerce' Initiative
- IDC Insight highlights the important elements of the IBM Smarter Commerce announcement and looks at the implications of the announcement in the context...
- How to Justify the Cost of a TMS by Automating Freight Audit & Payment
- Read this white paper to learn how implementing a service-based TMS, together with a well designed freight audit and payment module help reduce...
- The New Business Case for Inbound Transportation Management
- Read this white paper to learn how implementing a cloud-based TMS, together with a well designed freight audit and payment module, can help...
- Managing Volatility Through Smart Inventory Planning
- This paper will consider the latest developments in inventory optimization technology, including a look at how leading CP companies are using this generation...
- Smarter Commerce for Consumer Electronics
- Smarter commerce is more than just an idea-when applied, it provides real business results. This paper discusses how IBM helped consumer electronics companies... All E-business White Papers
- Distributed Database Security with Real-time Monitoring
- View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
- InfoSphere Warehouse Packs Demo
- These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
- Delivery Management -- Extending Lifecycle Management
- Date: Wednesday, June 20, 2012, 1:00 PM EDT
Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,... - Leverage automation today to reduce IT complexity
- Date: Tuesday, June 5, 2012, 2:00 PM EDT
Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific... - Redefine Expectations in the Data Center
- Need to do more with less? Watch this video to learn how HP ProLiant Gen8 servers can help your business deploy servers three... All E-business Webcasts