This website discusses domaining and the prediction of valuable domain names as well as discussing domain development opportunities.

What is a spam website according to Google?

by Barry on January 22, 2009

This is a second post analyzing a Google document from 2007 entitled “Google Guidelines For Quality Raters”which somehow found its way onto the Internet. In second half of this document, some common examples of spam websites are described. Understanding the signs of spam websites will help you avoid problems as you develop your websites.

Webspam

“Webspam is the term for web pages that are designed by webmasters to trick search engine robots and direct traffic to their websites.” Spam labels applied to webpages are independent of the query so it doesn’t matter what search was used to find the webpage. It is also independent of the quality of the information on the page. If deceptive techniques are discovered regardless of whether it is an vital or useful page, then the page will be labeled as spam.

What are classified as spam pages?

  • PPC Pages
  • JavaScript Redirects
  • Parked Domains
  • Keyword Stuffing
  • Thin Affiliates
  • 100% Frame
  • Hidden Text and Hidden Links
  • Sneaky Redirects

Thin pages

Yes, you read that correctly. Parked domains are considered spam. Pages designed to deliver ads with little to no added value are considered thin pages. This includes pages with AdSense or affiliate ads with little content. For the website developer, this means you want to put ads on your website after you have added all the content pages and only to pages with plenty of content. Sometimes website developers put ads on a page too early and risk having those pages flagged as spam.

Scraped content

Scrapers are people or bots that collect content from other websites and publish it on their website surrounded by ads. “Scraped or copied content refers to content that has been stolen from another source, either through the use of a piece of software that searches for content containing specific keywords, or through simple copy-and-paste.”

Common sources of scraped content are DMOZ and Wikipedia. If you use information from these sources, make sure you have original content alongside that content.

The acid test described by this document is as follows  “The important thing to remember is that if the scraped (copied) content on the page is removed and all that remains is ads, it is Spam.”

So for you mini-website or affiliate website developers out there, make sure you have sufficent original content. If you are buying content, make sure it is original and not just cut and paste. You can check using Copyscape

Keyword Stuffing

There are many variations on this. Putting too many and irrelevant keywords into the text, titles and tags is a fairly common implementation. Making sure that your keywords are relevant for the content of the page will help you avoid this. Avoiding too many repetitions of a keyword also helps. The language used should always be natural to read even when it contains keywords.

Keyword frequency is a much debated area of SEO but the general consensus seems to be that high keyword frequency will trigger a spam filter and the frequency needs to be lower than it has been in the past.

Hidden text

Hidden text is often used in combination with keyword stuffing. The simplest implementation is to have a text color identical or very close to the background color so the text is invisible to the visitor but seen by the search engine bots.

The invisible text is then usually full of keywords often unconnected to the content to try to drive more traffic to the page.

Webmasters can accidentally set the color of text incorrectly so it is always a good idea to highlight the whole page and see if any text appears. On the PC, you can use ctrl A to highlight the whole webpage.

Redirects

Redirects have a specific purpose and website developers most often use 301 redirects to direct the visitor (and bot) from an old URL to a new URL where the requested information now resides. These redirects are usually within the website and so not seen as spam. The other common use is to redirect other tlds e.g. .net or .org to .com.

Redirection outside of these examples should be done carefully and judiciously. A major spam flag occurs when a bot is redirected and a visitor is not. In other words, always try to present the same information to the bot that you present to the visitor.

If you do need to prevent the bot from seeing certain information, use the robots.txt or meta tags to do that.

Other techniques

I am not going to cover the other techniques listed simply because an honest website developer is not going to accidentally implement any of them. It would take a conscious malicious intent to put any of these into action.

Google Guidelines For Quality Raters

Here is the leaked document. At this point it is easily found on the Internet so I don’t mind linking to it here. It is in PDF format.

Google Guidelines For Quality Raters

Related Posts

{ 1 trackback }

why I don't use wikis for content development | Predictive Domaining
January 29, 2009 at 8:00 pm

{ 6 comments… read them below or add one }

Patrick McDermott January 22, 2009 at 8:12 pm

barry,

That is an excellent tutorial!
—————————–
sex golf mortgage sex sex home

real estate golfing sex love travel

woman porn sex money stocks website

sex money travel loan sex love money

finance sex ………….

house girls porno………….

Barry January 22, 2009 at 8:24 pm

Nice Patrick. I can’t wait to see all the traffic I get now ;)
Actually I have a funny story on a similar topic. I will have to add a post about it. Thanks for reminding me.

Susi October 21, 2009 at 4:54 am

yeah, that words generate long tail traffic.. like mine.. I was looking for how exit from a google penalty… I have a bought and old domain with a penalty…

Barry October 21, 2009 at 7:46 pm

It very much depends on what caused the penalty. Have you checked to see what content was on the site before you bought it? You can file for reinclusion but you better make sure the site is squeaky clean.

Susi October 23, 2009 at 7:05 am

Not much information from 2004 to 2007 in archive.org But in 2004 many links to another domain. Next years many error 500 and parking domain.

I have cleaned some errors. i am waiting the updating information on google webmasters for reinclusion.

Barry October 23, 2009 at 8:06 pm

Check all the links including inbound and avoid redirects. Make sure there is no hidden text i.e. check the html code itself. If you are showing ads, make sure you have a privacy policy. Good luck with the site.

Leave a Comment

By clicking the Submit button, you agree to our Comment Policy