What is a spam website according to Google ?
What is a spam website according to Google ?
This is a second post analyzing a Google document from 2007 entitled “Google Guidelines For Quality Raters” which somehow found its way onto the Internet. In second half of this document, some common examples of spam websites are described. Understanding the signs of spam websites will help you avoid problems as you develop your websites.
“Webspam is the term for web pages that are designed by webmasters to trick search engine robots and direct traffic to their websites.” Spam labels applied to webpages are independent of the query so it doesn’t matter what search was used to find the webpage. It is also independent of the quality of the information on the page. If deceptive techniques are discovered regardless of whether it is an vital or useful page, then the page will be labeled as spam.
What are classified as spam pages?
- PPC Pages
- Parked Domains
- Keyword Stuffing
- Thin Affiliates
- 100% Frame
- Hidden Text and Hidden Links
- Sneaky Redirects
Yes, you read that correctly. Parked domains are considered spam. Pages designed to deliver ads with little to no added value are considered thin pages. This includes pages with AdSense or affiliate ads with little content. For the website developer, this means you want to put ads on your website after you have added all the content pages and only to pages with plenty of content. Sometimes website developers put ads on a page too early and risk having those pages flagged as spam.
Scrapers are people or bots that collect content from other websites and publish it on their website surrounded by ads. “Scraped or copied content refers to content that has been stolen from another source, either through the use of a piece of software that searches for content containing specific keywords, or through simple copy-and-paste.”
Common sources of scraped content are DMOZ and Wikipedia. If you use information from these sources, make sure you have original content alongside that content.
The acid test described by this document is as follows “The important thing to remember is that if the scraped (copied) content on the page is removed and all that remains is ads, it is Spam.”
So for you mini-website or affiliate website developers out there, make sure you have sufficent original content. If you are buying content, make sure it is original and not just cut and paste. You can check using Copyscape
There are many variations on this. Putting too many and irrelevant keywords into the text, titles and tags is a fairly common implementation. Making sure that your keywords are relevant for the content of the page will help you avoid this. Avoiding too many repetitions of a keyword also helps. The language used should always be natural to read even when it contains keywords.
Keyword frequency is a much debated area of SEO but the general consensus seems to be that high keyword frequency will trigger a spam filter and the frequency needs to be lower than it has been in the past.
Hidden text is often used in combination with keyword stuffing. The simplest implementation is to have a text color identical or very close to the background color so the text is invisible to the visitor but seen by the search engine bots.
The invisible text is then usually full of keywords often unconnected to the content to try to drive more traffic to the page.
Webmasters can accidentally set the color of text incorrectly so it is always a good idea to highlight the whole page and see if any text appears. On the PC, you can use ctrl A to highlight the whole webpage.
Redirects have a specific purpose and website developers most often use 301 redirects to direct the visitor (and bot) from an old URL to a new URL where the requested information now resides. These redirects are usually within the website and so not seen as spam. The other common use is to redirect other tlds e.g. .net or .org to .com.
Redirection outside of these examples should be done carefully and judiciously. A major spam flag occurs when a bot is redirected and a visitor is not. In other words, always try to present the same information to the bot that you present to the visitor.
If you do need to prevent the bot from seeing certain information, use the robots.txt or meta tags to do that.
I am not going to cover the other techniques listed simply because an honest website developer is not going to accidentally implement any of them. It would take a conscious malicious intent to put any of these into action.
Google Guidelines For Quality Raters
Here is the leaked document. At this point it is easily found on the Internet so I don’t mind linking to it here. It is in PDF format.