3/20/2003

A MODEST WEBMASTER'S INTRODUCTION TO GOOGLE GROKKERY

At this point in history the collective brain of the Internet is at best a blithering fool, an idiot savante whose ramblings can be strung together sensibly only by the keenest of efforts and most dedicated panning. I have learned from first-hand experience that having some insight into the way the world's premier panning tool works helps websites get noticed by other search engines and directories, and surfed by visitors.

Before we start, let it be said that lots of things go into making a good website, and there are a lot of variables to consider beyond just specific search engines or directories when considering a web marketing strategy. This, however, is a narrow focus article. Don't write to me telling me there is a web beyond Google. I know. Shut up.

Secondly, I don't claim to have the fanciest website this side of San Francisco. I have a pretty basic collection of pages including a portfolio, description of professional services, and links to independent projects, built with non-cutting-edge HTML and minimal javascript, with QuickTime required for the multimedia components. I also have some kibbles of miscellania like this one, clustered around my homepage and my blog, which serve to feed some additional traffic to my main website. People surf my pages, some of those people contact me via e-mail or telephone, and some of those people end up giving me jobs. I get enough such jobs that I can pick and choose between them, so I judge my website to be a modest success.

In terms of traffic my site is also modest, in that the server tends to shell out between six and eight hundred pages per day at the time of this writing, which isn't a very strenuous load. (Note: counting pages is different than counting raw hits, which include calls for graphics, menu bar icons, robot restrictions files, error files and other crumbs.) Big commercial websites need over ten thousand page-views per day to start doing real e-business (they need millions to actually stay there), or so I have read, so in comparison my current page-view average isn't anything particularly remarkable.

But keep in mind: my average used to be five pages per day, when I first secured the domain (my old domain remains unused at the time of this writing). And on a typical day that meant two of those page-views were me checking to see if the domain name resolution was working, one of them was a type-o from somebody in Hong Kong and the resultant 404 error, and the other two were spambots looking for e-mail addresses to harvest. Welcome to the ground floor.

My site's pubescence came when I took the time to learn a little more about that sexy, secretive super-vedette of search enginery, Google [sic]. Though I had known about Google for some time, I had not really come to appreciate the fact that among surfers at large it was enjoying the fleeting privilege of being the pope of the world wide web -- a sort of Yahoo for the digitally-literate, in the sense of a symbolic portal.

My curiosity was first aroused when I noticed that my resume tended to turn up on Google's first page of results for my very specific queries, while my main sales pages languished far behind, on the tenth page or deeper. Like most amateur web spinners I had picked up on the old inertia that put a lot of emphasis on keywords listed in meta tags; I wondered why this technique wasn't yielding fruit. Though I was unfamiliar with the term "weblog" at the time, the answer turned out to be because of the blog-like character of my resume: a certain density of regular text and foreign links, updated regularly with dated entries. I have learned a lot since this initial curiosity, but it still does illustrate the most basic credo of the Googlebot: fresh, topical content and hyperlinks with text context are good.

The basic credo of Google itself is don't be evil. Glib as it sounds, this is a powerful message for a corporation to live by. And since Google has so far seemed to live by it so consistently, the only way to get along with Google is to emulate it. That is: the best to do well in Google listings is to feed Google what it wants: a good, non-spammy website that can be sensibly navigated and that features high-quality content along a theme. Featuring good content goes further with Google than trying to trick it with stuffed-crotch meta tags or link farms, because the latter approach is evil and Google is designed to sniff out and ignore evil (or sentence the site to be ignored, however you want to understand it). Shameless self-promotion will naturally cause some friction with Google, because Google has the capacity for shame.

Beyond that, here are some general guidelines I have gleaned from my armchair, attempting like so many others to divine the secret algorithms behind the curtains:

Google Likes Text
That may seem obvious, but some people get confused when their "Gallery Page" exclusively featuring photographs and navigation icons made from images without alt tags isn't well ranked. How does Google know your page is about leprechauns when you never actually use the word "leprechaun" in plain text on the page? This tip should be underlined and written in fluorescent lettering for people whose sites exist only through Flash or other multimedia schemes whose output is not parsed by the Googlebot at all.

Google Likes Formatting
How does Google tell whether your page is about leprechauns versus just making a passing reference to leprechauns? Through formatting, the Googlebot attempts to rank the importance of a given keyword. For instance, a keyword echoed in H1 or H2 headline text or boldface type is taken to be a more primary subject than one merely mentioned in the body text (though the number of occurances in the body text is relevant as well).

Google Respects Exclusion Standards
Use them. You should have a ROBOTS.TXT file at the root level of your website which states your general rules, and then specify in the header of your pages any specific restrictions using the robot exclusion meta tag. This should be used to avoid embarrassing things like seeing your navigation frame ranked higher than your actual content pages, by indicating in the robot tag that the links are to be followed, but the content of the navbar.html file itself is not to be indexed.

Google Likes Outbound Hyperlinks
Linking to similarly themed sites that Google already respects adds a layer of sugar to your content. Google is happy that you have contributed your grey matter to culling the content of the Internet, and rewards you with a certain amount of good faith that you know what you're talking about, and are not linking random, stupid things.

Google Loves Inbound Hyperlinks
Being linked to by a respected site (in this context a site with a high PageRank) is the single best way to boost your listing. Being linked to from sites themed with related keywords amplifies this effect, especially when the anchoring link text itself contains relevant keywords (for example "cool leprechaun site" rather than "click here" being the actual underlined text on the foreign site). Of course, Google knows that you know that hyperlinks are important, so Google keeps a sharp eye out for a practice known as "link-farming", where websites attempt to simulate the appearance of popularity by either exchanging links with strangers shamelessly or by creating circles of forwarding-domains which all point to one (sometimes duplicated) resource. How do you know if your website's friendly reciprocal linking policy smells like link-farming? Simple. Ask yourself: is it your intention to deceive? In other words: are you being evil? If you're not being evil, you're probably in the clear (despite paranoid discussion threads to the contrary).

Google Likes Accessibility
Google has respect for all browsers, and thus feels more warmly about pages that can be parsed by any browser, no matter how humble, rather than cryptic pages that require the latest and greatest browser bristling with bleeding-edge plug-ins in order to work. Google looks for good old HTML tricks like internal anchor names related to sub-topics, subject headlines, text justification and paragraph size to help determine the kind of content a page contains. While Google understands cascading style-sheets and some other dynamic tricks, always keep in mind how these pages will be parsed by low-end browsers before you publish. Google gets mad if you snub the blind man using Lynx, because that's evil.

Googlebot Isn't Psychic
Other than coming in the front door (to your HTML root directory, usually http://www.domain.tld), the Googlebot can only follow links that you provide to it. If you have pages that aren't linked to from anywhere else on your site (called "orphan" pages) they won't get spidered by Google or by anyone else for that matter. So, always remember to connect your pages together -- it is a web, after all.

Location, Location, Location
Google likes you to tell it where you are. It thinks city names are delicious.

Direct Me To Your Keyword
You score points for having a relevant keyword in the name of your directory or your actual HTML file, like http://domain.tld/littlefolk/leprechauns.html. Some sources, like this informative discussion thread, suggest that this practice is only good for a few levels deep, after that becoming potentially harmful to one's PageRank. In other words, you may not be making Google happier by going with http://domain.tld/folk/legendary/little/western/irish/domestic/leprechaun-related/leprechauns.html -- it may be more specific, but it feels spammy.

Google Likes Long Pages
I have often read the opposite advice, but my personal experience has consistently been that pages with long text content do better than shorter ones. The frequently quoted pearl that only the first twenty-five words of the page's plain text really count is nonsense.

Google Likes Experts
If you know about something, write a personal or formal essay about it, humble or fancy as you like, and post it on your site. If other people find your knowledge useful in any way they will find their way to the rest of your site (assuming you have semi-decently designed pages). The more obscure the topic you cover, the better you'll do as an expert. At the time of this writing my own musings on the subject of greasy Quebecois junk-food is the number one result for the query poutine. (If my website was about selling poutine, that would be a real feather in my cap. As it is, it probably just sends me a surge of raw traffic, which is good too.)

Invisible Text Isn't Necessarily Evil
I do use so-called "invisible text" on some of my title pages. I do this because I used to see these title or intro pages listed in Google with no description or keywords beneath the the title-link. By inserting some text coloured the same as the background I have maintained my design integrity while making the pages provide readable descriptions in the listings. Some people say that this is a spammy, evil activity since it reeks of burying pages in invisible keywords, but I think that they are being too paranoid. Use complete sentences, and don't go overboard.

Don't Obsess
The search engine results pages archive is a constantly roiling soup, a dynamic and fleeting thing, subject to the throes of the monthly dance as the servers around the world synchronise the Google database and the day to day fartings of endless everflux. Listings move up and down all the time. Don't panic. Be patient.

Google Likes Freshness
The Freshbot has been described as a sort of bonus feature, which indexes select portions of the web on a daily or weekly rather than monthly basis. It is hard to say what exactly makes the Freshbot visit more or less frequently, but it is true that the Freshbot is attracted to blog-like and/or newsfeed-like features. So, update your site often with topical, contemporary information...

...Here I am following my own advice. Share and enjoy!

No comments: