Click here to read the latest newsletter!
Submit Your Site For Free!

Email Address:
* URL:
*
*Indicates Mandatory Field

Terms & Conditions

DevWebProNL
FlashNewz
DevWebPro









The Latest Internet News
Add Headlines for your site


Increasing SEO Benefits By Adding Page Segmentation To Your Site

By David Harry
Expert Author
Article Date: 2009-07-14

Brother Bill expertly covered (yet another) patent on page segmentation the other day; this time from Yahoo. It is certainly an area that SEOs really might want to be paying more attention to these days. Considering that each of the big three have dabbled into it (to varying degrees) over the last ½ decade, there is every reason to believe that something might be here (all that ‘where there is smoke'… yada yada).

Page segmentation has widely been developed for use in OCR applications to better understand text/image relationships. More recently, we've seen this area of expertise brought to the online world. One important aspect that should be of particular interest to those in the SEO world is in the link related ramifications.

Let us consider that Google, the king of link reliant search engines, would certainly find a huge benefit from this approach for valuating links (or even indexation decisions).

An interesting example is the recent update to the Google Blog search that fixed the issue of Google indexing blog roll links (problematic when using the link: command). As soon as that news broke I thought, ‘How'd they do that I wonder?' - Obviously page segmentation came to mind.



Benefits of page segmentation
Now, as fastidious little detectives, the first thing we need to look for is motive. Why would a search engine want to do this? A few of the potential benefits include;
  1. Crawling/indexing resources - once the template structure of a site is prioritized, infrequent crawling of, or discarding certain layers would save on computing and storage costs.
  2. Topic Drifting - some pages will have more than one topic and search engines can struggle to properly index/categorize them.
  3. False positives - it can also better deal with problems that arise when a page contains links/citations to other topics not directly related to the content of the page. This approach would help improve the quality of the results in avoiding such false positives.
  4. Spam Detection - obviously understanding boilerplate and other inherently spammy elements would be an important part of the value and attraction.
  5. Paid links - it can also could go a long way in fighting link spam and paid links even. By devaluing segments or altogether passing them over, it makes the practice less enticing and handicaps the activity.

These are but a few ways that page segmentation can be a useful tool for search engines. There are more, but I just wanted to get you up to speed (for more see last post on it). What we're concerned with this time is how it has the potential to change the world of links as we know it… m'kay?



Page segmentation link values
Links in the chain
What is most important is how these types of approaches would affect link valuations and by association, link building campaigns. There are a few distinct areas where link builders might want to consider that can affect strength of a given link location;
  • Segment indexation - one of the benefits of page segmentation is deciding which parts of the page should or should not be indexed. For example we could suppose that a side panel segment titled ‘Our Sponsors' may be passed over altogether when the page is indexed. As you might imagine, having a link in a part of the page that the search engine doesn't index, could be problematic.
  • Topical diversity - imagine if a page has multiple topics and multiple link types going to that page. If the search engine can establish which links are to which segments, some segments may be of more value than others.
  • Segment location - if search engines can understand the (boiler plate) template of a page as well as common segments, it would be easy to refine the values. We could find that various areas of the page would be valued more than others in a kind of PageRank dampening.

It should be noted that most of the patentsbrapers on the topic have gone far into the whole link valuation area, but it seems quite intuitive that this would be a valuable tool. This means there would likely be a premium on the editorial/content areas of a given web page.

One paper that does go into it is Microsoft's; Block Level Link Analysis (PDF) where they discuss a Block Level PageRank, (BLPR) approach.



Some possible areas that a search engine might dampen the value of links are;
  1. Header/footer links - easily definable in most templates.
  2. Navigation links - internal site links in segments
  3. Blog rolls - as mentioned before, this could already be in play.
  4. Advertiser/Supporter side bar links
  5. Forum signatures - recurring template elements
  6. Blog comments - sorry mister comment spammer :0(
  7. Social user pages - profile pages etc.. common elements
  8. Directory/link pages - very easy boilerplate to identify

As you can imagine, this would put a premium on links within the content of a page and varied degrees of devaluation for other placements.




Time to link ahead
Ok, sure… this is all interesting, but is it real? There certainly does seem to be ‘something' going on with link valuations the last few years, but we can't be sure it has anything to do with segmentation. But I can't shake the feeling that this is an important discipline to be aware of. Another interesting tidbit that I can across, which further denotes Google's interest in some form of segmentation, was a comment in a post about HTMM;

"(…) shifting topics within a document, and in so doing, provides a topic segmentation within the document," - Google Research 2007



And they (partially) sponsored the; ICDAR2009 Page Segmentation Competition







Anyway, there is at least enough logic and anecdotal evidence to at do some testing and see what we find. It is likely a better route for SEOs these days than ‘real time social search' if you ask me. This is certainly something that we'll try and get some of the Dojo warriors working on, (I'll report back on that).

For now, it is definitely an area to watch for and consider when future proofing your link building efforts. And really, aren't the editorial links what one is after anyway? And why is that? How does Google know which links are which? …. Maybe they're further ahead on this than we know… time to break out some fashionable tin foil I'd say… because where there is smoke… well, you know the rest.



If ‘Block Level PageRank', (BLPR) isn't in your lexicon, maybe it's time it was…

 
Resources;

Posts;

Microsoft Granted Patent on Vision-Based Document Segmentation (VIPS) - SEO by the Sea

Google and Document Segmentation Indexing for Local Search - SEO by the Sea

Page segmentation; ignore at your own peril - fantomaster



Research Papers

VIPS a Vision-based Page Segmentation Algorithm - Microsoft : Nov. 1 2003 : Deng Cai, Shipeng Yu, Ji-Rong Wen  and Wei-Ying Ma

Block Based Web Search - Microsoft research : 2004 : Deng Cai, Shipeng Yu2, Ji-Rong Wen, Wei-Ying Ma

Block-level Link Analysis : Microsoft : 2004 : Deng Cai, Xiaofei He, Ji-Rong Wen,  Wei-Ying Ma

Learning Block Importance Models for Web Pages - Microsoft : 2004 -

Page-level Template Detection via Isotonic Smoothing : Yahoo : May 2007 : Deepayan Chakrabarti, Ravi Kumar, Kunal Punera

Also somewhat related is the webtables project which is similar to Google set's, but looks for data or layout elements via HTML etc… (Hat tip to Bill on that angle);Uncovering the relational Web - Google : WebTables: Exploring the Power of Tables on the Web - Google

Patents

Microsoft;

Vision-based document segmentation - Microsoft : filed; July 28, 2003 and awarded Sept.23 2008 : Wen; Ji-Rong (Beijing, CN), Yu; Shipeng (Beijing, CN), Cai; Deng (Beijing, CN), Ma; Wei-Ying (Beijing, CN) - and on SEO by the Sea

Method and system for calculating importance of a block within a display page - Microsoft : filed Apr 2004 and assigned April 2008 : Ma; Wei-Ying (Beijing, CN), Wen; Ji-Rong (Beijing, CN), Song; Ruihua (Beijing, CN), Liu; Haifeng (Toronto, CA) : (great coverage from Bill as always)

Method and system for identifying object information : Microsoft: filed April 2005 and assigned June 2008 : Wen; Ji-Rong (Beijing, CN), Ma; Wei-Ying (Beijing, CN), Nie; Zaiqing (Beijing, CN)

Retrieval of structured documents - Microsoft : filed Mar. 2006 and awarded sept 23 2008 : Wen; Ji-Rong (Beijing, CN), Cui; Hang (National University of Singapore, SG)

Yahoo

System and method for detecting a web page template - Yahoo : filed May 2007 and assigned Nov 2008 : Chakrabarti; Deepayan; (Mountain View, CA) ; Punera; Kunal; (Austin, TX) ; Ravikumar; Shanmugasundaram; (Berkeley, CA)

System and method for smoothing hierarchical data using isotonic regression : Yahoo : filed May 2007 and assigned Nov 2008 : Chakrabarti; Deepayan; (Mountain View, CA); Punera; Kunal; (Austin, TX); Ravikumar; Shanmugasundaram; (Berkeley, CA)

Techniques for approximating the visual layout of a web page and determining the portion of the page containing the significant content - Yahoo - filed August 2006 and assigned Feb 2008 - Kesari; Anandsudhakar

Google

Document segmentation based on visual gaps - Google : filed Dec 30 2004 and assigned Sept 2008 : Daniel Egnor

Systems and methods for analyzing boilerplate - Google : filed March 2004 and assigned Feb 2008 : Stephen R. Lawrence;

Comments

About the Author:
David Harry is the President of Reliable SEO and has been building and marketing websites since 1998. He can be found writing about search and internet marketing on the Fire Horse Trail and is the author of the SEO Handbook series.

http://www.reliable-seo.com
http://www.huomah.com
http://www.the-seo-handbook


Newsletter Archive | Article Archive | Submit Article | Advertising Information | About Us | Contact


DevWebProNL is an iEntry, Inc.® publication - 1998-2009 All Rights Reserved Privacy Policy and Legal