Wednesday, May 23, 2007

User Agent detection and segmentation

Web browsers indentify themselves with a particular string. It's called the User Agent string. In order to better fit contents to different scenarios, trying to segment the whole spectrum (or, at least, a big chunk) of user agents proved to be a daunting task.

While working on a prototype for an adaptation engine for Web based documents, I came across the need of finding out which type of Web browser is requesting a document. While nowadays some efforts are being made to ease this task, such as UAProf and WURFL. However, being pragmatic, an ubiquous availability of these two technologies may take several years to take off. Therefore, the simplest way of doing it today is sniffing the request's HTTP header, and looking for a User Agent field.

From the HTTP 1.1 specification, Web browsers and other Web user agents (e.g., crawlers) may identify themselves with a specific string on the header of each HTTP request. The production rule for this header is:

User-Agent = "User-Agent" ":" 1*( product | comment )
product = token ["/" product-version]
product-version = token


What's the meaning of this expression? Basically, it states that this header field should start with the string User-Agent:, followed by a product name and its version, or a comment. This must appear at least one time, at most... infinite times. Hence, user agents identify themselves with almost arbitrary strings, as long as they comply with the production rules. Headache warning.

Despite existing a huge amount of Web browsers available in the market, my adaptation engine should indentify them according to their segment, such as desktop browsers, mobile browsers, etc. But, thanks to HTTP's loose user agent rule, putting browsers correctly on their segment is really hard (read: cumbersome, error prone, nearly impossible).

Here's a quick sample of user agent strings from miscellaneous browsers, taken from a huge list found elsewhere:


  • Internet Explorer 7: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727)

  • Firefox 2.0: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.8.1) Gecko/20060918 Firefox/2.0

  • Sony Ericsson K610i builtin Web browser: SonyEricssonK610i/R1CB Browser/NetFront/3.3 Profile/MIDP-2.0
    Configuration/CLDC-1.1 UP.Link/6.2.3.15.0

  • Pocket PC Internet Explorer: Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)



Once again, pragmatics tell me that tailoring the Web towards each single device is unfeasible. Hence, it should be possible to define different segments and associate each User Agent string to the appropriate segment, through a set of heuristics. Even when WURFL, UAProf, or even more recent work from W3C's Mobile Web Initiative Device Description Working Group becomes widespread, segmenting the Web end-points - browsers - into a treatable set of characteristics will continue to be useful.

Going back to the User Agent strings mumbo jumbo, my initial proposal relates to distinguish between the mobile and desktop landscapes, and it goes something like this (beware - pseudo-code algorithm):

function user_agent_segment(string ua_str)
{
switch (ua_str)
{
case /MSIE/ except /PPC|PocketPC|Windows CE/:
case /Gecko/:
case /KHTML/:
case /Opera/ except /Mini|Mobile|Wii/:
return DESKTOP;
default:
return MOBILE;
}
}


The simple, yet crucial, aspect of this algorithm relates to detecting desktop browsers at first, since the (useful) desktop browser landscape is narrower (in comparison to the wildwest style huge range of User Agents on mobile phones). From there, one may just detect specific substrings.

If your keen on this topic, please feel free to implement, test, extend, and improve the algorithm. My (mid-term) goal lies on expanding it in order to detect and diferentiate mobile phones, ultra mobile PCs, and desktop environments (at least). Also, it could be somewhat interesting to extrapolate input mechanisms (i.e., modalities) available - e.g., if a mobile phone is detected, we may infer a numeric pad (and possibly arrow/cursor keys) as the available input modality. This way, navigation on a Web site may be tweaked in order to facilitate user interaction, thus leveraging the user's experience and increasing one's satisfaction.