Web Analytics Demystified

Archive for 'Web 2.0'

Example uses of the visitor engagement metric

My post last week on measuring visitor engagement was pretty long by the time I outlined the calculation, so I put off publishing examples of how the metric could be used until now. I’m excited to see that this topic has generated so much interest, both in terms of comments and emails sent to me directly.

My goal for this post is to provide a few examples and explanations to show how the metric can be used to supplement our otherwise already-rich set of web analytics data. Since so many folks have been willing to explore the engagement metric, I have embedded a bunch of questions in this post in italics that I’d love your feedback on.

Distribution of engagement scores and segmentation. Here is the distribution of engagement scores for about six months at Web Analytics Demystified by percent of visitors. As you can see, these scores are left-skewed and tail off as the score increases, showing that nearly half (47.6%) of visitors to my site are “poorly engaged”. When I look at this distribution it makes perfect sense to me — what do you think?

I have created segments to group visitors by their engagement score: “Well engaged” visitors have engagement scores over 30%, “moderately engaged” visitors are those between 10% and 30%, and”poorly engaged” visitors score less than 10%. These segments can then be used to explore how the behavior of visitors in each engagement group differs by looking at my page and referring source dimensions (page, content group, referring domain, campaign, search phrase, etc.)

Identify relationships that might otherwise not be found. At the top of this report you can see the pronounced difference in visitor engagement (and traditional metrics) for “branded” and unbranded searches (“None”) bringing visitors to my site. Now, because branded searches are a component of the calculation (Brand Index), you definitely expect to see a difference between the two engagement scores. What is interesting is that while other metrics (duration, sessions per visitor, page views per session) show a slight difference, visitor engagement and conversion are all three times higher for branded searches. I think this difference observed in all the metrics is further evidence that brand-driven searches are bringing more engaged visitors — what do you think?

In the middle table you can see search phrases bringing visitor to my site, showing visitor engagement, page views per session, and sessions per visitor. Here three phrases stand out to me:

  1. “web analytics book” and “web analytics process”, neither of which are particularly distinguished from other search phrases based on page views per session or sessions per visitor but both of which have visitor engagement scores over double my site-wide average of 8.8%. This is important to me because these are un-branded search terms that are critically important to my business.
  2. “vendor discovery tool” which would appear to be pretty important based on traditional metrics but only stands out slightly using the visitor engagement score (at 13.6%) I spend a lot of time trying to figure out how to drive folks using the vendor discovery tool to take other actions (buy books, inquire about consulting) and this data suggests that there is an unrealized opportunity.
  3. “performance indicators” which shows that the visitor engagement metric is useful to identify terms that you’d think are important to the site but aren’t attracting the right audience (average engagement score for these visitors is only 5.6%)

I think this level of information is actually pretty helpful for identifying search marketing opportunities — what do you think?

Engagement-derivative metrics like “Percent Highly Engaged Visitors” are useful. Here you can see a select group of referring domains showing the percent of highly and percent moderately engaged visitors they’re sending my way (with conversion to show that engagement and conversion are in fact different!) Avinash Kaushik is sending me a few (0.2%) highly engaged visitors (thanks!) but Ian Thomas is sending me a bunch (70.4%) of moderately engaged visitors, many of whom are purchasing books (1.2% conversion rate.)

By looking at traffic from Avinash’s site over time (bar graph) I can see peaks and valleys in overall engagement from folks coming from his site, which would be useful to back into those peaks to try and determine what other blogger’s readers might be reacting to when they’re exhibiting highly-engaged behavior on my site (see late August and early September.) Given that Clint proved that conversion is a poor measure of success when trying to evaluate traffic from other bloggers, I think visitor engagement is useful for examining the non-revenue value of referring sources — what do you think?

Those of you who are looking for correlation between engagement and conversion, have a look at the data for Mr. Jim Sterne’s wonderful site emetrics.org —  5.6% of the folks coming from Jim’s site are highly engaged, 66.2% moderately engaged, and man-oh-man does Jim help sell some copies of Web Analytics Demystified.  You’re the man, Jim!

Visitor engagement is globally useful. At least in Visual Sciences Visual Site you can apply engagement metrics and segments to pretty much any dimension tracked. Here I’m looking at the percentage of “highly” engaged visitors (50% or more) in my “well engaged” segment broken down by country. Now, this is certainly more interesting in light of the total volume of traffic coming from each geographic location, and as I think about localizing my books and planning future trips around the world this information becomes very helpful.

There is more, including some of the more granular visitor-level stuff I talked about in the first series of posts on the subject, but I want to be sensitive to protecting the identity of individual users on my site. If you’re interested in helping me collect some “ground truth” regarding the engagement calculation, write me and I’ll explain how you can help.

So what do you think? Do the screen-shots help you understand the calculation better? Or do they still make it look super-complicated and scary? Is there something specific you’d like to see me demonstrate with the calculation? Or do you think you could come up with these same insights using more traditional metrics?

Web Analytics 2.0? I am more worried about Web Analytics 3.0!

If you’re reading the web analytics blogs, you’ve probably already heard about the recent presentations I’ve given on the subject of “Web Analytics 2.0″. The future of web analytics and the relationship between Web 2.0 technology and measurement is something I’ve been talking about for over six months — I actually have a Web Analytics 2.0 workshop that I regularly give that you can read about under Analytics Consulting on my site — but given that it is “conference season” it is no wonder that this subject is getting attention from other folks in the industry. I have given my presentation at Web Analytics Day in Brussels, SEMphonic X Change in Napa, and will be giving a variation on same at Jim Sterne’s Marketing Optimization Summit in October.

Due to demand, you can download a PDF of the presentation from the white papers section of my site. If you’re interested in learning more about Web Analytics 2.0, please give me a call and I’d be happy to discuss it with you.

Strangely enough, the slides that are generating the most interest and commentary are not those about the Web Site Optimization Ecosystem, the integration of quantitative and qualitative data, or the Web Analytics Demystified RAMP, but rather the few slides I included outlining my thoughts about Web 3.0 and what I am calling Web Analytics 3.0.

What the heck is Web Analytics 3.0?!

Before I can tell you what Web Analytics 3.0 is, I need to tell you what I think Web 3.0 is going to be. The good old Wikipedia basically dodges this by saying:

Web 3.0 is a term that has been coined with different meanings to describe the evolution of Web usage and interaction along several separate paths. These include transforming the Web into a database, a move towards making content accessible by multiple non-browser applications, the leveraging of artificial intelligence technologies, the Semantic web, the Geospatial Web, or the 3D web.

While I know that Judah is all hopped up on the notion of the semantic web, after having traveled to Tokyo and Europe in the past month, I find myself absolutely convinced that the next technology era will be characterized by our collective ability to access the Internet anyplace, anytime, using so many devices we begin to look back on computers much the same way young people do television today — as something nice to use when YouTube is unavailable. Rolf Skyberg, a disruptive innovator from eBay who I met in Rotterdam a few weeks back, called it “digital ubiquity” — the point where we forget that the Internet actually exists and take our ability to access information completely for granted.

Given so many sexy alternatives — 3D web, transforming the Internet into a database, artificial intelligence, and the such — why am I so convinced that in the next three years we’ll be talking about Web 3.0 when we talk about mobile phones and non-traditional browsers?

Easy. The financial opportunity available via the mobile Internet makes the billions transacted today look like pocket change.

Think about it:

Just think for a minute about how your browsing experience might change if the web sites you visited remembered you and delivered a tailored experience based on your demographic profile (theoretically available via your phone number), your browsing history (accurate because you’re not deleting your phone number) and your specific geographic location when you make the request?

Now think about how the advertising buying experience would change if the same were true, not to mention behavioral targeting. I mean, given GPS and demographic data, the behavior being tracked could be “works downtown during the day, checks Facebook on his phone often, lives in the suburbs, surfs sports scores from his neighborhood bar.” The Starbucks web site could have a link at the top with a coupon to save $1 on my double-tall non-fat latte in stores 1 block, 2 blocks, and 5 blocks from my current location; the Best Buy web site could have an in-store promotion for the store I am standing in, targeted to my age and gender; and my search engine could disambiguate my searches based on my demographic profile, my geographic location, and my recent search history to serve me paid search ads designed to influence my geo-spatial movement, not just my likelihood to click.

Jeepers, huh?

Sure there are privacy issues, but given the intensely personal relationship most people have with their cell phones, and the fact that far more people in the world have mobile phones than computers (Gartner estimates 271 million units sold to end-users by Q2 2007) it is easy to make a convincing case for mobile computing and digital ubiquity defining the next technology era, much like social networking, AJAX, XML, and mashed-up business models define the current Web 2.0 era we’re living in today.

Okay, mobile is the future. So what the heck is Web Analytics 3.0?

If Web Analytics 1.0 was all about measuring page views to generate reports and define key performance indicators, and if Web Analytics 2.0 is about measuring events and integrating qualitative and quantitative data, then Web Analytics 3.0 is about measuring real people and optimizing the flow of information to individuals as they interact with the world around them.

Your log file analyzer can do that, right?

The current state of mobile measurement isn’t about Omniture and Visual Sciences, it isn’t about JavaScript and cookies, and it isn’t about page views, visits, and visitors. Web Analytics 3.0 is going to be something completely different, and it will depend on completely new technology. Anil Batra and I talked about a project he did a few years back while he was at digiMine — he hacked together WAP gateway logs into a pseduo-log file, using the phone number in place of a cookie. Brilliant, and the fact that Anil has this experience propels him to very near the head of the class for Web Analytics 3.0 analysts.

In theory, the mobile Internet has many of the same measurements as the hard-wired Internet. But as the information the platform and device providers make available changes, something I very much believe will happen, the quality and volume of information at our disposal will increase and improve. The W3C document on “Mobile Best Practices 1.0″ already exists but surprisingly enough don’t have a section about logging requests or measuring user interaction. M:Metrics is out there providing analyst reports, but the service is more similar to comScore and Nielsen than WebTrends and ClickTracks.

This post is already extremely long but I wanted to start the conversation. In future posts, as time allows, I’ll expand on some of what I believe is possible and how. In the interim, let me know what you think! Am I wrong? Is Web 3.0 bigger than mobile? Or do you already have a handle on measuring your mobile content, even without GPS and phone numbers as unique IDs? Do you personally have experience doing analysis on mobile content? If so, I’d love to hear about your experience.

As usual, I very much welcome your comments but am happy to receive your comments directly via email. Also, if you’re a mobile service provider or device manufacturer concerned with how advertisers and marketers will measure their success through your platform, application, or device, I would love to talk to you about the Web Analytics Demystified vision for Web Analytics 3.0.

Congratulations to the WAA Standards Committee!

I wanted to say congratulations to Jason Burby, Angie Brown, and everyone on the Web Analytics Association’s Standards Committee for publishing their standards document last week. Given the number of web analytics terms they defined (26) and the somewhat slow process the Association has for getting documents approved, this effort is a huge milestone for the organization, one that Jason and Angie deserve great praise for indeed!

If you haven’t already downloaded and read the definitions, check them out here (PDF download).

While the PDF document says that the final product is “Web Analytics Definitions – Version 4.0″ this is clearly a “Web Analytics 1.0″ document. The committee relegated all of the really wonderful Web 2.0 stuff like AJAX, RSS, XML, and the such to the same confusing obscurity they exist in today with the comment “certain technologies including (but not limited to) Flash, AJAX, media files, downloads, documents, and PDFs do not follow the typical page paradigm but may be definable as pages in specific tools.”

Given the last year’s push towards measuring Web 2.0 the right way and some great, insightful work from folks like Ian Houston and Judah Phillips it is kind of a shame that this document doesn’t address event-based measurement architecture more directly. The group does define “event” but only does so under the header of “Conversion Metrics” stating that an event is “any logged or recorded action that has a specific date and time assigned to it by either the browser or server.

Sounds like the definition of a Web 2.0 event to me, but I’m not sure why this is relegated to conversion metrics.

Regardless, this is great and valuable and useful work on the part of these hard-working volunteers. But the definition of standards raises one particularly important question: Given the definition of standards, what the hell do web analytics practitioners do with them?

The Fundamental Problem

The fundamental problem with these definitions (and any standard definitions IMHO) is that without an enforcement mechanism they are unlikely to provide any real benefit to the folks in the trenches. As long as smart folks like Eric Enge at Stone Temple Consulting continue to uncover as much as a 154% difference in the measured number of visitors and a 161% difference in the measured number of page views between concurrently deployed solutions, the average web analytics end user should not be comforted by the existence of standards.

Put another way, it is not the definition of standards that makes a difference, it is the adherence to standards by technology vendors that will provide the portability of skills, knowledge, and solutions so desired by many in our industry. Jason Burby sagely points this out in his Clickz article on his volunteer work when he says:

“Companies often switch metrics tools and subsequently change the terms they use to discuss analytics. One tool will call something one name, while another tool calls it by a different name or applies different meanings to a very similar name. When people switch tools and bring data with them, they don’t get an apples-to-apples comparisons. As a result, companies lose the important year-over-year view.

Though the new standards won’t instantly take care of that issue, they provide a step in the right direction.”

The Barrier to the Adoption of Standards

The problem as I see it is this: For many web analytics vendors, the way they calculate some of the critical metrics in web analytics is the “secret sauce” in their solution. Consider the WAA’s definition of unique visitors which states that unique visitors are:

“The number of inferred individual people (filtered for spiders and robots), with a designated reporting timeframe, with activity consisting of one or more visits to a site. Each individual is counted only once in the unique visitor measure for the reporting period.”

This is perfectly reasonable, but the definition goes on to say that “a unique visitor count is always associated with a time period (most often a day, week, or month), and it is a non-additive metric.”

Do you wonder what the folks at Visual Sciences who have spent millions to perfect their “data wheels” technology that effectively removes the “time period” requirement would say to this? One of the major value propositions at Visual Sciences (at least during my brief tenure) was that time was irrelevant — if you wanted the number of unique visitors for the football season, you dragged your mouse across the calendar; if you wanted the number of unique visitors for a few hours during the day, you dragged your mouse; if you wanted the number of unique visitors to your site since recording began, you dragged your mouse.

You can make the case that this example more or less removes the time dependence associated with the WAA definition. But should all the vendors who don’t have this capability (anywhere you are forced to use metrics like “Daily Unique Visitors”) spend the R&D money necessary to eliminate the dependence on time? Or should Visual back this functionality out of their application?

When you start to think about these kinds of things, much less issues associated with data sampling and data roll-off that occurs for a litany of reasons, you can start to understand why I made this somewhat snide comment in a MediaShift article awhile back:

“A friend of mine described it as the most beautiful fantasy…but it would never happen,” consultant Peterson said. “Omniture has a $1 billion market cap, and I don’t see Omniture tearing apart their technology to calculate unique visitors and page views differently because all their competitors have decided there’s a different way to do it. It’s hard to imagine. Not impossible. Fantasies sometimes come true.”

Ironically the cost isn’t the main problem: The impact on existing customers who would be forced to learn new definitions and suffer from potentially dramatic changes in data collection and reporting is the main problem. Do you want to be the person who has to tell a Fortune 500 customer that because you’re adopting more standard definitions that their page view count will suddenly drop by 35% month-over-month?

I had to do that once. Trust me here, it wasn’t a fun conversation to have.

An Idea in the Absence of a Solution

Given that I think that the WAA has produced some incredibly valuable work, despite some potential barriers to the work’s adoption, I do have an idea that I would love to see the Association follow-up on, one that would add a tremendous amount of value to this already great work.

I would love to see the Standards Committee create a matrix of standards compliance for each of the vendors in the marketplace today. Basically a checklist that details on a term-by-term basis which vendors are currently using the WAA definitions that would let companies looking for a solution to include that criteria in their assessment. Something that would let everyone quickly determine:

  1. How standards compliant a given solution is (and which solution today is “most compliant”)
  2. Which standard definitions are calculated out-of-box in each solution (for example, “Original Referrer” and “Bounce Rate”)
  3. Which currently available solutions dramatically differ from the norm in their use of standard terms

Something like this would probably have to be backed up with some documentation or examples as proof points, just for reference. And yeah, this is kind of a lot of work, but if you think about it all you really need is for one WAA member per solution to poke around in their documentation and then someone (Jason and Angie maybe) to collate the results and write it up. I would be happy to contribute the matrix assessment for the web analytics solution I’m using now if that would up!

Who knows, maybe we’d discover that all the vendors are already standards compliant and there really isn’t a problem with definitions!

What Do You Think?
I’d love to hear what all of you think about the new standards and my concerns about how they’ll be used (or not used.) Am I missing something? Were you disappointed to not see something that spoke more clearly to your concerns about Web 2.0 technology? Or are you just pleased that the WAA published these definitions and see them as a small-but-important first step?

The Web 2.0 Measurement Group has moved to Facebook

Jeremiah Owyang from PodTech has been bugging me to start a web analytics group at Facebook.  I initially resisted but upon kicking the tires at Facebook a bit it seemed like a good idea.  If you’re already in the Web 2.0 Measurement Working Group you should have seen an email from me about shutting it down at Google Groups.  Either way, the group is now totally open to everyone and can be joined here:

http://www.facebook.com/group.php?gid=2668335473

My hope is that by putting the Web 2.0 Measurement Group on the premier “Web 2.0″ platform that it will drive interest and promote conversation.  If nothing else it is a reflection that sometimes you can bring audience to you and sometimes you have to go where the audience is.

I hope you’ll join me at Facebook. 

On NetRatings and time spent on site

In all of the fuss about NetRatings dropping page views as a metric used to calculate site popularity is the fact that the company actually did a pretty smart thing: they took my advice from February 15th of this year and rolled in a very valuable and useful “sessions” metric. Well, maybe it wasn’t my advice they took, but I think it was a great idea either way to drop page views since they’ve become increasingly inconsistent to instead focus on the one metric that is consistently applied and well defined, sessions.

Unfortunately NetRatings chose to focus their announcement on “total minutes” saying that time was a better measure of engagement. Personally I’ve never been a very big fan of the time spent metrics — I guess I’ve just looked too long and too hard at all the problems associated with how time is collected and recorded in the web analytics realm.

There is a really engaged thread at the Web Analytics Forum at Yahoo! Groups on this subject that is definitely worth a read if you’re interested.

And I’ll admit, I don’t have all the details associated with how panel-based services like Neilsen and comScore track time spent. If they’re actively tracking the user and only counting time when the browser window is active and the mouse is moving, well that would be a good use of the panel. My suspicion is that, like in web analytics, they’re simply recording the delta between the first and last request for a page in the domain — a strategy that suffers from a litany of well-described problems.

The two I see as most problematic are:

  • Single page visits are either difficult to count or not counted in time spent calculations
  • The amount of time a web page is open is likely only poorly correlated to their actual engagement with the page

Some have already noted that the fact that very popular sites like Google will do poorly in time spent on site because one of the dominant use cases involves only a single page (I search and I go.) Conversely, depending on how time spent on site is calculated, the search engines may have inordinately long times spent based on a search leading to a long browse time on a discovered site, leading back to the search results (same session, clock is presumably still ticking), leading to the next discovered site, etc.

I for one use iGoogle in exactly this way: I load the page frequently throughout the day and do nothing more than look at a single page view. In fact, unless Nielsen is either tracking the AJAX-interaction with the iGoogle interface, or counting single page view sessions, it is likely that my interaction with iGoogle is not counted at all. But let me assure you, I am quite engaged with the content in my Google portal (something that would be well evidenced by the total session count I generate at the site each day.)

As I looked back through the plethora of comments that my original post on using sessions to compare sites I noticed that I had made this statement in response to a comment from Jacques Warren:

  • If you want to compare two or more web sites, use sessions because of the reasons I outlined in my original post.
  • If you’re interested in the number of people coming to one web site (presumably yours), use de-duplicated unique visitors but be mindful of cookie deletion.
  • If you’re interested in the activity of people on your web site, and if you have a “Web 1.0″ web site, use page views but be mindful of issues like code coverage, proxies, robots, etc.
  • If you’re interested in the activity of people on your web site, and if you have a “Web 2.0″ web site built around RIAs, etc., use some form of event model.

I’ll stand by this. Until I know more about how N/NR and comScore calculate their time spent on site metrics it’s hard to believe their numbers to be any more useful or accurate than those provided by direct measurement systems. That said, I’d welcome a briefing on the subject from either company if they’re reading this and are interested in having me pick apart their methodology spending some time with me.

If companies really need to use time spent on site, they should consider using better key performance indicators for time such as Percent Low/Medium/High Time Spent on Site categories (something I talk about at length in The Big Book of Key Performance Indicators.)  That way N/NR could report on the percent of all tracked sessions that were “30 seconds or less”, “31 seconds to 5 minutes”, and “More than 5 minutes” (as an example) which would give us a more powerful view into the relationship between visitors and the time they spend on site.
At the end of the day I like that N/NR has provided a consistent and easily compared metric to their customers in “total sessions” which is what I will inevitably focus on as a measure of site popularity. Having devoted quite a bit of time to describing what I believe to be a solid measure of visitor engagement, it’s difficult for me to think about “time spent on site” (or even “total sessions”) as a good proxy. Time spent, recency, depth of session, session number, etc. are all components of engagement, not direct measures.

What do you think? Is Nielsen right and I’m crazy? Have you been looking closely at your time spent on site metric for years and are delighted that the rest of the world has finally caught up? Or are you like me and spend far too much time browsing from site to site, flipping from task to task, and thusly confounding clocks and counters on every site you visit?

I welcome your comments.

 
COPYRIGHT © 2011 WEB ANALYTICS DEMYSTIFIED, INC. ALL RIGHTS RESERVED. PRIVACY POLICY