Web Analytics Blogs

Eric T. Peterson has been working in web analytics for over ten years and has built up an incredibly rich body of knowledge about the subject, knowledge Mr. Peterson works to share every week here in his Web Analytics Demystified weblog. Whether you're new to the subject or the most experienced practitioner, you should join the thousands of people around the globe already subscribing to Peterson's blog and start reading today.

Subscribe to Eric T. Peterson's weblog

Archive for 'Web 2.0'

« Previous Entries Next Entries »

Congratulations to the WAA Standards Committee!

I wanted to say congratulations to Jason Burby, Angie Brown, and everyone on the Web Analytics Association’s Standards Committee for publishing their standards document last week. Given the number of web analytics terms they defined (26) and the somewhat slow process the Association has for getting documents approved, this effort is a huge milestone for the organization, one that Jason and Angie deserve great praise for indeed!

If you haven’t already downloaded and read the definitions, check them out here (PDF download).

While the PDF document says that the final product is “Web Analytics Definitions - Version 4.0″ this is clearly a “Web Analytics 1.0″ document. The committee relegated all of the really wonderful Web 2.0 stuff like AJAX, RSS, XML, and the such to the same confusing obscurity they exist in today with the comment “certain technologies including (but not limited to) Flash, AJAX, media files, downloads, documents, and PDFs do not follow the typical page paradigm but may be definable as pages in specific tools.”

Given the last year’s push towards measuring Web 2.0 the right way and some great, insightful work from folks like Ian Houston and Judah Phillips it is kind of a shame that this document doesn’t address event-based measurement architecture more directly. The group does define “event” but only does so under the header of “Conversion Metrics” stating that an event is “any logged or recorded action that has a specific date and time assigned to it by either the browser or server.

Sounds like the definition of a Web 2.0 event to me, but I’m not sure why this is relegated to conversion metrics.

Regardless, this is great and valuable and useful work on the part of these hard-working volunteers. But the definition of standards raises one particularly important question: Given the definition of standards, what the hell do web analytics practitioners do with them?

The Fundamental Problem

The fundamental problem with these definitions (and any standard definitions IMHO) is that without an enforcement mechanism they are unlikely to provide any real benefit to the folks in the trenches. As long as smart folks like Eric Enge at Stone Temple Consulting continue to uncover as much as a 154% difference in the measured number of visitors and a 161% difference in the measured number of page views between concurrently deployed solutions, the average web analytics end user should not be comforted by the existence of standards.

Put another way, it is not the definition of standards that makes a difference, it is the adherence to standards by technology vendors that will provide the portability of skills, knowledge, and solutions so desired by many in our industry. Jason Burby sagely points this out in his Clickz article on his volunteer work when he says:

“Companies often switch metrics tools and subsequently change the terms they use to discuss analytics. One tool will call something one name, while another tool calls it by a different name or applies different meanings to a very similar name. When people switch tools and bring data with them, they don’t get an apples-to-apples comparisons. As a result, companies lose the important year-over-year view.

Though the new standards won’t instantly take care of that issue, they provide a step in the right direction.”

The Barrier to the Adoption of Standards

The problem as I see it is this: For many web analytics vendors, the way they calculate some of the critical metrics in web analytics is the “secret sauce” in their solution. Consider the WAA’s definition of unique visitors which states that unique visitors are:

“The number of inferred individual people (filtered for spiders and robots), with a designated reporting timeframe, with activity consisting of one or more visits to a site. Each individual is counted only once in the unique visitor measure for the reporting period.”

This is perfectly reasonable, but the definition goes on to say that “a unique visitor count is always associated with a time period (most often a day, week, or month), and it is a non-additive metric.”

Do you wonder what the folks at Visual Sciences who have spent millions to perfect their “data wheels” technology that effectively removes the “time period” requirement would say to this? One of the major value propositions at Visual Sciences (at least during my brief tenure) was that time was irrelevant — if you wanted the number of unique visitors for the football season, you dragged your mouse across the calendar; if you wanted the number of unique visitors for a few hours during the day, you dragged your mouse; if you wanted the number of unique visitors to your site since recording began, you dragged your mouse.

You can make the case that this example more or less removes the time dependence associated with the WAA definition. But should all the vendors who don’t have this capability (anywhere you are forced to use metrics like “Daily Unique Visitors”) spend the R&D money necessary to eliminate the dependence on time? Or should Visual back this functionality out of their application?

When you start to think about these kinds of things, much less issues associated with data sampling and data roll-off that occurs for a litany of reasons, you can start to understand why I made this somewhat snide comment in a MediaShift article awhile back:

“A friend of mine described it as the most beautiful fantasy…but it would never happen,” consultant Peterson said. “Omniture has a $1 billion market cap, and I don’t see Omniture tearing apart their technology to calculate unique visitors and page views differently because all their competitors have decided there’s a different way to do it. It’s hard to imagine. Not impossible. Fantasies sometimes come true.”

Ironically the cost isn’t the main problem: The impact on existing customers who would be forced to learn new definitions and suffer from potentially dramatic changes in data collection and reporting is the main problem. Do you want to be the person who has to tell a Fortune 500 customer that because you’re adopting more standard definitions that their page view count will suddenly drop by 35% month-over-month?

I had to do that once. Trust me here, it wasn’t a fun conversation to have.

An Idea in the Absence of a Solution

Given that I think that the WAA has produced some incredibly valuable work, despite some potential barriers to the work’s adoption, I do have an idea that I would love to see the Association follow-up on, one that would add a tremendous amount of value to this already great work.

I would love to see the Standards Committee create a matrix of standards compliance for each of the vendors in the marketplace today. Basically a checklist that details on a term-by-term basis which vendors are currently using the WAA definitions that would let companies looking for a solution to include that criteria in their assessment. Something that would let everyone quickly determine:

  1. How standards compliant a given solution is (and which solution today is “most compliant”)
  2. Which standard definitions are calculated out-of-box in each solution (for example, “Original Referrer” and “Bounce Rate”)
  3. Which currently available solutions dramatically differ from the norm in their use of standard terms

Something like this would probably have to be backed up with some documentation or examples as proof points, just for reference. And yeah, this is kind of a lot of work, but if you think about it all you really need is for one WAA member per solution to poke around in their documentation and then someone (Jason and Angie maybe) to collate the results and write it up. I would be happy to contribute the matrix assessment for the web analytics solution I’m using now if that would up!

Who knows, maybe we’d discover that all the vendors are already standards compliant and there really isn’t a problem with definitions!

What Do You Think?
I’d love to hear what all of you think about the new standards and my concerns about how they’ll be used (or not used.) Am I missing something? Were you disappointed to not see something that spoke more clearly to your concerns about Web 2.0 technology? Or are you just pleased that the WAA published these definitions and see them as a small-but-important first step?

The Web 2.0 Measurement Group has moved to Facebook

Jeremiah Owyang from PodTech has been bugging me to start a web analytics group at Facebook.  I initially resisted but upon kicking the tires at Facebook a bit it seemed like a good idea.  If you’re already in the Web 2.0 Measurement Working Group you should have seen an email from me about shutting it down at Google Groups.  Either way, the group is now totally open to everyone and can be joined here:

http://www.facebook.com/group.php?gid=2668335473

My hope is that by putting the Web 2.0 Measurement Group on the premier “Web 2.0″ platform that it will drive interest and promote conversation.  If nothing else it is a reflection that sometimes you can bring audience to you and sometimes you have to go where the audience is.

I hope you’ll join me at Facebook. 

On NetRatings and time spent on site

In all of the fuss about NetRatings dropping page views as a metric used to calculate site popularity is the fact that the company actually did a pretty smart thing: they took my advice from February 15th of this year and rolled in a very valuable and useful “sessions” metric. Well, maybe it wasn’t my advice they took, but I think it was a great idea either way to drop page views since they’ve become increasingly inconsistent to instead focus on the one metric that is consistently applied and well defined, sessions.

Unfortunately NetRatings chose to focus their announcement on “total minutes” saying that time was a better measure of engagement. Personally I’ve never been a very big fan of the time spent metrics — I guess I’ve just looked too long and too hard at all the problems associated with how time is collected and recorded in the web analytics realm.

There is a really engaged thread at the Web Analytics Forum at Yahoo! Groups on this subject that is definitely worth a read if you’re interested.

And I’ll admit, I don’t have all the details associated with how panel-based services like Neilsen and comScore track time spent. If they’re actively tracking the user and only counting time when the browser window is active and the mouse is moving, well that would be a good use of the panel. My suspicion is that, like in web analytics, they’re simply recording the delta between the first and last request for a page in the domain — a strategy that suffers from a litany of well-described problems.

The two I see as most problematic are:

  • Single page visits are either difficult to count or not counted in time spent calculations
  • The amount of time a web page is open is likely only poorly correlated to their actual engagement with the page

Some have already noted that the fact that very popular sites like Google will do poorly in time spent on site because one of the dominant use cases involves only a single page (I search and I go.) Conversely, depending on how time spent on site is calculated, the search engines may have inordinately long times spent based on a search leading to a long browse time on a discovered site, leading back to the search results (same session, clock is presumably still ticking), leading to the next discovered site, etc.

I for one use iGoogle in exactly this way: I load the page frequently throughout the day and do nothing more than look at a single page view. In fact, unless Nielsen is either tracking the AJAX-interaction with the iGoogle interface, or counting single page view sessions, it is likely that my interaction with iGoogle is not counted at all. But let me assure you, I am quite engaged with the content in my Google portal (something that would be well evidenced by the total session count I generate at the site each day.)

As I looked back through the plethora of comments that my original post on using sessions to compare sites I noticed that I had made this statement in response to a comment from Jacques Warren:

  • If you want to compare two or more web sites, use sessions because of the reasons I outlined in my original post.
  • If you’re interested in the number of people coming to one web site (presumably yours), use de-duplicated unique visitors but be mindful of cookie deletion.
  • If you’re interested in the activity of people on your web site, and if you have a “Web 1.0″ web site, use page views but be mindful of issues like code coverage, proxies, robots, etc.
  • If you’re interested in the activity of people on your web site, and if you have a “Web 2.0″ web site built around RIAs, etc., use some form of event model.

I’ll stand by this. Until I know more about how N/NR and comScore calculate their time spent on site metrics it’s hard to believe their numbers to be any more useful or accurate than those provided by direct measurement systems. That said, I’d welcome a briefing on the subject from either company if they’re reading this and are interested in having me pick apart their methodology spending some time with me.

If companies really need to use time spent on site, they should consider using better key performance indicators for time such as Percent Low/Medium/High Time Spent on Site categories (something I talk about at length in The Big Book of Key Performance Indicators.)  That way N/NR could report on the percent of all tracked sessions that were “30 seconds or less”, “31 seconds to 5 minutes”, and “More than 5 minutes” (as an example) which would give us a more powerful view into the relationship between visitors and the time they spend on site.
At the end of the day I like that N/NR has provided a consistent and easily compared metric to their customers in “total sessions” which is what I will inevitably focus on as a measure of site popularity. Having devoted quite a bit of time to describing what I believe to be a solid measure of visitor engagement, it’s difficult for me to think about “time spent on site” (or even “total sessions”) as a good proxy. Time spent, recency, depth of session, session number, etc. are all components of engagement, not direct measures.

What do you think? Is Nielsen right and I’m crazy? Have you been looking closely at your time spent on site metric for years and are delighted that the rest of the world has finally caught up? Or are you like me and spend far too much time browsing from site to site, flipping from task to task, and thusly confounding clocks and counters on every site you visit?

I welcome your comments.

Worried about page views dying? Don’t be.

I found myself thinking, “Are we really having this conversation?” today after reading Steve Rubel’s post today on “What will replace the allmighty page view?” where Mr. Rubel commented:

“The page view is on life support. It fails to capture all of the myriad of ways consumers engage in online activities without ever leaving a web page.

Okaaaaaaaaaaaaay.

I suppose Steve is coming at this from a different perspective than anyone who works in the web analytics field, more-or-less looking at page views as a basis for comparing the relative value of one advertising opportunity to another. If that’s the case then yeah, page views are becoming increasingly limited in their utility.

But damn, as a web analytics professional, doesn’t all this talk about page views going the way of the Dodo bird just make your stomach feel all funny? Like, you know there are problems with the metric, but A) when compared to the other problems web site operators have vis-a-vis counting (cookie deletion, cookie blocking, poor implementations, caching, robots, lack of understanding, lack of interest) and B) when put in the context of the number of sites that still rely on good old fashioned HTML, don’t these proclamations seem a bit premature?

Is it just me? Maybe it’s just me …

Anyway, we can stop worrying about dying pages and dying page views now since the answer has been with us the whole time. It’s not unique visitors … too many problems with how unique visitors are counted, what with cookie deletion and some of the inaccuracies ascribed to panel-based services. It’s not time spent on site … the problems with this metric as the basis of comparison are many (connection speed, amount of content, quality of content, bathroom breaks, etc.)

It’s sessions.

Yep, sessions. Good old “start ‘em with the first page view and stop ‘em after 30 minutes of inactivity” sessions. And while they don’t necessarily solve the problem of how many impressions a site can serve (you need old fashioned web analytics for that), they provide a stable basis for comparison across sites:

  • Sessions are defined by a widely-used and widely-understood standard, the 30 minute timeout between subsequent page views. Heck, in the web analytics industry, it’s pretty much the only standard we have …
  • Sessions are counted once and only once when a visitor goes to a web site in a single web browser and are thusly not subject to inflation due to crappy web design or RIAs. No more complaints about MySpace!
  • Sessions are time independent, except for the session timeout. You can click away all day and you’ll still only count one session, unless you walk away for 30 minutes and one second …
  • Sessions mitigate out issues associated with error pages and the such, because again, the number of pages viewed is irrelevant after the visitor views the first page. Again, no more complaints about MySpace …
  • Sessions are not affected by cookie deletion and are not always affected by cookie blocking. Whoopie! We can stop bugging out about cookie deletion …
  • Sessions are not affected by users visiting sites from multiple web browsers, since regardless of location (home, work, etc.) the session is counted. Hurrah! No more massive over-counting of unique visitors during Fantasy Football season …
  • Sessions can be counted even when the visitor is not on your web site, depending on what tracking technology you’re using and how it’s deployed. For example, a session can be counted when someone reads a post in their RSS reader …
  • Sessions are easily tied back to relevant referring sources, such as advertising units, RSS feeds, search terms, etc. Yippie! Not only do we get more accurate counts, we know from where the sessions are originating …

Yep, good old fashioned sessions … who’da thunk it? You can call them “visits” if you’d like!

What’s better is that the reporting networks should just as easily be able to report on sessions as they do unique visitors. If they can report on “unique searches” and “time per person” and “page views” and all that, nothing should theoretically stop them from using “sessions” as the basis for reporting.

Clint Ivy pointed out to me that Hitwise uses sessions as the basis for their reporting platform, only they report however on percent market share and not the actual number of sessions which is almost certainly what advertisers would prefer to see. Neither of us were sure why they don’t give raw session counts, do any of you?

Just think of all the problems we can solve by using sessions to compare the popularity of web sites! No more complaints about newspaper sites reporting more unique visitors than live in the entire state. No more complaints about huge differences in reported numbers ascribed to cookie deletion. No more freaking out about inanimate objects dying …

What do you think? Am I crazy? Is it just me? As always, I welcome your comments.

Measuring social activities online using my visitor engagement metric (Part V in a series)

(If you need to catch up on where we are to date, have a look at my last post in this series on measuring visitor engagement.)

I had a nice conversation a few days ago with Jeremiah Owyang, Web Strategist at PodTech.net, on how I have been measuring engagement. Jeremiah has been thinking about how engagement is defined for some time and had a very fresh perspective on the subject which has somewhat expanded my thinking on the subject. Jeremiah, by virtue of being an “A-list” blogger (IMHO) gets great critical feedback from folks like Forrester’s Charlene Li (who says that my measurement is too explicit, oh well …) After we talked, I realized that I really needed to get the promised post on measuring “social engagement in a Web 2.0 world” out the door. So here it is.

One of the links that Jeremiah references is this one from Wiredset, published in November of last year. In their post, Wiredset gives a definition of engagement as “a consumer based measurement that regards interaction with an aspect of a brand or media property” and goes on to say that “Web 2.0 Engagement” could include activities (Jeremiah refers to these as “gestures”) like:

  • Publishing
  • Creating and Publishing to a Group
  • Posting
  • Subscribing
  • Favoriting
  • Adding Friends
  • Bookmarking
  • Emailing
  • Distributing
  • Streaming
  • Networking
  • Creating Mash-up Content

I absolutely agree with Wiredset, and they go on to say:

When measuring engagement, the level of user interaction (i.e. 200 vs. 2,000,000 streams) is an obvious and important component. Yet engagement is complex in that it is not comprised solely by clicks, but also a range of involved user actions.

If you’ve been reading along the entire time, you’ll note that my current definition of visitor engagement is derived exclusively from click-stream data and it tries to be as independent of content as possible. While this makes sense for a lot of reasons, the larger conversation (as Clint and Jeremiah wisely point out) is about how a visitor engagement metric can help us better understand the value of emerging Internet technologies.

While Web Analytics Demystified is not your typical Web 2.0 or social community site, I have enough of the activities listed above on my site to apply a social media filter to my measurement calculation and look at the effects. Again, if you’ve been reading along, I covered many of these in Part III of this series.

Here is the list of things that I am tracking vis-a-vis social media/Web 2.0 on my site:

Now, up until this point I have basically fought applying any weighting to the visitor engagement metric, mostly because I think it’s pretty difficult to rationalize any particular weighting over another and it will complicate what has already been described as “the mother of all KPIs”. That said, I am scoring these social activities into what I call an “interaction index” (ratio of sessions with one of the activities above vs. sessions without) and using the interaction index to weight the visitor engagement metric.

So instead of the existing definition of visitor engagement:

We have the new definition of “Social Engagement”:

Both metrics are the sum of component indices divided by seven, so you can hopefully see that the latter metric is weighted by any contribution made by the “Interaction Index”. For definitions of the component indices, please see Part IV in this series.

So what does this give us? Well, if you were interested in tracking individual users based on their level of visitor or social engagement, you would be able to drill-down along each Web 2.0 activity and perhaps learn something interesting:

There is Frank Faubert from Unica again, not much more socially engaged with my site than he is otherwise engaged. Remember that Frank initially complained about his only having a 21 percent engagement score, to which I responded that I had lost him in my data. Well, I found him, and based on the evolving calculation, Frank is over 31 percent engaged but little of his measured engagement is “social” in nature.

But what if I drill-down along each of my defined social activities, what can I learn?

First we can see my good friend Jeff Katz, formerly of WebTrends, who is a regular reader of my blog and whose social engagement score is much higher than his visitor engagement score. Jeff has repeatedly joined the community (Web 2.0 Measurement Working Group, Web Analytics Wednesday attendee) and has also hosted a WAW event here in Portland, OR.

Looking at direct engagement via email, we can see the great Aurelie Pols from OX2 Belgium who has also submitted comments to my blog.

I can also apply the visitor and social engagement scores to other relevant dimensions like referrers:

Here you can see that I’ve calculated the variance between visitor and social engagement and am color-coding that against my site referrers. O’Reilly’s XML.com, E-consultancy, and Jim Sterne’s Emetrics web site all are sending visitors who are well-engaged socially.

Finally, you can see the difference between visitor and social engagement applied to the various blog posts I am tracking for Clint Ivy, Ian Houston, Robbin Steif, and Avinash Kaushik. Clint’s open letter to Jeff Jarvis (a controversial piece if ever there was one) is driving a great deal of Web 2.0 engagement amongst Clint’s readers. Nice work, Clint!

Hopefully you get the picture here. By weighting the visitor engagement metric with these social media activities, I am able to easily identify individuals, referring sources, marketing campaigns, rich Internet applications, etc. that are actively interacting, both on my site (join community, engage directly, submit a comment, contribute content) and off (host an event, share a social bookmark).

Wiredset’s proposes a distilled definition of “Engagement = Interaction/Attention” which makes sense to me … you have attention by virtue of their coming to the site, but can you drive interaction? I would propose that the visitor and social engagement metrics I have described in this series of blog posts describes this equation practically applied.

As always, I welcome your comments and criticism.

« Previous Entries Next Entries »
Mobilytics