Nick Arnett challenges my visitor engagement calculation
Nick Arnett from MCC Media (and one of the creators of Buzzmetrics) posted a very well though-out and moderately critical assessment of the visitor engagement calculation I wrote about earlier this week. Nick makes some great points and I thought it was worth addressing them while I prepare the follow-up post that shows off some of what the metric can do. My comments are preceded by ETP and Nick’s statements are in italics.
Definitely thought-provoking, Eric… I’m deep into this issue, although focused specifically on community sites.Overall, your approach doesn’t work for me on two main counts — it is too complicated (and thus unlikely to become any sort of standard) and doesn’t generate a metric that allows different sites to be compared. The latter is arguable, since standardized weightings could yield comparable numbers, but I think that’s excess complication also.
ETP: I’m sorry the calculation doesn’t work for you but I do appreciate your thoughts on the subject. Regarding it being too complicated, compared to what? Compared to “simple” metrics like bounce rate and average page views per session? Or compared to the technology you built to power Buzzmetrics? I guess I separate the complexity of making the calculation from one’s ability to actually explain the calculation.
ETP: Regarding using this metric to compare different sites … as I mentioned in the post, I don’t think there is “one” measure of visitor engagement and thusly trying to compare sites is probably a futile effort at best. I suppose you could remove the Brand, Feedback, Subscription and Interaction indices and come up with a standard set of threshold values for specific vertical markets, but I’m not sure that is really the best use of this calculation.
Is there any ground truth behind this? In case that isn’t clear, do you have any sort of primary market data for engagement that correlates with the output of your engagement metric?
ETP: Hmmm, here I’m not sure what you mean. What kind of primary market data is actually able to identify “engaged” visitors? Because I am able to see individuals interacting with my web site, I did talk to a handful of people based on their engagement scores when I was doing the original work on this metric, and some of their feedback was critical to tweaking the metric and inputs to its current state. But other than that I’d love to see the primary data you’re talking about if you’re able to share it!
As I’ve dug into the issues and our data (about five dozen communities ranging from very large to very small), I keep coming back to two main indicators of engagement — return rates and proactive behavior. If visitors don’t visit regularly and do something other than passive page viewing, I have a tough time including them in any measurement of community engagement.
ETP: Exactly why the Recency Index and Interaction Index are included in the calculation, but I disagree with your assessment that these are the only measures of engagement. I’m not sure exactly how I would determine that someone was only “passively” viewing pages, and again this metric is not designed to be a measure of “community engagement” but rather visitor engagement more broadly considered.
Some point-by-point thoughts…
Click-depth index — this is a place where ground truth really matters, I think. I’m not comfortable with the assumption that more clicks per session means greater engagement. Do we know enough about browsing behavior to know that this is true? And of course there’s the old problem of bad design resulting in more clicks… but when I consider that issue, I tend to think that if people show willingness to click through a bad design, maybe that means they really are engaged! Perhaps we should all include some known bad design… ;-)
ETP: I haven’t seen anything that says that more clicks means less engagement but I agree that confused people might generate more clicks. But I think it’s unlikely that confused and frustrated people would return, complete defined events, subscribe to blogs, etc. so despite your assertion that the metric is complex, multiple inputs are designed to mitigate those that may be confusing.
ETP: You do, however, make an excellent argument for not using something as simple as “click-depth” or “average depth of visit” as your sole measure of engagement.
I have pretty much the same questions about duration. Is there good, objective evidence that session duration correlates to engagement? There are visitors with long-duration visits who don’t visit regularly and don’t do anything proactive… I can’t see including them in any measurement of engagement.
ETP: It sorta depends on your definition of engagement, doesn’t it? But see my comment above about why a single measure like duration (as in Nielsen’s Time Spent ranking system) is perhaps inappropriate on its own to determine engagement.
Recency makes perfect sense to me — the fact that engaged visitors return often is practically a tautology. I would be very skeptical of calling anybody engaged if they aren’t returning regularly.
ETP: What about first time visitor? Are you saying you can’t be engaged on the first visit to a site? I agree that regular returns are a good indicator of engagement, but in my analysis the metric I’ve defined is able to resolve first time visitors into several engagement segments which I personally have found quite useful.
Your Brand Index is a great piece of data, but I don’t believe it works in a metric intended to compare sites. Language is too subtle and ambiguous to infer engagement from search terms. I spent years in the search engine and related markets, which gave me a great appreciation for the fact that what sometimes seems obvious about language isn’t. When people search on brand-related terms, it indicates *reach* to me, not engagement. I’m unwilling to assume anything more than brand awareness. People search on things they dislike, but that doesn’t mean they’re engaged with the subject they’re searching. And my data shows that visitors who show many other indications of engagement actually search *less* often.
ETP: Same comment about this metric not being specifically designed for comparing sites. I know that is the uber-goal for lots of folks in the world, it’s just not necessarily my goal or the best use for my engagement calculation.
ETP: Doesn’t “reach” plus “action” equal engagement? I haven’t spent years in search and related markets, but I struggle to believe that people searching on brands they dislike are not somehow engaged. Again, maybe this is a semantic issue arising from conflicting definitions of engagement.
ETP: Because the calculation is designed to be made over the lifetime of visitor sessions, searching less often is not a problem. I guess I more-or-less expect that the “direct” component of the Brand Index will be more important over time with truly engaged visitors (who wouldn’t be as likely to go back to Google and search on a branded term.)
Counting brand-related searches makes sense if we’re measuring *brand* engagement.
Counting direct (non-referred) visits makes sense if we’re measuring *site* engagement.
Counting both in the same metric doesn’t make sense to me. I don’t think we should even be talking here about ways to measure brand engagement… because I believe that’s well beyond the scope of site analytics. It requires massive monitoring systems along the lines of BuzzMetrics. (I’m the primary inventor of one of their systems.)
ETP: I’m not differentiating *brand* and *site* engagement since I’m trying to calculate an operational measure of ongoing *visitor* engagement. Brand is just a component, and the site is the measurement point. I think I understand your desire to differentiate the two given your background with Nielsen but I’m not trying to do the same thing.
One more problem with the Brand Index — people will argue all day long about what terms are appropriate to include… and there’s a strong incentive for site owners to err on the side of too many terms if their success is being measured by this metric. For example, you included “web site measurement hacks” in your list… but that could be a generic term. Is “Web Analytics Wednesday” really your brand? Or is it the WAA’s? I don’t want to argue which it is, just point out the kind of ambiguity that is inevitable.
ETP: Here I agree with you, coming up with a reasonable list is not easy, but web analytics is hard so at some point you have to make some tough decisions. “Web Site Measurement Hacks” is a book title and a branded term but could be a generic phrase. “Web Analytics Wednesday” is a branded Web Analytics Demystified term and has nothing to do with the Web Analytics Association. I don’t think there is that much ambiguity at the site level, at least in my experience.
Your Feedback Index is a specific instance of what I think of as the general principle of tracking proactive behaviors — what you seem to be getting at in your Interaction Index. In communities, visitors have many such opportunities — posting, editing, tagging, voting and so forth. I decided very early in this work to just give people one point for each such proactive action, despite the temptation to weight them (which would violate the need to keep things simple). These are the behaviors that make a community work; sites that aren’t based on user-generated content can exist without them.
ETP: Same comment about this calculation perhaps not being what you’re looking for vis-a-vis communities.
Your session focus really got me thinking. Does it make more sense to count the number of sessions in which visitors signal engagement or the number of actual such signals? I think it’s close to a toss-up, but so far, our ground truth suggests the latter — the number of proactive actions correlates better to our subjective estimates of engagement… but among our future tasks is to establish better ground truth. So far, I’m just using our community manager’s collective subjective scoring… but it correlates quite well to all but our smallest communities.
ETP: I agree, it’s probably a toss-up but if you think about the calculation all it does is count the number of signals. Long sessions are a signal, deep sessions are a signal, frequent sessions are signal, etc. I know you don’t like anything but recency and interaction but we can agree to disagree on this point. I’d love to hear about your “ground truthing” efforts and I’ll try and keep you appraised of mine.
The subscriber index doesn’t work for me because we want to be able to compare communities regardless of whether or not visitors are able to subscribe, join, become members or what-have-you. Some of our clients — e.g., a large professional sports organization — allow full participation without any need to sign up. Also, as I’ll explain below, I’ve found a strong negative correlation between highly active visitors and RSS subscribers.
ETP: Again, not designed for comparison (and at this point no wonder you don’t like my calculation!) I’d love to see the negative correlation data for RSS and yes, if you don’t have a subscription it doesn’t make sense to assign a negative penalty.
Finally, I guess I’ll toss out one of the ideas that I’m working with — segmenting visitors by proactivity.
In several ways, communities (and most web sites, I suspect) have a bimodal distribution of users. There’s typically a relatively large “Core” group that visits often, looks at lots of pages and does a lot of proactive stuff. There’s a middle ground, which I’m calling “Lingerers,” of people who fall into the 10th to 80th percentiles of such activities. Third and last, there’s a large contingent in the 0th percentile, people who might have one or two activities in a given time period, which I call the “Drive-bys.” In our communities, the Drive-bys are the largest group, but the Core usually is a bigger group than the Lingerers. What this says to me is that people tend to engage a lot or hardly at all — there isn’t much middle ground. I’ve been focusing on the Core’s relationship to the whole community for my engagement measurements. That’s what seems to correlate best to what little ground truth we have.
ETP: I am seeing a more normal distribution, especially as visitors return a third time, but it is definitely left-skewed towards lower levels of engagement. I’ll try and highlight this when I show some data that highlights the calculation in action. And since I’m not working on a community proper, I’ve found myself focusing on my middle group (“Moderately Engaged”) and trying to determine what I might be able to do to shift them up to “Highly Engaged”.
Overall, I’ve found that the Drive-bys and Lingers exhibit fairly similar behavior, but the Core is different. The Core visitors post more, search less and use RSS far less (so much for “subscribing” to RSS as a positive indicator of engagement!)
ETP: Your assessment of RSS being a poor indicator of engagement runs contrary to popular opinion (why would you subscribe to a RSS feed or email newsletter if you weren’t engaged??!) Perhaps this result is uncovering a flaw in your engagement calculation?
This post is getting long… so I’ll wrap it up (but ready to discuss further, of course) by repeating myself. I think any sort of engagement metric has to be backed up by demonstrating correlation to some kind of ground truth. Otherwise, it’s a mental exercise that runs the risk of having little relevance to the marketplace.
ETP: You keep coming back to the notion of “ground truth” but surely you recognize that this is A) extraordinarily difficult to come by and B) if we had it easily available we wouldn’t need a measure of engagement. I would love to see your “ground truth” data and talk about how you’re generating that, but unless I’m missing something it sounds a little impractical for widespread use. Still, I appreciate your feedback and very thoughtful comments and will endeavor to demonstrate the correlation between my calculation and “truly engaged” visitors.
Man, talk about a long post! What do you think? Is Nick more right than wrong? Are you focusing on communities and have the same concerns? Do you have similar concerns about your site? The conversation is almost as interesting as the metric and resulting analysis in my opinion so please, comment away!