Web Analytics Blogs

Eric T. Peterson has been working in web analytics for over ten years and has built up an incredibly rich body of knowledge about the subject, knowledge Mr. Peterson works to share every week here in his Web Analytics Demystified weblog. Whether you're new to the subject or the most experienced practitioner, you should join the thousands of people around the globe already subscribing to Peterson's blog and start reading today.

Subscribe to Eric T. Peterson's weblog

Archive for 'Cookies'

The comScore study on cookie deletion is finally out

I just happened to write my contact at comScore today asking about their follow-up report on cookie deletion.  He said it would be out today and here it is:

http://www.comscore.com/request/cookie_deletion.asp

This report does a good job of providing additional data and information about the comScore methodology in this report, something missing from the press release and critical to our collective understanding of cookie deletion.  This report explicitly addresses anti-spyware and the differences in third- and first-party cookie deletion, essentially showing that there is an anti-spyware effect but it is minimal compared to manual cookie deletion which appears to be the primary culprit.

comScore also presents some of the attitudinal data they alluded to in their press release, essentially confirming what I first reported at JupiterResearch in 2005 … that most consumers aren’t really sure what cookies do.

Since I last saw the report they added a few sections — one on international traffic and one on cookie blocking.  While the section on international doesn’t add much to the conversation other than to explain why panel-based and log-based systems numbers differ (something that should be fairly obvious), the cookie blocking data is pretty interesting.

According to comScore, if your web analytics application falls-back to an IP-based value for unique visitor identification in the absence of a cookie being successfully set, you’re likely worse off than you are simply dropping those visitors.  Their table on page 15 shows that due to dynamic IP assignment that the average home computer has 10.5 different IP addresses in a month.  Yikes!

If you’re into this stuff, or if you’re interested in how much cookie deletion might be impacting your own audience measurement, you should download the report and give it a careful read.  It certainly doesn’t provide a solution to the problem, but often times knowing is half the battle.

http://www.comscore.com/request/cookie_deletion.asp

I welcome your feedback on the report and the usual comments and criticism.

Ian Houston publishes very interesting cookie deletion data of his own

My friend Ian IM’d me last week and said he had confirmed comScore’s data on first-party cookie deletion. Since Ian is easily one of the sharpest people I know, I was immediately intrigued, given that he has been working on a methodology to restore deleted cookies using the browser’s cache and a dynamically generated script. Unfortunately Ian hasn’t been able to implement his strategy on a high-volume site, yet, but he did do a very robust comparison of measured site traffic data to comScore numbers.

What Ian saw by comparing the number of measured unique visitors based on accepted cookies to comScore data for the same site was, well, roughly a 2.5X inflation from panel to measured visitor counts. His monthly numbers ranged between 1.99X and 3.15X but he reports the average as 2.47X.

Ian also reported data for daily unique visitors where he saw an average inflation of 1.96X (range of 1.35X to 2.84X). Ian commented, and I agree, that the daily numbers are somewhat disconcerting given that they appear to support the notion that “serial deleters” are among the most engaged.

Keep in mind, these numbers are based on a direct comparison to the comScore panel-based numbers, numbers whose accuracy has long been questioned and continues to be questioned today.

As usual, Ian’s writing continues to be well thought out and well written, and I highly recommend reading him if you’re not already.  I also want to congratulate Ian on joining the team at WebSideStory/Visual Sciences.  The blogosphere loses a great practitioner but gains a great vendor/consultant (and to be fair, Ian has been a private consultant for as long as I have known him.)

comScore answers a few of my questions about their recent report

As I mentioned a few times in the Yahoo! group, I have been talking to the folks at comScore about their recent report on cookie deletion. I got an email back from Andrew Lipsman with some more information and partial answers to questions of mine and a few passed to me by other cookie-savvy folk.

According to Andrew, comScore will be publishing a more complete document describing their research methodology in the next few weeks. Until then, they’re giving me the scoop so here you have it, direct from comScore (my questions in bold type, comScore’s answers follow in normal type):

(Andrew provided this preamble to his answers …)

The reason we have done this study for two cookies is to ensure that we are very familiar with the cookie structure, the different value pairs (e.g. GUID=1234) and their purpose. We are in particular interested in ID value pair that identifies a user over time, and does not change when the cookie gets refreshed.

How did they identify the unique values of the cookies? Using the Set Cookie Response header, the Cookie Request header, or the actual storage (cookie) file?

We are reading the cookie request call and parsing out specific ID value pairs. Over time we will observe a time series for each panelist for the value of this identifier corresponding to each cookie request. Cookie reset events are based on qualified value changes for a targeted ID value-pair.

How did they take into account non-persistence and/or cookie expiration settings?

The cookie domain value-pairs were chosen to represent passively assigned unique identifiers designed to be persistent over time. Cookies of this nature should only expire in the event that the visitor never returned within a relatively long expiration window.

How did they identify First vs. Third party cookies?

We are reading specific value-pairs for specific domain cookies. The first party cookie is the cookie used by the Portal site. The third party cookie was used by the ad serving company. All information is directly observed from metered panel activity. Recall information was not a source of determining preservation rates.

How was cookie blocking treated or accounted for?

The analysis is based on a sample for which at least one cookie value was observed.

What were the domains they examined? If not the domains, what was the nature of the first-party site?

We will not disclose the names of the sites used for the analysis. First party site is a major internet portal. The third-party site is a major ad server.

What were the survey questions asked? How many people were asked and how were they selected?

All deletion/retention figures were derived from direct panel observations, not from a recall-based survey. Only qualitative information came from the survey.

Obviously some of the answers provided are lacking but I’m willing to admit that it may be more a function of my incomplete knowledge of what the comScore panel application is able to capture.

One particularly good question from a reader essentially asked if P3P-instigated cookie blocking could be artificially running up cookie reset counts (essentially counting each page request as a new cookie) to which comScore answered that the study only included panel members for which “at least one cookie value was observed.” (plus, P3P is less likely to impact the first-party cookies that I’m more interested in …)

The encouraging news is that comScore is now officially on the record as willing to produce additional documentation about the study within the next week or so. I conveyed to Andrew some of the skepticism about the results they report, skepticism I told them they would hear, and pointed him to the ongoing conversation so hopefully the community’s concerns will be directly addressed in their methodology document.

Suffice to say, if some major flaw appears in their research, the company will have major egg on their face as they approach their announced IPO. Conversely, if the research proves sound under examination, regardless of whether you’re a data purist looking for “perfection” or willing to manage based on trends however flawed the underlying data might actually be, we all have something to consider the next time someone asks us “how many visitors come to your web site?”

Perhaps the only true and precise answer is, “It depends!”

What do you think about the answers that comScore provided? As always, your comments are greatly appreciated!

comScore study sheds new light on risks to cookie-based measurement

Awhile back the folks at comScore called me and asked if I would be surprised to learn that cookies were being deleted at a pretty high rate. Of course I said, “No, because I reported as much in 2005.” Through the course of the conversation, however, it became clear that comScore had the ability to shed new light on our understanding of cookie-based measurement; specifically they had the ability to measure the rate of deletion associated with first-party cookies.

comScore published the results of that study today.

I will fight the temptation to smugly say, “Ah ha! I told you so …” since the comScore data shows that I was both right and wrong when I first wrote about cookie deletion when I was with JupiterResearch. I was right in my assessment that this is happening far more frequently than those of us in the web analytics field particularly want to believe. But I was wrong in my assumption that cookie deletion was largely limited to third-party cookies.

The comScore data reports that over 30 percent of their panel of 400,000 home user computers deleted both first- and third-party cookies. Now, when I talked to Andrew Lipsman and Gian Fulgoni from comScore I repeatedly encouraged them to check and double-check these findings since especially their number for first-party cookies is much, much higher than I think any of us expected to see.

That said, I have no reason to believe that comScore would make this claim frivolously (okay, except for the fact that they provide a competing methodology to cookies) … I have asked comScore for a deeper briefing on their research but nothing has been scheduled as of this posting. Perhaps on my urging comScore took their research a step further and surveyed a subset of their panel asking about their stated behavior towards cookies. In the press release, Dr. Magrid Abraham addresses this in the context of the conventional wisdom that assigns greater risk to third- than first-party cookies:

“There is a common perception that third-party cookie deletion rates should be significantly higher than first-party cookie deletion rates,” continued Dr. Abraham. “Because many PC users reset or delete their cookies using security protection programs, conventional wisdom dictates that people are more likely to selectively expunge third-party cookies – which are generally deemed more invasive – while maintaining their first-party cookies. But these findings suggest that selective cookie management is not prevalent, a fact that comScore confirmed via a survey, with only 4 percent of Internet users indicating that they delete third-party but not first-party cookies.”

Yikes. When you look at the tables in the comScore study you can see where the problem is coming from: serial cookie deleters, the 7% of site visitors (measured via the comScore panel) that are repeatedly removing their cookies and thusly will appear as a new site visitor with every visit. I addressed the idea of serial deleters in my final JupiterResearch report on “The Crumbling Cookie” and, at the time speculated that some of the more nefarious activities available through the Internet were to blame.

Still, I never would have put the number as high as 7 percent.

It’s interesting to me that cookies are back in the news. It will be more interesting to see how all of this is digested in the coming days, weeks, and months. I wonder if Seth Godin will comment on the comScore study? I mean, I’m not sure that the “echo chamber” argument applies to comScore’s panel of 400,000 measured, identified individuals.

This seems to be a topic ripe for commentary and conversation. What do you think? Is comScore crazy? Is this report flawed? Or are we just fooling ourselves when we believe that “unique visitor” counts are an accurate representation of the number of real human beings coming to our web sites over long periods of time?