Why Do Groupon Campaigns Damage Yelp Ratings?
One of the many benefits of visiting Microsoft Research this semester is that I get to attend some interesting talks by computer scientists working with social and economic data. One in particular this week turned out to be extremely topical. The paper was on "The Groupon Effect on Yelp Ratings" and it was presented by Giorgios Servas Zervas.
The starting point of the analysis was this: the Yelp ratings of businesses who launch Groupon campaigns suffer a sharp and immediate decline which recovers only gradually over time, with peristent effects lasting for well over a year. The following chart sums it up:
The trend line is a 30 day moving average, but re-initialized on the launch date (so the first few points after this date average just a few observations). There is a second sharp decline after about 180 days, as the coupons are about to expire. The chart also shows the volume of ratings, which surges after the launch date. Part of the surge is driven by raters who explicitly reference Groupon (the darker volume bars). But not all Groupon users identify as such in their reviews, and about half the increase in ratings volume comes from ratings that do not reference Groupon.
As is typical of computer scientists working with social data, the number of total observations is enormous. Almost 17,000 daily deals from over 5,000 businesses in 20 cities over a six month period are included, along with review histories for these businesses both during and prior to the observational window. In addition, the entire review histories of those who rated any of these businesses during the observational window were collected, expanding the set of reviews to over 7 million, and covering almost a million distinct businesses in all.
So what accounts for the damage inflicted on Yelp ratings by Groupon campaigns? The authors explore several hypotheses. Groupon users could be especially harsh reviewers regardless of whether or not they are rating a Groupon business. Businesses may be overwhelmed by the rise in demand, resulting in a decline in quality for all customers. The service provided to Groupon users may be worse than that provided to customers paying full price. Customer preferences may be poorly matched to businesses they frequent using Groupons. Or the ratings prior to the campaign may be artificially inflated by fake positive reviews, which get swamped by more authentic reviews after the campaign. All of these seem plausible and consistent with anecdotal evidence.
One hypothesis that is rejected quite decisively by the data is that Groupon users tend to be harsh reviewers in general. To address this, the authors looked at the review histories of those who identified Groupon use for the businesses in the observational window. Most of these prior reviews do not involve Groupon use, which allows for a direct test of the hypothesis that these raters were harsh in general. It turns out that they were not. Groupon users tend to write detailed and informative reviews that are more likely to be considered valuable, cool and funny by their peers. But they do not rate businesses without Groupon campaigns more harshly than other reviewers.
What about the hypothesis of businesses being overwhelmed by the rise in demand? Since only about half the surge in reviews comes from those who explicitly reference Groupon, the remaining ratings pool together non-Groupon customers with those who don't reveal Groupon use. This makes a decline in ratings by the latter group hard to interpret. John Langford (who was in the audience) noted that if the entire surge in reviews could be attributed to Groupon users, and if undeclared and declared users had the same ratings on average, then one could infer the effect of the campaign on the ratings of regular customers. This seems worth pursuing.
Anecdotal evidence on discriminatory treatment of customers paying discounted prices is plentiful (the authors mention the notorious FTD flowers case for instance). If mistreatment of coupon-carrying customers by a few bad apples were bringing down the ratings average, then a campaign should result in a more negatively skewed distribution of ratings relative to the pre-launch baseline. The authors look for this shift in skewness and find some evidence for it, but the effect is not large enough to account for the entire drop in the average rating.
To test the hypothesis that ratings prior to a campaign are artificially inflated by fake or purchased reviews, the authors look at the rate at which reviews by self-identified Groupon users are filtered, compared with the corresponding rate for reviews that make no mention of Groupon. (Yelp allows filtered reviews to be seen, though they are harder to access and are not used in the computation of ratings). Reviews referencing Groupon are filtered much less often, suggesting that they are more likely to be authentic. If Yelp's filtering algorithm is lenient enough to let a number of fake reviews through, then the post-campaign ratings will be not just more numerous but also more authentic and less glowing.
Finally, consider the possibility of a mismatch between the preferences of Groupon users and the businesses whose offers they accept. To look for evidence of this, the authors consider the extent to which reviews associated with Groupon use reveal experimentation on the part of the consumer. This is done by comparing the business category and location to the categories and locations in the reviewer's history. Experimentation is said to occur when the business category or zipcode differs from any in the reviewer's history. The data provide strong support for the hypothesis that individuals are much more likely to be experimenting in this sense when using a Groupon than when not. And such experimentation could plausibly lead to a greater incidence of disappointment.
This point deserves further elaboration. Even without experimentation on categories or locations, an individual who accepts a daily deal has a lower anticipated valuation for the product or service than someone who pays full price. Even if the expectations of both types of buyers are met, and each feels that they have gotten what they paid for, there will be differences in the ratings they assign. To take an extreme case, if the product were available for free, many buyers would emerge who consider the product essentially worthless, and would rate it accordingly even if their expectations are met.
There may be a lesson here for companies contemplating Groupon campaigns. Perhaps the Yelp rating would suffer less damage if the discount itself were not as steep. At present there is very little variation in discounts, which are mostly clustered around 50%. So there's no way to check whether smaller discounts actually result in better ratings relative to larger discounts. But it certainly seems worth exploring, at least for businesses that depend on strong ratings to thrive.
The Groupon strategy of prioritizing growth above earnings had been criticized on the grounds that there are few barriers to entry in this industry, and no network externalities that can protect an incumbent from competition. But if the link between campaigns and ratings can't be broken, there may be deeper problems with the business model than a change of leadership or strategy can solve.