Like Karl Broman said in his answer, a Bayesian approach would likely be a lot better than using confidence intervals.
The Problem With Confidence Intervals
Why might using confidence intervals not work too well? One reason is that if you don't have many ratings for an item, then your confidence interval is going to be very wide, so the lower bound of the confidence interval will be small. Thus, items without many ratings will end up at the bottom of your list.
Intuitively, however, you probably want items without many ratings to be near the average item, so you want to wiggle your estimated rating of the item toward the mean rating over all items (i.e., you want to push your estimated rating toward a prior). This is exactly what a Bayesian approach does.
Bayesian Approach I: Normal Distribution over Ratings
One way of moving the estimated rating toward a prior is, as in Karl's answer, to use an estimate of the form w∗R+(1−w)∗C:
- R is the mean over the ratings for the items.
- C is the mean over all items (or whatever prior you want to shrink your rating to).
- Note that the formula is just a weighted combination of R and C.
- w=vv+m is the weight assigned to R, where v is the number of reviews for the beer and m is some kind of constant "threshold" parameter.
- Note that when v is very large, i.e., when we have a lot of ratings for the current item, then w is very close to 1, so our estimated rating is very close to R and we pay little attention to the prior C. When v is small, however, w is very close to 0, so the estimated rating places a lot of weight on the prior C.
This estimate can, in fact, be given a Bayesian interpretation as the posterior estimate of the item's mean rating when individual ratings comes from a normal distribution centered around that mean.
However, assuming that ratings come from a normal distribution has two problems:
- A normal distribution is continuous, but ratings are discrete.
- Ratings for an item don't necessarily follow a unimodal Gaussian shape. For example, maybe your item is very polarizing, so people tend to either give it a very high rating or give it a very low rating.
Bayesian Approach II: Multinomial Distribution over Ratings
So instead of assuming a normal distribution for ratings, let's assume a multinomial distribution. That is, given some specific item, there's a probability p1 that a random user will give it 1 star, a probability p2 that a random user will give it 2 stars, and so on.
Of course, we have no idea what these probabilities are. As we get more and more ratings for this item, we can guess that p1 is close to n1n, where n1 is the number of users who gave it 1 star and n is the total number of users who rated the item, but when we first start out, we have nothing. So we place a Dirichlet prior Dir(α1,…,αk) on these probabilities.
What is this Dirichlet prior? We can think of each αi parameter as being a "virtual count" of the number of times some virtual person gave the item i stars. For example, if α1=2, α2=1, and all the other αi are equal to 0, then we can think of this as saying that two virtual people gave the item 1 star and one virtual person gave the item 2 stars. So before we even get any actual users, we can use this virtual distribution to provide an estimate of the item's rating.
[One way of choosing the αi parameters would be to set αi equal to the overall proportion of votes of i stars. (Note that the αi parameters aren't necessarily integers.)]
Then, once actual ratings come in, simply add their counts to the virtual counts of your Dirichlet prior. Whenever you want to estimate the rating of your item, simply take the mean over all of the item's ratings (both its virtual ratings and its actual ratings).