Anything with user-submitted ratings can be sorted by those ratings. But what is the best way? The naive approach is to sort by average rating. Unfortunately, this would rate 5-stars from a single user as higher than 4.5-stars from 150 users.
This guy has an answer. In short, his recommendation is to take the lower bound of the 95% confidence interval, given by:
[pmath] {hat{ p }+{ z^2_{alpha slash 2}} / { 2 n } pm z_{alpha slash 2} sqrt{ { [ hat{p} (1 - hat{p}) + z^2_{alpha slash 2}/{4n}]}/n}} / {1+z^2_{ alpha slash 2 }/n} [/pmath]
where
- [pmath]hat{p}[/pmath] is the observed fraction of positive ratings
- [pmath]z_{alpha slash 2}[/pmath] is the [pmath]1-alpha slash 2[/pmath] quantile of the standard normal distribution
- [pmath]n[/pmath] is the total number of ratings
Or, in Ruby:
require 'statistics2'
def ci_lower_bound(pos, n, power)
if n == 0
return 0
end
z = Statistics2.pnormaldist(1-power/2)
phat = 1.0*pos/n
(phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end
where
- pos is the number of positive rating
- n is the total number of ratings
- power refers to the statistical power (0.05 recommended)
Aidan Findlater Impersonal ruby, statistics, web development
Summary: Attached is a pure Ruby implementation of the AS66 algorithm (Hill 1973), ported from the Fortran code available here. It estimates the integral of the normal distribution, defaulting to the area under the right tail.
Read more…
Aidan Findlater Impersonal bioinformatics, research, ruby, statistics
This macro calculates the cumulative hypergeometric distribution for the given values, using Excel’s built-in HypGeomDist function:
Public Function CumHypGeom(sample_s As Integer, number_sample As Integer,
population_s As Integer, number_pop As Integer)
' Returns the cumulative hypergeometric distribution (i.e. p-value)
Dim RetVal As Double
RetVal = 0
For i = sample_s To number_sample
RetVal = RetVal + WorksheetFunction.HypGeomDist(i, number_sample, population_s, number_pop)
Next
CumHypGeom = RetVal
End Function
To use it, go: Tools > Macro > Visual Basic Editor then Insert > Module and copy the above code into the box. Save, return to Excel (using the “View Microsoft Excel” button) and then you can call it like any other function.
The function’s parameters are the same as the built-in HypGeomDist function. For a population of red and blue balls, where red balls are considered a success, they are:
- sample_s: number of successes (i.e. red balls) in the sample
- number_sample: sample size (i.e. number of balls drawn from the total population)
- population_s: number of successes (i.e. red balls) in the population
- number_pop: population size (i.e. total number of both red and blue balls)
Edit: I’d just like to state for the record that this is probably the least efficient way to do the calculation, but it was the easiest to program. In the end, I sacrificed CPU cycles to conserve brain cycles.
Aidan Findlater Impersonal bioinformatics, excel, statistics