Social Followed Recomendations.

ran88dom99

ran88dom99 8 years, 7 months ago at Nov 7 0:39 -

How do I find people with similar tastes again?

I am still working on a program that will do the same for critics. TLDR; 'Focus' is much easier and more reliable if not as useful as actually comparing ratings. And I need statistics help.

I made a spreadsheet with a bunch of ratings of games(26k) by publications(330). When sheet picks publications by the percent of their reviews I also scored(how much pub 'Focuses' on my games), the outputs show a strong genera bias not explainable by chance. Example; If user inputs a bunch of strategy games then the one publication that only ever reviews strategy games and no other genera gets selected first. I input my favorite 4x games and it outputs that strategy and RPG games a 4% better score than other genera and 'Most Misrated' favors indie, rpg, strategy and dislikes FPS. While there is plenty of silliness in video game journalism, when I try correlation and other comparisons to the numbers of my scores, the outputs tend to be much less certain.Here is an old video.

First, an algorithm selects publications for agreement with my opinions using either average absolute difference (my score minus publication, remove - if there is one, average all these Ex: Scribblenauts I rate 90 and publication rates 80, while Darwinia I rate 60 but publication rates 80 for 'AAD' of 15 points) or correlation And at least X number of games in common, 'Focus' as above, or by penalizing the final score by the average deviation of each missed game. Those last three parts are there to try and eliminate publications that got a great correlation or average deviation by accident. If I tweak the weight of each input of the algorithm enough, intermediary metrics (bunch of comparisons between average of all publications 'MC', my personal scores and the average of selected publications 'PMC') look amazing but the actual final output is often pretty bad. I know how to calculate certainty of a correlation being significant against 0 but how do I calculate the certainty of one correlation being greater than another if one data-set is used in both but data-points in common vary? Same question goes for the AAD.

The main test of a set of publications is the games they most think deserve a different score from the average consensus of all publications. I call this 'Most Misrated'. Mathematically, its just the number of selected publications rating times the difference between average of all 'MC' and average of selected publications 'PMC'.
EX:I picked a bunch of publications I liked and wanted to see which games they thought were badly judged by all the rest of the publications. Lets say I just get the absolute difference, like Gnomoria is rated 70% on average by all publications and 95% by the ones I selected ending in a big (it is big) 25% difference. Compare that to Silent Hill HD rated 80 by the average of all publications and 90 by the ones I selected. However only one of my selected publications rated Gnomoria while 5 rated Silent Hill. That matters so I multiply by number of publications rating to get 25 'Most Misrated' for Gnomoria and 50 for Silent Hill HD.
I know that is just unraveling the averaging and adding up all the PMC to MC differences while keeping -, but I can not think of a better way. 'Most Misrated' could be happening by chance and I want to gauge how much a selection of publications agree with each other based on how high their top 10 'most Misrated' scores are. That was a long probability theory question. Chance game rated by x pubs times chance they rated all in favor or against, times average deviation I guess? Also, would 'Most Misrated' be a better indicator of pub's opinions if I divided it by / removed the influence of the Average Deviation or the number of all publications rating the game?

The lower the score the greater the average deviation per game (correlation .5) and the lower the average of a publication the higher it's average deviation (correlation .4). I want to increase the lower scores until those correlations and average deviation differences go away. They mess with 'Most Misrated' among other things. I wanr a score of 10 to become 30 and a score of 60 become 65 or something like ithat. I made a CSV file showing number of ratings, Median-Average and average deviation of all games separated by their averages in 5 point interval. Bar Graph.
Blue is M-A, red is AveDev yellow # of reviews.

Final question! If I input a bunch of lower than the average for the game scores, AAD selects publications that on average rate lower than the average for each game. Feels like it affects everything and such a publication rating equal to one that does not should actually count for much more. After fixing (is this called normalizing? standardizing?) the problem in the previous paragraph, how should I make each publication's differences from each average average out to about 0? Ex: MC rates 90 pub rates 85, MC rates 80 pub rates 71 for AD of -7.

One last question! I bet, and 'Most Misrated' often implies, that publications prefer and give better ratings to certain genera not yet commonly identified like 'corridor shooter' and 'indie introspective'. So I want to take my big dataset and run correlations between each game to make a correlations table and then try to find sets of games that all have good positive correlations with each other. I don't know how to do that last part. I don't even know what branch of statistics deals with this stuff. Please help.

Stehako

Moderator

Stehako 8 years, 7 months ago at Nov 7 15:53 -

You might want to send that directly to the site owner, Tom. His username is Tom (Look at the post above your's, about this forum "Sticky").