Hi all As I hate to lose as most of us here I started to think that the armor stats were quite not as good for me as for the AI. Seeing repeated crafted armor failures when my opponent keeps succeeding at protecting himself with Plates is something I don't like to see. So I started to compile a list of all the armor é+, 3+ and 4+) tests in my games. I now have almost a thousand results. Just for fun. And because I hate to lose by a cheating computer So here are the results. Armors 2+ (normal succes rate is 5 out of 6 / 83.33%) Me: 77% Him: 92% Armors 3+ (normal success rate is 4 out of 6 / 66,67%) Me: 60% Him: 74% Armors 4+ (normal success rate is 50%) Me: 37% (ouch) Him: 64% (ouch again) If I calculate the stats including all the result modifying cards such as Bad Luck or bonuses it's even worse (meaning my bonuses fail to work more than his and my maluses trigger more often than his). Anyway....let's go back to the game
I strongly doubt that the die rolls are rigged in favour of the AI; mostly because I know that it's more work for the developers to make an unfair system dice than it is to make a fair one, and I don't think they'd want to spend time and effort just to make the game lie about important probabilities. On the other hand, a thousand trials should be provide decent evidence that it isn't rigged. How did you gather your results? In particular, did you record the result for every time an armour card triggered for every game you played during the testing period - without exception? Can you provide the raw data that you used the results you've given? Depending on how many trials you have for each armour type, the 2+ and 3+ results look like they might be within the range of normal statistical error; but the 4+ result is pretty far out, and the fact that all three results seem to favour the AI is suspicious. (If you have around 100 trials on the 4+, the standard deviation is around 5%, so I'd expect the results to be between 45% and 55% most of the time, and between 40% and 60% almost all the time...)
Hello karadoc I simply used pen and paper with columns I recorded every single succes or failure. I have about 300 results for 2+ and 3+, about 400 for 4+. Beeing a bit familiar with the videogames world (and strategic games) I must say that I wouldn't be surprised if there was a tweak of the stats to help the AI (which is only that, an AI, and therefore sometimes needs help to simulate higher levels and difficulties).
I have to say I'm really surprised by these results. The battle logic has no idea which players are human and which are AI.
Well, for this particular game I would be surprised; because the probabilities are made so explicit, and they play a critical role in the player's decision making. I think it would be appalling if they were tilted in favour of the AI without any hint or warning given to the player. UIs that blatantly lie to the player are bad, and rigged dice rolls for armour would certainly fall into that category. [edit]believe it or not, I wrote I message before Farbs posted - and but I didn't press send for ages. Based on what Farbs said, the results are probably a statistical fluke - although I'd estimate the probably of results like that would be less than 1/1000 if there was nothing wrong with the data collection. (I haven't calculated the actual probability of it being a fluke - I'm estimating based on what I think the standard deviation would be and so on.)
Well really it wouldn't be the first time (we all have played Civilization and seen strange results at high levels, right ? ) Anyway, the best way to confirm that is to build your own spreadsheet and see how it goes. I had good streaks and bad ones, but overall AI wins.
I'd had quite a lot of discussions with people about randomness in Civ4 (because I've got a lot of modding experience in that game.) It's somewhat natural to suspect that its rigged, because of attentional bias and stuff like that, but it actually isn't. Normally I'd just assume that's what was going on here - but if you really have that data, then attentional bias apparently is not the explanation!
OK, so I ran some quick statistical tests. I used his estimates for sample size (300 for 2+ and 3+ and 400 for 4+) and used that to estimate number of "successes" for each situation. For The Claw For 2+, z=-2.94, p=.001. For 3+, z=-2.45, p=.007 For 4+, z=-5.2, p=9.98E-8 (holy crap) For the AI (I forgot to write down the P values for two of these) For 2+, z=4.03 For 3+, z=2.69 For 4+, z=5.6 (1.07E-8) Obviously these aren't conclusive, because I'm assuming a sample size that might not be accurate, but they certainly seem to indicate that something hinky is going on. If you're unfamiliar, the z is the number of standard deviations away from expected the result is (big=strange), and p is the probability that the result can be explained by random chance (small=strange). For most purposes, a p less than .01 is sufficient to indicate a statistically significant effect. I'm glossing over a lot of details because we're not trying to cure cancer here. I should note here that I don't mean to conclude there's bias in the die rolls, just that something is strange.
I did this for holy armor a while back because i was noticing a very poor performance for a long time and i mean like 8-11 fails and one success in a battle. So i did the numbers at a sample base of 600 i was 26% off but when i was up to 1200 it was 4.8% sample size really matters a lot when it comes to rng. That said obviously the thread is important and the conversation is valid i just wanted to drop that information.
Just FYI, your sample size is pretty small. Using "almost a thousand" is a bit misleading. You have 6 different sample populations. You & AI for 3 different cases. at 1000, thats just 160 samples per pop.
Not really necessary to look at it that way, Jotun, as we're mostly interested in the difference between player results and AI results, regardless of armor quality. You could simply start counting pips for player rolls and AI rolls - if there's a significant discrepancy it doesn't matter what armor the units are using, and if our numbers are like The Claw's numbers and there are way more AI pips the dice are indeed rigged. In a hour or two of recording pips we should have a significant result if his numbers are right... I suspect they are, and there's some kind of a thumb on the scale with some or all armor rolls.
Ah, I think I misinterpreted the sample size. The Claw, when you said that you have about 300 results for 2+ (for example) did you mean 300 each for you and the AI, or roughly 150 each? If it's the second case I'll need to recalculate, and it will affect the findings a bit. Even so, I think especially the 4+ result is really notable.