Anyone good at stats?
Discussion
Not sure if this the right place to start a thread, but I had a stats question which is beyond my level of competence, and I wondered if anyone could point me in the right direction please?
Let's say I told you the probability of an event follows a normal distribution, and I tell you the range of results which equates to 1 standard deviation i.e. the range of numbers where the actual results will fall within ~68% of the time.
How many actual instances would you need to see to be 95% certain I had told you the correct range which defined 1sd? Is there a formula or a website I can use which helps calculate this?
Let's say I told you the probability of an event follows a normal distribution, and I tell you the range of results which equates to 1 standard deviation i.e. the range of numbers where the actual results will fall within ~68% of the time.
How many actual instances would you need to see to be 95% certain I had told you the correct range which defined 1sd? Is there a formula or a website I can use which helps calculate this?
paul.deitch said:
May be this will help you.
https://qlutch.com/marketing-tips/determine-statis...
Thanks, but I don't think that does apply, as the probability of those events are not normally distributed (although I do agree it is also useful to know about the sample size needed in those scenarios to understand the reliability of a survey).https://qlutch.com/marketing-tips/determine-statis...
You're looking for the Normality test to prove if you theory that x results show that 68% of your data fit at 95% confidence a Bell distribution.
Rule of thumb 68% ~1/3rd, say if something happens (or doesn't) every day of the week, in one week you'd need four individual days data points to fit your assumed distribution.
Assuming you have assumed that the data is a Bell distribution and that in assuming that you were 100% correct in your assumption.
Read some papers on the Vysochanskij–Petunin inequality for more detailed methods than the above "rule of thumb".
Rule of thumb 68% ~1/3rd, say if something happens (or doesn't) every day of the week, in one week you'd need four individual days data points to fit your assumed distribution.
Assuming you have assumed that the data is a Bell distribution and that in assuming that you were 100% correct in your assumption.
Read some papers on the Vysochanskij–Petunin inequality for more detailed methods than the above "rule of thumb".
This is a pretty fun problem, and not one I’ve a great answer for. But
You want to prove, to a 95% confidence that you know the SD of a Gaussian distribution
This seems very analogous to AQL, assumed quality levels used for statistical sampling of parameters in a manufacturing process, I’d look at that maths as starting point.
There are tables of data that tell you how many pass/fail samples you need to test to get certain confidence that your parts meet a certain criteria- the problem seems very similar to what you define.
Shout if you can’t get a point of entry to AQL and I’ll dig more for you as this is actually an interesting problem to a non-statistician
You want to prove, to a 95% confidence that you know the SD of a Gaussian distribution
This seems very analogous to AQL, assumed quality levels used for statistical sampling of parameters in a manufacturing process, I’d look at that maths as starting point.
There are tables of data that tell you how many pass/fail samples you need to test to get certain confidence that your parts meet a certain criteria- the problem seems very similar to what you define.
Shout if you can’t get a point of entry to AQL and I’ll dig more for you as this is actually an interesting problem to a non-statistician
Thanks for the suggestions and ideas, they were helpful in getting me to a usable answer.
I did build a small spreadsheet where I could give it a set of data, from which I could then create datasets of increasingly large sizes selected at random and then measured the variation over several hundred iterations.
And then I found this calculator - you tell it the number of data points, the mean, the standard deviation, and the confidence internal you want to work with, and it tells you the resulting range of the mean. So to my original question, if the mean moves outside the upper or lower bound, you know the result is outside the 95% confidence internal for the curve.
https://www.omnicalculator.com/statistics/confiden...
I did build a small spreadsheet where I could give it a set of data, from which I could then create datasets of increasingly large sizes selected at random and then measured the variation over several hundred iterations.
And then I found this calculator - you tell it the number of data points, the mean, the standard deviation, and the confidence internal you want to work with, and it tells you the resulting range of the mean. So to my original question, if the mean moves outside the upper or lower bound, you know the result is outside the 95% confidence internal for the curve.
https://www.omnicalculator.com/statistics/confiden...
Gassing Station | Science! | Top of Page | What's New | My Stuff