Stats help - Variance
Discussion
If I have a 2 sets of statistics from 2 identical tests on different populations (but same number of sample of each) how do I get a combined variance for the total data set?
eg if I had the variance in height of a group of 100 males, and a group of 100 females, can I calculate the variance in the group of 200?
This link seems to suggest I can just add the variances, but can I really do that if the two populations have different means? http://onlinestatbook.com/2/summarizing_distributi...
The other problem I have with that approach is if instead of joining the male and female groups I accidentally picked the make group twice, I'd have 2 measurement for each individual, therefore my gut feeling is the variance wouldn't change, why on earth would it double?
I am however aware that statistics rarely follow the "gut feeling" logic and I'm now getting even more confused. Any suggestions?
eg if I had the variance in height of a group of 100 males, and a group of 100 females, can I calculate the variance in the group of 200?
This link seems to suggest I can just add the variances, but can I really do that if the two populations have different means? http://onlinestatbook.com/2/summarizing_distributi...
The other problem I have with that approach is if instead of joining the male and female groups I accidentally picked the make group twice, I'd have 2 measurement for each individual, therefore my gut feeling is the variance wouldn't change, why on earth would it double?
I am however aware that statistics rarely follow the "gut feeling" logic and I'm now getting even more confused. Any suggestions?
RizzoTheRat said:
If I have a 2 sets of statistics from 2 identical tests on different populations (but same number of sample of each) how do I get a combined variance for the total data set?
eg if I had the variance in height of a group of 100 males, and a group of 100 females, can I calculate the variance in the group of 200?
This link seems to suggest I can just add the variances, but can I really do that if the two populations have different means? http://onlinestatbook.com/2/summarizing_distributi...
The other problem I have with that approach is if instead of joining the male and female groups I accidentally picked the make group twice, I'd have 2 measurement for each individual, therefore my gut feeling is the variance wouldn't change, why on earth would it double?
I am however aware that statistics rarely follow the "gut feeling" logic and I'm now getting even more confused. Any suggestions?
Define your dependent and independent variables.eg if I had the variance in height of a group of 100 males, and a group of 100 females, can I calculate the variance in the group of 200?
This link seems to suggest I can just add the variances, but can I really do that if the two populations have different means? http://onlinestatbook.com/2/summarizing_distributi...
The other problem I have with that approach is if instead of joining the male and female groups I accidentally picked the make group twice, I'd have 2 measurement for each individual, therefore my gut feeling is the variance wouldn't change, why on earth would it double?
I am however aware that statistics rarely follow the "gut feeling" logic and I'm now getting even more confused. Any suggestions?
Which are height and gender.
Run an independent/student's t test or just use a calculator if your knowledge of statistics software isn't up to scratch. Might take some time though.
Add a variable, say children's height, and you'd use ANOVA or MANOVA, assuming parametricity.
If you want the means for the two combined, then just do that. Why would you accidentally pick the male group twice? I'd suggest if the numbers are that close you might want to check for normal distribution (parametricity) of data.
Trouble is I don't have the individual data points. Tried to simplify by the heights analogy but maybe that's not helpful.
Actual problem is looking at a computer model with a lot of inputs and a lot of outputs. I have a program someone has written to vary inputs by Morris and by Sobol methods to allow us to look at which inputs have the greatest effect on the outputs. The only problem is that it's limited in the number of runs it can perform, with 37 inputs it can do 420 runs whcih seems to be enough to get a stable result using Morris (two runs show the same rank order for the variance of each input even if the numbers aren't identical), but Sobol seems to need way more. It's spitting out a variance for each input against the individual outputs, and I'm wondering if I can legitimately combine the variance from 2 sets of runs to effectively have a data set twice the size.
Actual problem is looking at a computer model with a lot of inputs and a lot of outputs. I have a program someone has written to vary inputs by Morris and by Sobol methods to allow us to look at which inputs have the greatest effect on the outputs. The only problem is that it's limited in the number of runs it can perform, with 37 inputs it can do 420 runs whcih seems to be enough to get a stable result using Morris (two runs show the same rank order for the variance of each input even if the numbers aren't identical), but Sobol seems to need way more. It's spitting out a variance for each input against the individual outputs, and I'm wondering if I can legitimately combine the variance from 2 sets of runs to effectively have a data set twice the size.
Gassing Station | Science! | Top of Page | What's New | My Stuff