Statistics question

Statistics question

Author
Discussion

Dr Jekyll

Original Poster:

23,820 posts

267 months

Saturday 18th January 2020
quotequote all
I'm looking at a book on statistics for beginners.

It refers to decision trees without actually explaining them but that's OK, I know what they are. Then it goes on to talk about 'Random Forests'. Not something I've come across but from context I can figure out roughly what they are. But after a quick introduction saying you have to decide whether use a decision tree or a random forest there is this paragraph.

Book said:
If there are M input variable amounts then m<M is going to be specified from the beginning, and it will be held as a constant. The reason that this is so important is that it means that each tree that you have is randomly picked from their own variable using M.
What on earth does this mean?

Chester35

505 posts

61 months

Sunday 19th January 2020
quotequote all

It means that is not a statistics book for beginners.

I statistic book for beginners should allow you to see the wood rather than the trees.

smile

At least in 90% of the time of course.


V8LM

5,237 posts

215 months

Sunday 19th January 2020
quotequote all
Also surprised that machine learning is covered in a beginner's book on statistics.

Flooble

5,567 posts

106 months

WatchfulEye

505 posts

134 months

Tuesday 21st January 2020
quotequote all
It means that in a random forest, each tree uses only some of information available.

So,if you have 5 (M) variables into a classification task (e.g. Leaf length, leaf width, stem length, stem diameter, petal number) then each individual tree is developed using fewer than 5 (m).

For example if m is 3: tree 1 (leaf length, stem length, petal count) ; tree 2 (lesf width, stem length, stem diameter) ; tree 3 (stem length, stem diameter, petal count) m; tree 4(leaf length, leaf width, stem diameter)