Statistics question

Statistics question

Author
Discussion

Dr Jekyll

Original Poster:

23,820 posts

268 months

Saturday 18th January 2020
quotequote all
I'm looking at a book on statistics for beginners.

It refers to decision trees without actually explaining them but that's OK, I know what they are. Then it goes on to talk about 'Random Forests'. Not something I've come across but from context I can figure out roughly what they are. But after a quick introduction saying you have to decide whether use a decision tree or a random forest there is this paragraph.

Book said:
If there are M input variable amounts then m<M is going to be specified from the beginning, and it will be held as a constant. The reason that this is so important is that it means that each tree that you have is randomly picked from their own variable using M.
What on earth does this mean?

Chester35

505 posts

62 months

Sunday 19th January 2020
quotequote all

It means that is not a statistics book for beginners.

I statistic book for beginners should allow you to see the wood rather than the trees.

smile

At least in 90% of the time of course.


V8LM

5,265 posts

216 months

Sunday 19th January 2020
quotequote all
Also surprised that machine learning is covered in a beginner's book on statistics.

Flooble

5,571 posts

107 months

WatchfulEye

505 posts

135 months

Tuesday 21st January 2020
quotequote all
It means that in a random forest, each tree uses only some of information available.

So,if you have 5 (M) variables into a classification task (e.g. Leaf length, leaf width, stem length, stem diameter, petal number) then each individual tree is developed using fewer than 5 (m).

For example if m is 3: tree 1 (leaf length, stem length, petal count) ; tree 2 (lesf width, stem length, stem diameter) ; tree 3 (stem length, stem diameter, petal count) m; tree 4(leaf length, leaf width, stem diameter)