It seems to me that Green threads exist primarily for convenience.
They are definitely not more performant than native threads with a work-stealing approach (producer - consumer pattern) that is tuned to the number of cores available. How could they be more performant? Even the simplest switching adds some cost, from a performance perspective it would never make sense to run thousands of threads on 8 cores, whether they are scheduled by the OS or by the language implementation.
So to answer your 3rd question: short lived I/O tasks.
They are definitely not more performant than native threads with a work-stealing approach (producer - consumer pattern) that is tuned to the number of cores available. How could they be more performant? Even the simplest switching adds some cost, from a performance perspective it would never make sense to run thousands of threads on 8 cores, whether they are scheduled by the OS or by the language implementation.
So to answer your 3rd question: short lived I/O tasks.