I've actually done that same concept a while back when whisper.cpp came out. A significant challenge is sane paragraph segmentation, as even humans don't often agree on the best place for a line break. I wonder what approach you've used.
I've adopted a very simple approach: 80 words per "paragraph". I am now experimenting with computing the embeddings of each sentence and try to detect topic segments. But the simple approach yields pleasant segments AFAIK.