Optillm works with llama.cpp but this approach is implemented as a decoding strategy in PyTorch so at the moment you will need to use the local inference server in optillm to use it.
Not yet, but I'm curious if that would be a good improvement. Would it really benefit you to gray out one subreddit, and not the rest? Maybe it'd be better to add exceptions instead?
We have tried to resolve exactly this problem of "journey skipping". Could you please consider taking a look at our attempt? (it's free), and I'm curious regarding your feedback: https://littlestory.io/en