Yes, we can stream with ollama, and in general stream with any model to a point in a buffer.
I’m not sure what you mean by filtering results, can you give an example?
You can change the model, temperature and system messages. There’s no more sophisticated support for agent workflows, but if there’s something you think might be a good fit, and is sufficiently general, I’m open to implement it.
Ah got it. That is indeed a problem I’d like to solve. If you look at the Open AI integration, I already have code to solve it there, but how to extend it to everything else has been an open question that I’ll eventually have to figure out. The interfaces involve are also not clear. Any insight you’ve come up with is likely to be helpful, so please don’t hesitate to share.