So we have multiple modules like selenium etc that automate the computer / web pages. Now, self operating computer script that was just released tries to explain to gpt4 what it is looking at with gpt 4 vision and allows it to use the computer according to a prompt.

It seems to me that there should be a module that does the same. Explains to llm what it can do with prompts using mouse / keyboard. Customized to different os’s.

So then all we would have to do is load the module and be able to do something similar for our own special needs.

Is anyone working on something like this?