Hmmh, I lately use mistral-nemo which is 12B parameters. Since I’m more a programmer than a gamer, I didn’t put a graphics card into my PC, and I believe it’s too old to accomodate any recent one. (older PCIe generation, only x8 ports) I’d have to replace everything. And then I might as well go for a Radeon RX 7900 XTS or something. That’s $1.000(?) but has 24GB of VRAM. I don’t think buying an entire PC and then going for an old GPU will make me happy. And thanks to llama.cpp I get about 2 tokens per second just on the CPU. It’d have to be a considerable step up to be worth it. And last time I checked even a P40 was like $300+ and it’s super old and unclear if it’ll continue to be supported in the major frameworks. I’m not sure. I still lean towards paying for cloud GPU compute.
Thanks for the numbers on your setup. That certainly helps weighing my options. Maybe some of my friends have some upgrades planned and want to give me their older 8GB NVidia cards…
Good point. I think it’s super important to make this decision early on. Whether you want to invest time and do self hosting, or not and you’ll want to use managed services or regular non-free platforms. Doing things by yourself certainly teaches a lot. I do it. And I gain knowledge, independence and I think it’s important to understand the tools I use on a regular basis and not let Apple/Google take care of my life. And since I do a lot of things with computers, I can make good use of the gained knowledge. However I can also feel how someone wouldn’t want to do that. They might have other hobbies, a stressful job or a family and it’s quite some time that I spend digging through configuration files, reading documentation and maintaining stuff. It has to be worth it in some way, or it becomes a liability. And I think that’s not super obvious when starting the journey. I’m glad we have managed services which give independence without spending too much time. But I also prefer going all the way and learning lots of stuff.