It really depends on how you quantize the model and the K/V cache as well. This is a useful calculator. https://smcleod.net/vram-estimator/ I can comfortably fit most 32b models quantized to 4-bit (usually KVM or IQ4XS) on my 3090’s 24 GB of VRAM with a reasonable context size. If you’re going to be needing a much larger context window to input large documents etc then you’d need to go smaller with the model size (14b, 27b etc) or get a multi GPU set up or something with unified memory and a lot of ram (like the Mac Minis others are mentioning).
- 3 Posts
- 3 Comments
Joined 5 years ago
Cake day: June 30th, 2020
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
FrankLaskey@lemmy.mlto You Should Know@lemmy.world•YSK there's a tool to check US non-profit compensationEnglish1·3 months agoIt would be cool if they would provide some useful statistics about the aggregated data as well. Maybe something like showing the percentile for pay to the ED/CEO or for the total compensation compared to other organizations in the sector.
I didn’t scour the site so maybe this does exist.
Looks like it now has Docling Content Extraction Support for RAG. Has anyone used Docling much?