Login
You're viewing the front-end.social public feed.

Replies

  • Jun 9, 2026, 4:33 AM

    @Migueldeicaza This seems very similar to Google's new Gemma 4 QAT models. Testing them with Ollama yesterday, I noticed that instead of loading the full model into VRAM, it only loads a small portion, likely using mmap to read the weights from disk during inference. Impressive for those of us with limited GPU memory!"

    💬 0🔄 0⭐ 0
  • 💬 0🔄 0⭐ 0
  • Jun 9, 2026, 6:36 AM

    @Migueldeicaza So "Apple Intelligence" is at least 5 different models: 2 on device, 2 on Apple Silicon in the cloud and the final one is basically Gemini running on NVIDIA in Google Cloud.

    💬 0🔄 0⭐ 0
  • 💬 0🔄 0⭐ 0