Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • Saterz@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    13 days ago

    Well it is a 9B model after all. Self hosted models become a minimum “intelligent” at 16B parameters. For context the models ran in Google servers are close to 300B parameters models

    • SuspciousCarrot78@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      12 days ago

      Not sure how we’re quantifying intelligence here. Benchmarks?

      Qwen3-4B 2507 Instruct (4B) outperforms GPT-4.1 nano (7B) on all stated benchmarks. It outperforms GPT-4.1 mini (~27B according to scuttlebutt) on mathematical and logical reasoning benchmarks, but loses (barely) on instruction-following and knowledge benchmarks. It outperforms GPT-4o (~200B) on a few specific domains (math, creative writing), but loses overall (because of course it would). The abliterated cooks of it are stronger yet in a few specific areas too.

      https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF

      https://huggingface.co/DavidAU/Qwen3-4B-Hivemind-Instruct-NEO-MAX-Imatrix-GGUF

      So, in that instance, a 4B > 7B (globally), 27B (significantly) and 200-500B(?) situationally. I’m pretty sure there are other SLMs that achieve this too, now (IBM Granite series, Nanbiege, Nemotron etc)

      It sort of wild to think that 2024 SOTA is ~ ‘strong’ 4-12B these days.

      I think (believe) that we’re sort of getting to the point where the next step forward is going to be “densification” and/or architecture shift (maybe M$ can finally pull their finger out and release the promised 1.58 bit next step architectures).

      ICBW / IANAE