• Kazumara@discuss.tchncs.de
    link
    fedilink
    arrow-up
    1
    ·
    4 days ago

    Inference is cheap and efficient.

    Tell that to all the Github users that are screaming about the new token based billing. In reality inference on these massive models with big context windows is expensive, but was subsidized so hard, that nobody has an accurate feeling for the cost.

      • Kazumara@discuss.tchncs.de
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        3 days ago

        Sure it’s much much cheaper than training, but importantly those companies are not recouping anything with inference because it is still more expensive than what they are selling it for.

        They are double bankrupting themselves.

        At work we run inference for a research project with an open weights model in the public cloud another part of my company provides and we pay around 25$ a day for a VM with a single L40s. It’s both slow - despite not even serving concurrent users - and kind of bad in its outputs.

        Edit: Interference -> Inference, arguing on the internet after waking up first thing in the morning might not have been the best idea