• David GerardOPM
    link
    fedilink
    English
    47 months ago

    there’s a research result that the precise tokeniser makes bugger all difference, it’s almost entirely the data you put in

    because LLMs are lossy compression for text