• GissaMittJobb@lemmy.ml
    link
    fedilink
    arrow-up
    18
    ·
    2 days ago

    Is this real? On account of how LLMs tokenize their input, this can actually be a pretty tricky task for them to accomplish. This is also the reason why it’s hard for them to count the amount of 'R’s in the word ‘Strawberry’.

    • jj4211@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      1 day ago

      The LLM doesn’t have to innately implement filtering. You can use a more traditional and concrete filtering strategy on top. So you sneak something problematic by in the prompt and it’s too clever to be caught by the input filter, but then on the output the filter can catch that the prompt tricked the LLM into generating something undesired. Another comment specified they tried this and it started to work but then suddenly it seemingly shut out the reply in the middle, presumably the minute the LLM spit something at a more traditional filter and that shut it down.

      I think I’ve seen this sort of approach has been applied to largely mask embarassing answers that become memes, or to detect input known not to work, and to shut it down or redirect it to a better facility (e.g. redirecting math to wolfram alpha).

    • kautau@lemmy.world
      link
      fedilink
      arrow-up
      6
      ·
      2 days ago

      It’s probably deepseek r1, which is a “reasoning” model so basically it has sub-models doing things like running computation while the “supervisor” part of the model “talks to them” and relays back the approach. Trying to imitate the way humans think. That being said, models are getting “agentic” meaning they have the ability to run software tools against what you send them, and while it’s obviously being super hyped up by all the tech bro accellerationists, it is likely where LLMs and the like are headed, for better or for worse.

      • GissaMittJobb@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        2 days ago

        Still, this does not quite address the issue of tokenization making it difficult for most models to accurately distinguish between the hexadecimals here.

        Having the model write code to solve an issue and then ask it to execute it is an established technique to circumvent this issue, but all of the model interfaces I know of with this capability are very explicit about when they are making use of this tool.

        • morrowind@lemmy.ml
          link
          fedilink
          arrow-up
          1
          ·
          2 days ago

          Not really a concern. It’s basically translation, which language models excel at. It just needs a mapping of the hex to byte

            • morrowind@lemmy.ml
              link
              fedilink
              arrow-up
              1
              ·
              2 days ago

              I’m well aware, but you don’t need to necessarily see each character to translate to bytes

              • GissaMittJobb@lemmy.ml
                link
                fedilink
                arrow-up
                1
                ·
                2 days ago

                It’s not out of the question that we get emergent behaviour where the model can connect non-optimally mapped tokens and still translate them correctly, yeah.

                • kautau@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  24 hours ago

                  I’m confused, is the concern when the model doesn’t properly identify when it is using software to identify something like a hex pattern?

                  • GissaMittJobb@lemmy.ml
                    link
                    fedilink
                    arrow-up
                    2
                    ·
                    22 hours ago

                    The concern is that the model doesn’t actually see the world in terms of distinct hexadecimals, but instead as tokens of variable size - you can see this using the tiktokenizer-webapp: enter some text and it will split it into the series of tokens the model actually will process.

                    It’s not impossible for the model to work it out anyway, but it is a reason for this type of task to be a bit harder on LLMs.