• Demdaru@lemmy.world
    link
    fedilink
    English
    arrow-up
    81
    arrow-down
    5
    ·
    23 days ago

    I am so confused by the low link lol.

    • “AI haters build tarpits to trap and trick AI (!)” - Ohmy god poor AI :<
    • “…that ignore robots.txt!” - …oh, so illegal AI…?
    • “Attackers explain-” - YEAH! THE EVIL AGRESSIVE
    • “how anti-spamdefense became an AI weapon” - …folk trying to defend from spam…?

    FFS they try to paint people protecting themselves as evil but are keeping facts too much and it becomes an absolute confusing mess xD

    • skulblaka@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      39
      arrow-down
      6
      ·
      23 days ago

      It’s not really that confusing.

      The software equivalent of armed masked men are illegally breaking in to your personal property, stealing everything that isn’t nailed down and ripping all the nails out of everything that is, and then leaving with it in order to reuse it for personal profit. It is, in all ways, similar to a home invasion. These invaders are then telling you that you’re a bad person because you don’t want them invading your property and stealing all your shit.

      Its highly illegal, everyone involved with it knows for a fact that it’s highly illegal, so they best they can do is try and spin propaganda around it because nobody has the balls to try and arrest Sam Altman, et al about it.

      If you pick the lock on my front door and enter my home without permission I am going to put a 12 gauge slug through your solar plexus. If I could do the same to an AI crawler I would.

    • [object Object]@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      2
      ·
      23 days ago

      It’s deliberately slow to load

      That kinda defeats the goal of feeding AI as much garbage as possible. They will just fetch a page from a different site in that time, instead of spending cycles on this page. It’s not like the crawler works strictly serially.

      • gressen@lemmy.zip
        link
        fedilink
        English
        arrow-up
        35
        ·
        23 days ago

        The idea is to protect own server from unnecessary loads. You’re welcome to provide a faster AI tar pit, just mind that ultimately this is a waste of resources.

        • [object Object]@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          1
          ·
          edit-2
          23 days ago

          I’m guessing that Markov chains are pretty efficient computationally compared to AI training. Don’t have a site currently, but I’d love to see a bot rip through hundreds of pages a minute.

  • _thebrain_@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    33
    ·
    23 days ago

    I wonder how effective they are. When I first heard about ssh targets (like endlessh) I thought it was an awesome idea. But as I started to look at some analyzed logged data it turns out they are either slightly effective to not at all effective. If simple logic can be written so a dumb ssh bot programed to find vulnerable ssh servers can easily avoid a tar pit, I would think it is pretty trivial for an AI crawler to do the same thing. I am interested to see some analyzed data on something like this after several months on the open internet.

    • tempest@lemmy.ca
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      1
      ·
      23 days ago

      The reality is that depending on the crawling architecture someone is watching.

      As aggressive as the LLM crawlers are there still have limits so a competently written one will have a budget for each host/site as well as a heuristic for the quality of results. It may dig for a bit and periodically return but if you’re site is not one that is known to generate high quality data it may only get crawled when there isn’t something better in the queue.

    • Saprophyte@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      22 days ago

      Super effective, I tried both of these on a couple of domains I have and the amount of hits they get vs how long crawlers stay in them is insane. I use the AI robot.txt file and if they ignore it will spend hours scraping randomized nonsense text from unlimited internal links. I’m sure large legit ai companies have protection, but I get a lot of traffic from Africa and Asia in particular. Not sure if it’s the source or a VPN, but I just look at the geos and tend not to dig deep.

  • pigup@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    23 days ago

    Please, someone make us super easy to implement version of this.

    • boonhet@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      10
      ·
      22 days ago

      Appreciate you using the ß correctly instead of using it as a replacement for “B”

    • VoterFrog@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      3
      ·
      22 days ago

      Doesn’t work either

      The text you provided translates to:
      “But what about typing like this?”. This style of writing involves replacing standard Latin letters with similar-looking characters from other alphabets or adding diacritical marks (accents, tildes, umlauts) available in the Unicode standard.

    • Luffy@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      2
      ·
      22 days ago

      Even if the LLM dosent recognise it, the Human ghost workers will train/translate it

      You’re only hindering people who have trouble reading

  • shalafi@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    ·
    23 days ago

    Seems like these traps would be trivially easy to defeat. I should get off my ass and run one, see how it goes.

    • Krudler@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      3
      ·
      22 days ago

      Agree. This is another revenge fantasy from people that think the idea is great, without understanding that the implementation part is where it’s gonna break down.

      • VoterFrog@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        3
        ·
        22 days ago

        Yeah, much like the thorn, LLMs are more than capable of recognizing when they’re being fed Markov gibberish. Try it yourself. I asked one to summarize a bunch of keyboard auto complete junk.

        The provided text appears to be incoherent, resembling a string of predictive text auto-complete suggestions or a corrupted speech-to-text transcription. Because it lacks a logical grammatical structure or a clear narrative, it cannot be summarized in the traditional sense.

        I’ve tried the same with posts with the thorn in it and it’ll explain that the person writing the post is being cheeky - and still successfully summarizes the information. These aren’t real techniques for LLM poisoning.

          • VoterFrog@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            22 days ago

            An AI crawler is both. It extracts useful information from websites using LLMs in order to create higher quality data for training data. They’re also used for RAG.

  • laranis@lemmy.zip
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    22 days ago

    Soon: US Republicans introduce law to prohibit the use of AI tar pits; cite copyright law and freedom of speech.

  • FosterMolasses@leminal.space
    link
    fedilink
    English
    arrow-up
    17
    ·
    22 days ago

    It’s weird how this is written to make them sound more like animals or insects than a computer algorithm… “thrash around” lol

    • wolframhydroxide@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      ·
      22 days ago

      It’s because the tool is named Nepenthes, after the pitcher plants, into which victims fall, cannot escape, and thrash around until they die and are digested.