Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.

  • Dremor@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    4 days ago

    Anubis is no challenge like a captcha. Anubis is a ressource waster, forcing crawler to resolve a crypto challenge (basically like mining bitcoin) before being allowed in. That how it defends so well against bots, as they do not want to waste their resources on needless computing, they just cancel the page loading before it even happen, and go crawl elsewhere.

    • tofu@lemmy.nocturnal.gardenOP
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      4
      ·
      4 days ago

      No, it works because the scraper bots don’t have it implemented yet. Of course the companies would rather not spend additional compute resources, but their pockets are deep and some already adapted and solve the challenges.

      • Encrypt-Keeper@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        edit-2
        3 days ago

        The point was never that Anubis challenges are something scrapers can’t get past. The point is it’s expensive to do so.

        Some bots don’t use JavaScript and can’t solve the challenges and so they’d be blocked, but there was never any point in time where no scrapes could solve them.

        • JuxtaposedJaguar@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          3
          ·
          3 days ago

          Wait, so browsers that disable JavaScript won’t be able to access those websites? Then I hate it.

          Not everyone wants unauthenticated RCE from thousands of servers around the world.

          • Encrypt-Keeper@lemmy.world
            link
            fedilink
            English
            arrow-up
            7
            ·
            3 days ago

            Not everyone wants unauthenticated RCE from thousands of servers around the world.

            Ive got really bad news for you my friend

      • Dremor@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        4 days ago

        To solve it or not do not change that they have to use more resources for crawling, which is the objective here. And by contrast, the website sees a lot less load compared to before the use of Anubis. In any case, I see it as a win.

        But despite that, it has its detractors, like any solution that becomes popular.

        But let’s be honest, what are the arguments against it?
        It takes a bit longer to access for the first time? Sure, but that’s not like you have to click anything or write anything.
        It executes foreign code on your machine? Literally 90% of the web does these days. Just disable JavaScript to see how many website is still functional. I’d be surprised if even a handful does.

        The only people having any advantages at not having Anubis are web crawler, be it ai bots, indexing bots, or script kiddies trying to find a vulnerable target.

        • tofu@lemmy.nocturnal.gardenOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 days ago

          Sure, I’m not arguing against Anubis! I just don’t think the added compute cost is sufficient to keep them out once they adjust.

          • rumba@lemmy.zip
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 days ago

            Conceptually, you could just really twist the knobs up. A human can wait to read a page for 15 seconds. But you’re trying to scrape 100,000 pages and they each take 15 seconds… You can make it expensive in both power and time that’s a win.