• meowmeowbeanz@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    43
    arrow-down
    2
    ·
    6 days ago

    Oh look, another tech giant treating open knowledge initiatives like their personal data buffet. Let me translate this corporate nonsense for you:

    Meta: “We need training data for our AI!” Also Meta: Let’s leech 81.7TB from a community project without contributing anything back.

    The absolute audacity of downloading terabytes through torrents while their employees were internally admitting it was “legally problematic”. And the best part? They couldn’t even be bothered to seed properly - just grab and go, classic corporate behavior.

    Remember when companies actually contributed to open source instead of just parasitically consuming it? But no, they’d rather burden volunteer-run projects with massive bandwidth costs while their lawyers probably bill more per hour than these projects’ entire monthly budget.

    Pro tip Meta: If you’re going to pilfer knowledge from the commons, at least seed back properly. Your “move fast and break things” motto isn’t supposed to apply to community archives.

  • njordomir@lemmy.world
    link
    fedilink
    English
    arrow-up
    45
    arrow-down
    2
    ·
    7 days ago

    If someone was to acquire a few hundred gigs of books and feed them to something like paperless-ngx, would it work as a sort of google of books? Are there any software projects better suited for doing thisand understand synonyms and perhaps some context? I guess AI search but guided for the intermediate user.

    Google is so bad lately. Basically every result is official sponsored corporate biased BS. It would be nice to be able to instantly query a bunch of ebooks.

  • ad_on_is@lemm.ee
    link
    fedilink
    English
    arrow-up
    12
    ·
    6 days ago

    If buying ain’t owning, than downloading…

    oh wait, that’s our slogan

    • rottingleaf@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      ·
      7 days ago

      In copyright protection terms the ratio shouldn’t matter. They should pay for all the lost profits from pirating everything they’ve downloaded. Every time someone pirated it should be counted. And every time someone uses the AI trained on the data.

      They can become the corporate Jesus of the interwebs, having paid for our sins.

      • grue@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        1
        ·
        7 days ago

        Technically, copyright infringement is committed by the entity making and sending the copy, not the entity receiving it. Leeching could indeed remove liability.

        I’m not sure if the courts have cared about that nuance when persecuting the ‘small fish,’ but I bet they would in this ‘big fish’ case.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          1
          ·
          7 days ago

          If the receiving entity then ingests all that copyrighted material into its AI, and the AI sends it piece at a time to other receiving entities, that should be the AI infringing on everything it is copying to make its answers.

          • grue@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            ·
            7 days ago

            Yes, yes it should. But that’s a different act than the one being discussed here.

  • Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    106
    ·
    7 days ago

    “Meta downloaded millions of pirated books from LibGen through the bit torrent protocol using a platform called LibTorrent. Internally, Meta acknowledged that using this protocol was legally problematic,” the third amended complaint noted.

    Just want to make clear that Libtorrent is just the torrent application they were using, while the Libgen torrents are easily accessible on the libgen site, not through a separate “platform” called Libtorrent.

    I wish people like us could help with these complaints, because then they might actually get the details more accurate to reality.

    https://libgen.is/repository_torrent/

    https://www.libtorrent.org/

    The amended complaint makes it sound like Libtorrent is a private tracker website when its just the application they were using on the publicly available torrents.

  • SinningStromgald@lemmy.world
    link
    fedilink
    English
    arrow-up
    37
    arrow-down
    1
    ·
    7 days ago

    Given the extent it should be considered criminal so $250k per offense and the higher ups who authorized the torrenting should get conspiracy charges at a minimum.

    But this is America so they’ll probably pay a small amount, for Meta, and a light slap on the wrist with a finger wagging.

    • Pika@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      14
      ·
      7 days ago

      you are being optimistic, it’s likely going to be considered “fair use” and then be business as usual. Meta themselves have claimed that they aren’t filing to dismiss because they believe they are on the legal side, due to the fact they aren’t distributing the pirated content, only using it for training which is currently a massive grey area that hasen’t been ruled as non-fair use

      • merc@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        5
        ·
        7 days ago

        Average ebook size: 2.5 MB or so.

        Meta downloaded 81 TB, or 81,000,000 MB.

        81,000,000 / 2.5 = Approx 30 million books.

        30,000,000 books * $250,000 per offence = $7.5 trillion

        (are you sure you’re from programming.dev?)

      • Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        11
        ·
        edit-2
        7 days ago

        Each time someone uses their LLM it should be considered a violation.

        People are using these things millions of times a day in aggregate. That adds up fast. $250k multiplied by millions suddenly isn’t so cheap.

      • grue@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        7 days ago

        $250k * [every book in existence] is literally nothing?

        Remember, “offense” doesn’t mean “per torrent,” it means “per copyrighted work infringed.”

    • misk@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      43
      ·
      7 days ago

      It’s a popular search engine that works with shadow libraries like Sci-Hub or Library Genesis. Shadow libraries are hosts to copies of works of literature and science. Their legal status is murky at best but it’s incredibly impractical to persecute those accessing them.

        • PM_Your_Nudes_Please@lemmy.world
          link
          fedilink
          English
          arrow-up
          22
          ·
          edit-2
          7 days ago

          TPB and 1337x are torrents, whereas Anna’s Archive typically uses direct downloads. So it’s more akin to the old CoolROMs back before the massive takedown purges.

          Anna’s Archive does offer torrents, but it’s not for individual files. Their torrents are more like database backups, with thousands of books each. In fact, people will download and seed them to help increase AA’s resilience. Since they aren’t super useful for individual files, very few people use them as such. But clearly, Meta just used them to feed into an LLM, because they didn’t care about the content of the files as long as they were properly written. It was less “looking for your favorite fantasy book” and more “looking to grab every fantasy book ever written.”

          • jaybone@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            7 days ago

            Thanks. It’s confusing because everyone is talking about torrents. It’s in the title even, but I didn’t read the article.

            • Corkyskog@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              6
              ·
              7 days ago

              Well i think you can also torrent off of there too. There are massive backup files on their home page that they are basically begging people to download and seed… So maybe it’s that?

      • MonkderVierte@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        6 days ago

        it’s incredibly impractical to persecute those accessing them.

        Always was. If you’re serious, persecute those hosting it.

  • Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    45
    arrow-down
    25
    ·
    7 days ago

    Meta has open sourced every single one of their llms. They essentially gave birth to the whole open llm scene.

    If they start losing all these lawsuits, the whole scene dies and all those nifty models and their fine-tunes get removed from huggingface, to be repackaged and sold to us with a subscription fee. All the other domestic open source players will close down.

    The copyright crew aren’t the good guys here, even if it’s spearheaded by Sarah Silverman and Meta has traditionally played the part of the villain.

    • misk@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      46
      arrow-down
      16
      ·
      7 days ago

      Meta stole from everyone, including those that struggle to make ends meet, so it doesn’t matter that they gave you back some of it. Any moral qualms should evaporate when you consider that they did it to create shareholder value and the rest is philanthropy (aka pretend tax). As a socialist I believe that man is owed for his work and you can’t take from him even though technology makes it so easy.

            • misk@sopuli.xyzOP
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              edit-2
              6 days ago

              The world is in a mess is that we were told to choose between fascists and pro-market technocrat libertarians pretending to be leftists. This is a worldwide issue that’s doubly important because those liberals guilt trip us for not supporting them and that’s why I’m just laying little bricks here and there. At the end of the tunnel we either rework our society into a socialist one or we succumb to feudal lords again. Years of neoliberal hegemony needs to be undone so I try to go against the grain like that sometimes, hoping I made someone think.

                • misk@sopuli.xyzOP
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  6 days ago

                  I assume you probably want to know how this kind of leftism is different from others or other ideologies calling themself leftist, rather than for me to write an essay on myself.

                  I believe in equal opportunity but reject that you should be able to „win” in any system. I believe in empathy over soulless meritocracy. I believe in collective ownership but don’t reject that one is owed for his work. You could say it all stems from egalitarianism but this term has been caricatured by liberals too. For a long time I thought social democracy as an ideology gives you enough levers in the system to steer it toward that goal but time and time again it turned out that in most places SocDem parties are no different from liberal ones and so I learned from past mistakes.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        9
        ·
        edit-2
        7 days ago

        Don’t give me that slop. No one except the biggest names are getting a dime out it once OpenAI buys up all the data and kills off their competition. It’s also highly transformative, which used to be perfectly legal.

        Copyright laws have been turned into a joke, only protecting big money and their interests.