‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

  • NeatNit@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    1
    ·
    6 months ago

    hijacking this comment

    OpenAI was IMHO well within its rights to use copyrighted materials when it was just doing research. They were* doing research on how far large language models can be pushed, where’s the ceiling for that. It’s genuinely good research, and if copyrighted works are used just to research and what gets published is the findings of the experiments, that’s perfectly okay in my book - and, I think, in the law as well. In this case, the LLM is an intermediate step, and the published research papers are the “product”.

    The unacceptable turning point is when they took all the intermediate results of that research and flipped them into a product. That’s not the same, and most or all of us here can agree - this isn’t okay, and it’s probably illegal.

    * disclaimer: I’m half-remembering things I’ve heard a long time ago, so even if I phrase things definitively I might be wrong

    • dasgoat@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      edit-2
      6 months ago

      True, with the acknowledgement that this was their plan all along and the research part was always intended to be used as a basis for a product. They just used the term ‘research’ as a workaround that allowed them to do basically whatever to copyrighted materials, fully knowing that they were building a marketable product at every step of their research

      That is how these people essentially function, they’re the tax loophole guys that make sure you and I pay less taxes than Amazon. They are scammers who have no regard for ethics and they can and will use whatever they can to reach their goal. If that involves lying about how you’re doing research when in actuality you’re doing product development, they will do that without hesitation. The fact that this product now exists makes it so lawmakers are now faced with a reality where the crimes are kind of past and all they can do is try and legislate around this thing that now exists. And they will do that poorly because they don’t understand AI.

      And this just goes into fraud in regards to research and copyright. Recently it came out that LAION-5B, an image generator that is part of Stable Diffusion, was trained on at least 1000 images of child pornography. We don’t know what OpenAI did to mitigate the risk of their seemingly indiscriminate web scrapers from picking up harmful content.

      AI is not a future, it’s a product that essentially functions to repeat garbled junk out of things we have already created, all the while creating a massive burden on society with its many, many drawbacks. There are little to no arguments FOR AI, and many, many, MANY to stop and think about what these fascist billionaire ghouls are burdening society with now. Looking at you, Peter Thiel. You absolute ghoul.

      • NeatNit@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        6 months ago

        True, with the acknowledgement that this was their plan all along and the research part was always intended to be used as a basis for a product. They just used the term ‘research’ as a workaround that allowed them to do basically whatever to copyrighted materials, fully knowing that they were building a marketable product at every step of their research

        I really don’t think so. I do believe OpenAI was founded with genuine good intentions. But around the time it transitioned from a non-profit to a for-profit, those good intentions were getting corrupted, culminating in the OpenAI of today.

        The company’s unique structure, with a non-profit’s board of directors controlling the company, was supposed to subdue or prevent short-term gain interests from taking precedence over long-term AI safety and other such things. I don’t know any of the details beyond that. We all know it failed, but I still believe the whole thing was set up in good faith, way back when. Their corruption was a gradual process.

        There are little to no arguments FOR AI

        Outright not true. There’s so freaking many! Here’s some examples off the top of my head:

        • Just today, my sister told me how ChatGPT (her first time using it) identified a song for her based on her vague description of it. She has been looking for this song for months with no success, even though she had pretty good key details: it was a duet, released around 2008-2012, and she even remembered a certain line from it. Other tools simply failed, and ChatGPT found it instantly. AI is just a great tool for these kinds of tasks.
        • If you have a huge amount of data to sift through, looking for something specific but that isn’t presented in a specific format - e.g. find all arguments for and against assisted dying in this database of 200,000 articles with no useful tags - then AI is the perfect springboard. It can filter huge datasets down to just a tiny fragment, which is small enough to then be processed by humans.
        • Using AI to identify potential problems and pitfalls in your work, which can’t realistically be caught by directly programmed QA tools. I have no particular example in mind right now, unfortunately, but this is a legitimate use case for AI.
        • Also today, I stumbled upon Rapid, a map editing tool for OpenStreetMap which uses AI to predict and suggest things to add - with the expectation that the user would make sure the suggestions are good before accepting them. I haven’t formed a full opinion about it in particular (and especially wary because it was made by Facebook), but these kinds of productivity boosters are another legitimate use case for AI. Also in this category is GitHub’s Copilot, which is its own can of worms, but if Copilot’s training data wasn’t stolen the way it was, I don’t think I’d have many problems with it. It looks like a fantastic tool (I’ve never used it myself) with very few downsides for society as a whole. Again, other than the way it was trained.

        As for generative AI and pictures especially, I can’t as easily offer non-creepy uses for it, but I recommend you see this video which takes a very frank take on the matter: https://nebula.tv/videos/austinmcconnell-i-used-ai-in-a-video-there-was-backlash if you have access to Nebula, https://www.youtube.com/watch?v=iRSg6gjOOWA otherwise.
        Personally I’m still undecided on this sub-topic.

        Deepfakes etc. are just plain horrifying, you won’t hear me give them any wiggle room.

        Don’t get me wrong - I am not saying OpenAI isn’t today rotten at the core - it is! But that doesn’t mean ALL instances of AI that could ever be are evil.

        • dasgoat@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          3
          ·
          6 months ago

          ‘It’s just this one that is rotten to the core’

          ‘Oh and this one’

          ‘Oh this one too huh’

          ‘Oh shit the other one as well’

          Yeah you’re not convincing me of shit. I haven’t even mentioned the goddamn digital slavery these operations are running, or how this shit is polluting our planet so someone somewhere can get some AI Childporn? Fuck that shit.

          You’re afraid to look behind the curtains because you want to ride the hypetrain. Have fun while it lasts, I hope it burns every motherfucker who thought this shit was a good idea to the motherfucking ground.