@FatCrab

FatCrab@lemmy.one · 2 months ago

Even better, gorditas literally translate to little fat girls i believe (but maybe I’m wrong). Tbf, we have a processed sausage-alike that we call a hot dog so language and our perception of it is weird

FatCrab@lemmy.one · 2 months ago

Not that I agree they’re conscious, but this is an incorrect and overly simplistic definition of a LLM. They are probabilistic in nature, yea, and they work on tokens, or fragments, of words. But it’s about as much of an oversimplification to say humans are just markov chains that make plausible sentences that can come after [the context] as it is to say modern GPTs are.

FatCrab@lemmy.one · 2 months ago

Grok is closed source, I believe, so it’s hard to say. But, ignoring unknown architecture or latent space details, this could be a lot of things. The way you seem to be using the term hallucination effectively applies to EVERY output of a GPT. They effectively reason probabilistically across a billion dimensioned space mapping language components, with various dimensions taking on various semantic values due to a sort of mathematical differentiation during training. This could be the result of influence from any number of things tbh.

FatCrab@lemmy.one · 3 months ago

Yes, this is what I said. Situations where a work can conceivably considered co-authored by a human, those components get copyright. However, whether that activit constitutes contribution and how is demarcated across the work is a case by case basis. This doesn’t mean any inpainting at all renders the whole work copyright protected–it means that it could in cases where it is so granular and directly corresponds to human decision making that it’s effectively digital painting. This is probably a higher bar than most expect but, as is not atypical with copyright, is a largely case by case quantitative/adjudicated vibes-based determination.

The second situation you quoted is also standard and effectively stands for the fact that an ordered compilation of individually copyrighted works may itself have its own copyright in the work as a whole. This is not new and is common sense when you consider the way large creative media projects work.

Also worth mentioning that none of this obviates the requirement that registrations reasonably identify and describe the AI generated components of the work (presumably to effectively disclaim those portions). It will be interesting to see a defense raised that the holder failed to do so and so committed a fraud on the Copyright Office and thus lost their copyright in the work as a whole (a possible penalty for committing fraud on the Office).

FatCrab@lemmy.one · 3 months ago

The CO didn’t say AI generated works were copyrightable. In fact, the second part of the report very much affirmed their earlier decisions that AI generated content is necessarily not protected under copyright. What you are probably referring to is the discussion the Office presented about joint works style pieces–that is, where a human performed additional creative contributions to the AI generated material. In that case, the portions such that they were generated by the human contributor are protected under copyright as expected. Further, they made very clear that what constitutes creative contribution and thus gets coverage is determined on a case by case basis. None of this is all that surprising, nor does it refute the rule that AI generated material, having been authored by something other than a human, is not afforded any copyright protection whatsoever.

FatCrab@lemmy.one · 3 months ago

For sure. I personally think our current IP laws are well equipped to handle AI generated content, even if there are many other situations where they require a significant overhaul. And the person you responded to is really only sort of maybe half correct. Those advocating for, e.g., there to be some sort of copyright infringement in training AI aren’t going to bat for current IP laws-- they’re advocating for altogether new IP laws effectively thar would effectively further assetize and allow even more rent seeking in intangibles. Artists would absolutely not come out ahead on this and it’s ludicrous to think so. Publishing platforms would make creators sign those rights away and large corporations would be the only ones financially capable of acting in this new IP landscape. The compromise also likely would be attaching a property right in the model outputs and so it would actually become far more practical to leverage AI generated material at commercial scale since the publisher could enforce IP rights on the product.

The real solution to this particular issue is require all models that out materials to the public at large be open source and all outputs distributed at large be marked as generated by AI and thus being effectively in the public domain.

FatCrab@lemmy.one · 3 months ago

AI art is not protected by copyright, yes. That isn’t a “should” but rather how it actually works in nearly all countries but a few, certainly including the US.

FatCrab@lemmy.one · 4 months ago

It’s been defined a few ways. Based on registration and also self reporting.

FatCrab@lemmy.one · 4 months ago

That’s because in practical reality, there are no independent voters. The vastly overwhelming majority of independents actually vote consistently along party lines.

FatCrab@lemmy.one · 7 months ago

??? It is literally impossible for any voter to not know the devil they chose. No, over 70 million voters actively chose to elect perhaps the most incompetent and transparently stupid president in history back into office, but with a well known and well documented playbook this time around on how literally entry metric of American life, from domestic policy to foreign policy, will be made worse to the sole benefit of big corporate actors and 1%ers. A whole bunch of others were too apathetic to be concerned by this.

Voters ultimately made their choice. A lot of folks are going to die as a result, but unfortunately it won’t be limited to just the idiots that actually chose this.

FatCrab@lemmy.one · 9 months ago

My point is just that they’re effectively describing a discriminator. Like, yeah, it entails a lot more tough problems to be tackled than that sentence makes it seem, but it’s a known and very active area of ML. Sure, there may be other metadata and contextual features to discriminate upon, but eventually those heuristics will inevitably be closed up and we’ll just end up with a giant distributed, quasi-federated GAN. Which, setting aside the externalities that I’m skeptical anyone in a position of power to address is equally in an informed position of understanding, is kind of neat in a vacuum.

FatCrab@lemmy.one · 9 months ago

Yes, it’s called a GAN and has been a fundamental technique in ML for years.

FatCrab@lemmy.one · 9 months ago

I think if you can actually define reasoning, your comments (and those like yours) would be much more convincing. I’m just calling yours out because I’ve seen you up and down in this thread repeating it, but it’s a general observed of the vocal critics of the technology overall. Neither intelligence nor reasons (likewise understanding and knowing, for that matter) are easily defined in a way that is more useful than invoking spirits and ghosts. In this case, detecting patterns certainly seems a critical component of what we would consider to be reasoning. I don’t think it’s sufficient, buy it is absolutely necessary.

FatCrab@lemmy.one · 9 months ago

Genetic algorithms is a sort of broad category and there’s certainly ways you could federate and parallelize. I think autoML basically applies this within the ML space (multiple trainings explore a solution topology and convergence progress is compared between epochs, with low performers dropping out). Keep in mind, you can also use a genetic algorithm to learn how to explore an old fashioned state tree.

FatCrab@lemmy.one · 9 months ago

Like I’ve said, you are arguing this into nuanced aspects of copyright law that are absolutely not basic, but I do not agree at all with your assessment of the initial reproduction of the image in a computer’s memory. First, to be clear, what you are arguing is that images on a website are licensed to the host to be reproduced for non-commercial purposes only and that such downstream access may only be non-commercial (defined very broadly–there is absolutely a strong argument here that commercial activity in this situation means direct commercial use of the reproduction; for example, you wouldn’t say that a user who gets paid to look at images is commercially using the accessed images) or it violates the license. Now, even ignoring my parentheses, there are contract law and copyright law issues with this. Again, using thumbs and, honestly, I’m not trying to write a legal brief as a result of a random reply on lemmy, but the crux is that it is questionable whether you can enforce licensing terms that are presented to a licensee AFTER you enable, if not force, them to perform the act of copying your work. Effectively, you allowed them to make a copy of the work, and then you are trying to say "actually, you can only do x, y, and z with that particular copy–and this is also where exhaustion rears its head when you add on your position that once a trained model switches from non-commercial deployment to commercial deployment it can suddenly retroactively recharacterize the initial use as unlicensed infringement. Logistically, it just doesn’t make sense either (for example, what happens when a further downstream user commercializes the model? Does that percolate back to recharacterize the original use? What about downstream from that? How deep into a toolchain history do you need to go to break time traveling egregious breach of exhaustion?) so I have a hard time accepting it.

Now, in response to your query wrt my edit, my point was that infringement happens when you do the further downstream reproduction of the image. When you print a unicorn on a t-shirt, it’s that printing that is the infringement. The commercial aspect has absolutely no bearing on whether an infringement occurs. It is relevant to damages and the fair use affirmative defense. The sole query of whether infringement has occurred is whether a copy has been made and thus violated the copyright.

And all this is just about whether there is even a copying at the training of the models stage. This doesn’t get into a fairly challenging fair use analysis (going by SCotUS’ reasoning on copyrightability of API in Oracle v Google, I actually think the fair use defense is very strong, but I also don’t think there is an infringement happening to even necessitate such an analysis so ymmv–also, that decision was terrible and literally every time the SCotUS has touched IP issues, it has made the law wildly worse and more expensive and time-consuming to deal with). It also doesn’t get into whether outputs that are very similar to works infringe in the way music does (even though there is no actual copying–I think it highly likely it is an infringement). It also also doesn’t get into how outputs might infringe even though there is no IP rights in the outputs of a generative architecture (this probably is more a weird academic issue but I like it nonetheless). Oh, and likeness rights haven’t made their way into the discussion (and the incredible weirdness of a class action that includes right of publicity among its claims).

We can, and probably will, disagree on how IP law works here. That’s cool. I’m not trying to litigate it on lemmy. My point in my replies at this point is just to show that it is not “basic copyright law bruh”. The copyright law, and all the IP law really, around generative AI techniques is fairly complicated and nuanced. It’s totally reasonable to hold the position that our current IP laws do not really address this the way most seem to want it to. In fact, most other IP attorneys I’ve talked to with an understanding of the technical processes at hand seem to agree. And, again, I don’t think that further assetizing intangibles into a “right to extract machine learning from” is a viable path forward in the mid and long run, nor one that benefits anyone but highly monied corporate actors either.

FatCrab@lemmy.one · edit-2 9 months ago

No, this is mostly incorrect, sorry. The commercial aspect of the reproduction is not relevant to whether it is an infringement–it is simply a factor in damages and Fair Use defense (an affirmative defense that presupposes infringement).

What you are getting at when it applies to this particular type of AI is effectively whether it would be a fair use, presupposing there is copying amounting to copyright infringement. And what I am saying is that, ignoring certain stupid behavior like torrenting a shit ton of text to keep a local store of training data, there is no copying happening as a matter of necessity. There may be copying as a matter of stupidity, but it isn’t necessary to the way the technology works.

Now, I know, you’re raging and swearing right now because you think that downloading the data into cache constitutes an unlawful copying–but it presumably does not if it is accessed like any other content on the internet. Because intent is not a part of what makes that a lawful or unlawful copying and once a lawful distribution is made, principles of exhaustion begin to kick in and we start getting into really nuanced areas of IP law that I don’t feel like delving into with my thumbs, but ultimate the point is that it isn’t “basic copyright law.” But if intent is determinitive of whether there is copying in the first place, how does that jive with an actor not making copies for themselves but rather accessing retained data in a third party’s cache after they grab the data for noncommercial purposes? Also, how does that make sense if the model is being trained for purely research purposes? And then perhaps that model is leveraged commercially after development? Your analysis, assuming it’s correct arguendo, leaves far too many outstanding substantive issues to be the ruling approach.

EDIT: also, if you download images from deviantart with the purpose of using them to make shirts or other commercial endeavor, that has no bearing on whether the download was infringing. Presumably, you downloaded via the tools provided by DA. The infringement happens when you reproduce the images for the commercial (though any redistribute is actually infringing) purpose.

FatCrab@lemmy.one · 9 months ago

Yes, inadvertent copying is still copying, but it would be copying in the output and is not evidence of copying happening in the creation of the model. That was why I used the music example, because it is rather probative of where there could be grounds for copyright infringement related to these model architectures. This may not seem an important distinction, but it has significant consequences on who is ultimately liable and how.

FatCrab@lemmy.one · 9 months ago

I get that that’s how it feels given how it’s being reported, but the reality is that due to the way this sort of ML works, what internet archive does and what an arbitrary GPT does are completely different, with the former being an explicit and straightforward copy relying on Fair Use defense and the latter being the industrialized version of intensive note taking into a notebook full of such notes while reading a book. That the outputs of such models are totally devoid of IP protections actually makes a pretty big difference imo in their usefulness to the entities we’re most concerned about, but that certainly doesn’t address the economic dilemma of putting an entire sector of labor at risk in narrow areas.

FatCrab@lemmy.one · 9 months ago

You are misunderstanding what I’m getting at and unfortunately no this isn’t just straightforwardly copyright law whatsoever. The training content does not need to be copied. It isn’t saved in a database somewhere (as part of the training…downloading pirated texts is a whole other issue completely removed from the inherent processes of training a model), relationships are extracted from the material, however it is presented. So the copyright extends to the right of displaying the material in the first place. If your initial display/access to the training content is non-infringing, the mere extraction of relationships between components is not itself making a copy nor is it making a derivative work in any way we haven’t historically considered it. Effectively, it’s the difference between looking at material and making intensive notes of how different parts of the material relate to each other and looking at a material and reproducing as much of it as possible for your own records.

FatCrab@lemmy.one · 9 months ago

I have no personal interest in the matter, tbh. But I want people to actually understand what they’re advocating for and what the downstream effects would inevitably be. Model training is not inherently infringing activity under current IP law. It just isn’t. Neither the law, legislative or judicial, nor the actual engineering and operations of these current models support at all a finding of infringement. Effectively, this means that new legislation needs to be made to handle the issue. Most are effectively advocating for an entirely new IP right in the form of a “right to learn from” which further assetizes ideas and intangibles such that we get further shuffled into endstage capitalism, which most advocates are also presumably against.