I've been writing code for nearly as long and man, you have to try it. It's more like autocomplete on roids than something that generates Fibonacci methods automagically.
The completions are often more “similar to stuff I’ve typed before” rather than “generate this working function”, and it’s not an order of magnitude improvement over just regular intellij completion.
…but I find it’s generally better.
Up to you if you consider it worth the cost of paying for it, and the macro completion is so-so, but “its rust so copilot can’t do it” isn’t really true.
I guess any "abuse" is the fact that any code reproduced verbatim doesn't carry the license under which it was originally published. But the reality is that what Copilot is most useful for is (semi) intelligent autocomplete, and generating functions that almost certainly exist as StackOverflow answers. It won't do any thinking for you when it comes to the bigger picture.
someone can still mirror your stuff on github. I wonder if they should make a special open source license, that disallows use of the source code for the purpose of training something like Copilot.
> I wonder if they should make a special open source license, that disallows use of the source code for the purpose of training something like Copilot.
Since Microsoft uses material for copilot outside of licensing on the basis that it is Fair Use, that would probably have no effect in practice on whether or not the material is used in training something like Copilot. For that to matter, you’d first have to win a lawsuit on the basis that training something like Copilot requires permission of the copyright owner of the training material, to invalidate the premise of Microsoft’s action.
I don't care about all the handwaving here, it comes down to this.
The moment a fragment of code from a GPL'd or AGPL'd project shows up almost verbatim in someone's closed source or non-copylefted, etc. project, and someone proves it, sparks are going to fly.
And it's probably already happening, just people haven't discovered it yet.
How many years did the Oracle/Google lawsuit go on for? And in the end about a handful of a lines of code only tangentially related to the issue at hand?
Part of the lesson from that must be: employers should be telling their workers to say TF away from Copilot or things like it. And be careful in general when browsing source. License literacy is critical.
I don't touch it because I need to feed my kids. I don't need my career exploded. Overcautious? Maybe. I'll let someone else find out. I make a living in and around open source software.
> The moment a fragment of code from a GPL'd or AGPL'd project shows up almost verbatim in someone's closed source or non-copylefted, etc. project, and someone proves it, sparks are going to fly.
I'm hoarding popcorn and can't wait for that, honestly.
> License literacy is critical.
It's beyond critical, but most people I have talked says that they see the right to copy and use any code they see online. They don't care.
Building this open source corpus was not easy, and we need to defend it too. This is a culture.
Anything generated by copilot, which if it is a derivative work, is not something that copilot can hold the copyright on.
From the auditor's perspective it doesn't matter if you copied it out of stack overflow, from some GitHub search, or copilot. You, the human, didn't check the license / plagiarism detecter. It is you, the human, claiming copyright on the work you are creating which may incorporate material from other sources.
Copilot isn't claiming fair use.
You could argue that the model that copilot runs from is a derivative work (and this is going to be interesting when it gets to the courts, because, frankly no one will come out the 'winner' on this when trying to explain it to a judge) - but that's not the code that a human is claiming to be their creative work and is ultimately the license violation.
Personally, I (not a lawyer), believe that copilot is on ok ground - but anyone using it needs to do their due diligence in verifying that the code that they've incorporated is licensed appropriately - just as if they've copied something from Stack Overflow - who knows where that copied was copied from.
I have less concerns with identifiable code from copilot than humans not caring about the licenses of their source material in creating human generated content.
It's not really an issue when you're a large software corporation; you already have mechanisms in place to check for license compliance in everything that ships, including F/OSS plagiarism checks.
I think that's the part that people who don't think it's worth the money aren't getting. This kind of system is godsend for the likes of Infosys, TCS etc. So the immediate threat is to the jobs there - but the side effect is that it'll make it all even cheaper, so we'll see more "outsourcing to the cloud", so to speak. Often to the obvious detriment of quality, but that doesn't seem to matter in this market.
There is a common trend of devs on HN getting angry that their work has been "stolen" to train copilot, While none of them raised the same concerns when everything else like art, music, literature, etc was used to train other models. Now that it affects them its a real issue.
Yes. It's especially amusing given that much of the other things you note are actually intended for commercial use (i.e. sales) from the start, unlike open source software.
I just don't understand the OSS community sometimes. "Software should be open and free (libre) for me to study and modify" includes what Github did for copilot. If you don't want your software to be free (in either sense), don't host it on an open source platform, especially one that makes it available gratis to the public.
There's possibly a valid argument that any private repo code that was used for copilot doesn't fit the proper definition of "open" (or gratis). But I haven't actually read the Github license around this, so I don't know.
Your argument fails to distinguish between "open source" and "free software".
Copyleft, free software, GPL style licenses do not have their source open purely for the purpose of studying and modifying. Their licenses also require that derivative works also be free and that such modifications be distributed.
Copilot does not comply with this. And so violates the spirit of those licenses, and probably also the letter of the law.
In what sense? Copilot isn’t a derivative work in the sense these licenses usually are understood to mean. And given that they’re open source code bases I expect licenses to explicitly disallow things, and consider anything not explicitly disallowed as permitted.
> Copilot isn’t a derivative work in the sense these licenses usually are understood to mean
The phrase “derived work” is, IIUC, a phrase from copyright law. And you’d have a hard time convincing me that Copilot-generated code is not a derived work from its training data.
> And given that they’re open source code bases I expect licenses to explicitly disallow things, and consider anything not explicitly disallowed as permitted.
That is very much not how copyright and licences work. Copyright law gives the copyright holder the exclusive right to make copies of the work, making derived works, (and to do some other related things, like making a public performance of it, etc.), so to do any of those things, you need explicit permission, i.e. a license from the copyright holder to do it. A license is not a list of things you are forbidden to do; on the contrary, it is a list of things you are permitted to do, which you would not otherwise be legally allowed to do according to copyright law.
Sure, but there are things you can do without a license because they're not copyright violations. You can read the work, learn from it, and sometimes make quotations under fair use.
This is a novel scenario. It seems unclear how the courts will interpret it? Never mind what we think, will they decide it's a derivative work, or is it a transformative use?
“Fair use” is, technically, not actually permitted by copyright law. ISTR that “fair use” is only a defense you can use when you are being sued for copyright violation.
Suppose we create a new AI image generator, and use as training input every image ever made of a Disney character (official images by Disney, that is, no fan art), including every frame of every Disney movie. Could we just use the output images of that AI however we wanted to? (Not withstanding trademarks.)
Looks like there is case law that fictional characters are protected if they are "sufficiently delineated." I don't see how that applies to code, though.
This is unclear. I have never seen an open source license that was explicit about this. Seems like a grey area.
It's not even clear how often training machine learning algorithms on code results in copyright violations. CoPilot does have a setting to detect and disallow direct copying, but how well does it work?
This legal uncertainty is enough that I wouldn't advise using it, but maybe people who use it will be fine?
I found carefully reviewing the suggestions it gave me more work than actually writing the code myself. Granted, I only used it for a day, but many of the suggestions were subtly wrong, needless inefficient, or used outdated/deprecated paradigms/standard library stuff.
I only used it for a language I'm very familiar with. I'd be a lot more hesitant using it for a language I'm less familiar with because I won't be able to spot the problems so easily.
I’ve really found no use for it at all. It doesn’t understand the codebase it’s being used in. I can’t tell it to write a service that gets data from another internal microservice, oh and make sure you do it in the same way the other services are implemented so that this passes code review… it can cough up slightly wrong answers to leetcode problems, but who has a job where that’s useful?
I've found it extremely helpful in writing highly repetitive code that's too complicated for a Regex find/replace. For example, I used it when writing a recursive descent parser in Rust for a hobby project.
I wrote the grammar in a comment at the top of the file, wrote and imported the AST enum, and wrote the first production. After that, I just prompted Copilot and it worked its way down the grammar, producing the parser functions one at a time. The CLion integration was able to consider the imported data structures as part of the prompt, so it even stored everything in the correct AST nodes.
For something like that, it's easy to verify that it did it correctly (through visual inspection and testing), and it allowed me to write the entire parser in about 2-3 seconds per rule.
I don’t think they mean that it can’t, just that the better way to think about the advantages it gives an experienced engineer is more along the lines of “autocomplete v2”, i.e. a keystroke-saver.
Yes because often times while I'm unable to recall the exact idiosyncratic keywords incantation that I need, copilot will retrieve them automatically. This saves me a context switch to MSDN docs or stack overflow.
It automates a great deal of boilerplate crap that you have to do especially in web frameworks such as angular.
I can picture autogen boilerplate being collected and distributed in versioned "community expansion packs" to popular languages, and I'm not looking for fragmentation like that in my tools. I really don't want my IDE involved like this.
I'd rather see Copilot used to expand existing libraries. Pulling potential additions to your own library off of Copilot would be an interesting twist on the situation. A hacktoberfest-alike based off this would be weird.
Nobody is forcing you to use it... Others like myself find it useful and to be a huge timesaver. If it doesn't benefit you, then just don't use it. Why must people be so vocal about not liking something? I get that you don't think it would be useful for yourself, but it sounds like you've never used it, yet are against it enough to come bash on it in a thread.
I think you should give it a try and see what you think about it. I was hesitant about it at first but was very surprised at how much time it could save me from having to look up things on Google. I've found it especially useful when I'm switching to a language I may be less familiar with. I can understand the basic logic of what I want to do, but would have to spend time looking up how to do it in this specific language. Orrr... I just have copilot help me out and generate such a solution.
It's obviously not going to be a tool that is applicable to everyone. Just like how many manual labour oriented contractors have a bunch of tools, each of them may have their own set of tools that slightly differs from the other person. That is okay, and there would be no need to try and bring someone else down for their choice to use a certain tool.
This is a good example of why I dread Copilot: even if Go specifically couldn't express this any more concisely, there is a language that can and Copilot's very existence makes it less likely for that other language to be used as much as it deserves.
Besides, the generated example seems to be missing code to gracefully handle the case where len(filtered) is zero. Maybe there's a precondition that prevents that from happening or maybe a division by zero is exactly what you'd want, but at face value it looks like the bot did a rush job.
Zero is gracefully handled; the mean of a zero-sized set is best represented by NaN, and this would be idiomatic in most languages' IEEE754-style handling.
Saturation is not. This is what really bugs me: If I'm going to drag in a billion GPUs of external computation (or a dependency, which is basically the same thing but with human brains), I want it to provide the hard algorithm I can't write, not the easy one I can. I am not limited by typing speed.
Agreed about saturation and the choice of variable name, but the code would trigger a division by zero and not result in NaN: https://go.dev/play/p/vYm4tSNEJ7M
(Also, in--say--Ruby and JavaScript 1.0/0.0 is Infinity and not NaN.)
Your playground link shows a build error. 0.0/0.0 at runtime will be NaN. And in basically every language, 1.0/0.0 is Infinity. But we're talking about 0.0/0.0.
Both good examples of coding where you should be thinking instead, though.
I think if anything there is far too much thinking going on here, for the tiny example I copied from the window I literally already had open with the function I was working on.
For what it's worth, Copilot (correctly) inferred a loop variable called "detection", I imagine based on similar usage earlier in the function. And there is already a conditional in place to prevent invalid operations; if I remove it I see a new suggestion:
if len(filtered) > 0 {
This tool is far from perfect, but it very much sounds like you folks haven't used it. If that's the case, I would encourage you to research it like all tooling and draw some informed conclusions about it's applicability instead of making assumptions.
Not until there's a setting that can guarantee the automcomplete is based on verifiably license-unencumbered source that is automatically tracked in some kind of sourcemap that tells me which parts of "the code I didn't write" comes from which other project and file inside that project.
Until then, copilot is a giant liability that ensures I can't use it for code that my company ends up owning, nor can I contribute code I write with it to literally any open source project because in a very real sense: I didn't write it. I just assembled it from parts unknown, and those parts may end up being lawsuits.
As a hypothetical, what would that case actually look like? I'm suing you because I have a strong belief that part of the codebase of your personal project was assembled from code I wrote and didn't license permissively, so now I'm claiming ownership?
Obviously IANAL so this is largely conjecture, but until we _actually_ see how this would play out in court, I'm leaning towards this being less of a legal issue than folks here act like. For personal projects, I'd say the likelihood of some other engineer reading your code, noticing a similarity or duplication, and dragging you to court for it is near 0.
- "you were hired to write code for us, not to use an autocomplete service that makes us liable for both copyright and patent lawsuits, I hope you like getting fired."
- "as per this project's license, we can only take code on board that you contributed under our license, but an audit shows that your PR/MRs contain tons of GPL/MIT/Whatever licensed code instead. We're going to have to back all of that out, and we're going to revoke your contributor status"
- etc.
If you don't know where the code in your autocomplete comes from (and copilot can autocomplete large swathes of code) then literally anything that comes out of "you didn't write this code" may apply. From fraud (depending on what contract you signed) to trademark infringement, to license violations, to even just simply misrepresenting your skills to an employer. As with all things, it's a sliding scale, but just because the majority of incidents will be on the bening part of the spectrum doesn't mean the litigating part doesn't exist, and that's what your legal department plans for.
Work for a big company? Good bet you're not allowed to use copilot. And depending on the company, not even "for your personal projects" because you might accidentally read someone else's license encumbered code that you would not have come up with yourself and may now open your employer up to "you stole our ideas instead of properly crediting/paying for licenses".
I would also be comfortable with a service that was willing to broadly indemnify me as the customer from copyright and patent claims arising from code generated by their service. I doubt that will ever happen either.