Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Promptr, let GPT operate on your codebase and other useful goodies (github.com/ferrislucas)
109 points by deathmonger5000 on April 4, 2023 | hide | past | favorite | 95 comments
Hi HN, I've been working on an experimental tool that helps you use GPT to work on your codebase. I'd love to improve the tool if there's interest. New ideas welcome! I think this could also be useful for experimenting with other types of recursive prompts.

It’s a little bit Swiss Army knife and a little bit skynet:

https://github.com/ferrislucas/promptr

From the README: Promptr is a CLI tool for operating on your codebase using GPT. Promptr dynamically includes one or more files into your GPT prompts, and it can optionally parse and apply the changes that GPT suggests to your codebase. Several prompt templates are included for various purposes, and users can create their own templates.



I'd be worried about the per-token-cost of using the OpenAI API when submitting an entire codebase. And, I'd probably only use it with open-source codebases.

I've been using chatblade, it has the nice feature of having a cost estimate call:

https://news.ycombinator.com/item?id=35223759

As to how to use GPT to help with coding, I wanted to be able to store the chatblade output to a file with a simple one-word call, so I wrote a wrapper that does that, and the first step was to submit this question to GPT:

> "I want to use Python's subprocess module in a script (that takes command-line arguments) to manage a call to an OpenAI API (that takes a variable amount of time to complete), and which prints output to the terminal, and I also want to tee that output to a specified file for storage. What do you recommend, from the perspective of an expert Python programmer?"

It laid out a nice template using several python modules, and I was able to write it in a single morning with a little bit of additional reference to pydocs and a few more questions about specifics, so now all the command-line queries get appended to a file with an up-to-date cost estimate attached to each one. And I'm pretty junior as far as programming goes.


Thanks, chatblade looks cool I'll check it out! I've found GPT to be a phenomenal tool for solving problems with code.


It sounds like a good idea to send your codebase through a MITM to OpenAI, both of these ideas I mean.


The files you pass to promptr are indeed sent to OpenAI. There’s no man in the middle. Privacy is important, and I’m glad you care.

The relevant code is here https://github.com/ferrislucas/promptr/blob/3ae09d1cffbb6b93...

And here: https://github.com/ferrislucas/promptr/blob/3ae09d1cffbb6b93...


I think, if I am not wrong, commentator is saying that you are the man in the middle.


Is it really a MITM if the codebase is open source and you trust him to deploy that code


Default trust is problem. How can you establish if this person is a good faith actor?


This is good for security overall... think about it, if we sent all of our stuff to the NSA, they would find security bugs and fix them for us.


NSA be like:

"Romani ite domum"

After catching you writing:

"Romanes eunt domus"


Is the size/file limit with OpenAI API different than what you can send via ChatGPT in the browser?

How much are you going to pay for example to send a 1kb, 2kb, 4kb, 8kb, 16kb source file for analysis? If you are paying per "token", a 3kb file I have with like 70 lines of code might have 2-3 tokens per line if I understand what a token is, 210 tokens? Each token costs...

https://openai.com/pricing

Ok, not as bad as I thought.


I want to see the evolution of context management tools for coding with GPT.

- implement a function, providing structural context (e.g. model, service, controller) with stubbed modules, but leave out utilities/libraries and database model - drop the structural context, and ask it to expand the stub implementation with some additional context about the database model - drop the database model context, and ask it to refactor the solution with some additional context about utilities/libraries

I think this is doable, if we can build up some utilities for context management and iterative development, GPT should start to be usable on large code bases. It could work similarly to how one person wrote an entire novel using GPT.


I have been experimenting with exactly this. I hit the context window limit while developing a small web app [1] by having GPT do all the coding.

I've tried a few things:

1. Summarize/collapse the code.

  - Use GPT to summarize code blocks into a 1 line comment.
  - Collapse all the top level code blocks into their summary line. This turns the entire codebase into a much smaller "top level map".
  - Use a ReAct pattern via langchain to have the LLM itself try and determine which collapsed blocks need to be recursively "expanded" to understand the code with respect to the user requested feature/bugfix/change/modification.
  - Feed GPT this partially expanded code base along with the user requested change.
  - Have it spit back the modified version of that partially expanded code.
  - Apply the GPT changes back to the original source files.
2. Explicitly ask GPT "which parts of the code are relevant to this change request" or "which parts of this code would need to be modified to make the needed changes", etc.

3. Again using ReAct, give GPT tools to "grep" and "cat" the code so that it can explore the codebase to find and understand relevant chunks. I've even armed it with bash in a sandbox. It started importing and running parts of the python code I was operating over.

None of these approaches have panned out fully yet. But there are promising signs.

[1] This is the small webapp that GPT wrote for me. I'm working on it mainly as a forcing function to explore these sorts of "GPT as junior developer / coding collaborator" workflows.

https://github.com/paul-gauthier/easy-chat#created-by-chatgp...


Hey, I started tinkering with this last night, your comments probably saved me a lot of time.

It's more work, but maybe language specific tooling as a first pass? I'm wondering how far you'd get by feeding it all the type information first (from lets say rustdoc as a specific example), and then asking the LLM to understand the structure of the program.

Then taking that output (which you could cache) + any source file local context information + the users request for a change.


Ya, I played with giving a ReAct loop access to some python `jedi` tools for navigating a python code base. Considered wiring up language server protocol (LSP) into the ReAct toolchain as well, but couldn't find easy bindings for that.

There's 2 pieces to this puzzle:

1. Condensing the code based to fit into the context window.

2. Getting GPT to generate good code modifications.

A big stumbling block I have encountered with all of these approaches is that when you feed GPT condensed code it tends to GENERATE similarly condensed code. It doesn't fill in the details for the new code it's supposed to be writing. Rather it just generates stubs and comments like it was shown.


Wait, you cant just reinsert the condensed code into the original source file and then have it expand it? ChatGPT will already expand (or try to) a comment into an implementation for me. I'm sorry if I'm missing something obvious here, your much further along on this than I am.


As an aside, I'd love to see the code you've got so far. How did you wire up the ReAct loop? Haystack, Langchain, or just direct API calls to OpenAI? Getting LLM's to talk to the LSP protocol directly seems like a good idea as well.


I've been mostly using langchain and the direct openai api.

None of my code for this is worth sharing. It's all quick and dirty early experiments just to test if/how GPT handles various approaches.


Have you tried passing types, function definitions, etc. into the OpenAI embeddings API? It might be possible to automatically build up the context using embedding vectors generated between the request prompt and the code.


Imagine, this integrated into software as exception handling. Instead of critical error, fail and send report. Safe input state, create error report + problem description, send to chatgpt, recompile and onwards. Software might never die, but instead "self-repair"..


Reminds me of that middleware from awhile ago that would run all your uncaught exceptions through stack overflow before presenting them. Just on even more steroids.


the question is in quality of chatgpt generated fixes.


Awesome, thanks for this!

I have been thinking a lot about the time when I can start jamming with GPT inside my dev environment and code base and this is a step closer.

Use case 3, where you define tests and it tries to give you a passing implementation is the dream.


Thanks for saying this. I feel the same way. There's something magic about telling the robot to "make the tests pass" and watching the implementation magically appear. It does surprisingly well sometimes. I think if I put some work into the prompts then things could perform better and more consistently.


I firmly believe that there will always be a horizon effect whereby there will be a solution that matches all of the tests but fails for the general case. Computers are more likely to find that test-solving solution that doesn't handle the general case because of their iterative nature, whereas humans want to solve the general case first and then add in conditions to handle the unexpected cases.


As I understand it, the proponents of TDD would argue that as long as the implementation tends towards laziness (avoiding unnecessary complexity), adding more and more tests to handle additional edge cases should cause the code to converse to solving the general case (and avoid local minima).

I suppose that the crucial piece is the laziness, including refactoring whenever possible, to save time on addressing each subsequent test case.


`expect(true).to be true`


Just want to say that I'm not going to touch any AI tools until the legalities are ironed out, and I absolutely won't be using it at work without very clear approval. AI poses a huge risk to businesses as programmers start feeding code into other companies backends without thinking and pulling out random snippets from projects with varying licenses.


Ignoring AI tools is very shortsighted IMO


Being cautious in order to keep one's IP and job safe during an economic downturn isn't shortsighted, it's playing the long game. AI isn't going anywhere, and waiting and seeing how the legalities shake out and how companies will want to consume this internally is the smart play.

Jumping onto a new technology with unknown risks and getting burned is shortsighted.


The "long game" becomes short if your organisation is out-executed by a competitor who isn't as conservative in approach.


Then I get home and think: how come we're collectively killing the biome that sustains us and degrading its ability to nurture human civilization? Then I remember I need to race to the bottom with everyone else to get that cash son!


It's the reality of the marketplace. I'd like to change it too.


You'd like to or you're going to?


I'm not sure what company you're working for where generating code that may or may not be correct and needs careful analysis is somehow increasing your coding velocity to the point where it's trumping higher level thinking or protecting your IP or not getting into expensive lawsuits. I would like to hear more.


It's not about generating code which may or may not be correct. It's about being free to use a new wave of tools which allows engineers to try many approaches and analyse results at warp speed. The engineers (and organisations) which can capture this velocity will have a huge advantage over those who do not.

I understand that it would be nice for the industry to behave within the law when it comes to IP and copyright. But historically, that isn't the case. Google v Oracle re: Java. Zenimax V Occulus. Samsung v Apple. Microsoft v Motorola. Companies will happily look the other way when it comes to IP if it means capturing more of the market.


Why do you care? You use the tools and get a competitive edge. Get rich, if you're right!


I feel like there's a big lack of understanding of how the world works here.


He's just saying he won't share the codebase owned by the company he works for on a 3rd party platform without explicit approval from whomever owns it, and he won't just willy-nilly copy-paste snippets from the AI onto said codebase without properly vetting them.

He didn't say anything about ignoring AI. Pretty reasonable if you ask me.


Beyond shortsighted.


What's beyond shortsighted? What would you call that?


You are spot on with the work comment. Posting IP you do not own to a service you may not be permitted to use is a bad idea.


Wise decision as far as I'm concerned. Unfettered AI experiments on production systems are an IP and (depending on your field) regulatory nightmare waiting to happen. Unless you have lawyer-vetted guidelines and oversight, use OpenAI products exclusively for experiments and personal projects.


If you've reduced your code base to vector weights and that's all you send to an api, I wonder how much you'd mitigate this concern?


So AI products can only be developed in AI friendly legal areas? Similar to cloning and stem cell research?


s/developed/used/ I think is more the comment.

OrgA will be 'all in' with the convenience of Copilot; OrgB will flee due to legality/litigious concerns.


I never said anything about AI development.


Thanks for sharing promptr! I will try it out.

I have also been exploring a similar pattern for using GPT as a coding collaborator:

  - Send all the (relevant) code to GPT along with a change request
  - Have it reply with all the code, modified to include the requested change
  - Automatically replace the original files with the GPT edited versions
  - Use git diff, etc to review and either accept/reject the changes.
GPT is significantly better at modifying code when following this "all code in, all code out" pattern. This pattern has downsides: you can quickly exhaust the context window, it's slow waiting for GPT to re-type your code (most of which it hasn't modified) and of course you're running up token costs. But the ability of GPT to understand and execute high level changes to the code is far superior with this approach.

I have tried quite a large number of alternative workflows. Outside the "all code in/out" pattern, GPT gets confused, makes mistakes, implements the requested change in different ways in different sections of the code, or just plain fails.

If you're asking for self contained modifications to a single function, that's all the code that needs to go in/out. On the other side of the spectrum, I had GPT build an entire small webapp using this pattern by repeatedly feeding it all the html/css/js along with a series of feature requests. Many feature requests required coordinated changes across html/css/js.

https://github.com/paul-gauthier/easy-chat#created-by-chatgp...

Another HN user has also released a command line tool along these lines called gish:

https://github.com/drorm/gish


We're thinking the same thing. You might be interested in the prompt template I used to make GPT respond in json format here: https://github.com/ferrislucas/promptr/blob/main/templates/r...

Having GPT's response in json was useful to be able to easily apply the changes GPT wants to the filesystem. GPT4 is significantly more consistent with only responding with json.

Gish looks cool!


This use case is why we need local models that work. In the future you will build a server to host your own models or pay.



Thing is gpt3.5 turbo is so cheap buying hardware to run it don’t make any sense while at the same time, won’t likely able run GPT4 where you are forced to use the server because the hardware cost for that wouldn’t make sense for personal use. This is OpenAIs strategy they need to balance the cost evelope so people are always stuck between a rock and a hard place.


It’s a loosing strategy for OpenAI.

We only need so much AI power before we can make our own improved versions by leveraging that initial AI power.

Individual developers are now creating models that run on phones and regular computers.

OpenAI cannot possibly catch up to what millions of people can do in the open.


Nah, they'll run directly on your edge device. For 90% of tasks, there will be no market for paid or self-hosted inference.


I think in 5 years maybe chatGPT can revolutionize the handling of technical debt, for example with the multidisciplinary knowledge within legal and finance on top of code and architecture rewrite all of the ancient z/os systems in the financial sector into something fitting the modern age.


I promise you that all financial and legal systems worth rewriting require a lot more context than 32K tokens.

Given the utter absence of solid test coverage, a much more likely approach is massive coverage generation via ChatGPT test generation & human audits, and then a gradual rewite by humans assisted by ChatGPT.

It's a really good tool, it's nowhere good enough to do automated rewrites. (And as long as the results matter, you'll continue to have humans in the loop for more than 5 years - if for no other reason than cleanly assigning legal blame)


Yes I agree. I work with these systems, they are so complex I think all of it would have to be in memory at once. My "dream" requires many chained miracles and that something like this would be possible at all, abolishing technical debt would only be a footnote of the effects on the world. Anyway I still think it's possible and obviously in one way or another humans will always be in the loop. I'll tell you about the current state, if I write 50 lines of code in a very important place there will be 3 months of testing.


Also it could be used to save a lot of abandonware through AI enhanced reverse engineering. This is the kind of high toil work that very few people want to do, unless it becomes a lot easier


For people looking for development tools with GPT. One of the best is CodeGPT, with already 350k+ downloads on the visual studio marketplace: https://marketplace.visualstudio.com/items?itemName=DanielSa...


I use a local LLaMA API as my code inference endpoint. No way I'm just shipping all my code to OpenAI.


You can also use GPT4All https://github.com/nomic-ai/gpt4all


I use Vicuna[0]. It's much better than GPT4All.

Vicuna is based on 13B (not 7B) and its training data includes humans chatting with GPT-4 vs GPT4All's purely synthetic dataset generated by GPT-3.5.

[0] https://github.com/lm-sys/FastChat


Thank you!

Could Vicuña be used to further fine-tune GPT4All to make it better?


I think GPT4All's inferior quality dataset would make a worse combined model than strict Vicuna. Vicuna-30B will likely be better than GPT-3.5 level and approaching GPT-4 level when it's done training, but run slow on CPU.


I don't think that the world needs yet another chatgpt proxy to mess with source code when copilot already fills the need, they have the engineers and lawyers to make it work.

What I don't see is, unique complex applications with a great UI that don't overlap with the applications like office with copilot, or copilotX.


This isn't a proxy. Say you use ChatGPT to assist you while you write code... there's a lot of copy paste action in that workflow. It gets old.

promptr gets rid of the copy paste. That's a much better developer ergonomic (IMO)


Plus your tool is open source, and direct calls to the OpenAI API are a much cheaper than Copilot. If anything Copilot's looking like the redundant one here to me. Thanks for doing Promptr, it looks great I'm going to give it a shot


Thank you! Have fun - please share if you do anything cool!


take a look at https://www.youtube.com/watch?v=s7AGkcSMiaI for a demo of what copilot is capable of, did you try copilot and copilot labs?, they have the best DX, look the interviews of nat fridman on the tool.

The code is a prompt, a foreach file execute prompt and return whatever, it can be used as a template build something useful and not redundant, but definitely not code. Don't reinvent the wheel.


Cool stuff. I recently created a similar tool that automatically fixes errors in source code: https://github.com/mherrmann/fix


The tool you linked looks really cool nice job! "What could possibly go wrong?" - I love it!


Thank you! Yours looks very nice as well :)


Consider an alternative approach: chunk up and embed your entire codebase (dramatically cheaper) and insert it into a vector store. when you go to send a query, search the store and retrieve the most relevant chunks, and those are what you send with your query.

Optionally send other specific code alongside it.

This should be similar in quality and dramatically cheaper. It's quite doable with langchain, and you can use Chroma + DuckDB to avoid having to pay for Pinecone.

You could even maintain there vector store using git diffs to only update chunks that have changed.


The problem with your suggested approach is the resulting lack of holistic context. The problem of OP's approach (direct parsing) is cost and context-window-limits. There has to be a better way.


It's searching by semantic meaning, so it should be able to find all relevant pieces. Using overlap during chunking should help too.

Using the "give it everything" method will cause it to forget most of what you're feeding it if you have a large repo anyway, right?


I don't think it will forget anything as long as everything fits in the context window, but I could totally be wrong. That's the big problem with the "give it everything" approach: if your codebase doesn't fit then it's game over. I've had success limiting what I give it to the relevant files.


Right- "forgetting" assuming a rolling context window maxed at the models max token count. "If it doesn't fit then it's game over" - assuming 8k tokens with each token being ~4 characters, that's a pretty small repo. And that's a motivator behind the similarity search approach.


Great ideas thank you!


Yeah this is the way!


The README and code examples are quite dense. To be honest, I'd rather think about my own code than try to figure out how to write Promptr commands to get GPT to do it for me. Is there any way you can simplify the syntax, or at least create some short aliases for common use cases?


Yes, I totally agree. I just wanted to get it out there. Probably a little early, but oh well.


Great to see this here. I am working on a VS Code extension that provides some nice UX to use GPT for autonomous software development. Check it out:

https://github.com/MateusZitelli/PromptMate


Thanks for dropping this here. promptr could sure use a better UX - something like what you've created. Happy to collaborate if there's any interest. Building great tooling for using LLM's to code is something that's really interesting to me.


Definitely, it is inspiring to see so many fantastic initiatives popping up. I am emailing you.


Man that’d cost a pretty penny. You have input and output token costs, likely close to dollar per call. And you could need 2/3 calls that get it right, worth it?


I inherit small- to medium- sized projects from other developers at the agency I work with and to be able to run a project through it and get this sort of analysis without having to find the entry-points and whatnot myself... easily worth the money.

It's not something I'd be using with any regularity (on the same project), but I can see where it'd be worth it, even at a dollars-per-project cost.


Depends on your scenario, I guess. If we assume an engineer’s time is worth $75/hr and it would take an hour (much more or less depending on the size and complexity of the code base) to complete one of these tasks then this looks like a bargain, right?


it likely won’t be a no brainer once you factor in testing the code and understanding the code depending on your constraints like security, performance, expandablilty etc. Not like you can run it and check in, you’d get fired doing that


How can I select a block of code in vim and pass it to this? Maybe I should build a vim plugin for this with ChatGPT.


if it supports stdin and stdout, try

    :'<,'>!promptr -m gpt4 -t refactor -p "Cleanup the code"


promptr doesn't support stdin in this way, but this is a great reason to make it do so.


It looks like this only works on JavaScript? It errors if you don't have a package.json


I'd love it if you opened an issue on the github repo. I haven't seen this happen. My guess it that maybe you have an older version of node, but I might be way off. Maybe I should dockerize this, so you wouldn't need node installed to use it.


Is this javascript only at this moment? Looking for something that can handle rails


No, it's any code and even things that aren't code. It will write and revise documentation for you, create jquery, nosql databases, and more. Basically anything a human can do with text files.

If that's not enough for you, projects like Auto-GPT give GPT-4 full autonomy to figure out how to do complex multi-step tasks beyond modifying text all on its own with only a vague goal provided. https://github.com/Torantulino/Auto-GPT


You can use this on any code (or text files) that you want. I've used this for rails, and it's great. I think you'd run into issues with more niche languages where the model hasn't had much exposure.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: