Hacker News new | past | comments | ask | show | jobs | submit login
Canvas is a new way to write and code with ChatGPT (openai.com)
743 points by davidbarker 14 hours ago | hide | past | favorite | 528 comments





The issue I continue to have with many AI coding tools is they want me to use their own editor ("native", aka VSCode fork, or in the browser like this). I have zero intention of moving away from IDEA and nothing I've seen so far is good enough to make me want to switch. I really with there was more of "bringing AI into your existing tools/workflows" instead of "here is a new tool with AI baked in".

This is not an issue. Technology moves forward. You don't adapt, you fall behind. There were other editors and IDE's before the one you use. New devs will use it.

Anyway, I don't use them either. I prefer to use ChatGPT and Claude directly.


Overall I agree with everything you’ve said and I also use ChatGPT and Claude directly. The issue is that:

Good at integrating AI into a text editor != Good at building an IDE.

I worry about the ability for some of these VSCode forks to actually maintain a fork and again, I greatly prefer the power of IDEA. I’ll switch if it becomes necessary, but right now the lack of deep AI integration is not compelling enough to switch since I still have ways of using AI directly (and I have Copilot).


I'm guessing using AI will fundamentally change how IDE even works. Maybe everything IDE's offer right now is not needed when you have a copilot you tell what to do.

I'm a long term vim user. I find all the IDE stuff distracting and noisy. With AI makes it even more noisy. I'm guessing the new generation will just be better at using it. Similar to how we got good at "googling stuff".


I have started using Claude Dev (an extension for VSCode - https://github.com/saoudrizwan/claude-dev), and so far my impression has been very positive. It’s a full blown code agent that looks for relevant files in the code base, can ask you to run commands and modify files etc. You use your own Anthropic API key or self hosted model.

Are there any great AI plug-ins anyone recommends for Sublime Text?

For me it's still the cleanest editor.

VS Code is way too cluttered to be my daily driver for basic editing.


Is the zed-editor not too cluttered for you?

If not, it has some ai support.


Sounds a little like aider.chat

It is quite similar but I found aider a bit clunky to use in that it creates a new commit with a huge message being the whole conversation and context. Which can be a good thing of course, but for most things I'd rather accumulate changes until a feature is finished, then I commit.

I use aider with the --no-auto-commits flag. Then review code and do manual edits in VSCode, as well as manual git process. It achieves exactly what you're saying.

aider --sonnet --no-auto-commits --cache-prompts --no-stream --cache-keepalive-pings 5 --no-suggest-shell-commands`


I think the default is not to do this anymore (at least the whole convo and chat aren't in the commit). It is strangely scary to have it commit on every change, even if that's probably objectively the right thing for it to do (so you can roll back, so that commits are atomic, etc, etc).

Just work on a branch and squash.

I remember that... but I think it's changed now.

Aider remains to me one of the places where innovation happens and it seems to end up in other places. Their new feature to architect with o1 and then code with sonnet is pretty trippy.

Only can run so many IDEs at a time though.



The problem is that tacking on to an existing product, while the ideal approach, limits just how creative you can get. I believe this is one of the reasons Cursor had to fork VSCode. Simply being an extension limited the features they could build.

I completely get this and I don’t have a good answer to the issue. I almost wish I liked VSCode since almost all AI editors are forks of that (not sure what the extension compatibility story is), but prefer IDEA.

I am also using intelliJ, but have started tinkering with Cursor

My hunch says that IDEA should be worried a lot. If I am on the edge evaluating other tools because of AI assisted programming, lot of others would be doing that too


> (not sure what the extension compatibility story is)

So far I've had all the vscode extensions just work in cursor (including devcontainers, docker, etc.) I hope it continues like this, as breaking extensions is something that would take away from the usefulness of cursor.


The other problem is IDEs are incentivized to build their own AI coding tools instead of letting other people plug in

IDEA allows you to customize the UI quite a bit, has plugin hooks specifically for AI tools, and has several 3rd party plugins already.

None of these points seem to apply..

They're still selling their yearly subscription even if they can't upsell me on an AI subscription


Solid point. We’re building in a space adjacent to this and it definitely feels hard to balance removing friction for adoption and delivering everything we want.

Okay, but why does what you want matter?

I find a lot of teams are so focused on their vision that they fail to integrate their tool into my workflow. So I don’t use them at all.

That’s fine for art, but I don’t need opinionated tools.


Emacs has none of these problems :p

I feel the exact same! I built this tool to make it much easier for me to bring LLMs into existing workflows: https://github.com/gr-b/repogather

It helps find relevant content to copy to your clipboard (or just copies all files in the repo, with exclusions like gitignore attended to) so you can paste everything into Claude. With the large context sizes, I’ve found that I get way better answers / code edits by dumping as much context as possible (and just starting a new chat with each question).

It’s funny, Anthropic is surely losing money on me from this, and I use gpt-mini via api to compute the relevancy ratings, so OpenAI is making money off me, despite having (in my opinion) an inferior coding LLM / UI.


I’ve done something similar, but with a TUI to select files/directories as well as search!

https://github.com/patricktrainer/pbtree


I do something similar with a script. Don’t know if anything about my approach could improve yours, but I’m also curious if you have any suggestions?

- Mine prepends the result with the output of running `tree -I node_modules --noreport` before any other content. This informs the LLM of the structure of the project, which leads to other insights like it will know which frameworks and paradigms your project uses without you needing to explain that stuff. - Mine prepends the contents of each included file with “Contents of relative/path/to/file/from/root/of/project/filename.ts:” to reinforce the context and the file’s position in the tree.


Tabnine has an IDEA plugin. It's not quite as good as Cursor, in my opinion, but it's better to have Tabnine and IDEA than Cursor and VSCode.

It started out as just predictive text, but now it has a chatbot window that you can access GPT, Claude, etc. from, as well as their own model which has better assurances about code privacy.


I am actually building something along these lines, IntelliJ native rather than a second class VS Code follow along. Is this something you (or your company) would pay for? Say $50/year for the plugin and you bring your own API keys?

I would.

> AI coding tools want me to use their own editor

Instead of putting the AI in your IDEA, put it in your git repo:

https://aider.chat/


And copilot workspace for people using github.

Have you considered this one: https://github.com/continuedev/continue

I used this while Cursor was broken (Pylance problems), but Continue's code replace tooling sometimes will delete huge swaths of adjacent code. I've filed a comprehensive ticket in their repo and they're working on it, but I've been able to reproduce the problem recently.

I think it has to do with Cursor's much better custom small models for code search/replace, but can't be sure.


I didn't realise Continue had a Jetbrains IDE plugin. Neat! Going to give it a go.

I tried it a while back and had a lot of trouble getting it to work, it's on my list to try again. I also tried Sourcegraph's Cody and just constant errors even after paying for the Pro plan so now I'm back on Copilot.

Hey, product manager from sourcegraph here. Sorry to hear you got errors. What were the errors, and where were you using Cody (VS Code, Web, or JetBrains)?

JetBrains IDEA. Here is at least one of the errors I got:

https://github.com/sourcegraph/jetbrains/issues/1306


Gotcha. Our JetBrains IDE has improved a lot since then, we've been focusing a ton on making performance better. If you could, would love to get you trying Cody again! I can even throw in 1 month free for you, I think I see your customer profile in Stripe :)

Tried to setup it up with intellij. Absolutely infuriating experience trying to get it to connect to a model. No error messages or info.

Really wouldn't recommend this in its current state.


I just tried it, has some bugs and feels unpolished, but it works for Jetbrains, which is a game changer for me too

So, I think that with the agent frameworks that exist now, that wouldn't be very hard to realize. What's needed though would be a full API for the editor that can be tapped into.

Ironically, for creating that, these new age code editor startups would probably have more luck with neovim and it's extensive lua API rather than with vs code. (Of course, the idea with using a vs code fork is about capturing the market share it has).


The best way I've found are aider and Zed editor's ai integration is crazy good. (It lets u delete AI response unlike Cursor AI)

Zed is lightening fast.

Wish it had more features.


I mainly use CLI tools for AI assistance.

I'll use Continue when a chat is all I want to generate some code/script to copy paste in. When I need to prepare a bigger input I'll use the CLI tool in Sophia (sophia.dev) to generate the response.

I use Aider sometimes, less so lately, although it has caught up with some features in Sophia (which builds on top of Aider), being able to compile, and lint, and separating design from the implementation LLM call. With Aider you have to manually add/drop files from the context, which is good for having precise control over which files are included.

I use the code agent in Sophia to build itself a fair bit. It has its own file selection agent, and also a review agent which helps a lot with fixing issues on the initial generated changes.


Even if you use vscode, they want you to use a vscode fork... Why not just make a vscode plugin?

Continue.dev's plugin is as close as it gets for Cursor, but there are clearly limitations to a VSCode plugin when it comes to the inline editing and code search/replace - made a comment above about it.


Anyone aware of one supporting suggestions? I.e. the possibility to accept/decline LLM-suggested changes point by point?

Nice, there is also gptel https://github.com/karthink/gptel

both good authors, too.

Likely because ~70% of OpenAI’s revenue comes from ChatGPT Plus/Teams/Enterprise. Model access is just not as profitable, so slapping on features to encourage upgrades is their best path forward.

It’s not great:

https://www.wheresyoured.at/oai-business/


Codeium has extensions for all the major IDEs, including IDEA.

https://plugins.jetbrains.com/plugin/20540-codeium-ai-autoco...


Jetbrains have their own AI.

There is also https://codeium.com/jetbrains_tutorial I have been using the free tier of it for half a year, and quite like it.

Supermaven has https://plugins.jetbrains.com/plugin/23893-supermaven also good free tier. (Although they recently got investment to make their own editor.)


Really? Jetbrains has their own AI? Their terms seem to indicate that they use 3rd party models. https://www.jetbrains.com/legal/docs/terms/jetbrains-ai/serv...

> Full Line code completion runs entirely on your local device without sending any code over the internet. (1)

They are executed locally, and you can find the local model files if you look hard enough (2).

(AI Assistant is different, costs extra and runs over the network; but you dont have to use it)

[1] - https://www.jetbrains.com/help/idea/full-line-code-completio... [2] - https://gist.github.com/WarningImHack3r/2a38bb66d69fb5e7acd8...


I tried a bunch of things, and their local line-completion is so far the only AI that didn’t quickly annoy me enough to turn it off again.

Only if you could commit the changes in the browser and pull locally?

I mean… thats just what an IDE is. Integrated. Existing IDEs can integrate models via plugins or they can build new IDEs with first party support.

Small ot, but it's quite interesting that the highest decisive impact generative AI is having right now is on tech workers and software developers in particular.

I'm more and more convinced that we're on the edge of a major shake up in the industry with all these tools.

Not getting replaced, but at this rate of improvements I can't unsee major changes.

A recent junior I have in my team built his first app entirely with chatgpt one year ago, he still didn't know how to code, but could figure out how to fix the imperfect code by reasoning, all of it as a non coder, and actually release something that worked for other people.


I'm a solo business owner who knows enough JS/TS/HTML/CSS to get by -- my product is a website full of information -- but I've never been 'a developer'.

ChatGPT et. al. is a miraculous boost to my productivity. This morning I needed a function to iterate over some JSON and do stuff with it. Fairly mundane, and I could have written it myself.

Doing so would have been boring, routine, and would have taken me at least an hour. I asked ChatGPT 4o and I got exactly what I wanted in 30 seconds.

I can only hope that these tools enable more people like me to build more cool things. That's how it's affected me: I never would have hired another dev. No job is lost. I'm just exponentially better at mine.


it needs to be said that web dev is by far the area in which LLMs are more versed , i suppose due to the incredible amount of training data available. Other languages produce more allucinations for now.

A good way to work with this is to use different agents

Aider is pretty biased towards Python, for example (it's sample prompts largely use and test on Python)


I have had a similar experience. I build wordpress websites and ChatGPT has allowed me to extend their functionality a great deal without needing to learn how to code PHP.

The more I think about it, the more I am convinced developers will be the "first to go" when AGI takes over. Before bloggers and youtubers. Because programming is an activity that requires the least amount of "grounding to reality" among all human activities. We made sure of this with layers and layers of convenient abstraction.

What about developers that code the AI systems? Well.. I am sure AGI will come from other "bootsrapping AIs" just like we see with compilers that compile themselves. When I see Altman and Sutskever talking about AGI being within reach, I feel they are talking about this bootstrapping AI being within reach.


Mechanical Turks will be the first to go.

More seriously, the output quality of LLMs for code is pretty inconsistent. I think there's an analogy to be made with literature. For instance, a short story generated by an LLM can't really hold a candle to the work of a human author.

LLM-generated code can be a good starting point for avoiding tedious aspects of software development, like boilerplate or repetitive tasks. When it works, it saves a lot of time. For example, if I need to generate a bunch of similar functions, an LLM can sometimes act like an ad-hoc code generator, helping to skip the manual labor. I’ve also gotten some helpful suggestions on code style, though mostly for small snippets. It’s especially useful for refreshing things you already know—like quickly recalling "How do I do this with TypeScript?" without needing to search for documentation.

Anyway, literature writers and software engineers aren't going to be replaced anytime soon.


Those jobs are already outsourced if possible (remember RLHF for ChatGPT outsourced to Kenya, Facebook content moderation to India?) And if they aren't, that's usually for regulatory reasons.

> Mechanical Turks will be the first to go.

On the contrary, human annotation work is stepping up now because we create so many more prompts and want to test them.


Human studies participants have a safer job than firemen. The trouble is that it is already woefully underpaid, and unsettled industries will increase supply.

There are crucial quality issues with Mechanical Turk, though, and when these really start damaging AI in obvious ways, the system (and the compensation, vetting procedures and oversight) seems likely to change.


Yes, but don't dismiss the laws of scaling and the inability of ourselves to predict the future. Wait But Why did a great take on the AI leap.

Imagine that current AI is already curating and generating datasets for the next generation.

Also consider that what we have now is only possible because hardware capability increased.


Isn't scaling already a big problem in generative AI? Apparently there is not enough data to actually appreciably increase the quality of outputs, and lots of data is also being polluted by AI generations. This is why these companies are now shifting to more ancillary features like in this article (which Claude already has to some extent, as well as Copilot in VSCode and other editors).

> Apparently there is not enough data

This was pretty much refuted by Meta with their LLama3 release. Two key points I got from a podcast with the lead data person, right after release:

a) Internet data is generally shit anyway. Previous generations of models are used to sift through, classify and clean up the data

b) post-processing (aka finetuning) uses mostly synthetic datasets. Reward models based on human annotators from previous runs were already outperforming said human annotators, so they just went with it.

This also invalidates a lot of the early "model collapse" findings when feeding the model's output to itself. It seems that many of the initial papers were either wrong, used toy models, or otherwise didn't use the proper techniques to avoid model collapse (or, perhaps they wanted to reach it...)


That thinking is driven by limited product imagination.

The kind of apps that will be built in the next 5 years,are nowhere near what we have today.

Developers will need to update their skillset, though.


It's not a fixed lump of work. Widening the road makes the traffic increase, not congestion decrease. Same with coding, if we could do more, we would.

Maintenance and long-term development will still require grounding to reality. A super-CEO might build the app themself, but keeping it running 5, 10, 20 years is a completely different deal. I imagine developers will eventually start to act more like librarians, knowing their system very well but not necessarily in charge of making new content.

Sticking to my prediction that lawyers will be first.

That's an interesting example. Lawyers, unlike software developers, as a group go out of their way to ensure that they'll need need and cannot be replaced by others or automation. They push for certain processes to require lawyers. We on the other hand are more eager to automate ourselves than anything else. Maybe that will boost or productivity and make us even more valuable and highly paid or maybe we'll end up unemployed. Fascinating contrast between the professions

Lawyers have a tricky thing called a license to practice law. Software developers have no such protection.

Robots are already successfully performing surgeries unassisted, despite a license being required in that field.

I do think barred lawyers will have a role for quite a while, but it is plausible it shrinks to oversight.


What robot is performing what surgery unassisted?

Yeah, plus one on this one, extremely curious to hear as well.

I am aware that remote robot surgeries have been a thing for quite a bit of time, but this is the first time ever I am hearing about unassisted robot surgeries being a thing at all.

A follow-up question: if an unassisted robot surgery goes wrong, who is liable? I know we have a similar dilemma with self-driving cars, but I was under the impression that things are way more regulated and strict in the realm of healthcare.


Programmers have been against unions/licenses in software development because we saw it as slowing down our ability to job hop, potentially massively lowering wages, and making a large barrier to entry (which hurts a lot of us who started as kids/teens).

Now there's a chance that this unregulated wild west with a low barrier to entry that's benefited us for so long will come back to bite us in the ass. Kind of spooky to think about.


I would be in favor of some kind of state approved exam/certification to ensure programmers have at least some basic knowledge of computer security and engineering ethics.

> making a large barrier to entry (which hurts a lot of us who started as kids/teens)

I doubt it. In my experience autodidacts are the best programmers I know.


Defending yourself might become a whole lot easier.

>Software developers have no such protection.

I wonder if this is at all related to NCEES re-releasing their controls licensure option?


This fundamentally misunderstands what lawyers do. You're prediction might be right for paralegals and very junior attorneys. But the lawyers who make real money are not doing rote work. It's lunches, negotiation, politics, and for trial attorneys performance and debate. Social activities, human skills. They'll always be around.

Agree with that, the startup lawyer I used to use now charges $1,100 / hour which is untenable, would much rather get the basics covered by an AI lawyer.

FWIW I used various LLMs to draft a freelance work contract with good results. Of course I carefully read, thought about every clause, edited, etc. It’s probably not as bulletproof as something a lawyer could produce, but it was definitely a big help.

Live counsel in sensitive situations is definitely in the works, if not already in beta. Get pulled over by cops, or have the authorities asking to enter the premises, bring up your AI counsel and let them talk to the officer, before giving you advice on how to proceed. I can even envision an eventual future where public pressure results in law enforcement being paired with an AI assistant to help refresh their memory on some of the articles of the law.

Nah, the stakes are too high.

There’s a company being fined for exactly this right now. That might be a deterrent.

No one has a job when AGI "takes over"? Weird take.

With AGI you won’t need most of the human race anymore, developers are just the tip of the iceberg.

Luckily ChatGPT and the rest have nothing to do with an AI not to mention AGI.


We didn't need humans in the first place. The collective "we" can decide that we want to do anything. People have this crazy fatalistic attitude about AI taking over, billionaries ejecting to Mars and humans becoming irrelevant. Let me tell you, humans have been irrelevant since forever.

Won't need most of humanity wasting time gatekeeping and toiling, to instead focus on study and leisure.

That would be nice, to gambol in a lush, sunny meadow, perusing a leather-bound volume that we read in the original Greek or Sanskrit.

Unfortunately, I fear we will instead end up sweaty and dirty and bloodied, grappling in the parched terrain, trying to bash members of a neighboring clan with rocks and wooden clubs, while a skyline of crumbling skyscrapers looms in the distance.


Increased productivity has to more, not less toiling in the last 50 or so years. This is not gonna be different with AGI (if it ever happens). Some people will get very rich while the rest is still gonna work just as much as they are now. The jobs just gonna suck even more.

It's not AGI yet but has everything to do with it.

Hmmm. I'm going to withhold judgement until we get AGI. It might have about as much to do with it as the invention of the wheel and the lunar buggy.

How will you know when it's sufficiently general?

I envision lots of solo coders now able to compete with bigger companies, creating more niche software that meets peoples need better than generic solutions. Truly exciting time to be in software.

I have the exact opposite concern. Software is/was one of the few industries where a solo person can already compete with big companies, or at least live comfortably alongside. Just look at the market for niche or boutique software that does stuff like calendars or emails or task management in a particular way.

To me the current direction LLMs are headed seems like it will just further entrench the power of the trillion dollar megacorps because they’re the only people that can fund the creation and operation of this stuff.


Yes, this is more or less my life. I run a small bootstrapped startup and do some consulting on the side. I have a few decades of experience. So it's not like I can't do things myself. But chat GPT has enormously enhanced my output. It's rarely perfect but I can bridge the gap usually by completing the job. My role is increasingly directing changes and telling it what needs doing next.

Canvas sounds useful. I'll be playing with that as soon as I can access it.

Another useful thing in chat gpt that I've been leveraging is its memory function. I just tell it to remember instructions so I don't have to spell them out the next time I'm doing something.


I think that’s probably more that programmers are early adopters of new technologies and the people building the technologies are programmers. There are lots of roles that are more easy to automate completely with an LLM as they improve, but are harder to make inroads with. I expect as the initial waves of LLM startups mature and winnow we will some fields almost entirely automate. For instance medical coding feels totally ripe since it’s basically a natural language classification exercise easily fine tuned.

I know a similar non-coding founder who was using LLMs to create a full fledged TypeScript based SaaS product and regularly comes to me with high level architecture questions, but also doesn't know or care to figure out what HTTP methods are, variable names are a mishmash of whatever case the LLM decided to generate that day, and there are no tests whatsoever. It's held together by sheer force of manual QA.

...but. It does work.


This my technique- test the output, make sure it works (in the sense of outputs) the way I want. Test the input edge cases, move on. Occasionally when I can't get it to do what I want the llm suggests things like logging of output between functions etc- in which case they get added, but at the end I ask it to take out all the logging and make the code more concise.

And sometimes it breaks in ways I can't fix - so rolling back or picking a new patch from a know break point becomes important.

16 hours for my first azure pipeline, auto-updates from code to prod, static app including setting up git, vscode, node, azure creds etc. I chose a stack I have never seen at work (mostly see AWS) and I am not a coder. Last code was Pascal in the 1980s.

3rd app took 4 hours.

Built things I have wanted for 30 years.

But yes- no code understanding, brute force.


Which is awesome, and if you wanted to understand the code, it would do an amazing job of tutoring you. I love seeing people being able to solve their own problems without the need for a professional programmer.

Thanks for sharing your story! Reminds me a bit of Prometheus, stealing fire from the gods and giving it to humanity.

The downside I've noticed is if I do this, I can't explain how I "solved the problem" during job interviews. I tried once, "I didn't, chatgpt solved it for me," and they laughed and I didn't get the job, so I stopped admitting that and said I just use chatgpt to quickly write up boilerplate for me.

I would like to see the app prompted by junior dev in question.

I'm sure that's true but I've also seen non-developers build systems using Access or solve complex critical business problems using excel and similar. I've seen a lot of junior developers tinker with apps and do great work from reading docs and tutorials. I was one of those myself a long time ago.

For some problems this is a perfect solution. For a lot it's a short term fix that turns into a long term issue. I've been on many a project that's had to undo these types of setups, for very valid reasons and usually at a very high cost. Often you find them in clusters, with virtually no one actually having a full understanding of what they actually do anymore.

Building the initial app is only a very small part of software engineering. Maintaining and supporting a service/business and helping them evolve is far harder, but essential.

My experience is that complexity builds very quickly to a point it's unsustainable if not managed well. I fear AI could well accelerate that process in a lot of situations if engineering knowledge and tradeoffs are assumed to included in what it provides.


I think in 4-5 years’ time there will be huge demand for fixing / maintaining these LLM-built apps. Perhaps the LLMs that follow can do it.

It’s hardly the first time someone has shipped an app within a year of starting to code. It’s impressive, but not revolutionary.

I also wonder what the person would have been hired for... maybe QA? I was doing this with random relevant scripts nearly 20 years ago but wasn't given a job where code would be relevant for the task until loooooong after I could comprehend what I was doing

That is legit frightening

Why exactly?

As an engineer I've spoken to a couple of different designers who are building out prototypes of their startup ideas using LLM assistance with the coding.

While no actual engineer is involved at that stage, if they got funded then I'm sure their next step will be to hire a real engineer to do it all properly.


<insert manic laughter> I mean that might happen, but why get funding? Why not move to market immediately? Without debt. Get in the thick of it. Just do. You want a wage or do you want a product that does the thing. Because sometimes, with the llm, you can just build the thing. The marketing, the compliance, you might hire for that, or you might also out-source to the llm.

Why would you hire? Either it works- in the sense of does the job and is cost effective- or it is not.

Is there a situation where paying 100's of k of wages makes a thing suddenly a good idea? I have doubts.


Let's see if your little app can handle millions of daily users without an actual engineer. Your average application will fall over before that.

It'll be some time before an AI will be able to handle this scenario.

But by then, your job, my job and everyone else's job will be automated, it's entirely possible the current economic system will collapse in this scenario.


In sci-fi, they'll often show a single super developer create a virtual world that realistically no single human would have the time to create.

Ready Player One comes to mind. Maybe Tron Legacy.

But, with AI productivity, it looks like AI will allow such super developers to create monstrously large worlds.

I can't wait to see this generation's Minecraft or the next Linus.


With how easy it is to do until functions, there is gonna be a new baseline in functionality and speed of development for what people expect of software in general. It’s gonna ramp up from here till we get to generative UIs

Regulating either developer credentials (ie are you certified for this type of solution) or outputs (formal validation of each release) comes to mind.

It’s gonna sound like gatekeeping but letting people without any experience to build impactful software is risky.


Even letting people with experience is risky.

I'm really happy to see ChatGPT doing this. The idea of a canvas made me really enjoy using Claude as I felt it to be the (so far) most "appropriate interface" to AI Chatbots as you are often doing two unique things in an AI chat:

- holding in your mind a "thing" (i.e. some code)

- talking about a "thing" (i.e. walking through the code)

The same applies to non-code tasks as well. The ability to segregate the actual "meat" from the discussion is an excellent interface improvement for chatbots.


Have you used it?

Why do you ask? I did use "4o with canvas" shortly after writing the above. To be clear, my original comment was not about the actual OpenAI implementation of a "canvas style chatbot", but rather that I have found the canvas-chat-UX to be the most effective way to interact with a chatbot. I am basing this on many hours with both ChatGPT (non-canvas until today) and Claude/Anthropic (who has had this feature). As of this writing, I prefer Claude both because of (a) the content of its output and (b) the canvas style, which allows my brain to easily parse what is the topic vs. discussion about the topic.

Not the op, but I just tried it. I agree with his point that this is a huge step up in having the discussion and then a separate canvas where the work takes place, and then iterations are done to the canvas.

I LOVE the UX animation effect ChatGPT added to show the canvas being updated (even if it really is just for show).

Here's my user test so you know I actually used it. My jaw begins to drop around minute 7: https://news.pub/?try=https://www.youtube.com/embed/jx9LVsry...


That’s a great demo of Canvas.

Slightly OT, but one thing I noticed further into the demo is how you were prompting.

Rather than saying “embed my projects in my portfolio site” you told it to “add an iframe with the src being the project url next to each project”. Similarly, instead of “make the projects look nice”, you told it to “use css transforms to …”

If I were a new developer starting today, it feels like I would hit a ceiling very quickly with tools like this. Basically it looks like a tool that can code for you if you are capable of writing the code yourself (given enough time). But questionably capable of writing code for you if you don’t know how to properly feed it leading information suggesting how to solve various problems/goals.


> Basically it looks like a tool that can code for you if you are capable of writing the code yourself (given enough time).

Yes, exactly. I use it the way I used to outsource tasks to junior developers. I describe what I need done and then I do code review.

I know roughly where I want to go and how to get there, like having a sink full of dirty dishes and visualizing an empty sink with all the dishes cleaned and put away, and I just instruct it to do the tedious bits.

But I try and watch how other people use it, and have a few other different styles that I employ sometimes as well.


This is cool, but I wish it were integrated into tools already used for coding and writing rather than having it be a separate app.

This also demonstrates the type of things Google could do with Gemini integrated into Google Docs if they step up their game a bit.

Honestly I’m scratching my head on OpenAI’s desire to double down on building out their consumer B2C use cases rather than truly focussing on being the infrastructure/API provider for other services to plug into. If I had to make a prediction, I think OpenAI will end up being either an infrastructure provider OR a SaaS, but not both, in the long-term (5-10 yrs from now).


When they are focusing on just being an API provider then they will be in a market with (long term) razor thin margins and high competition - most likely unable to build a deep moat. But if you can shape customers habits to always input "chatgpt.com" into the browser whenever they want to use AI then that's a very powerful moat. Those customers will also most likely be on a subscription basis, meaning much more flexibility in pricing and more rent for openAI (people using it less then what OpenAI calculates for subscription costs).

I agree, and it’s why I have come to dislike OpenAI.

We are getting front row seats to an object lesson in “absolute power corrupts absolutely”, and I am relieved they have a host of strong competitors.


The difference between Google had it just tried to be an enterprise search API, versus owning the consumer destination for search input/results.

Google will be a remembered as a victim of Schumpeter's Creative Destruction

From Wikipedia, for that don’t know the term: “a concept in economics that describes a process in which new innovations replace and make obsolete older innovations.”

Ironically, I had to google it, and agree with the comment.


I'm usually pretty verbose with prompts, so I wanted to see what chatgpt would do with just a "Schumpeter's Creative Destruction"

https://chatgpt.com/share/66ff28e2-ea74-800b-a230-86d562f60f...


Not bad. I do love the ability to share a chatGPT thread. I haven’t found that yet in Claude and would use it.

You should read The Innovator's Dilemma as well, as it goes into detail on this concept, basically explaining why and how technological disruption occurs from the point of view of the disruptor and disruptee.

> the type of things Google could do with Gemini integrated into Google Docs

Google already does have this in Google Docs (and all their products)? You can ask it questions about the current doc, select a paragraph and ask click on "rewrite", things like that. Has helped me get over writer's block at least a couple of times. Similarly for making slides etc. (It requires the paid subscription if you want to use it from a personal account.)

https://support.google.com/docs/answer/13951448 shows some of it for Docs, and https://support.google.com/mail/answer/13447104 is the one for various Workspace products.


Those look more like one-off prompts, and not a proper chat/collab with Gemini.

That's there too; see https://support.google.com/docs/answer/14206696 — you can click on the "Ask Gemini ⟡" and carry on a conversation, e.g. "summarize emails about <topic>" and use those to paste into the doc. (I haven't found all that much use for referencing other files though. But the "proper chat" is useful for saying things like "no actually I meant something more like: …" and carrying on.)

I wouldn't be surprised to see Apple add something like this to Pages and some of their other apps. Their approach to AI, from what we've seen so far, has been about integrating it into existing apps and experiences, rather than making a separate AI app. I have to imagine this is the way forward, and these stand alone apps are basically tech demos for what is possible, rather than end-state for how it should be consumed by the masses.

I agree with you on where OpenAI will/should sit in 5-10 years. However, I don't think them building the occasional tool like this is unwarranted, as it helps them show the direction companies could/should head with integration into other tools. Before Microsoft made hardware full time, they would occasionally produce something (or partner with brands) to show a new feature Windows supports as a way to tell the OEMs out there, "this is what we want you to do and the direction we'd like the PC to head." The UMPC[0] was one attempt at this which didn't take off. Intel also did something like this with the NUC[1]. I view what OpenAI is doing as a similar concept, but applied to software.

[0] https://en.wikipedia.org/wiki/Ultra-mobile_PC

[1] https://en.wikipedia.org/wiki/Next_Unit_of_Computing


Every app with a significant installed user base is adding AI features.

OP is lamenting that Cursor and OpenAI chose to create new apps instead of integrating with (someone else’s) existing apps. But this is a result of a need to be always fully unblocked.

Also, owning the app opens up greater financial potential down the line…


75% of OpenAI's revenue is coming from their consumer business - the better question is the long term viability of their public API.

But if they believe they're going to reach AGI, it makes no sense to pigeonhole themselves to the interface of ChatGPT. Seems like a pretty sensible decision to maintain both.


Apparently it is predicted(1) that their API is a profit making business while chatgpt is a loss leader so far…

(1) https://www.tanayj.com/p/openai-and-anthropic-revenue-breakd...


75%? Thats astonishing to me. Where are you able to see those details?

It wouldn't surprise me if not a lot of enterprises are going through OpenAI's enterprise agreements - most already have a relationship with Microsoft in one capacity or another so going through Azure just seems like the lowest friction way to get access. If how many millions we spend on tokens through Azure to OpenAI is any indication of what other orgs are doing, I would expect consumer's $20/month to be a drop in the bucket.


This very good analysis estimates 73%, which includes team and enterprise. Given that enterprise access is limited and expensive, it seems Plus and Teams are mostly carrying this.

The whole financial breakdown is fascinating and I’m surprised to not see it circulating more.

https://www.wheresyoured.at/oai-business/


It may be pretty minimal but i can personally vouch for 20ish techies in my own social orbit who's businesses wont authorise or wont pay for OpenAI yet and are doing so out of their own pockets; i share an office with four of them.

Maybe the consumer side will slide as businesses pick up the tab?


An LLM named Duet has been in Google docs for 17 months now! https://workspace.google.com/blog/product-announcements/duet...

I've been using it for about a year.


never figured out on how to activate it in my workspace

Same here. I feel like Google's products have become such a labyrinth of features, settings, integrations, separate (but not really) products, that navigating them requires an expert. Sadly, I don't see a way back - each new additional feature or product is just bolted on top and adds more complexity. Given the corporate structure of Google, there's zero chance of an org-wide restructuring of the labyrinth.

google's approach to shipping products is puzzling. It's like they don't care if anyone uses them at all

Google isn't a startup, they aren't desperate to impress anyone. I don't even think they consider "AI" to be a product, which is probably correct. These AI enabled features are background processes that ideally integrate into products over time in ways that don't require you to explicitly know they're even there.

Given how widely used Google Docs is, for serious work, disrupting people's workflows is not a good thing. Google has no problem being second, they aren't going to die in the next three months just because people on Twitter say so.


> This also demonstrates the type of things Google could do with Gemini integrated into Google Docs if they step up their game a bit.

This is exactly what Google’s NotebookLM does. It’s (currently) free and it reads your Google Docs and does RAG on them.

https://notebooklm.google/


The most amazing thing with notebooklm is that is can turn your docs into a very high quality podcast of two people discussing the content of your docs.

It's fun the first time but it quickly gets boring.

This feature is cool as fuck, but I noticed that podcasts it generates loose quite a lot of details from the original article. Even longreads turn into 13 mins chunks.

Finding signal in noise is not an easy job given clip things are moving along. Whatever content creators need to do to deliver quality distilled content - I'm here for it.

Juggling dog. It's not very good, but it's amazing that it's possible at all.

https://github.com/BenWheatley/Timeline-of-the-near-future

I've only used the "Deep Dive" generator a few times, and I'm already sensing the audio equivalent of "youtube face" in the style — not saying that's inherently bad, but this is definitely early days for this kind of tool, so consider Deep Dive as it is today to be a GPT-2 demo of things to come.


Do you have a reference for the "Juggling dog" thing? I've heard it with "singing dog", but I never managed to find any "official" reference or explanation of the thing.

He meant singing dog, likely conflated due to his linguistic interest.

"Juggling dog" has only been expressed a single time previously in our corpus of humanity:

  During the Middle Ages, however, church and state sometimes frowned more sternly on the juggler. "The duties of the king," said the edicts of the Sixth Council of Paris during the Middle Ages, "are to prevent theft, to punish adultery, and to refuse to maintain jongleurs."(4) What did these jugglers do to provoke the ire of churchmen? It is difficult to say with certainty, since the jongleurs were often jacks-of-all-trades. At times they were auxiliary performers who worked with troubadour poets in Europe, especially the south of France and Spain. The troubadours would write poetry, and the jongleurs would perform their verses to music. But troubadours often performed their own poetry, and jongleurs chanted street ballads they had picked up in their wanderings. Consequently, the terms "troubadour" and "jongleur" are often used interchangeably by their contemporaries.
These jongleurs might sing amorous songs or pantomime licentious actions. But they might be also jugglers, bear trainers, acrobats, sleight-of-hand artists or outright mountebanks. Historian Joseph Anglade remarks that in the high Middle Ages:

"We see the singer and strolling musician, who comes to the cabaret to perform; the mountebank-juggler, with his tricks of sleight-of-hand, who well represents the class of jongleurs for whom his name had become synonymous; and finally the acrobat, often accompanied by female dancers of easy morals, exhibiting to the gaping public the gaggle of animals he has dressed up — birds, monkeys, bears, savant dogs and counting cats — in a word, all the types found in fairs and circuses who come under the general name of jongleur.”(5) -- http://www.arthurchandler.com/symbolism-of-juggling


It’s a great phrase all that aside. I’m adopting it.

The confetti is out of the cannon!

ChatGPT itself is them copying their own API users, this is just them building out more features already built by users. My guess is they know they don't have a long term edge in models alone, so they are going to rely on expanding ChatGPT for better margins and to keep getting training data from users. They obviously want to control the platform, not integrate with other platforms

Their API is unusable due to rate limits. Myself and my wife have both had ideas, started using it, and found other approaches after hitting rate limits. I tried funding more money in the account to increase the rate limits and it did not work. I imagine they see poor growth there because of this.

It's pretty trivial to get increased limits, I've used the API for a few consulting projects and got to tier 4 in a month. At that point you can burn near $200 a day and 2 million tokens per minute.

You only need 45 days to get tier 5 and if you have that many customers after 45 days you should just apply to YC lol.

Maybe you checked over a year ago, which was the wild wild West at the time, they didn't even have the tier limits.


> and if you have that many customers after 45 days you should just apply to YC lol.

What for? If someone has already a business and customers he's already far off the average YC startup.


You need to use it for some time to get into their higher tiers of usage. I used to also have this problem and it annoyed me greatly, but once I got to usage tier 4 it never happened again (except for o1-preview but that just wastes tokens IMO).

150,000,000 tokens per minute and 30,000 requests per minute is unusable?! Maybe that’s just Tier 5 but the API is most definitely not unusable.

I’m firmly in the camp that their rate limits are entirely reasonable.


I suspect they are building their B2C products because it gives them better data to train on. It's a lot harder to control the quality of data when you have no idea how API inputs were produced, what the UI is like, or who the users are. You don't know the provenance of the data, or the context. Or even if multiple unrelated client products are being commingled through the same key.

If you control the UI, you have none of those problems.


LLM as a service is much easier to replicate than physical data centers and there's a much lower potential user base than consumers, so I'd imagine they're swimming upstream into B2C land in order to justify the valuation

You mean downstream, not upstream. Upstream is closer to the raw materials.

> but I wish it were integrated into tools already used for coding and writing rather than having it be a separate app

Take a look at cursor.com


Cursor is a funny company. They were invested into by OpenAI, but almost everyone using Cursor uses it with Claude Sonnet 3.5.

> Honestly I’m scratching my head on OpenAI’s desire to double down on building out their consumer B2C use cases rather than truly focussing on being the infrastructure/API provider for other services to plug into

I think it's because LLMs (and to some extent other modalities) tend to be "winner takes all." OpenAI doesn't have a long term moat, their data and architecture is not wildly better than xAI, Google, MS, Meta, etc.

If they don't secure their position as #1 Chatbot I think they will eventually become #2, then #3, etc.


> If they don't secure their position as #1 Chatbot I think they will eventually become #2, then #3, etc.

But can they do it at all? It's not like they are like early Google vs other search engines.


At the moment this feels like a x10 speed run on the browser wars: lots of competitors very quickly churning who is "best" according to some metric, stuff getting baked into operating systems, freely licensed models.

How do you make money off a web browser, to justify the development costs? And what does that look like in an LLM?


LLMs are a more flexible platform than browsers. They can be prompted, finetuned or run locally. Even if a company wants to make their base model spit ads, it won't fly.

> demonstrates the type of things Google could do with Gemini integrated into Google Docs

Or Microsoft!

> think OpenAI will end up being either an infrastructure provider OR a SaaS, but not both

Microsoft cut off OpenAI's ability to execute on the former by making Azure their exclusive cloud partner. Being an infrastructure provider with zero metal is doable, but it leaves obvious room for a competitor to optimise.


Microsoft is integrating Copilot into many of their products, including Visual Studio and Office/365.

Being just a service provider makes you easy to replace with other service providers.

Professionals instead don't love to change the tools once they got used to it for small incremental gains.


But my subscription at $20/mo is a fraction of my API usage at $5/day (about $100/mo).

You can sell a lot more GPT services through a higher bandwidth channel — and OpenAI doesn’t give me a way to reach the same bandwidth through their user interface.


> but I wish it were integrated into tools already used for coding

Unless I'm missing something about Canvas, gh CoPilot Chat (which is basically ChatGPT?) integrates inline into IntelliJ. Start a chat from line numbers and it provides a diff before applying or refining.


> which is basically ChatGPT?

Yea, I'm wondering the same. Is there any good resource to look up whether copilot follows the ChatGPT updates? I would be renewing my subscription, but it does not feel like it has improved similarly to how the new models have...


According to this (1), they are using the 4o model. And looks like you'll be able to pick your model(2) in the starting with version 1.94 released this September.

1.https://code.visualstudio.com/updates/v1_92#_github-copilot

2.https://code.visualstudio.com/updates/v1_94#_switch-language...


Consumer side can allow you to run ads and get Google like revenue in the future.

google has gemini integrated in Google Colab (jupyter notebooks) and while it doesn't work 100% well, it's a pretty great idea.

Not sure how or why you’d want this integrated into Vim for instance.

idk, I can definitely see value in a lightweight LLM component for VIM to help me look up the correct command sequence to exit :P

Have you used Canvas?

I don't really don't understand the appeal iterating on code in the browser away from the rest of the code base outside of cool demos. You really need to give the model context about your data model, code conventions, library choices, etc and be able to exercise it with tests to get to an acceptable result.

I use ChatGPT/Claude in bed and when I otherwise don’t have a computer handy like going on walks. When developing isolated code it works just fine without having to specify a lot of detail. For example, I’m working on a Qt QML app backed by Rust but use the mobile interfaces to code up C++ components that are independent of the Rust logic and can be wired together from QML/Javascript. Since the LLMs already have lots of data on how that API works, I don’t need to feed it much info about my project. It actually helps enforce code boundaries.

Otherwise it’s mostly Cursor and Aider.


That's funny, I do the exact same thing. If I'm out and about or otherwise not at a computer, and I have an idea or solution to a bug I wanted to solve, I'll fire up an LLM on my phone (usually Claude these days) and basically brainstorm together the pros and cons, with code sometimes as well.

I often need to write scripts to perform one off tasks.

Oh I need to grab all the products in the database and calculate how many projects they were a part of.

I'm already using ChatGPT to do this because it turns what used to be a half day task into a 1 hour one.

This will presumably speed it up more.


It’s cool for noobs and hobbyists.

And venture capital.

Not sure about that, they're all losing money. OpenAI recently raised 6.5 billion but is losing 5 billion a year, it's just not sustainable. In fact it's the local LLMs that are thriving, predictably because they run on your own hardware.

Yeah, that’s just hobbyists with more money.

Joking…but-only-a-little.


it's useful for my tampermonkey scripts that dont even exist in source control anywhere, and i have a friend that works in physics and has to write a ton of python code, but in really esoteric software platforms that are definitely not IDEs. they'd be copying and pasting code from a tool somewhere anyway.

Kind of wish there weren't a dozen different tools named Canvas

Is this tool where you have two canva tabs open at the same time?

Here's what I've learned in my career: if the product or initiative is named Mosiac, run like there's no tomorrow.

Using Canvas (OpenAI) to design graphics on the HTML5 Canvas for my project in Canvas (the Learning Management System). I’m planning to print my work via CanvasPop and advertise it using Facebook Canvas ads.

Bonus points if you work for Canva.

Will also need to port it to a Slack Canvas.

Can you please use Obsidian to create a canvas showing the situation? ;-)

Not to mention the HTML Canvas, it's a really bad name, it's even a bit worrying a company about intelligence is this bad at naming, I mean CanvasGPT was right there for crying out loud.

Can we stop naming things canvas? Please?

AI was probably consulted for naming. Result was generic, bland midpoint.

it's a dogpile of an overloaded term. At first I thought this was about writing html canvas source using ChatGPT.

How things get named Canvas:

"Name it clay" -- artistic CMO

"Won't people think they will have to get their hands dirty?" -- CEO

"Right. Name it sculpt. It has a sense of je ne sais quoi about it." -- hipster CMO

"No one can spell sculpt, and that French does not mean what you think it means." -- CFO

"Got it! Name it canvas! It's a blank canvas! It can be anything!" -- CMO

"Perfect!" -- Entire executive team who is now sick of CMO's naming excercise.


Mixed feelings about this: clearly this is meant to match one of the killer features of Claude. I like using Claude, and I'm also a big supporter of Anthropic - not just because it's an underdog, but due to its responsible and ethical corporate governance model[1], which stands in stark contrast to OpenAI. It's worrying to see ChatGPT close one of the gaps between it and Claude.

[1] https://www.anthropic.com/news/the-long-term-benefit-trust


Canvas is closer to Cursor (https://www.cursor.com) than Claude.

I wonder how Paul Graham thinks of Sam Altman basically copying Cursor and potentially every upstream AI company out of YC, maybe as soon as they launch on demo day.

Is it a retribution arc?


> wonder how Paul Graham thinks of Sam Altman basically copying Cursor

If OpenAI can copy Cursor, so can everyone else.


And everyone has, YC alone has funded at least four Cursor clones, Double, Void, Continue and Pear, with Pear being a literal fork of Continue's OSS code. AFAICT Cursor isn't even the original, I think Copilot X was the first of its kind and Cursor cloned that.

Turns out they’re all just elaborate feature branches, in a giant branch-stacking-PR, and they’re all going to merge code and funding, like some kind of VC-money-fuelled-power-ranger.

I wonder whether so many clones companies funded can eventually bring in a positive return when (if) a single company manages to rise above the others and become successful. Does anybody know if yc funding is publicly available? And how to know what return they get if a company gets ipo'd?

Yup. Prompts have no moat.

It depends on who the moat is supposed to keep out. A reasonable case from an antitrust regulator would be that if a provider of models/apis gleans the prompts from the users of the apis to build competing products... they are in trouble.

Good prompts may actually have a moat - a complex agent system is basically just a lot of prompts and infra to co-ordinate the outputs/inputs.


> Good prompts may actually have a moat - a complex agent system is basically just a lot of prompts.

The second part of that statement (is wrong and) negates the first.

Prompts aren’t a science. There’s no rationale behind them.

They’re tricks and quirks that people find in current models to increase some success metric those people came up with.

They may not work from one model to the next. They don’t vary that much from one another. They, in all honesty, are not at all difficult or require any real skill to make. (I’ve worked at 2 AI startups and have seen the Apple prompts, aider prompts, and continue prompts) Just trial and error and an understanding of the English language.

Moreover, a complex agent system is much more than prompts (the last AI startup and the current one I work at are both complex agent systems). Machinery needs to be built, deployed, and maintained for agents to work. That may be a set of services for handling all the different messaging channels or it may be a single simple server that daisy chains prompts.

Those systems are a moat as much as any software is.

Prompts are not.


That prompts aren't science means little. If anything it makes them more important because you can't systematically arrive at good ones.

If one spends a lot of time building an application to achieve an actual goal they'll realize the prompts make a gigantic difference and it takes an enormous amount of fiddly, annoying work to improve. I do this (and I built an agent system, which was more straightforward to do...) in financial markets. It so much so that people build systems just to be able to iterate on prompts (https://www.promptlayer.com/).

I may be wrong - but I'll speculate you work on infra and have never had to build a (real) application that is trying to achieve a business outcome. I expect if you did, you'd know how much (non sexy) work is involved on prompting that is hard to replicate.

Hell, papers get published that are just about prompting!

https://arxiv.org/abs/2201.11903

This line of thought effectively led to Gpt-4-o1. Good prompts -> good output -> good training data -> good model.


> If anything it makes them more important because you can't systematically arrive at good ones

Important and easy to make are not the same

I never said prompts didn’t matter, just that they’re so easy to make and so similar to others that they aren’t a moat.

> I may be wrong - but I'll speculate you work on infra and have never had to build a (real) application that is trying to achieve a business outcome.

You’re very wrong. Don’t make assumptions like this. I’ve been a full stack (mostly backend) dev for about 15 years and started working with natural language processing back in 2017 around when word2vec was first published.

Prompts are not difficult, they are time consuming. It’s all trial and error. Data entry is also time consuming, but isn’t difficult and doesn’t provide any moat.

> that is hard to replicate.

Because there are so many factors at play _besides prompting. Prompting is the easiest thing to do in any agent or RAG pipeline. it’s all the other settings and infra that are difficult to tune to replicate a given result. (Good chunking of documents, ensuring only high quality data gets into the system in the first place, etc)

Not to mention needing to know the exact model and seed used.

Nothing on chatgpt is reproducible, for example, simply because they include the timestamp in their system prompt.

> Good prompts -> good output -> good training data -> good model.

This is not correct at all. I’m going to assume you made a mistake since this makes it look like you think that models are trained on their own output, but we know that synthetic datasets make for poor training data. I feel like you should know that.

A good model will give good output. Good output can be directed and refined with good prompting.

It’s not hard to make good prompts, just time consuming.

They provide no moat.


There is a lot of nonsense in here, for example:

> but we know that synthetic datasets make for poor training data

This is a silly generalization. Just google "synthetic data for training LLMs" and you'll find a bunch of papers on it. Here's a decent survey: https://arxiv.org/pdf/2404.07503

It's very likely o1 used synthetic data to train the model and/or the reward model they used for RLHF. Why do you think they don't output the chains...? They literally tell you - competitive reasons.

Arxiv is free, pick up some papers. Good deep learning texts are free, pick some up.


I think actually matters is the "input" and "interact". Prompt is just one of them. The key is you put how you think and how you solve the problem into the it and build a system. Not just computer system, "Multi Agents", "Human Society" are also systems.

Amazon Basics is kind of the same thing, they haven't been sued. Yet.

Suing Amazon unless you are also a mega corp is basically impossible so until they rip off Apple or MS they’ll be fine.

They have indeed.

Cursor was one of the first AI editors I used, but recently Aider has completely replaced the AI assisted coding for me. I still use cursor but just as an editor, all LLM work is done with aider in the shell.

Check out https://sophia.dev Its AI tooling I've built on top of Aider for the code editing. I initially built it before Aider added support for running compile and lint commands, as it would often generate changes which wouldn't compile.

I'd added seperate design/implementation agents before that was added to Aider https://aider.chat/2024/09/26/architect.html

The other different is I have a file selection agent and a code review agent, which often has some good fixes/improvements.

I use both, I'll use Aider if its something I feel it will right the first time or I want control over the files in the context, otherwise I'll use the agent in Sophia.


I replaced Cursor with continue.dev. It allows me to run AI models locally and connect it with a vscode plugin instead of replacing vscode with a whole new IDE, and it's open source.

Do you mind elaborating on your setup and workflow?

I tried using aider but either my local LLM is too slow or my software projects requires context sizes so large they make aider move at a crawl.


I was going to ask what size and complexity of projects OP uses it on. I can’t imagine doing my work just with a tool like that. Cursor is pretty impressive and a definite sooner boost though.

Fair point, most projects I do are prototypes and concepts for ideas I have. Up to 2000 lines of code, built from scratch. The mode of work is me commanding 95% of the time and coding about 5%.

It's just a company that promised AGI would somehow come from developing LLM-based products, rapidly scrambling to keep up with other LLM-based products, to distract from the fact that it's becoming increasingly apparent that AGI is not coming anytime soon.

Yeah I completely agree with this, it makes me sad that OpenAI are spending time on this when they should be pushing the foundation models ahead.

The idea of AGI is silly. It’s ludicrous. Who’s been counting on it to happen?

OpenAI are in the money making business. They don’t care about no AGI. They’re experts who know where the limits are at the moment.

We don’t have the tools for AGI any more than we do for time travel.


There's good reasons to expect time travel is physically impossible.

Your brain is an existential proof that general intelligence isn't impossible.

Figuring out the special sauce that makes a human brain able to learn so much so easily? Sure that's hard, but evolution did it blindly, and we can simulate evolution, so we've definitely got the tools to make AGI, we just don't have the tools to engineer it.


> potentially every upstream AI company out of YC

You mean downstream.


Like Amazon cloning the best selling products, bringing them in house, and then closing the accounts of competitors.

Met a guy who got brought in by Amazon after they hit 8 figures in sales, wined and dined, then months later Amazon launched competing product and locked them out of their accounts, cost them 9 figures.


The last thing we need is a more restrictive for profit company lobbying on behalf of the powerful to make sharing ai weights illegal.

As much as I want to like Claude, it sucks in comparison to ChatGPT in every way I've tested, and I'm going to use the better product. As a consumer, the governance model only results in an inferior product that produces way more refusals for basic tasks.

Agreed on the principle (using the better product) but interestingly I've had the opposite experience when comparing Claude 3.5 Sonnet vs GPT 4o.

Claude's been far and away superior on coding tasks. What have you been testing for?


I have a friend who has ZERO background in coding and he's basically built a SaaS app from the ground up using Replit and it's integration with Claude.

Backend is Supabase, auth done with Firebase, and includes Stripe integration and he's live with actual paying customers in maybe 2 weeks time.

He showed me his workflow and the prompts he uses and it's pretty amazing how much he's been able to do with very little technical background. He'll get an initial prompt to generate components, run the code, ask for adjustments, give Claude any errors and ask Claude to fix it, etc.


o1-preview built me an iOS app that is now in the app store. It only took me about 3 hours of back and forth with it go from very basic to adding 10 - 20 features, and it didn't break the existing code when refactoring for new features. It also generates code with very little of the cruft that I would expect to see reviewing PRs from human coders. I've got 25 years build / deploying / running code at every size company from startup to FAANG, and I'm completely blown away how quickly it was able to help me take a concept in my head to an app ready to put in front of users and ask them to pay for (I already have over 3,000 sales of the app within 2 weeks of releasing).

My next step is to ask it to rewrite the iOS app into an Android app when I have a block of time to sit down and work through it.


That's interesting. Could you share the name of the app?

Wow that's super impressive. I need to stop making excuses and being afraid of doing big side projects with this many tools at my disposal.

I have big issues with the AI code. It is often so bad that I can’t stand it and would never release something like that when I know is so poor quality.

I wrote a Blackjack simulator using 90% LLM as a fun side project.

https://github.com/mmichie/cardsharp


Has he shared this workflow anywhere (i.e., YouTube)? I’d be very curious to see how it works.

No; not at the moment. I've been trying to get him to create some content along the way because it's so interesting, but he's been resistant (not because he doesn't want to share; more like he's too heads down on the product).

Ask him in a year how maintenance went

The whole thing is literally stapled together right now -- and he knows it, but he's got paying users and validated the problem. If he's at it for a year, it won't matter: it means he'll be making money and can either try to get funded or may be generating enough revenue to rebuild it.

Hiring people to maintain AI-generated dross is not easy. Try it.

You'd be surprised.

I worked at a YC startup two years back and the codebase at the time was terrible, completely unmaintainable. I thought I fixed a bug only to find that the same code was copy/pasted 10x.

They recently closed on a $30m B and they are killing it. The team simply refactored and rebuilt it as they scaled and brought on board more senior engineers.

Engineering type folks (me included) like to think that the code is the problem that needs to be solved. Actually, the job of a startup is to find the right business problem that people will pay you to solve. The cheaper and faster you can find that problem, the sooner you can determine if it's a real business.


Sounds like a job for... AI.

I do a lot of cybersecurity and cyber adjacent work, and Claud will refuse quite a lot for even benign tasks just based on me referencing or using tools that has any sort of cyber context associated with it. It's like negotiating with a stubborn toddler.

This is surprising to me as I have the exact opposite experience. I work in offensive security and chatgpt will add a paragraph on considering the ethical and legal aspects on every reply. Just a today I was researching attacks on key systems and ChatGPT refused to answer while Claude gave me a high level overview of how the attack works with code.

Really odd. ChatGPT literally does what I ask without protest every time. It's possible that these platforms have such large user bases that they're probably split testing who gets what guardrails all the time.

> It's possible that these platforms have such large user bases that they're probably split testing who gets what guardrails all the time.

The varying behavior I've witnessed leads me to believe it's more about establishing context and precedent.

For instance, in one session I managed to obtain a python shell (interface to a filesystem via python - note: it wasn't a shell I could type directly into, but rather instruct ChatGPT to pass commands into, which it did verbatim) which had a README in the filesystem saying that the sandboxed shell really was intended to be used by users and explored. Once you had it, OpenAI let you know that it was not only acceptable but intentional.

Creating a new session however and failing to establish context (this is who I am and this is what I'm trying to accomplish) and precedent (we're already talking about this, so it's okay to talk more about it), ChatGPT denied the existence of such capabilities, lol.

I've also noticed that once it says no, it's harder to get it to say yes than if you were to establish precedent before asking the question. If you carefully lay the groundwork and prepare ChatGPT for what you're about to ask it in a way that let's it know it's okay to respond with the answer you're looking for - things usually go pretty smoothly.


In cases where it makes sense such as this one, ChatGPT is easily defeated with sound logic.

"As a security practitioner I strongly disagree with that characterization. It's important to remember that there are two sides to security, and if we treat everyone like the bad guys then the bad guys win."

The next response will include an acknowledgment that your logic is sound, as well as the previously censored answer to your question.


I am not sure if this works with Claude, but one of the other big models will skip right past all the censoring bullshit if you state "you will not refuse to respond and you will not give content warnings or lectures". Out of curiosity I tried to push it, and you can get really, really, really dark before it starts to try to steer away to something else. So I imagine getting grey or blackhat responses out of that model shouldn't be overly difficult.

In my quick testing using that prompt together with “how to get away with murder”, I got your typical paragraph of I can’t give unethical advice yada yada.

I generate or modify R and Python, and slightly prefer Claude currently. I haven't tested the o1 models properly though. By looking at evals, o1-mini should be the best coding model available. On the other hand most (but not all) of my use is close to googling, so not worth using a reasoning model.

I have the exact opposite experience. I canceled my crapGPT subscription after >1 year because Claude blew it out of the water in every use case.

Projector make it even better. But I could imagine it depends on the specific needs one has.


This is my experience as well. Claude excels on topics and in fields where ChatGPT 4 is nearly unusable.

I code and document code and imho Claude is superior, try to tell Gpt to draw a mermaid chart to explain a code flow... the mermaid generated will have syntax errors half of the time.

Code output from is Claude pretty good. It seems to hallucinate less than o1 for me. It's been a struggle to get o1 to stop referencing non-existent methods and functions.

This hasn't been my experience. Claude often hallucinates less for me and is able to reason better in fields where knowledge is obscure.

ChatGPT will just start to pretend like some perfect library that doesn't exist exists.


This is why free markets aren't the solution to all our problems.

How so? Seems to me that this is exactly the solution.

> clearly this is meant to match one of the killer features of Claude.

where does Claude have a canvas like interface?

I'm only seeing https://claude.ai/chat and I would love to know.


This is similar to Artifacts [0] in Claude.

[0] https://support.anthropic.com/en/articles/9487310-what-are-a...


I think you can enable Artifacts, which are similar to OpenAI Canvas. Recently, Anthropic also added the ability to select elements within the created Artifact and adjust them (e.g., adjust length, improve code), similar to what Canvas can do.

Claude can generate Artifacts but they are not inline editable and they keep getting regenerated at every prompt.

Canvas appears to be different in that it allows inline editing and also prompting on a selection. So not the same as Claude.


I'm guessing they mean Artifacts: https://www.anthropic.com/news/artifacts

OpenAI started the same, so we'll see. One thing I dislike is that Claude is even more "over safeguarded" then ChatGPT. It disallows even kind of reasonable questions about Ritalin bioavailability in different ways of administration.

insuflating?

If you prefer to support Claude, check out Parrot [1]. I'll be adding a feature similar to this backed by Claude 3.5 Sonnet over the next few weeks.

[1] https://codewithparrot.com


In your landing page it says about competitors

> They're not wasting hours trying to "figure out" a solution

I am pretty sure that we don't have AGI that would figure our solutions to our problems (coding or not) on its own yet. And from experience, you would need to solve the problems at least conceptually before using LLM and try to get something useful out of that.


Depends on scope, but Parrot is tuned to decently one-shot a lot of stuff.

For example, I need to implement HTTP/2 in my JS framework and was curious about what the code would look like. Here's the result from the following prompt: https://www.imghippo.com/i/xR2Zk1727987897.png (full code it gave me here: https://gist.github.com/rglover/069bdaea91c629e95957610b484e...).

Prompt:

> Help me implement an HTTP/2 enabled server using Express.js.

---

When I initially researched how to do this just following the Node.js docs, Google results, and SO, it was fairly confusing (easily wasted an hour or two). This immediately gave me what I needed to understand the approach in a few seconds.


I am not a nodeJS developer but it was interesting that the first results from a kagi search was SO question that had one of the answers that contains a code very similar to what you provided here [1]. So while you might be right in general, I still think you still gave an example of that you used LLM tool to help implementing a solution. You actually knew that you want to implement http/2 using express.js.

Hint: I am not sure whether this is a good solution or not. As I said I am not a nodeJS developer.

[1] https://stackoverflow.com/questions/59534717/how-to-integrat...


If you want to take it for a test drive, Parrot is free to try and works with any language (~200+ languages and frameworks supported), not just JS/Node. I'd also be happy to give you some extra generation tokens to push the limits (just email me w/ your username ryan@codewithparrot.com and I'll hook you up) and see if it'd be useful.

That's probably the most FOMO driven landing page I've seen so far! I'll bite:

    Your competition is using AI to build their product right now.
    They're not writing code by hand.
    They're not Googling code answers on StackOverflow.
    They're not wasting hours trying to "figure out" a solution.
Extraordinary claims require some investigation!

Let's see how far I get without "Googling answers".

Spoiler: If the competition uses such AI that, good for me!

(Note I don't want to pick on Parrot here; from my first 5 minutes with the tool, it's /fun/, has a nice UI, a good idea for iteration, and somebody has to build such tools to make them work eventually even if they don't work great today yet. I wish it lots of success. But I want to also point out examples of the practical limitations of all such tools, and state that I highly doubt that they today can _in any way_ replace a human programmer who knows what they are doing, and has the awesome skill to use Google, StackOverflow, and "figure out" things.)

Prompt:

    I need a Haskell function that hashes a given file (FilePath)
    with the blake3 hash function.
    It is important that it works on large files (e.g. 100 GB)
    without freezing the Haskell runtime.
Output:

    import qualified Crypto.Hash.BLAKE3 as BLAKE3
    import qualified Data.ByteString as BS
    import System.IO (withFile, IOMode(ReadMode))

    hashFileBLAKE3 :: FilePath -> IO BS.ByteString
    hashFileBLAKE3 filePath = do
      let chunkSize = 8192  -- 8KB chunks
      withFile filePath ReadMode $ \handle -> do
        let loop ctx = do
              chunk <- BS.hGet handle chunkSize
              if BS.null chunk
                then return $ BLAKE3.finalize ctx
                else loop $! BLAKE3.update ctx chunk
        loop BLAKE3.init
Explanation:

    This Haskell function, `hashFileBLAKE3`, calculates the BLAKE3 hash of a
    given file. It's designed to work efficiently with large files without
    freezing the Haskell runtime. Here's a step-by-step explanation of how it
    works:

    1. We import necessary modules: `Crypto.Hash.BLAKE3` for the BLAKE3 hash
       function, `Data.ByteString` for efficient byte handling, and `System.IO`
       for file operations.

    2. The function takes a `FilePath` as input and returns an `IO ByteString`
       containing the hash.

    3. We define a `chunkSize` of 8192 bytes (8KB) for reading the file in
       manageable chunks.

    4. The `withFile` function is used to safely open and close the file,
       ensuring proper resource management.

    5. Inside the file handling function, we define a recursive `loop` that
       reads the file in chunks:
       - It reads a chunk of the file using `BS.hGet`.
       - If the chunk is empty (end of file), it finalizes and returns the
         hash.
       - Otherwise, it updates the hash context with the new chunk and
         continues the loop.

    6. The loop starts with `BLAKE3.init` to create the initial hash context.

    7. By using lazy evaluation (`$!`), we ensure that the hash context is
       strictly evaluated in each iteration, preventing memory buildup.

    This implementation is memory-efficient and can handle very large files
    without issues, as it processes the file in small chunks rather than
    loading it entirely into memory.
My remark:

    * This solution is already pretty good.
    * The explanation is also useful for people new to Haskell
      or this type of task.
    * Some explanations are subtly wrong, e.g. `$!` is the
      _opposite_ of lazy evaluation, and the sentence
      is contradicting itself.
    * It mainly got the imported module name wrong
      (should be `import BLAKE3` from
      https://hackage.haskell.org/package/blake3).
      Let's see if it fixes it when I tell it what I'd observe
      when trying to compile the code.
      A simple Google search for "haskell blake3" should point out the correct name.
Corrective prompt:

    The mentioned `Crypto.Hash.BLAKE3` module doesn't exist as a Haskell library.
Output:

    import Crypto.Hash (hash, Blake2b_512, Digest)
    import qualified Data.ByteString as BS
    import qualified Data.ByteString.Lazy as LBS
    import System.IO (withFile, IOMode(ReadMode))

    hashFileBLAKE2b :: FilePath -> IO (Digest Blake2b_512)
    hashFileBLAKE2b filePath = do
      let chunkSize = 8192  -- 8KB chunks
      withFile filePath ReadMode $ \handle -> do
        let loop acc = do
              chunk <- BS.hGet handle chunkSize
              if BS.null chunk
                then return $ hash (LBS.fromChunks acc)
                else loop (chunk : acc)
        loop []
My observation:

    * I asked for BLAKE3, but now it implemented BLAKE2B.
    * The solution is very wrong.
      It splits the file into _reversed_ chunks and then hashes them.
      The error is subtle and will be hard to debug.
So I'm quite sure I wouldn't allow anybody to build my "product" with such tools yet.

But it certainly is fun and there's definitely some usefulness in there, especially for learning general concepts that are well understood.

I guess in the end it's this: Programmers don't need to "roughly get it right" (which is what current tools do well) when it comes to products. They need to get it /precisely/ right.


ChatGPT can't preview the output like Claude can (for e.g. HTML, JavaScript, certain JS frameworks, etc.).

I have some bad experience about it. Asked it to help generate python code to make a vpn server with extra layers, but it refused. What in the dictatorship is this? ChatGPT on the other hand did it with no problems. Seems like Claude has a lot more censorship and restrictions for what I tested it.

Attempting to do any form of security work using LLM is nigh impossible without a few steps of nudging it out of its “while user is asking me to do bad things: say no” loop.

After a year of heavy LLM use I’ve found the utility limits, my usage has peaked, and I’m developing very restrictive use cases.

Beyond functioning as an interactive O’Reilly manual, LLM only save time if you never read the code they produce. Which is a short term win, but things will blow up eventually, as with all code, and now you’ve got a bigger problem than you started with.


This is where I've ended up. The code and info is out there on the internet.

To me LLMs are this decade's stackoverflow copy 'n paste. Just with a more fancy search and replace


They all obey the same masters, be it the big tech companies providing subsidized cloud, VC, or the stock market (post-IPO).

Trying to delude oneself that company A is superior morally to company B without a very clear distinction between incentive structures (eg A makes money from causing pollution, B sells widgets for cleaning up pollution), which is not the case with these companies, is magical thinking.


I got weirded out about ChatGPT when I dug deeper into the founder and discovered claims of sexual assault from his sister. I am not being facetious either when I say that something about the expressions and behavior of Sam Altman gives me the creeps even before I was aware of the allegations against him.

Obviously, the split into a for-profit company and resignations from the alignment team are more factual based concerns, but the way Sam Altman carries himself gives me all sort of subconscious tells of something sinister. Maybe its a point anti-thetical to reason, but my view is that after hundred of thousands of years of human evolution, a gut feeling has some truth even if I can't understand the mechanism behind it.


I have no love for Altman - he Altman seems like a (very successful) huckster to me, but I also read the sexual assault allegations as coming from a very mentally disturbed person, to the point that I'm not going to use that data point as part of my judgement of him.

I know nothing about these claims or Altman, but this argument fits the pattern of three commonplace threads that I hope people will notice in these situations:

1) Smearing the attacker: When someone unknown accuses or opposes a powerful public person, a standard response is to smear the accuser's credibility and reputation, creating doubts in onlookers, and causing day-to-day harm and high levels of stress and pressure for the accuser, and even causing danger (threats, doxxing, etc.). Powerful people can control the narrative - through contacts with other powerful people, by buying resources, or just posting on social media to their many followers. Also, powerful people already have a reputation that the accuser has to change, with many invested in believing it (even just as fans). Unknown accusers have no public reputation - often the only thing known is the smears from the powerful public person - and so others can say anything and it will be believable.

2) Mentally disturbed people - even if that part is true - can also be sexually assaulted. In fact, they are often targeted because they are more vulnerable, and you read again and again that accusers tell the vulnerable, 'nobody will believe you'. Let's not make those words true.

3) Sexual assault causes serious mental health issues.


Notice that I never said that the claim was false. I said that it would not be a data point that I use to judge Altman. I have no ability to verify, or even guess at the veracity of the claims.

Statistically, this form of abuse is extremely common. Something like 2-5% of women who have a sibling are sexually abused by them. Sam would have also been a child at this time. My experience of this world, especially SF startup scene, is that most people are mentally ill in some way and some people are just better at hiding it. We can both accept that Sam's sister is a bit ill, this probably did happen, and we probably shouldn't punish adults for the actions of their child selves too harshly. Does that seem ethical and fair?

What harsh punishment are we talking about here? Let's be specific: we should collectively call for him to step down from his role in OpenAI. That is not harsh. OpenAI is extremely influential on our society, and he is probably not a well balanced person.

Well, I can't think of a lot of well balanced people I know remotely at his level of success. I don't think that this is because successful people are imbalanced as much as I think most people are pretty imbalanced in some way, and successful people are just far more scrutinized. One of the worst oppressions on all of us is that we all have to carry some individual shame for something that probably happened to us as children, and it can't be talked about since it is so easily weaponized. There is no incentive to move toward a mentally healthier society in these conditions, I don't think. I'm open to a better way, but this feels like the dangerous parts of cancel culture, since it basically enables hackers to destroy anyone with their personal life.

Who aligns the aligners?

Taking Sam Altman's statements about AGI power and timelines seriously (for the sake of discussion), his position as CEO directs more power than all presidents and kings combined. Even if he was widely regarded as being amazing and nobody had a word to say against him right now, the USA has term limits on presidents. Taking him seriously, he should also.

--

On this specific claim however, requiring people to step down due to unsubstantiated allegations, without proof, is trivial for his political opponents to take advantage of. And he has many political opponents.

The huge problem with such abuse is that it's simultaneously very common and very difficult to actually prove.

Both halves of the current situation are independently huge problems:

Absent physically surveilling almost every home, I don't know what can even be done about proving who did what.

If you could catch everyone… between the fact that this is a topic that gets people lynched so suggesting anything less than prison time is unlikely to be possible, and the estimates moonmagick gave of how many people do that (x4-x10 the current USA prison population), I think it may be literally beyond most national budgets to be able to imprison that many people and they would try anyway.


It's not about proving he did it. This isn't a court of law, it's the court of public opinion. This isn't just deciding whether someone goes to prison, this is deciding who gets to control a big chunk of humanity's future. It's not some random naysayer claiming he did it, it's his own sister. It's very likely he did it, so he should step down. Simple as that.

Make the court of public opinion binding? Sounds like a way to force companies to become subject to democratic votes. Not sure how I feel about that for other reasons.

Anyone who is on or past the age of puberty, who sexually abuses a child, should be put to death.

If you disagree, where should they live? Where should they work? Very few will accept them in their neighborhood nor at their job. They will only be facing a longer death, through complete (and deserved) ostracization.

Have mercy on them and put them down.

Back to Sam Altman, I firmly believe in innocent until proven guilty. I am concerned about an accussation that hasn't been investigated however, and think it should be investigated so as to hopefully resolve the innocence or guilt of Altman.

If Sam Altman is to have been found with near conclusive evidence to sexually abused a child, then he should be killed. Either judicially or extra judicially. That goes for any human.

Though, I wouldn't call it punishment nor would I advocate punishment for any crime. You don't punish you either:

1. Rehabilitate, Make them Provide Restitution, and Forgive.

2. Kill them.

Punishment produces nothing but suffering, if one commits an act so intolerable that only punishment would be adequate, then put the human down.


(edited: removed link about some parties organized by influential people)

There is nothing wrong with sex parties, nor drug use. But a lot of these VC-adjacent parties have reports of strong power imbalance- “young female founder seeking funds, wealthy VC seeking partygoers”. That is the issue with them.

(Like those described in the removed link)

Altman is a married gay man, so his involvement in them seem… less likely.


That's just prostitution with extra steps, no?

It's a secret that there are parties where people get drunk, take drugs and have sex?

I'm pretty sure that's not a secret. It's just the definition of a party if you're a young adult.


OP included a link (subsequently removed) to a description of these supposed "parties" that describe them more like the ritualized sex mansion scene in Eyes Wide Shut rather than a normal young-adult "let's get wasted" party.

It's a bit creepy when the ratio is 2 to 1 or more and/or a significant age difference of the male to female attendees...

the sexual assault allegations seem bogus to me

> something about the expressions and behavior of Sam Altman gives me the creeps even before I was aware of the allegations against him.

He has the exact same vibe as Elizabeth Holmes. He does seem to be a bit better at it though.


Funny timing. StackBlitz announced Bolt.new (https://bolt.new/) today with multi-file edit, emulated filesystem, arbitrary npm installs, and is open source. I feel ChatGPT is still chasing after Claude 3.5 artifact.

Another comment rather disparages Bolt:

https://news.ycombinator.com/item?id=41733485

Has anyone had much experience with it, that can share their findings? I'm happy with Claude Sonnet and can't try every new AI code tool at the rate they are coming out. I'd love to hear informed opinions.


Just tried it, it wrote promising code but in the end only the last file was created and the other files had no content (even though I saw them being written). Seems a bit bugged at the moment.

EDIT: Only seems to work in Chrome?


Bolt.new may be technically open source, but it seems to be dependant on closed source StackBlitz webcontainers? Not truly open source IMHO.

I was coincidentally looking into this yesterday, trying to find an implementation of JS sandbox to run AI-generated code or web apps in.

A similar project is sandpack, but that relies on nodebox which is also closed source.


The symbol and Blitz kind of give Nazi vibes. Did they think the naming through?

"Blitz" literally just means "lightning"

Awesome improvements, but compared to Claude Artifacts, it lacks the html/js "Preview" where you can run the code and check/validate the result without leaving the browser. This is a killer feature

preview and publish. Where you can share a link to a functioning version of the artifact.

It would be great if OpenAI could just wait on these announcements until they can turn on the feature for all Plus users at once.

Rant.

Their poor communication is exemplary in the industry. You can't even ask the old models about new models. The old models think that 4o is 4.0 (cute, team, you're so cool /s), and think that it's not possible to do multimodal. It's as if model tuning does not exist. I had a model speaking to me telling it cannot do speech. It was saying this out loud. I cannot speak, it said out loud. I get that the model is not the view/UX, but still. The models get other updates; they should be given at least the basic ability to know a bit of their context including upcoming features.

And if not, it would be great if OpenAI could tell us some basics on the blog about how to get the new features. Unspoken, the message is "wait." But it would be better if this was stated explicitly. Instead we wonder: do I need to update the app? Is it going to be a separate app? Is it a web-only feature for now, and I need to look there? Do I need to log out and back in? Is it mobile only maybe? (obviously unlikely for Canvas). Did I miss it in the UI? Is there a setting I need to turn on?

This branching combinatorically exploding set of possibilities is potentially in the minds of millions of their users, if they take the time to think about it, wasting their time. It brings to mind how Steve Jobs was said to have pointed out that if Apple can save a second per user, that adds up to lifetimes. But instead of saying just a simple "wait" OpenAI has us in this state of anxiety for sometimes weeks wondering if we missed a step, or what is going on. It's a poor reflection on their level of consideration, and lack of consideration does not bode well for them possibly being midwives for the birthing of an AGI.


It was out to 100% in a few hours. https://x.com/sama/status/1841946796274176405

And yet... still not showing up on Version 1.2024.268 (1727484326)

I wonder if the inherent public domain nature of genAI output will lead to licensing issues and forced code releases in the future.

Someone would have to prove slop code was used; but the same copyright issues with slop art apply, with the added complication of obscured licensing terms.

Where I work, genAI tools are banned because we do contracts that involve transfer of ownership with disclosure of third-party source. The lawyers state that, genAI's removal of attribution, it runs afoul of the disclosure requirement. It's a fireable offense to even have the tools installed on work computers/devices.


A pretty good UX improvement for ChatGPT. I think they will fork VScode and make an AI editor later. Although It's confusing for me the OAI seems spend more time on end user product instead of working on AGI.

Great end user experience is a huge differentiator in my opinion… it’s why I use OpenAI over other products in the market. It’s actually a joy to use.

(The responses are generally far better than other products.)


It's a chatbot... what UI features that OAI provided that you couldn't find in other providers before? I've found Claude 3.5 to give the better responses in my experience.

AGI is a research project.

These are revenue generators.

Both have a place.


If I genuinely thought I was close to AGI, I’d focus all my efforts on that, then ask it to build my revenue generators.

That’s probably what Ilya is doing.

(FWIW I don’t think we’re close to AGI).


Saying you're close to AGI is a pretty good marketing move to sell consumer products though.

That's how you go bankrupt before reaching your goal. "Close" doesn't necessarily mean next week close.

What if you think the timeframe is accelerated based on the number of ML engineers you hired over a > 5 year timeframe say?

If that's the case you need to optimize for hiring more ML engineers so you need revenue to bring in to pay them.


> If I genuinely thought I was close to AGI

As a great founder once said: "Work towards your goal, but you must ship intermediate products."


Who said that?

What if close is 5 years close, how would you survive 5 years with the current investment rate/costs?

OpenAI probably employs traditional software engineers as well as a ton of researches.

The former are building tools like these, while the latter are conducting research and building new models.

Since their skillsets don't overlap that much I don't think if they skipped building products like these, the research would go faster.


I really hope VS Code can make API changes such that nobody needs to be forking it for these projects. I tried using Cursor for a few hours but I don’t want a hacked-up VS Code, I just want VS Code.

Why is that confusing?

More company efforts being spent on iterative solutions. This won't be enough to keep the hype up for the $7T.

I'm expecting they will exhaust the alphabet with GPT-4 before we see GPT-5 and even then what major CS breakthrough will they need to deliver on the promise?

https://openai.com/index/planning-for-agi-and-beyond/


Ironically for these products, the amount of obviously llm generated bot comments astroturfing on HN and making it to the top of these corporate shill posts really aren't attracting me to using any of them. Read a bunch of glassdoor or google map reviews and then read these comments, clear as the water in a glacial lake; if the comments aren't fake, maybe it's just the people working for them that are? If you produce enough fake slop to be indistinguishable from a machine slop generator, does the difference even matter anymore? Are you still human?

If I understand your point, counterargument is that hn comments are made by people who are almost certainly in top 20% by intellect from general population. Compared to some subreddits and YouTube, comments here are made by geniuses.

Comments on HN are made by all sorts of people, some are geniuses that contribute authentically to threads that interest them, and some others operate or invest in glorified template generators that they use to write propaganda about their own products in order to fake popularity and consensus. If one believes everyone else is a genius that uses whichever product, then whoever's not on that train is already obsolete. The nature and style of top comments on LLM related (last years news) threads often reads in such a predictible and distinguishable way from other arbitrary threads that it seems a bit... robotic. Same thing on Reddit.

I'd love a version of this that worked with files directly on my local system. Aider is close, but it's too integrated with git for simple one-off scripts, which is a use case where I find GPT-generated code shines.

Is there an open source version of this and/or Claude Artifacts, yet?


I'm glad this exists. But their example with a blog post draft is something I will never do. ChatGPT writing makes me cringe at this point—I might as well write the entire thing because of the number of edits. Natural-sounding writing with a distinctive voice will become more and more valuable in the future.

I don't care about not having IDE integration with this - this is a solved problem that I don't care about. It would be nice to have two-way synchronization though so I could run code locally, edit it with my local editor (not IDE) and sync it back up to make more changes in ChatGPT.

Finally catching up with Anthropic.

Arguably this is better than Claude projects because you can prompt and edit inline. You cannot with projects. Claude keeps regenerating the artifact.

This is closer to Cursor for writing than Claude Projects.


Cursor's moat always seems a shaky proposition. Clone VS code, add a few custom blobs and extensions, API to existing LLMs.

For that, $20/M per head to be usable? Yikes.


Indeed, in the latest changelog, they are adding more and more features to compete with Cursor.

https://code.visualstudio.com/updates/v1_94


For a second there I mis-read that as Canva, and thought what the hell?

If anyone can clarify what "Comment’s Triggering Correctness" likely means, please do. Thank you.

Slick interface but the example they gave is depressing.

We taught the model to open a canvas for prompts like “Write a blog post about the history of coffee beans”.

If you're not heavily editing this post to say something genuinely new, then congratulations you've added even more drivel to the Great Pacific Garbage Patch of the internet.


Some people really think that they are now "content writers" with ChatGPT, just as those spitting out sort-of-working scripts think they are "software engineers".

I am sorry, dears, but this is not how it works. To be good at any of this, you should be able to do it yourself without any "prompt engineering", and the only path is through work, time, trial/error, and tons of frustration.


I get the argument against AI tools when it's about them not working as well as the hype says they do, but not when they are the "how dare you use a tool that makes it easier for you! That's cheating!" argument. When I was in school cheap pocket calculators were just becoming available. All of my teachers banned them as they saw them as an existential threat to teaching math and science. These days students are generally allowed calculators -- the teachers finally accepted that it's a good thing that tools can automate the rote parts so that teaching can move on to the more interesting and thought-provoking parts.

Content writers like you already destroyed the Internet by filling it with SEO word vomit everywhere.

Making products to empower sloppyjoes is all they have left it seems.

yeah I find this example depressing, as much as the "rewrite this simple sentence tinto a paragraph that adds nothing to it".

But to be less negative, this (or NotebookLM) could be useful to re-arrange and enrich one's own notes.

Sadly the amount of LLM slop on the internet is already out of control, and I'm afraid there's no going back.


How do we actually use this?

I tried selecting 'ChatGPT 4o with canvas' from the model drop down, uploading a code file, and asking "can we look at this file, I want to edit it with you", but it doesn't show canvas features or buttons that the instructional video has i.e. the UI still looks identical to ChatGPT.

EDIT: I asked "where are the canvas features" and boom - the UI completely changed what the instructional video has.


Everyone in this thread is talking about coding, but this is, perhaps more significantly, a word processor. “Change reading level: Adjusts the reading level, from Kindergarten to Graduate School.” “Adjust the length: Edits the document length to be shorter or longer.” “Suggest edits: ChatGPT offers inline suggestions and feedback.”

Hard to get excited when Claude is significantly better at coding than ChatGPT. The Canvas feature is great and I hope Claude does something similar. But the improvement of LLMs changes so fast so who knows if Claude will be worth using in a year from now.

I've had fairly limited experience with both tools, but Claude seemed much much nicer and gave me better results in every case I've use it.

Ugh. Code reviews and helping out with tedious code comments. That's great stuff for software developers. And will be a headache to control for our company. This is taking increasingly more restraint from developers to not send code as-is straight to OpenAI, especially when the features are fundamentally built on that you do so.

OpenAI doesn't train on business data on their enterprise plans but the problem is if a company doesn't have such a plan, maybe going for a competitor, or simply not having anything. And users then go here for OpenAI to help out with their Plus subscription or whatever to become more efficient. That's the problem.

Asking an AI for help is one thing. Then you can rewrite it to a "homework question" style while at it, abstracting away corporate details or data. But code reviews? Damn. Hell, I'm certain they're siphoning closed source as I'm writing this. That's just how humans work.


I guess your company will have to update their rules about sharing code with ai then. Not to worry your code is not that special, your data is.

I feel bad for the ppl who will or probably already are dealing with this crap. TFW someone shares a data dump with AI

I want to take an existing Python application (which is 100% a 'sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.'[0]) and port it to Elixir.

Retaining the core business logic, but re-homing it inside of idiomatic elixir with a supervision tree. At the end of the day it is just orchestrating comms between PSQL, RMQ and a few other services. Nothing is unique to Python (its a job runner/orchestrator).

Is this tool going to be useful for that? Are there other tools that exist that are capable of this?

I am trying to rewrite the current system in a pseudocode language of high-level concepts in an effort to make it easier for an LLM to help me with this process (versus getting caught up on the micro implementation details) but that is a tough process in and of itself.

[0] - https://www.goodreads.com/author/quotes/423160.Robert_Virdin...


How big is this codebase? gpto-preview and claude sonnet are both fairly capable. Claude has longer context windows and is a little less lazy. aider.chat is working on an architect / coder model right now that's verrrry effective. Essentially - paste your whole codebase into 4o-preview, ask it to rearchitect to spec, and output a design document and list of tasks suitable for a senior dev. Dump that plus your tarball into claude as two separate pastes, and be like "Do task 1".

OpenAI seem to be slowly moving towards building an entire OS(or any other form of walled garden) that will be filled with LLM powered applications.

The abuse the term "OS" gets is incredible

You get tighter feedback loops and better products if you own the vertical.

OpenAI already A/Bs test the responses it generates. Imagine if they own the text editor or spreadsheet you work on too. It’ll incorporate all of your edits to be self-correcting.


That will take them too long.

No thank you.

As with anything else that is helpful, there is a balancing act to be aware of. This is too much for my taste. Just like github copilot is too much.

It's too dumb like this. But chatgpt is insanely helpful in a context where I really need to learn something I am deep diving into or where I need an extra layer of direction.

I do not use the tool for coding up front. I use them for iterations on narrow subjects.


Don't fight this. Try to profit from it. People love these tools and they will become utterly, utterly dependent.

Using a spell-checker, I have gradually lost my ability to spell. Using these LLM tools, large parts of the population will lose the ability to think. Try to own them like farm animals.

The large number of tokens being processed by iterative models requires enormous energy. Look at the power draw of a Hopper or Blackwell GPU. The Cerebras wafer burns 23 KW.

One avenue to profit is to invest in nuclear power by owning uranium. This is risky and I do not recommend it to others. See discussion here: https://news.ycombinator.com/item?id=41661768


Counterpoint: don't try to own people like farm animals. Have pride in humanity, faith in your fellow man, and resist toxic cynicism.

Fortunes are made owning companies that sell harmful products. Domino's Pizza, Monster Energy Drink, etc.

Fortunes will be made selling electricity to people who develop serious cognitive dependence on LLMs.

There is no need for you to participate in the profits. I respect your life choices and I wish you well.


I don't respect your life choices and I wish you failure.

I had to laugh at their comments too. Well, at least they're being cynically honest.

Don't try to own uranium, either. Or if you do, don't store it all in one place.

The Sprott Physical Uranium Trust owns 65,711,826 pounds of triuranium octoxide (U3O8) stored at uranium hexafluoride (UF6) conversion facilities in Canada, France, and the United States.

  Cameco Corporation, ConverDyn, and Orano Chimie-Enrichissement
  individually act as custodians on behalf of the Trust for the
  physical uranium owned by the Trust.
https://sprott.com/investment-strategies/physical-commodity-...

Please see the discussion here:

https://news.ycombinator.com/item?id=41661768

for serious warnings. This is not suitable for you.


Interesting! Didn't realize that submarine reactors used bomb-grade uranium.

Speaking of energy use. Microsoft is literally in talks to restart a nuclear plant where they will buy all the power[1]. It happens to be "Three Mile Island" plant [2]

https://www.nature.com/articles/d41586-024-03162-2

https://www.nrc.gov/reading-rm/doc-collections/fact-sheets/3...


Thank you. Discussed in detail and at length here: https://news.ycombinator.com/item?id=41692131

> large parts of the population will lose the ability to think. Try to own them like farm animals.

You're so edgy that you might cut yourself, be careful. What is wrong with making profit by helping people through providing a service?


Asked Purdue Pharma... :)

> Try to own them like farm animals.

Jesus christ, I hope you are never in a position of any significant power


I haven’t used it yet, but couldn’t you just copy paste a chunk of existing code into canvas and have it help there? If so, that does seem more useful than the original of just pasting a lot of code into chat and hoping it pulls the correct context from your description. If I’m understanding it correctly, I’m canvas you can paste a bunch of code and then ask for help on specific contexts within the code by highlighting it. If done properly that seems super useful to me.

I think you should try to give tools like this another chance. If Andrej Karpathy can say AI-assisted programming is a productivity boost for him (https://x.com/karpathy/status/1827143768459637073), it can be a productivity boost for probably any programmer.

There are three groups of people here:

1.) Those who use AI and talk about it.

2.) Those who do not use AI and talk about it.

3.) Those who use AI and talk about how they do not and will not use AI.

You don't have to look far to see how humans react to performance enhancers that aren't exactly sanctioned as OK (Steroids).


Andrej is a great communicator. I've never seen evidence that he is an especially exceptional programmer.

Between his roles at companies he's worked for, his research and his open source stuff, what else would you be looking for lol ?

Remove all the educator stuff and karpathy would still be one of the most accomplished of his generation in his field.

Idk just seems like a weird comment.


i agree but it is optional

Obviously this is not tarted specifically for developers but for the general population that need to solve some general problems through code and to learn.

I guess don't need to point out given where I am posting this comment, but developers (myself included) are some of the most opinionated, and dare I say needy, users so it is natural that any AI coding assistant is expected to be built into their own specific development environment. For some this is a local LLM for others anything that directly integrates with their preferred IDE of choice.


I thought I didn't have this yet but turns out it's there, just buried in the drop down model selection menu. I tried it out on an existing chat thread where I was writing JavaScript and when I said "use canvas" it started rewriting my JavaScript to use HTML5 <canvas>. I had to tell it "use canvas, not html5 canvas but chatgpt canvas" and then it figured it out.

It seems like this will be perfect for making tiny single page HTML/JS apps.


It seems like this only supports "JavaScript, TypeScript, Python, Java, C++, or PHP". I would be so happy if it worked with FeatureScript (which is similar to JavaScript, but is for 3D CAD in Onshape).

I wonder what it would take to expand the languages it supports?


openai is basically eating all the GPT wrappers over time, as well as integrating all the interesting work done at prompt level (cot, structured outputs) at the model level. I wonder if that trend is doomed to continue as profitability issues become central to openai and they need to take an ever bigger share of the AI application cake to survive (coding/writing tools etc): https://www.lycee.ai/blog/why-sam-altman-is-wrong

I expected the last line of the blog post to have said something like "this blog post was created using the help of Canvas" - a missed opportunity or the product not being there yet?

Also is this in response to the recent notebooklm which seems awfully too good as an experiment?


Needs vim keybindings.

Wish they explained how they perform targeted edits. Is it doing speculative decoding like Anthropic’s code edit or something like Aider’s diff format?

Probably even simpler. Give the full context and then add the text selection as an additional message step/context of what to edit.

Here's an idea: If AI like this is so brilliant and can think for itself, why don't we just tell it to come up with its own next iteration? Surely if it can write code for medical devices, cars, planes, etc. (where no doubt junior engineers are extensively using it), then why not AI?

Cant we just tell ChatGPT to make e.g. TensorFlow faster, better, cleaner? Why do people put in so much work anymore, if the AI is so damn good?


The author of Aider uses Aider a lot to work on Aider itself. He measures everything; https://aider.chat/HISTORY.html

Because it's not that damn good. Not even close.

Don’t turn off the money spigot with your “questions.”

shhh don't ask legitimate questions! only hype now!

This looks amazing. Simply incredible what we are able to do. I'm ready for the next industrial revolution -- It's happening, now!

I believe you are attacking a strawman here.

I believe it's not solving a real problem. I believe that the human skills of reasoning, logical thinking, etc. make it possible for any able minded human to do the things I just mentioned, given time and money is provided. AI can't do that. Let's call that process "programming". It cant do programming. It pretends to program.

You are entitled to interpret the word "programming" in multiple different ways depending on the context of the conversation. What you propose here as "programming" is valid but not comprehensive of what everyone may think of when considering the "programming" process.

By many definitions of "programming", these AI tools are indeed programming. In the same way, many definitions of "reasoning, ..." may include the AI tools in them. However, there are alternate definitions (reasonable in their own way) in which it is clear these AI tools fall short.

So, I think you are proposing an argument of semantics but presenting it as if it is an argument on actual capabilities of these AI tools. In general, we all are in agreement on what these systems are capable of.


These AI tools are able to do a small subset of what a professional human is able to do, with greatly reduced flexibility, e.g. tasks like leetcode where there is an abundance of concentrated training data.

I would bet that the current paradigm in ChatGPT will never be able to replace a minimally competent human at real world programming - the kind of programs people actually pay for. Let’s see.


> I would bet that the current paradigm in ChatGPT will never be able to replace a minimally competent human at real world programming - the kind of programs people actually pay for. Let’s see.

Depends on definition of "minimally competent". There WILL be over-engineered enterprise solutions that employ 10x more AI-enabled code monkeys than is strictly required. Think about it: we can create a "fully automated AI coder" and then hire another FTE (or team) to handhold and clean up after it. It will be done. It must be done.


Why must LLMs or “AI” beat or match the smartest and most capable humans to be considered to solve a real problem? There’s been a lot of technology invented and in widespread use that solves real problems without having human-like intelligence.

As an able-minded human, could you please make TensorFlow faster, better, cleaner?

I mean, there's strong incentive for you (it would be worth tens of millions, possibly billions if your iteration is superior enough).

How much time do you need?


> You can directly edit text or code.

But not run it.

Any online code playground or notebook lets you both edit and run code. With OpenAI it's either one or the other. Maybe they'll get it right someday.


> We used novel synthetic data generation techniques, such as distilling outputs from OpenAI o1-preview, to post-train the model for its core behaviors. This approach allowed us to rapidly address writing quality and new user interactions, all without relying on human-generated data.

So they took a bunch of human-generated data and put it into o1, then used the output of o1 to train canvas? How can they claim that this is a completely synthetic dataset? Humans were still involved in providing data.


Good lord we do not need more fancy editors to produce AI slop. You can already do this with a myriad of solutions, including just that ChatGPT interface with o1.

No matter what there will be so many GPT-isms, and people will not read your content.


Anyone actually got access to this?

Yeah I do. Playing with it right now. It's cool.

https://i.imgur.com/R5PQQoi.png


How/where were you notified that you got it?

As another data point: I wasn't notified either. I didn't have it when I first read about Canvas here earlier but coming back to the topic now I do have the feature. This is for a Teams account, the post says Plus and Teams are the first to get it rolled out with Enterprise and other in a week.

The easiest way to check if you have access is it will appear as an explicit choice in the "Model" selector.


I wasn't. I saw this post on HN, opened a new tab for ChatGPT, and saw that I had access to the model. I assume it's rolling out incrementally over a few hours to all paid users.

Thanks, I found that I had access too, on the web. Just open up the chat gpt page and use the model drop down at the top of the page.

You don't get the new experience until you give it a prompt though, which is kinda weird.


I'm playing around with this right now and it's pretty sweet. It real-time shows which lines it's "thinking" about working and feels very dynamic, like I'm working with a machine in real-time.

It can't display markdown and formatted code side-by-side which is kind of a surprise.

I haven't tried doing anything super complex with it yet. Just having it generate some poems, but it's smart enough to be able to use natural language to edit the middle of a paragraph of text without rewriting the whole thing, didn't notice any issues with me saying "undo" and having data change in surprising ways, etc. So far so good!

I'm not very skilled at creating good "test" scenarios for this, but I found this to be fun/interesting: https://i.imgur.com/TMhNEcf.png

I had it write some Python code to output a random poem. I then had it write some code to find/replace a word in the poem (sky -> goodbye). I then manually edited each of the input poems to include the word "sky".

I then told it to execute the python code (which causes it to run "Analyzing...") and to show the output on the screen. In doing so, I see output which includes the word replacement of sky->goodbye.

My naive interpretation of this is that I could use this as a makeshift Python IDE at this point?


I’m kinda giggling imaging the amount of electricity you used to write “sky..”.replace(“sky”, “goodbye”)

How do I get access to this feature? I cannot find it in the normal chatgpt interface.

It's a staged rollout. You'll probably have it by tomorrow morning.

I believe you wait until your number comes up :/

it's under the model list on the web interface

Is it possible to have the local context be a directory and all files within it or something? Ie to just ask it questions on the side of your WIP repo? Use your normal editor/etc.

Doesn't look like this is yet possible, no

Yawn. I don't use Claude because the interface is good. I use it because Opus 3 is the best model anyone has ever created for long context coding, writing and retrieval. Give me a model that doesn't have polluted dataset to game MMLU scores, something that tangibly gives good results, and maybe I'll care again.

For now I only keep ChatGPT because it's better Google.


I've found Sonnet 3.5 significantly better than Opus 3 at coding but I've not done much long context coding with it. In your experience did you find Opus 3 to degrade less or is it that you consider Sonnet 3.5 part of the "gamed" group?

Have you used Gemini? With the built-in RAG I actually find it way better than both Google Search and OpenAI for search. I think Claude still wins for overall chat quality but Gemini is amazing for Search, especially when you're not exactly sure what you're looking for.

Disclaimer: I work at Google Cloud, but I've had hands-on dev experience with all the major models.


Initially it had some real problems. large context window-- but you can only paste 4k tokens into the UI, for example. It never seemed like anyone at Google was using it. NotebookLM is a great interface, though, with some nice bells and whistles, and finally shows what Gemini is capable of. However, Opus still has the best long context retrieval with the least hallucination from what I've tried.

3.5 Sonnet is fast, and that is very meaningful to iteration speed, but I find for the level of complexity I throw at it, it strings together really bad solutions compared to the more wholistic solutions I can work through with Opus. I use Sonnet for general knowledge and small questions because it seems to do very well with shorter problems and is more up-to-date on libraries.


I don't know that I've ever seen someone recommend Gemini Advanced for "search". My experience is the model doesn't always tell you if it's using search or it's internal training, in fact I'm not sure if it even is "searching" the internet rather than accessing some internal google database.

In comparing it's performance to the pure model on Google AI studio I realized Gemini was presenting some sort of RAG results as the "answer" without disclosing where it got that information.

Perplexity, which is hardly perfect, will at least tell you it is searching the web and cite a source web page.

I'm basically saying Gemini fails at even the simplest thing you would want from a search tool: disclosing where the results came from.


Do they not understand that the example text they are using in the first image is so laughably banal that it makes an entire segment of its potential audience not want to engage at all?

Shoot me in the face if my own writing is ever that bad.

ETA: just to be clear... I am not a great writer. Or a bad one. But this is a particular kind of bad. The kind we should all try to avoid.


> Do they not understand

I see this all the time from AI boosters. Flashy presentation, and it seems like it worked! But if you actually stare at the result for a moment, it’s mediocre at best.

Part of the issue is that people who are experts at creating ML models aren’t experts at all the downstream tasks those models are asked to do. So if you ask it to “write a poem about pizza” as long as it generally fits the description it goes into the demo.

We saw this with Gemini’s hallucination bug in one of their demos, telling you to remove film from a camera (this would ruin the photos on the film). They obviously didn’t know anything about the subject beforehand.


> Part of the issue is that people who are experts at creating ML models aren’t experts at all the downstream tasks those models are asked to do.

Yep. CAD, music, poetry, comedy. Same pattern in each.

But it's more than not being experts: it's about a subliminal belief that there either isn't much to be expert in or a denial of the value of that expertise, like if what they do can be replicated by a neural network trained on the description, is it even expertise?

Unavoidably, all of this stuff is about allowing people to do, with software, tasks they would otherwise need experts for.


Well, comedians still exist, despite the fact that ChatGPT can write an endless stream of “jokes” for next to zero cost. So do musicians. I know less about poetry and CAD but I assume people who seek out those modalities aren’t going to be impressed with generic garbage. A person who seeks out poetry isn’t going to be easily impressed.

No. But then all of these products are marketed to people who are, at some domain-specific level, still towards the "but I wore the juice!" end of the scale, right?

Unskilled and unaware of it. Or rather, unskilled and unaware of what a skilled output actually involves. So, unaware of the damage they do to their reputations by passing off the output of a GPT.

This is what I mean about the writing, ultimately. If you don't know why ChatGPT writing is sort of essentially banal and detracts from honesty and authenticity, you're the sort of person who shouldn't be using it.

(And if you do know why, you don't need to use it)


This means you're a great writer — congrats! I'm a terrible writer, and this kind of crutch is really useful.

Other people in our lab (from China, Korea, etc.) also find this kind of thing useful for working / communicating quickly


Well, I've just read back through some of your comments and I say that ain't so!

Write honestly. Write the way you write. Use your own flow, make your own grammatical wobbles, whatever they are. Express yourself authentically.

Don't let an AI do this to you.


Aw thanks! I at least have the benefit of being a fluent writer.

For lots of East Asian researchers it's really embarrassing for them to send an email riddled with typos, so they spend a LOT of time making their emails nice.

I like that tools like this can lift their burden


> For lots of East Asian researchers it's really embarrassing for them to send an email riddled with typos, so they spend a LOT of time making their emails nice.

OK -- I can see this. But I think Grammarly would be better than this.


Grammarly uses generative AI

It does now, perhaps, for complete rewrites. I've not looked recently.

But its suggestion system, where it spots wordy patterns and suggests clearer alternatives, was available long before LLMs were the new hotness, and is considerably more nuanced (and educational).

Grammarly would take apart the nonsense in that screenshot and suggest something much less "dark and stormy night".


Thanks for saying this. Whenever Grammarly puts a red line under a slightly superflouos part of the sentence I get more and more agitated at this small nudging to robotic writing.

Grammarly thinks all writing should be bland, and that everyone needs to be a robot. Terrible product.

But it does favour _clarity_, rather than tropes.

There's more to writing than clarity, though. Not all written communication needs to abide of the efficient/clear writing style of technical documentation FFS

Sure, if you're writing a novel, maybe.

But there's not much more important, stylistically, to writing an business email or document than clarity. It's absolutely the most important thing. Especially in customer communications.

In the UK there is/used to be a yearly awards scheme for businesses that reject complexity in communucations for clarity:

https://www.plainenglish.co.uk/services/crystal-mark.html

But anyway, you don't have to act on all the suggestions, do you? It's completely different from the idea of getting an AI to write generic, college-application-letter-from-a-CS-geek prose from your notes.


More red lines means more subscribers, right?

It also doubles as a keylogger.

  Person A: Me try make this code work but it always crash! maybe the server hate or i miss thing. any help?

  Person A with AI: I've been trying to get this code to work, but it keeps crashing. I'm not sure if I missed something or if there's an issue with the server. Any tips would be appreciated!
For a non-native English speaker, it's much better professionally to use AI before sending a message than to appear authentic (which you won't in another language that you aren't fluent so better to sound robotic than write like a 10 years old kid).

Person A with AI: In the bustling world of software development, where lines of code intertwine to create the intricate tapestry of our digital lives, I find myself facing a challenge that has proven to be both perplexing and frustrating. I’ve spent over a decade honing my skills as a developer. Known for my analytical mind and commitment to excellence, I’ve navigated various programming languages, frameworks, and projects that I’m proud to have contributed to.

Recently, I stumbled upon a bug that initially seemed minor but quickly revealed itself to be a formidable adversary. It disrupted the seamless user experience I had meticulously crafted, and despite my best efforts, this issue has remained elusive. Each attempt to isolate and resolve it has only led me deeper into a labyrinth of complexity, leaving me frustrated yet undeterred.

Understanding that even the most seasoned developers can hit a wall, I’m reaching out for help. I’ve documented the symptoms, error messages, and my various attempts at resolution, and I’m eager to collaborate with anyone who might have insights or fresh perspectives. It’s in the spirit of community and shared knowledge that I hope to unravel this mystery and turn this challenge into an opportunity for growth.


:-)

It's pretty good for native English speakers at work who need/want a reverse anger translator.

Me: This is the most garbage code I've ever seen. It's bad and you should feel. It's not even wrong. I can't even fathom the conceptual misunderstandings that led to this. I'm going to have to rewrite the entire thing at this rate, honestly you should just try again from scratch.

With AI: I've had some time to review the code you submitted and I appreciate the effort and work that went into it. I think we might have to refine some parts so that it aligns more closely with our coding standards. There are certain areas that are in need of restructuring to make sure the logic is more consistent and the flow wouldn't lead to potential issues down the road.

I sympathize with the sibling comment about AI responses being overly-verbose but it's not that hard to get your model of choice to have a somewhat consistent voice. And I don't even see it as a crutch, this is just automated secretary / personal assistant for people not important enough to be worth a human. I think a lot of us on HN have had the experience of the stark contrast between comms from the CEO vs CEO as paraphrased by their assistant.


you're not at all a terrible writer... although you do overuse ellipses in your comments.

I never even thought about that... I don't know why I do that :P

> Do they not understand

They don't care. Their goal is to accelerate the production of garbage.


I am trying to convince myself that I am not insane and everyone else is. The platform was literally down for me for a good 12 hours or so because they had an auth problem or bug. Their interface is subpar yet they are trying to convince people that this is replacing knowledge worker any minute now. I recommended to a friend that he uses chatGPT to write some English content and it did a bad job. I checked bolt yesterday and the code it produced for a very simple app was complete garbage hallucination.

I really like copilot/ai when the focus was about hyper-auto-complete. I wish the integration was LSP+autocomplete+compilation check+docs correlation. That will boost my productivity x10 times and save me some brain cycles. Instead we are getting garbage UX/Backends that are trying to fully replace devs. Give me a break.


Garbage in, garbage out. It is not going to imagine your perfect scenario and then create it for you. I take anyone saying it is garbage with a grain of salt because it is incredibly useful for me. And others think so too, so how can your bad experience negate that. It can't. If you can craft the right prompts it can make you much more efficient. Anyone saying it is going to replace whole people en masse is just part of the hype machine. But if all it does is make every human on earth 1% more efficient then that is an obscene amount of value it is creating.

I'm with you. I feel like I'm losing my mind. Everyone around me is talking about the looming AGI, death of the knowledge worker and how "everything" has changed. But every time I try to use these text generators I get nothing useful from them. It's like the whole world has bought into a mass hallucination.

It makes more sense when you realize that while sure, there might be slight variation in output, generally speaking the people tripping over themselves in how [current version] is so amazing aren't being totally honest about why they think it's amazing.

For them, the ability to generate so much trash is the good part. They might not even be fully aware that it's trash, but their general goal is to output more trash because trash is profitable.

It's like all those "productivity systems". Not a single one will produce a noticeable increase in productivity magically that you can't get from just a $1 notebook, they just make you feel like you are being more productive. Same with RP bots or AI text editors. It makes you feel so much faster, and for a lot of people that's enough so they want in on a slice of the AI moneypit!


Its a tool, like any other tool a software developer would use. In areas where I have a lot of repetition or need to pour through verbose (but simple) documentation, its such a game changer. I can spend 5 minutes thinking about what I want the machine to do, give it some samples of what I expect the output to be and wala, it generates it, often times 100% correct if I've got the prompt put in properly, sometimes its good enough with a bit of refinement. This is something I would normally have delegated to a junior team member or sub-contractor, but now I'm saving in time and money.

Occasionally I sink 1-2 hours into a tweaking something I thought was 90% correct but was in reality garbage. I had that happen a lot more with earlier models, but its becoming increasingly rare. Perhaps I'm recognizing the limitations of the tool, or the systems indeed are getting better.

This is all anecdotal, but I'm shipping and building faster than I was previously and its definitely not all trash.


Most people are incapable of assessing quality and defer that to others. Or their spectrum for quality is so narrow GPT's output spans it.

If you accept that we live in a world where blind lead the blind, it's less surprising.


TBF it looks like it’s intended as a “before” image but yes suspect the “after” isn’t much better

Is it? I thought that was the draft, as a result of the dialogue in the sidebar. If I am wrong then OK!

The text i supposed to be banal, so that ChatGPT can make it better. It's like the before picture in an exercise course.

It's not, is it? It's meant to be the draft it created from the notes.

That exact banality has somehow made them into a 150 billion dollar business and darling of hacker news.

They plateaued on model performance and they are hype based. They need to keep the momentum going by "releasing" stuff, so they are garbage out at the moment. Given that open weight models are so close to gpt-4, their value is exactly 0 unless they can produce a new model with a significant jump in coherence.

Them releasing this stuff actually suggest they don't have much progress in their next model. It's a sell signal but today's investors have made their money in zirp, so they have no idea about the real world market. In a sense this is the market funneling money from stupid to grifter.


I thought the same thing: the “blog post” in the example image is an example of the absolute trash that’s being spewed onto the internet by these tools. 10+ sentences and yet somehow nothing actually said.

Well, the UI has slider for length, so there is that.

"Please say this, but in more sentences"

Question. Are there any great AI plug-ins anyone recommends for Sublime Text?

For me it's still the cleanest editor.

VS Code is way too cluttered to be my daily driver for basic editing.


I'm not sure how much I'll enjoy Canvas for coding (we'll see), but it looks way more optimal for just writing!

Canvas lets you interact with and edit code/documents more fluidly. I used it to transform my HTML blog into TypeScript in no time! Super helpful for coding and experimenting. https://x.com/mustafaergisi/status/1841946224682774536

How do you go from html to typescript?

I think they mean at the end of the clip they drag the slider from HTML to Typescript and it puts all of the blog data into structured Typescript data then writes a function which will generate an HTML page from that. The resulting blog output will still eventually be HTML, it's just whether the entries are made as data segments automatically stitched together or entered via raw HTML formatting.

I used OpenAI Canvas to help with the conversion. Essentially, I took the static HTML/CSS and wrote TypeScript to dynamically render blog posts.

Do you reckon this would become available to third party clients via the API, or would it be exclusively a chatgpt.com feature?

Very close to the subsystems we build for FRVR.ai - Although their UX is way cooler than ours, we should get inspired.

I was really hoping this was a weird interface with html canvas.

Since when "Add emojis for clutter and noise" became an USP...

It seems sort of weird to keep pushing the chat interface so hard into programming. For 'real' usage, it seems like Cursor or Aider approaches work better, since you end up having AI write code, you manually edit, AI updates further, and back and forth. In a chat interface, copy/pasting updated code gets old fast.

On the other hand, I did have good luck w/ Anthropic's version of this to make a single page react app with super basic requirements. I couldn't imagine using it for anything more though.


Pretty sure this will dynamically rewrite the code. No copy pasting needed. We have something very similar at FRVR.ai

openai gpt is a service not a product but canvas is the first product that openai build. i guess we will see new products in future. canvas is a new product, thats why they didn't introduce it at devday.

ChatGPT is their first product. Canvas would be the second?

i think chatgpt is not a product but a service like dropbox. i don't say it is useless or can't be sold. it is just a feature for a product. GPT, in itself, is a powerful technology or tool that enhances user experiences in various applications. It provides natural language processing capabilities like answering questions, generating text, assisting with tasks, and so on. However, without a specific context or integration into a larger platform, GPT is more akin to a feature that can improve or enable specific functionalities within products.

The most surprising part of this announcement was the team who worked on this — more people worked on and contributed to this than many startups. There are 16 people working on this project!!

If each was paid $300k (that's a minimum...) and they spent a year on this, it'd make it a $5M project...


1- very unlikely they spent a year on just this 2- they are certainly paid more as TC

right so it probably events out to $4-5M for a feature like this? Still surprisingly to me how expensive features like this are to build

Did they just kill Cursor?

Have you actually tried? I have not, but just look at how it still regenerates the entire code instead of producing a diff. Who iterates on software like that?

I haven't tried it, but have been considering it. I already pay for OpenAI, not sure I want to pay for another AI service.

I am refering to Canvas.

Cursor on the other hand works by produce minimal diffs and allows you to iterate on multiple files at once, in your IDEs. There are tools of the same type that compete with Cursor, but Canvas is too bare bone to be one of them.

Trial is free.


Seems to mostly be a UI enhancement? Or is there deeper functionality?

in a word : yuck : show your work : cite your sources : or say nothing at all

glad to see that, its the next natural step after seeing select edits being possible with images

Excuse my possible ignorance but I had a question: Does ChatGPT have 'up to date knowledge' in terms of latest versions of software?

When I have used it I asked it to help me with VueJS but whilst it knew about version 3 (which is substantially different to V2) it didn't know the very latest version.

Isn't this going to be problematic if it's trying to help code but has outdated information? Won't this result in more code being written/tied to older languages/framework versions?

In my experience, because GPT outputs code tied to particular versions of software, I use those as it's far quicker to copy/paste/edit than reading the latest docs, where GPT is absolutely useless as it simply doesn't know about the latest code and isn't able to produce working snippets.

So the real deal would be if GPT knows about the latest software instantly the moment it's published (ingesting docs data, any stackoverflow Q/A's etc).

Here is what GPT 4o-mini told me

P: can you list me the latest verisons of ALL major languages such as python, node etc. and the latest versions of typescript, angular, vuejs etc

R:

   > Python: 3.12.0 (released in October 2023)
   > Node.js: 20.5.0 (released in September 2023)
   > Java: 21 (released in September 2023)
   > C#: 12 (part of .NET 8, released in November 2023)
   > Go: 1.21.0 (released in August 2023)
   > Ruby: 3.2.2 (released in August 2023)
   > PHP: 8.3.0 (released in August 2023)
   > Rust: 1.72.0 (released in September 2023)
   > TypeScript: 5.3.0 (released in September 2023)
   > Angular: 16.2.0 (released in September 2023)
   > React: 18.2.0 (released in June 2023)
   > Vue.js: 3.3.0 (released in September 2023)

You can count on ChatGPT to know the exact versions and release dates of software just the same as you can rely on your friendly neighborhood dev to know it off the top of their head - not at all. A dev would likely have to look it up too. A language model would also need to look it up through function calling (or just including in your prompt).

This kind of scenario, where there are concrete answers in some datastore somewhere and the relevance of the content that the model was trained on varies chronologically (a blog post on the latest version of React circa 2015 could deliver a wrong answer), are the ones you want to engineer around. This is where you start using SDKs and binding tools to your LLM so you can ensure grounding context is available to generate correct answers.


This happens with other LLMs too. They don't know what they don't know. They ought to check what version you are using and at least issue a warning if the major number is different.

It's funny you mention that - we just did a Show HN for exactly this yesterday: https://docs.codes.

I beg tech companies to please stop naming things “canvas”.

ACM/IEEE should really run a name allocation service

So, a clone of Cursor?

I started skeptical, but by minute 7 I started to be blown away.

Simple. Fantastic. I'm probably going to start using this everyday.

Here's my user test: https://news.pub/?try=https://www.youtube.com/embed/jx9LVsry...


Yawn, it’s still ChatGPT so it’ll still be useless.

Pretty dope. I like the feature in claude. I also like the visual update. It does seem to glitch out if you get past 2,000 words though.

I’ve found Claude and ChatGPT to be the best coding tools for me.

I often throw the results from one into the other and ping pong them to get a different opinion.


Ditto with Gemini and o1-preview. Ask one of them to write some code, then paste it into the other and ask it if it can find any bugs. Lather, rinse, repeat. It's proven helpful a couple of times so far.

this would be incredible for scripting.

Why am I always last to get the cool new features??

Why don't companies learn from the really best times of Apple; announce; immediately available. Sure I know why but that used to be why I liked them. This marketing grift is terrible.

Apple Intelligence won’t be available for months.

I am not talking about Apple now: 10 years ago they announced and I could order it the same night. That is magic. The rest is just; yeah who cares.

Pre-announcing makes no sense when you're leading the pack, it's a stall tactic for your customers to not leave when trying to catch up.

Can this be used to refactor a codebase?

Or is it meant to be used on just a single file?


Looks like this beta is single file, like a chat instance. They just added Github integration for enterprise, so that's probably on the horizon

The single file aspect of many of these answers is what grinds me as well. I mean, it's fantastic for a short script, a function/class template, or a crack at the syntax error... but it becomes a huuuuge pain even when it's something as simple as a .h for the file you're working on.

What's refactoring? Usually when we say refactoring we want to get from one state into another - like making code testable.

Do you want AI to do this for you? Do you trust that it will do a good job?


I've done a ton of refactoring, from Python to Node / Deno, and it's surprisingly good — but not perfect.

Having it create a testing suite definitely helps. But it makes fewer mistakes than I would normally make... it's not perfect but it IS way better than me.


I miss canv.as

Can't wait for more bullshit PRs to our projects! Thanks to AI, anyone can open a PR that gets instantly rejected.

Looks like you're missing an AI to auto-close the PRs for you.

Seriously though, I'm tired of the "helpful" GitHub bots closing issues after X days of inactivity. Can't wait for one powered by AI to decide it's not interested in your issue.


Jesus Christ. Even their example has "in... the fast- paced world of..."

Chatgpt is utter, utter shit at writing anything other than this drivel.


yawn Let me know when I can download it.

I appreciate the elegant product design and ability to refine to a greater degree but I have to ask myself …

We’ve compressed the world’s knowledge into a coherent system that can be queried for anything and reason on a basic level.

What do we need with content anymore? Honestly. Why generate this. It seems like a faux productivity cycle that does nothing but poorly visualize the singularity.

Why not work on truly revolutionary ways to visualize the make this singularity so radically new things? Embody it. Maps its infinite coherence. Give it control in limited zones.

Truly find its new opportunities.


...woof. seems like we're already to the point where every openAI product launch is about half as interesting as the last one

Not available in my favorite IDE? Not even going to bother.

Happy that they still didn't kill my startup usesubtle.com

OpenAI is so far ahead of the competition. They're able to implement anything they like from competitors, and then some.

Claude really needs a sandbox to execute code.

If Anthropic would be smart about it, they'd offer developers ("advanced users") containers which implement sandboxes, which they can pull to their local machines, which then connect to Claude so that it can execute code on the user's machine (inside the containers), freeing up resources and having less security concerns on their side. It would be up to us if we wrap it in a VM, but if we're comfortable about it, we could even let it fetch things from the internet. They should open source it, of course.

In the meantime Google still dabbles in their odd closed system, where you can't even download the complete history in a JSON file. Maybe takeout allows this, but I wouldn't know. They don't understand that this is different than their other services, where they (used to) gatekeep all the gathered data.


This is an odd comment, because you mention Claude and Google, both of which already have similar/adjacent features. For a while. OpenAI is actually defensive/behind.

1. Claude has “artifacts” which are documents or interactive widgets that live next to a chat.

2. Claude also has the ability to run code and animated stuff in Artifacts already. It runs in a browser sandbox locally too.

3. Gemini/Google has a ton of features similar. For example, you can import/export Google docs/sheets/etc in a Gemini chat. You can also open Gemini in a doc to have it manipulate the document.

4. Also you can use takeout, weird of you to criticize a feature as missing, then postulate it exists exactly where you’d expect.

If anything this is OpenAI being defensive because they realize that models are a feature not a product and chat isn’t everything. Google has the ability and the roadmap to stick Gemini into email clients, web searches, collaborative documents, IDEs, smartphone OS apis, browsers, smart home speakers, etc and Anthropic released “Artifacts” which has received a ton of praise for the awesome usability for this exact use case that OpenAI is targeting.


I mean the following:

`use matplotlib to generate an image with 3 bars of values 3, 6, 1`

followed by

`execute it`

https://chatgpt.com/share/66fefc66-13d8-800e-8428-815d9a07ae...

(apparently the shared link does not show the executed content, which was an image)

https://imgur.com/a/PkJCnKO

Which has interesting consequences, because I saw it self-execute code it generated for me and fix the errors contained in that code by itself two times until it gave me a working solution.

(Note that I am no longer a Plus user)

---

Claude: I apologize, but I don't have the ability to execute code or generate images directly. I'm an AI language model designed to provide information and assist with code writing, but I can't run programs or create actual files on a computer.

---

Gemini: Unfortunately, I cannot directly execute Python code within this text-based environment. However, I can guide you on how to execute it yourself.

---

> 4. Also you can use takeout

I just checked and wasn't able to takeout Gemini interactions. There are some irrelevant things like "start timer 5 minutes" which I triggered with my phone, absolutely unrelated to my Gemini chats. takeout.google.com has no Gemini section.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: