Hacker News new | past | comments | ask | show | jobs | submit login
Hey, GitHub – Waiting list signup (githubnext.com)
316 points by rcshubhadeep 83 days ago | hide | past | favorite | 241 comments

This is the third such thread in the last 24 hours which consists of nothing but an elaborate waiting list signup. I've changed the titles to make that clear.

GitHub Blocks – waiting list signup - https://news.ycombinator.com/item?id=33537706 - Nov 2022 (41 comments)

GitHub code search – waiting list signup - https://news.ycombinator.com/item?id=33537614 - Nov 2022 (48 comments)

A good HN discussion needs more than a waiting list signup. A good time to have a thread would be when something is actually available.

When I talk to my Google Home then 50% of my brain power is engaged in predicting and working out how to best phrase something so that the "AI" understands what I mean and the other 50% is used to actually think about what I want to accomplish in the first place. This is just about okay for things like switching lights on/off or requesting a nice song I want to listen to, but I could never be productive programming like this. When I'm in the zone I don't want to have to waste any mental capacity on supplementing an imperfect AI, I want to be thinking 100% about what I want to code and just let my fingers do the work.

For that reason I think this will be less appealing to developers than GitHub may think, otherwise I think it's a cool idea.

I think the biggest use case for this is accessibility. There are plenty of people who permanently or temporarily cannot use a keyboard (and/or mouse). This will be great for those users.

For the average dev, I agree this is more of a novelty.

I am highly suspicious of new tech coming in the guise of 'accessibility'. As someone goin blind, a lot of things toted as good for me are cumbersome and bad.

Maybe this will be different, and that'd be neat. Though I just think more expressions of code is neat. I also know the accessibility you're talkin about isn't for blindness.

That being said I can talk about code decently well, but if you've never heard code come out of text-to-speech, well, it's painful.

I bring up the text-to-speech because if speech is input, it would make sense for speech to also be the output. Selfishly, getting a lot of developers to spend time coding through voice might end up with some novel and well thought out solutions.

For sight problems you are correct. But voice input is valuable by itself. I had chronic tendonitis in my wrists a few years ago. I looked into voice coding and it was difficult to set up. Fortunately for me I've been able to adapt with a vertical mouse and split keyboard.

You look at the product from your point of view and you are not the target group, it's that easy.

I do think there will be big advancements in the text-to-speech realm. I've noticed some ML projects imitating voices surprisingly well and while it's not quite there yet - it's already a bit less grating than it was even a few years ago.

“I think there is a world market for maybe five computers.” - Thomas Watson

I bet if we use our imaginations, we’ll think of a lot of places were using voice to code could come in handy.

Personally, I’ve been waiting for it for a few decades.

The creator of TCL has RSI and has been using voice since the late 1990’s


Thought we were really close 10 years ago when Tavis Rudd developed a system:


GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

This would help if you barely knew the language.

Time to learn Rust or Scala with a little help from machine learning.

> GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

To me, looks like it's feeding your voice input to Copilot that then generates the code output just as before. So, the same strength and weaknesses of Copilot apply (and you can probably mimic it locally with a voice input method you control, just dictate comments for copilot)

> “I think there is a world market for maybe five computers.” - Thomas Watson

This statement probably didn't happen. The closest thing to it was 10 years after the quote is usually supposed to have happened and was about a single model of a single machine: https://geekhistory.com/content/urban-legend-i-think-there-w...

As a new dad, I would love to have the voice-to-text accuracy and speed I get on my Pixel phone on my desktop OS. Done right, I could easily see myself using it more often than when I have my youngling in one arm as I've been WFH for the better part of the last 6 years of work.

This looks to be much more heavily using GPT3/Codex/Copilot, which I've found to be eerily effective. It basically feels like a voice interface to Copilot. The main difference between these and something like Google Home is how effectively they pick up on context. "Hey Github" would be able to use all the code in the file as context, so when you say "wrap this in a function", it'll have an idea of what you mean, without that function having to be explicitly programmed. Voice assistants have to _always_ be in a voice space, so context is very limited. And generally the way Google home-style voice assistants are created is by programming specific actions linked to specific phrases. ML helps make the phrase matching flexible, but the action is usually entirely explicitly coded. Using Codex would let the action be ML influenced as well.

If Copilot is any indicator of effectiveness, then I have high hopes for this! I've always wanted to program while stationary biking :)

I think yes this could be a real multiplier for seniors, you're doing something you have done lots of times before just a bit different you know pretty much everything you need to do, describe it until it is in a state where you can through and finish it off. Exactly like a stationary bike or out in the garden with your kid type thing.

IF the voice analysis was any good of course. But maybe it will also be able to be better than typical voice analysis because the syntax is limited, when programming I use a much more limited vocabulary than when writing literary criticism. So while text to speech is total crap for handling complex literary phrasing it might be adequate for programming structures.

I’m a senior/systems architect coming down with bad carpel tunnel and this sounds like a godsend

Around 1998 I broke my collarbone and had to use Dragon Dictate.

I found that for general subjects it was quite difficult to use because of the fairly poor recognition rate.

But when I talked about computers, it got almost everything right. I assumed it must have been trained by the developers, who talked about computers mostly.

This is another special purpose vocabulary, so it seems as if it would have a good chance of a high recognition rate.

It’s most likely just Cortana bolted on to Copilot.

GithubNext here! Just clarifying that this is not the case.

Then it’s Whisper into command tree thing?

I don't use voice assistants any more due to privacy concerns but I wrote some similar software in 2010s. I'm fluent in English, but with the current tech, the success rate for me giving commands to a machine is still 50/50.

> I could never be productive programming like this.

It's likely to work much better than a generic speech-to-text model due to fine-tuning.

Plus, consciously or not, we will adapt our human language to the English-ML "pidgin" (e.g. by introducing a more efficient grammatical structures, using a specific subset of vocabulary).

The way I see it is that it's not much different from giving commands to your dog, writing a Google query, writing a Stable Diffusion prompt. It'll get better. Manual input is not as fast as speech though and that's where I see the issue.

I am happy to take a severe deficit over not being able to work at all. When my back was acting up, I could not physically use my left side. Dictation was the ONLY way I could code. By the end of this period, my output was back up to 95% of my typed output - especially as I don’t type code nearly as fast as I do general language writing.

GitHubNext here! We would love to hear more about your experience. Please help us out by signing up for this experiment :)

The voice interface experience(in general) so far is like trying to make a really stupid person do something for you. Out of context misunderstandings are the worst because it breaks your flow trying to understand why that happens and how to fix it.

I imagine that voice to code would be like standing over the shoulders of a junior coder who knows the syntax and some techniques just enough to follow orders but has no idea whats doing and when gets it wrong will be very wrong.

"Writing is thinking. To write well is to think clearly. That’s why it’s so hard." ~David McCullough

This not only holds for literature but also for programming. Concerning the hard part, I would argue that is the reason why it is not called "talking is thinking".

"If you're thinking without writing, you only think you are thinking." -Leslie Lamport

Even though now speech recognition rate is really high, but I wonder how many authors use speech to write articles. The comparison may make sense. And I think there's few.

I think there's a difference between communicating your intent to a machine, which is hopeless since it has no model of intention; and commanding a machine to reproduce something.

Ie., when you're managing your house you want something that can be communicated in an infinite number of ways, but the "AI" accepts a tiny finitude of ways.

However when programming it seems like we arent asking the machine to "write a function to do X", but rather saying, "def open-paren star args...."

This seems like a pretty trivial problem to solve.

> However when programming it seems like we arent asking the machine to "write a function to do X", but rather saying, "def open-paren star args...."

Click the link first and take a look at what is being showcased, because your comment is the exact opposite of what they demo when you visit the HN link.

You're right... So, yes, it will be largely useless (as shown) for actual programming.

But I suspect there'll be a subset of its features consistent with my comment that will be actually useful.

Programming, via Naur/Ryle, is always a kind of theory building. And unless you're basically copy/pasting, it's a novel theory of some area (, business process, etc.).

That's something where intentions arent even really communicable as such, since the art of programming is sketching possible theories as a way of finding out what we ought intend.

So this is another gimmik with maybe marginal improvements at the edges.

It's really useful for those who have challenges typing (arthritis, disabilities etc..), perhaps not best for general audience as typing with auto complete is faster.

For repetitive tasks like preparing a report in the demo, saying is definitely faster than typing. It's quite impressive if your boss ask you to prepare one and the report is done in less than two minutes.

However, I too really doubt if there's any better use cases than simple tasks, let alone everyone would hear what you ask the AI to do in the office. Oh my! How embarrassing am I?

The assistants (Google, Alexa, Siri) are not great at NLP. Compare how you speak to them vs speaking to a LLM like gpt-3, there is a world of difference. The latter feels like speaking to a human, the former more like your trying to get your voice commands into a state machine.

Blind people are already very productive using voice-to-code.

There may well be examples of this, but while the blind developers I have known (a small sample, I admit) typically use screen-reader technologies to navigate and read code, they use a keyboard to write and edit it.

I don't disagree with that. I just meant I don't think it's going to have mainstream appeal. A wheel chair also makes a disabled person super productive if the alternative is not being able to go anywhere at all, but it doesn't make wheelchairs super appealing to people with healthy two legs if you see what I mean.

I think this is great for: a) people who are visually impaired or have issues with their hands/fingers b) people who aren't programmers; if you could make it more Scratch-like then this is amazing tool for showing off power of programming

The mental load would reduce with practice very quickly

they're creating a new job "prompt engineers" to replace the engineers. this is 2022.

The rise of the... t...talking monkey? cognizes intensely

Very interesting - I was sort of expecting it to happen soon.

I have been playing with using Whisper + Github Copilot in Vim [0]. The Whisper text transcription runs offline with a custom C/C++ inference and I use Copilot through the copilot.nvim plugin for Neovim. The results were very satisfying.

Edit: And just in case there is interest in this, the code is available [1]. It would be very awesome if someone helps to wrap this functionality in a proper Vim plugin.

[0] https://youtu.be/3flN9kTcZJY

[1] https://github.com/ggerganov/whisper.cpp/tree/master/example...

I feel like this is being misunderstood - the long term view of this wouldn't be for code scribing, it'd be for non technical people to be able to instantly create things. Imagine being able to say outloud to your phone "hey, create me a view of all the weather data from the past year correlated against x and y in z view format". The code is the means, not the product.

> "hey, create me a view of all the weather data from the past year correlated against x and y in z view format"

Ironically, I think only technical people would even want to do something like that. The less technical you get, the more high level (and ambitious) you need to go.

You can see this a lot in game dev questions. Beginner questions will be "How do I make an MMORPG?" and the more advanced questions will be "how do I return x from y" or whatever, and then it scales between the two ends of the spectrum.

There's plenty of technical people who don't code — like, in your example, a game designer.

Why would it be more successful than the past countless efforts to make that a reality ? (it could as well be, but why do you think so?)

As a more simple problem space, building programs from UML charts was one of Java's promise, and it failed miserably, not because the technology was lacking, but because it's just a damn hard problem.

As of now ee have nothing approaching "non technical people to be able to instantly create things" if the "things" you want are useful in any way.

The big difference is that Copilot generates real, functioning code based off past implementations of real, functioning code. A no-code tool like Webflow is an entirely different product all together. The ML transformers powering Copilot are the secret sauce here, that technology just literally did not exist when Java and UML were all the rage.

I'd encourage anyone who hasn't tried it to give Copilot a try. There really has never been anything like it in my memory, and while I totally agree there have been dozens of efforts to allow non-technical people to generate code, I think Copilot may be on to something very special.

Copying functions and composing them into apps are two completely different things.

The problems are more than how to express.

You have to read code and debug it that is inevitable, you can't say that there will not be any bugs if you use voice instead of writing.

I think it’ll be better because Copilot is pretty good for typing and this builds upon that framework

People who can think of weather in terms of plotting and correlating it are usually technical enough to code


The crazy thing is... this probably will work.

In 20 years. But, it probably will work.

There is absolutely no reason you cannot use a neural network to transcribe appropriately phrased requirements into an AST.

There are several reasons this wouldn't work:

1. How do you check the output of the voice to code step? If you need as much expertise as you do now to actually review the code, then the voice to code step is just a layer that adds confusion

2. How would debugging work? Again, would you still need to be able to understand the code? Same issue.

3. What if you have to pause and think? This will affect how the voice to code interface interprets your speech.

4. How would you make a precise edit to your source audio using a voice interface?

5. How would you make changes which touch multiple components across the project? How would you coordinate this?

6. Precisely defining interfaces between components and using correct references to specific symbols is very difficult to do in natural speech, which typically uses context to resolve ambiguous references. The language you would be using would still have to resemble the strictness of a programming language even when spoken, but you have replaced a reliable checkable channel (input through keyboard, transfer as-as to text buffer, feedback from visual view of source) with an unreliable channel (input through microphone, transfer through complex signal processing and multiple neural network language models, through multiple representations, where you have to be able to check multiple representations for feedback about the structure of your program (initial speech-to-text step, text to source))

Voice to code into Scratch. Novice voice coder then debugs a scratch-like extremely easy to debug application.

Voice to code into Scratch, plus 20 years of hard product development effort.

I'm not saying it'll be easy or work the first time. The Apple Newton failed miserably the first time.

I'm saying touch screens still worked out amazingly well two decades later.

If we can say anything about programming today, it's that programming today with 20 years of advancements is remarkably different to 20 years ago.

I fully expect the next 20 years to be similarly fruitful.

It may work. Since it may be just a new "programming language" (somewhat literally), i.e. a new level of high level abstraction. We already know examples of such transition to higher abstraction levels: binary code -> assembly languages -> c/lisp/fortran/etc -> c++/javascript/go/python/r -> np/torch/react/whatever frameworks/libraries. For an average programmer nowadays knowledge of frameworks/libraries is as important (if not more important even) as actual knowledge of the programming language they use. The only disadvantage of this is that people will need to adapt to something generated and updated via a machine learning. So far there are not much examples of that, except maybe people adapting to Tesla Autopilot with every new release. Before we were adapting to a new c++/python/framework version, in future there will be GitHubNext v1, v2 and v3 with known features and bugs.

The only problem with this being a next abstraction level, is that it actually leads to more "coding", because of general spoken language being less informationly dense as any programming language.

Before by switching from binary to assembly to higher level languages to frameworks/libraries, you generally reduce amount of "code" being written after each step, with voice programming this seems to be the opposite.

Before copilot, we were far from it.

But now, given how magical this thing is, it opens many doors to what's possible with no code.

I never really believe in anything no code before apart from Excel and RAD.

But basic tasks are going to get accessible to a lot of people sooner than I expected

I'm almost certain a trained professional is still going to do a far better job than an amateur.

That part is unlikely to change much, if it all.

Trained professionals will always outperform the latest fellow dragged off the street.

That doesn't matter much.

You free part of the industry from needing professionals for some simple tasks, which is enough to empower users with a lot of possibilities, and focus pros on where they are really needed.

"Hey phone, next time mum send me a text about voting, you can send back a 'ok boomer'?" or "hey phone, can you setup a webpage that list my tiktok videos up to last year?"

Most people are not going to hire a pro for that, but we could end up with a AI general enough to be able to do that for users. It's ok if the result is not extensible, maintainable or modular.

From memory, there was a time (end of millenium?) when using voice recognition to write documents was the next big thing. There was a pricey bit of software for Windows that was popular with power users and they would spend hours training it to their voice.

Then it seemed to just die off. I don't think it was bad technology, because I don't think novelty value was enough to account for its popularity - you had to put hours in to get it to work well, it wasn't a casual toy.

What's changed since then in terms of technology? Unless it's very significant, I suspect it will go the same way. Apart from an assistive technology viewpoint, my gut instinct is that it's not that satisfying or rewarding talking to a computer all day.

Ah yes, Dragon NaturallySpeaking. Training for hours and hours and getting incredibly subpar results. It was a fun toy but there's a reason it didn't really take off in corporate settings.

Dragon NaturallySpeaking is still alive at least in medical practice. Its Nuance Dragon Medical One product is fairly popular in some regions for medical report dictation as radiologists don't like to write them down (sorry ;) ). I've seen a Philips product in that field too. Seems like LG's TVs used their recognition engine for a while (don't know if that's still valid)

The story for the most mainstream-popular dictation softwares is kind of funny. Back in the late 90's there was Dragon NaturallySpeaking and IBM's ViaVoice. In early 00s, after a financial fraud and bankruptcy involving both the then current Dragon owners and Goldman Sachs, they got bought by Scansoft. Scansoft bought Nuance, began to use its name, and then got exclusive rights for ViaVoice (!) from IBM.

Now, in March this year, Nuance has been acquire by Microsoft.

Interesting, I didn't know the history there! Just remember playing with it as a kid :)

The best recognition rates on the words ‘scratch that’, which you used every other sentence.

> What's changed since then in terms of technology? Unless it's very significant, I suspect it will go the same way. Apart from an assistive technology viewpoint, my gut instinct is that it's not that satisfying or rewarding talking to a computer all day.

Training data is now abundant compared to twenty years ago, and so is computation power. That means training can be much more complex now.

The underlying technology is now typically neural networks (broadly speaking), whereas twenty years ago it might have been Hidden Markov Models.

Overall, recognition quality, even without speaker-specific training, is now on a very different level than back then. Whether it’s considered good is a matter of opinion. But it’s significantly better than twenty years ago.

WFH become more common. You cant program with voice in office full of people who also program with voice. At home you can multytask work with voice while doing home stuff with hands. You just need projector do display on wall while you cook or assemble IKEA table.

I cannot even concentrate on reading a text while the radio is playing; let alone programming while assembling furniture.

> What's changed since then in terms of technology? Unless it's very significant, I suspect it will go the same way. Apart from an assistive technology viewpoint, my gut instinct is that it's not that satisfying or rewarding talking to a computer all day.

It's hugely significant - look at this graph of Google's speech model accuracy across 2013 to 2017:


Or this that shows a similar pattern:


Unfortunately 95% isn’t a lot of nines

Human transcription accuracy is just two nines :)

one thing that changed that you can see in a demo, that it's speech recognition paired with a neural network. He doesn't have to say "write titanic = titanic.dropDuplicated()" he just says, "drop duplicates from titanic" so you "in theory" have to write less. It probably falls apart when you have more complex things to write and then you have to fallback to speaking it out word for word, but it is an interesting development

There is a reason we don't have many programming languages where you can say “drop duplicates from titanic”.

Programming languages need to be unambiguous, but voice interfaces to them don't have to be. With a voice interface, you can say something vague, see what it gets translated into, and either click Run or edit it.

what's the reason?

Natural language is ambiguous and doesn't represent nested structures well.

Main things that have changed are:

1) Improvements in speech to text, as others have mentioned

2) Improvements in language models (and model size) allowing for more flexible interpretation of speech. This isn't dictation anymore. It's more like instruction. You don't have to tell the computer exactly what to write, you tell it in much more broader terms. Eg "pull this out into a function". Or "delete the cookie before creating the transaction". Or "lint the file".

That's my guess anyways! This mostly feels like a voice interface to Copilot in a lot of ways. Can't say whether it'll be effective, but I'd love to be able to program while I'm e.g. on a stationary bike!

Isn't that a very niche application simply because it's voice-based? I.e. it can only be used if you are alone in an office, otherwise you would be annoying your coworkers and the voices would get mixed up.

The only time I’ve ever successfully used voice recognition was teaching skyrim to recognize my words of power. Shouting fus-roh-dah ! was incredibly satisfying.

It might have been Dragon NaturallySpeaking. I remember toying around with it 20 years ago or so. Apparently it has just been bought by Microsoft.

I have good memories spending 4+ hours training Dragon to end up with what seemed like 30% accuracy.

I sometimes use voice recognition in Notes on my Mac to write up my meeting notes, but I find that my waffly speech results in very verbose notes. Also still quite a few mis-hears where I read my notes back later and have to work out what I was actually saying.

Can I have transcription that can then turn my rambling into neat and concise prose?

Many people dictate messages on their phones. Doctors use it extensively.

Not a doctor, but I also started to using the dictation feature on my iPhone more and more recently. It's often more convenient than typing when I'm walking (which I do a lot) and the pickup of voice messaging, talking to your phone like it's a mic, made me more comfortable doing it in public.

Dragon speech? I used it quite a bit 10 years ago!

If this works well, I would pay a seriously high amount of money. My daily coding time is currently limited by the pain in my hand/fingers that eventually becomes too uncomfortable, and I have to wait for a "cooldown" period of days to "reset" my hands back to normal. I can't even code on a normal keyboard or trackpad for a long time anymore.

The problem with current voice programming systems is they're just too slow so I end up getting impatient and using my fingers anyway

I imagine you have done extensive research on your own, but in any case I found this article by Josh Comeau on coding with voice commands and eye tracking very interesting: https://www.joshwcomeau.com/blog/hands-free-coding/

I wonder if the quest pro could do the eye tracking as well? It has pretty extensive eye tracking cameras, not sure how precise they're though.

Could it also do a virtual keyboard, but a custom layout to not trigger arm, elbow and hand pains?

We built a VR prototype a year ago for voice to code using Codex in VR: https://www.youtube.com/watch?v=icHLoxOFerk

If you want to code with your voice, also checkout https://github.com/cursorless-dev/cursorless

Have you tried an organ MIDI pedalboard and a script to translate MIDI to keystrokes? You could also put a micro controller between the pedalboard and your computer so that it looks like a normal keyboard to the computer. I do not know whether that would be pratical, but some sort of feet keyboard is in my idea space for what if.

GitHubNext here! We appreciate your support. Please consider helping us by signing up for the experiment on the website and providing feedback. :)

I'm assuming you have an ergonomic keyboard? If so which one?

And a movable monitor, decent chair and desk at the appropriate height...

An ergonomic keyboard layout like Workman doesn't hurt either. QWERTY was made with typewriter jamming in mind, sacrificing ergonomics completely, especially for English.

Ergodox EZ

The challenges that remain in speech coding is not generating code as much as it is navigating through existing code or an application.

There's only two ways to do this effectively and unfortunately no one has taken the true path to accessibility. The more common way is plugins/extensions to grab a information from the editor.

Accessibility is more than just one editor. It's the OS and all the applications. Microsoft needs to take the hard route to make an accessibility UI automation server to grab that information and only make up the difference through plugins as needed.

It's all about grabbing information from the application and generating on the fly commands, not just parsing free dictation in order to get the best accuracy.

It takes a lot of expertise to make any sort of UI automation, fast and efficient for navigating and selecting text or out of focus menu items.

I've fussed around and managed to get tree sitter to navigate across code. For example generic commands are like 'next function'. Code simply isn't pronounceable when it's written by others. Therefore, navigating across generic tokens is really the best method. Then other methods can be used for fine navigation if needed.

My hope is that they develop a grammar system that is open source and integrates with accessibility frameworks focused on performance.

I wish I could have a phone call with the development team.

I think an accessibility a la vim or with something like tree sitter, would help immensely like:

  “Top of file
  Down 5 lines
  Modify import source to …
  inside first class
  Down 5 methods
  Insert new method after
  Inside arg list
  Append arg named … of type …
And add a way to indentify types and parameters with special pronunciation.

I recognize that all applications are not accessible through accessibility APIs. However, there is no high level access to accessibility APIs. There are quite a few for automated testing UI. However, none of them are performant enough for speech to code or screen readers. Testing automation frameworks don't really require high performance.

Accessibility accessing the content of the application and the context is what's important. It's more important than the speech recognition backend.

Speech recognition shines work best with a narrow context. (when those commands are available)

The type of performance we need as a speech recognition community and screen reader community is quite high. By the beginning of speech and just before decode time information needs to be available to be parsed for navigation/editing. That way these tokens can be weighted as commands for recognition.

Commands could be modeled after vim functionality though.

Outside of tree sitter it would be interesting to hook into hooking into as a client a language protocol server. However, I think they only expect one client. In addition, I still see that as a lesser approach without dedicated support for high performance UI automation server for speech recognition engine to leverage.

Yes, minimizing number of command and specificity as much as possible for navigation by understanding the context of where the user is optimizes the user's time in navigation.

Imagine even more precise commands 'next function' followed by a letter. That allows you to navigate to only a function with that letter defined. Really the possibilities are endless when we have complete context of the screen and the structure of the code.

Someday I hope for the release of something like stable diffusion for voice coding. An open complete pipeline that users can illiterate fast and innovate!

However weird and seemingly useless this might appear to the normal programmer on here, I see this as a huge accomplishment and an incredibly important tool. Why? Accessibility.

Let’s hope that I never get in a serious accident or get an disabling disease, but if I do I am not planning on giving up coding. What would you do if you lost your hands, or became permanently paralyzed. This is the tool we need to combat that. Hats off to github on this one.

People who cannot see or use a keyboard already use tools like this to code. Been doing this for a long time.

Related: https://www.youtube.com/watch?v=MzJ0CytAsec

It does look like we've made some progress in the 15 years since. I do wonder how this would work in an office setting though - so much noise, so much distraction, and so much crosstalk between programmers...

> I do wonder how this would work in an office setting

Everyone gets a throat mic and the cubicle farm is full of unintelligible whispering instead of clacking of keyboards? Can't wait for the future. /s

Hahahah thank you for posting this, I was about to go look for this because I remember being in tears laughing when I saw it this first time and immediately thinonof this whenever I see voice controlled things

Having programmed and navigated my PC via voice exclusively for about 6 months, done a ton of research and written several articles about it and what options are out there [0][1], I think might be pretty ground-breaking stuff.

Inputting code with voice is generally difficult, often due to variable names, casing, punctuation etc being hard to get right in voice-to-text. I think this might help quite a lot with that.

_However_, some of the hardest things in voice coding isn't necessarily just the input. Navigating large codebases is hard, and particularly editing existing code can be extremely difficult, probably much more difficult than just inputting new code.

I have my doubt that with the demonstration shown here, that it's able to make complex editing tasks simple, but if it does - I cannot overstate how huge of a leap forward it is.

[0]: https://www.gustavwengel.dk/state-of-voice-coding-2017/ [1]: https://www.gustavwengel.dk/state-of-voice-coding-2019/

>Having programmed and navigated my PC via voice exclusively for about 6 months...

I'm curious, why have you done this?

I can't speak for his use case. However, people with medical conditions like RSI, stroke or anything that limits their action between keyboard and mouse.

However, the average developer doesn't need those fine-grained navigation controls but can still benefit from enhanced input through voice. Some have mental disabilities who interface differently. Others are simply supplement their input as an average developer by voice as a preventative measure for repetitive strain RSI. The day the hope is develop something that every developer could see the value and leverage. In a way accessibility is for everyone.

In general I see accessibility as a hierarchy that could benefit everyone. Accessibility APIs, close to real time OCR, Eye tracking, alternative inputs (eg pedal, touch pad, stylus) allowing for the broadest possible input and APIs to extract information from applications. Extraction of information from applications and input to applications allows the user to specialize for their use case.

My experience as people will become experts in voice their command vernacular shortens as they carve out their niche use case. It goes beyond singular shortcuts too series of actions to get stuff done. However, what really means to happen is voice systems need access to the OS and to the application to really shine. That would empower not only navigation for those that are disabled but context-specific commands that are intuitive and abstracted like next function or parameter.

I had very bad RSI

In the meantime, Talon is pretty good. You can use Vim motions and commands as you normally would, except using your voice (this applies to any editor, really):


Talon is exceptional, I only wish it was more natural to drive cli commands, I find I need to spell them out which I’m still quite slow at.

I think commenters here are -- as usual -- missing the point. This is the training ground (literally) for better models able to respond to commands like "take the CSV from me desktop, plot columns A and D and check if the KL divergence os close to zero". And from that to more complex tasks. You always need the first step and this is it.

I'm bullish.


Copilot is getting better everyday, because it's learning from the way we are using it.

GitHubNext here! We appreciate your support.

GitHubNext here! We appreciate your support.

I've tried writing documentation and fiction using text-to-speech and, for me, it doesn't work because the apparently the of my part brain I use to think about what I'm going to say is the same part I use to actually say it, so I can't do both things at once. I end up writing far more slowly than I can type.

In case anyone else stopped after watching the video, if you scroll down a bit further you see the list of


Write/edit code

Just state your intent in natural language and let Hey, GitHub! do the heavy lifting of suggesting a code snippet. And if you don't like what was generated, ask for a change in plain English. Go to the next method

Code navigation

No more using mouse and arrow keys. Ask Hey, GitHub! to...

    go to line 34
    go to method X
    go to next block
Control the IDE

"Toggle zen mode", “run the program”, or use any other VisualStudio Code command.

Code Summarization Don’t know what a piece of code does? No problem! Ask Hey, GitHub! to explain lines 3-10 and get a summary of what the code does.

Explain lines 3 - 10

All i could think of while looking at this was having to tell Siri where every comma and period should go while texting with it.

"insert curly brace", "insert semicolon", "insert insertion", etc. does not sound to fun.

My reactions to the demo (when all is good there is no reaction, so here are only the problematic ones, sorry)

1) import matplotlib.pyplot as plt

Why "as plt"?! Let the import alone. But this is a matter of style.

2) Get titanic csv data from the web [...]

Surprise, it turns out that "the web" is an URL on raw.githubusercontent.com Hopefully I'll be able to spell an URL of my choice

3) clean records from titanic data where age is null

Somehow I already know that there is an Age field and somehow it knows that it must capitalize age into Age

4) fill null values of column Fare with average column values

The generated code looks great but somehow I managed to spell a capitalized Fare this time :-) (this is probably a typo in the demo)

5) Hey,Github! New line

Inserting a new line can't take so many words. We're going to do without new lines or rely on a formatter or something equivalent.

6) plot line graph of age vs fare column

This is where it becomes evident that there was no need to import as plt because I'm not pressing those keys anyway. But this is style and it's going to be uniform across all the users of these tools.

7) Hey, Github! Run program



A) Why do commands (new line, run) need "Hey, Github!" which is pretty long and terrible to repeat all the day long (just imagine having to say Hey Joe every time we have to say a sentence to Joe, withing a long conversation with Joe) and text-to-code doesn't?

B) We got a graph at the end. Now what should I do to edit the code in those 99% of cases where I got the graph wrong? An acceptable answer could be mouse and keyboard. It's a little underwhelming but voice to code already gave me the structure of the code.

C) Does that mean that Microsoft and GitHub are going to know all the closed source code we'll write for our customers (there might be contractual implications) or is this something that will be self hosted in our machines?

GitHubNext here! Here is a little writeup that explains a bit more about the project https://github.com/githubnext/githubnext/tree/main/HeyGitHub

Hope this is helpful :)

To note, there's a class action lawsuit against GitHub Co-Pilot since it learns from a bunch of open source code with very specific licenses. It's very interesting from establishing copyright in an AI training perspective. Hopefully it goes the distance and some nuanced arguments come out in the court case.


Spoken language is incredibly ambiguous. It's one thing to generate a drawing which can vary wildly in output and still be acceptable. It's another to specify something precisely to a computer. Working with non-programmers on a daily basis it is incredible how difficult it is to communicate even relatively simple things without confusion.

So all the more power to them, but I am very skeptical. Especially since co-pilot has zero knowledge of the formal semantics of programming languages.

This is a lot different than the half ass auto complete that it already does since that at least has some context.

It's the same with copilot; you have to know how to implement things to implement things with copilot (for the most part), but when you are a programmer and you could write the code, then you know the prompt to write to generate 10+ lines of code for 1 comment of text all day long. Especially for data transformation, copilot has been a real magic tool; if you put in a comment:

      this functions transforms this json from: 

          ... some complex structure in json 
      to this json: 
          ... some different structure in json 
... copilot comes up with the function that takes in the first and spits out the latter. Even if the fieldnames do not match etc, it usually 'guesses' right what fits on what (so it does have some context from it's learning phase what 'looks alike' or 'might be the same thing'. Example: I had a structure with firstName: string, lastName: string and a target structure with name: string; it just did name: firstName+' '+lastName, which was indeed what I wanted. But it comes up with more intricate stuff as well that is pretty much surprising (too human basically).

What is another bonus; if you generated function transfromAtoB(a: A) above, then you only have to do:

      do the reverse of function transfromAtoB, accept json structure B as input and return structure A
And it'll come up with the reverse.

It's not hard to write yourself, but it's boring and error prone (some of these structures are huge). Now I press tab a bunch of times, and run the tests to see if it worked. I am also not that worried i'm infringing someone's open source code; this is all way to custom to look like anything else. That's where this shines; things where it verbatim copies something, you should've been using a library anyway.

Statically typing and using typescript definitely works better than other combinations I have tried (C# was pretty bad last I tried it, JS is good but often subtly wrong because of type issues).

With copilot ambiguous language gets transformed into concrete syntax. If the implementation doesn't fit your ambiguous request, you should be able to refine … with ambiguous language. Theoretically this would create a "programming dialog" environment.

So you are going to have to verbalize something, interpret the code, build a mental model of how it works, and then if it does not match what you want go back to step 1?

That sounds exhausting when we have spent countless human years developing languages which let us communicate our intentions precisely to computers.

If you don't do this there is no ambiguity detector. Meaning it's entirely possible for the computer to interpret what you are saying completely different than intended, yet it is a perfectly valid interpretation. So the only one who can qualify if it got it right is you.

I probably wouldn’t use this to write code, but I could see it being really useful for navigating around a project.

“Go to line 35” “Open the model controller” “Show the get method and set method side by side”

if you remember the keyboard shortcuts, you can be quite fast while working with VSCode. Voice will never be perfect.

Oh cool, my brother used to wish out loud something like this existed a few years back when his wrists were really killing him. He's wrists were so far gone he couldn't even type on a ergonomical keyboard for any greater duration of time, so he used to wish he could just talk instead.

For me, I got a ergonomical keyboard before my wrists went bad, and so far they seem to be holding up!

Moral of the story: get a good keyboard early, or you might need a tool like this one someday!

Hey Github what did the previous developer actually _mean_ with this piece of legacy code?

Eye strain is one reason I have been waiting for something like this. If I could close my eyes and just navigate the codebase through a mental modal and some voice commands, I really wouldn't mind paying!

I have looked at some tools for the blind, but you need just way too much dedication for it to work for you and since you have working eyes it is usually easier to just open your eyes.

There was an excellent talk at Strange Loop a few years ago by Emily Shea about how she'd learnt to code vim using her voice to combat RSI.


The demos are in Ruby, but I could imagine that languages with strong type-aware auto-completion could be easier to do.

This is effectively a new higher level programming language without a fixed syntax. Describing more the "what", not "how", and being much closer to natural language over computer language.

The voice part seems like an (albeit important) accessibility add on.

I'm sure it won't be perfect but an amazing step forward in the evolution of programming languages

I could be wrong, but I think (minus editor commands) much of this can be emulated in existing Copilot by writing a comment symbol followed by natural language. I wouldn't even be surprised if under the hood "Hey, GitHub!" is basically doing exactly that with the voice input.

I’m working on something similar. The target market is the 99% of people who want to program ad-hoc domain-specific problems. For example, generating charts w/o having to dig through all the data sources (Wolfram Alpha does a simple version of this). Building a financial risk model for a client’s specific request (you have to be a whiz at Excel, python or some internal ide). Even for home automation, my mom can’t use Alexa’s awful app to customize routines.

I don’t think the voice part is necessary. It’s easy enough to slap ASR on the front. But going from natural language -> full problem spec -> code is hard in the general case, but doable in well-understood domains. Why can’t Scotty talk to a computer? (https://youtube.com/watch?v=hShY6xZWVGE&feature=share)

Imagine sitting there, talking to your computer, and trying to get the notations right.

If err unequal nil opening bracket, no no don't open the racket opening bracket... BRACKET, do you know what a bracket is No don't do a do while, delete delete. Don't delete everything... sigh

Well something like that, I imagine it being a very painful experience.

Maybe you could use little clicks and pops with your mouth to signify different characters. The computer could learn which one is which. That way instead of typing

"if (int i = 0; i < count; i++)"

you could say something like

"if beep int i click zero boop i bop count boop i pop pop zing"

This would achieve the same thing, but much faster and with less effort than typing.

If saying that is faster than typing it, you're really slow at typing.

Or debugging. Goodness. I can only imagine.

I remember a talk given some years ago by a man who was using voice to text for creating source code. The key point I remember from his talk & demonstration is that it was not casual ordinary speech but instead a very weird mashup of sounds intended to represent the various symbols which we use in source code.

I think you're talking about this video: https://youtu.be/8SkdfdXWYaI?t=1049

Yes! That's the talk.

GitHub is doing a whole lot. I think I prefer to edit my code in an editor, not on the website where it's hosted. And I think I don't want fancy AI driven code editor features using my code either. But I guess it is nice they are considering solutions for vision impaired users.

I really hope this is very easy to use. I have severe RSI and can barely surf the web. I tried using other voice to code stuff and it just hurt my voice so I'm hoping I can speak very naturally. I'm really looking forward to seeing if this can help me code again.

I could see it being useful for things like "goto line 42" or "rename this file as...", or very simple things like that, otherwise, I don't want the cognitive overhead of having to translate coding intent through a voice interpreter.

I think this, or a future version of this, would have real potential.

I'm thinking about this in terms of the navigator-pilot pair programming approach, and believe that as a senior, if it's even half-as-good as working with a fresh out of uni hire, then it could have real value. When there's a piece of code that I would like written, when I have good test cases in mind, but would prefer to offload it on someone, I could perhaps write the test cases and function signatures (maybe with the bot's help), get the bot to fill in the blanks until it passes the tests, and then give it direct feedback on how to refactor the code.

I've signed up for the waiting list and am excited to try this out.

What you are describing is more akin to what GitHub Copilot already does. It is really good at taking a description and a function signature and producing a solution. Paired with a solid test suite it can definitely speed up development in my experience.

People can’t seriously believe this is going to be useful at all?

I can see this helping as an accessibility tool, but beyond that I don’t think it will be useful. This kind of assumes you know everything about what you’re doing, most of the time you don’t.

As someone who works remotely from home, the last thing I need is to start babbling to myself in code for 8 hours a day. I imagine that's a one way ticket to developing some sort of disorder.

Someone emailed me the other day to share their FOSS voice control system. I was really impressed. It seems to map syllables onto actions in a modal sense ala vim. If I were to build a voice control system, it would look much like this.


It's free software, it's local to your machine, you don't have to sign up for it, and it works today.

Great for accessibility, but I don't see this would work well in an open office, or even at home if other people are around. Seems really annoying.

Imagine using this in a setting where you're not alone in the room. Imagine using this surrounded by other developers who do the same.

Curious: In "Clean records from titanic data where age is null", how does it know that the age field is exactly `Age` and not just `age`? You cannot know this without examing the data set (the headers), so is the software inspecting the loaded CSV "on the fly" before us telling it to actually execute the code?

Why are all the comments here so negative? Maybe typing is a hard sell, but some of the navigation stuff seems quite useful. Even being able to invoke VS Code's command palette would be really cool with this. Something like "Open Dockerfile" would be useful and maybe faster than typing.

My worst prediction ever was at the end of my book, when I struck a positive note about voice interfaces. The startup I was at in 2015 had the pitch "Let your sales people talk directly to Salesforce" and we pushed the limits of what we could do with NLP. That particular startup had spectacularly bad management and so it flamed out in a series of screaming, raging fights, which I documented here:


But at the end of the book I struck an upbeat note, about how the technology was advancing quickly and within 3 or 4 years someone would achieve something much greater than our own limited successes.

But I was wrong. 7 years later I'm surprised at how little progress there has been. I don't see any startup that's done much better than what we did in 2015. Voice interfaces remain limited in accuracy and use.

So this is a frontend of Copilot. The example of "import pandas" getting translated into "import pandas as pd" is pretty convincing, as the tool helps developers to state their intentions. On the other hand, "hey, github, a new line" kills me.

We have come a long way. I remember when announcements like this one were done by companies on April 1st!

If translation is semantic and not literally identical, chances are that the user asks for a piece of code and it outputs something that is 100% identical to code that is copyrighted elsewhere. Big "blame the AI" legal loophole waiting to happen?

Actually would be useful.

If this is reliable I would pay to use it to some capacity, like add an argument.

I spent half an hour today trying to convince the O2 voice agent to get me a real person. Conversational AI is a special kind of hell filled with unhappy paths.

But for a glimpse of the future watch The Expanse or read William Gibson's Agency.

Execution is everything with this. I've wanted something like this so I could actually code while performing other activities or in various states of intoxication. Don't code and drive. Don't drink and code

I hope to see the click consonant "‖" adopted as "||" one day.

Let's try to picture the noise in an openspace full of people using that ... focusing is going to be difficult, well at least for people like me who are easily distracted by background noise/conversations.

What if the code in question is a DSL? Something say that is syntactically python, but with a namespace defined through a narrow set of imports. This would be interesting to explore for end-user scripting.

Nice attempt and interesting workflow using a prompt based transformer. I would prefer being able to spawn a command palette and skip over the voice, alongside having the choice between different variations.

Imagine an office where everyone is sitting screaming at their computer.

Programming Perl with speech recognition (an oldie but goodie)


This is awesome. I could see using this to write code on my phone even.

Interesting. I would find this annoying because its so different from what I'm used to, but the potential it has for people with disabilities is huge.

This is not going to play well with open-space offices.

it is not practical if we have to describe each and every line.

Also, imagine you are sitting in an office with other team mates - what happens if all of them talk together but are working on different projects. It will disturb others in terms of noise pollution.

but it will definitely be a fun project and might work perfectly when you are working alone from home.

Those who say it's useless, what do you think about blind people using this, or those who couldn't type?

Why does the oauth scope requires to “operate on your behalf” but the app is “not owned or operated by GitHub”.


One concern is in office space, saying things aloud is ... awkward to say the least.

I'm sure this will work well with my Scottish accent... (or any non-US accent)

Next is thoughts to code. Just read my mind I'm gonna seat there and think

Song to code, we shall be singing our next systems:)

dance to code. transform the esthetics of your movements into … whatever your boss requires.

I'd like to see how they do with my creative variable and function names.

    bool success equals user dot no i mean ah fuck stop stop quit

Thank god we're remote. An open office space with this would suck.

So software development houses will become call centers.

export const ButtonComponent; FunctionComponent no Github no semicolon i meant colon Github backspace 5 times no backspace delete delete Github Arrrgh goddammit

What you're describing is more like dictation. What you'd probably say is "export the button component", and it would determine the syntax.

Which will probably, outside of small, perfectly planned experiments, work similarly well

You're free to speculate, but I would respectfully disagree. Extrapolating from how effective GPT3/Copilot are at interpreting text and generating code, I think it has more of a chance of working well than any other tech has ever had. I definitely hope it succeeds, anyhow!

And you thought open plan offices were bad already!

I have rsi, github please make it work well

I imagine happiness in the open space

How does it do with SQL?

I do not want this.

Copilot -> Pilot

Addd some comments

No. Thanks.

very intresting!


Drop the "Hey Github" nonsense (hopefully it's only for illustration purposes anyways) and … this will be a generational paradigm change in how to write code… if it works. The hard part will be editing code with your voice too. Like "no, I meant …" etc.

VERY PROMISING, in any case you can just manually fill the gaps with the keyboard!

Generational? Idk. I work for a company that regularly sends out surveys, and there are several tools to integrate voice into it. Willingness to speak instead of type is quite low across respondents (which is a representative population sample). It looks as if speaking to a machine does not hold the same appeal as speaking to a human (something that can also be seen in telephone queue screener questions).

I hate talking to machines. Sometimes it’s the best option (I love using a voice assistant in the kitchen), but almost always I’d have a full keyboard as an interface instead.

If machines were amazing at Speech-to-Text, okay, sure. But while the capabilities are impressive, they still kinda suck at it.

The only voice control anything I use is to create reminders on my iPhone, and the only reason I use that is because the default reminders app's UX is really bad that it's quicker to use the voice commands.

I don't see how text to code would be faster than typing. And even if it is, typing speed is not really a limiting factor in the speed at which I can produce code.

Speech to text is now amazing: https://huggingface.co/spaces/openai/whisper

It got "Theresa's and Aidsdorm" for "Turisas and Alestorm". Surprisingly, it got pretty close with the German band Schandmaul (something Alexa recognizes 100% of the time as Sean Paul), transcribing to "Schandmauel or Schandmöhrl".

But yeah, that is pretty close to amazing.

I kinda forgot about it after seeing that the Rhasspy community experimented with it, and it had issues with short utterances and a slow startup time.

The first amazing result would not be for you to program with this, but for somebody with a phone to be able to automate a few small tasks with just the voice.

Exactly. We've seen a constant improvement in the tech for decades. I remember before color lcd phones that they had "voice control" and today we have assistants, which are orders of magnitude more sophisticated.

Yet, it hasn't stuck. I'm exclusively using Siri to set timers. Most people are like me, or don't use it at all. Some use assistants for googling factoids or something. Fidelity wise, it's really underwhelming.

It's not a social acceptance issue, because people would still use it at home, and they don't. It's a small chance there's some key UI insight missing (discoverability for one), but I doubt it. Even with perfect UI, natural language is quite flawed when you're dealing with technical details (see exhibit on variable naming).

Anyway, the chances of Github solving this in an exceptionally difficult subdomain, as a side project, seems like a... Let's say, long shot.

That said, the silver lining in all these billions spent on voice interfaces is accessibility. For some people, these things are a life saver.

This is not marketed as an alternative input mechanism for people who have otherwise no difficulty typing code. It's an input mechanism for people whose abilities to type are limited.

Yes, but this answers to the grandparent, not the parent.

This means it's an assistive technology, but hardly "a generational paradigm change in how to write code".

If it works equally good like Apple Siri or Google Hey (or whatever its's called), then it will be ... totally useless? I can't imagine that they bring a better product than two of the richest companies in the world even can't figure out (perfectly). And if I need to read and adjust all my code for typos, I can just write it myself.

Because in my experience it is very often like "Call Peter" -> "Today it's sunny in NY".

To be fair Siri was really good before iOS15 on the phone - very rarely got a word wrong then I don't know what they changed but it went belly up for me and many other people have said the same.

On macOS it still seems pretty good - I have carpal tunnel syndrome and by Thursday or Friday most weeks I end up using Siri to dictate not code but a lot of conversations in Slack, pull requests, iMessage, etc. In fact, I wrote this reply with Siri right now.

I don't know what version number, but when it was new I could depend on it to do things like sending SMS while driving, changing the navigation, etc.

Now it's barely worth attempting, because it gets it wrong more than it gets it right.

I definitely notice there's a difference in quality depending on your network latency I thought quite a bit of the processing was done locally now, but latency seems play such a part in its quality.

iPhone ability to convert speech to text has always been good. It’s always been Siri’s capacity to take a meaningful action from the recognized speech that has been problematic.

I've been trying to use Siri while driving more and more, it's amazing how distracting it is compared to peaking at the screen (it's naughty, I know, I try not to do it).

But yeah, something about talking to a device which gets things wrong all the time is ridiculously distracting, at least for me.

Sometimes I look back at the road after trying to workout what it interpreted and I feel scared how focused on the phone I became.

>I can't imagine that they bring a better product than two of the richest companies in the world

Code is much more constrained by language syntax though.

Even for the "call peter" example, while the input is easy, the expected range of inputs that Siri should handle and be able to differentiate it from is huge.

Of course this is still a problem for e.g. defining variable names, where you could say anything.

In my experience, OpenAI's Whisper speech recognition is beyond anything currently out there. Likely Github will use it on the backend.

> I can't imagine that they bring a better product than two of the richest companies in the world even can't figure out

Are either of those companies investing particularly heavily into voice agents? Certainly neither of them has anywhere near the kind of power of something like Copilot.

Also, a general agent is way different from one that's specific to writing code.

Somehow Google has gotten worse in the last couple of years.

It seems wonderful for people who can't as easily use a keyboard, but for most people, this doesn't seem any easier than using a keyboard. Am I missing something?

I use a Czech keyboard layout on my Mac, because Czech has some letters that don't exist on a US keyboard, and I don't like switching between layouts. So basically all "programming" characters (braces, brackets, parentheses, apostrophes, quotation marks, pipes, colons) are behind modifiers.

I would totally enjoy being able to tell my IDE to "call foo with bar and string hello there end string with a block of gee times two" or something, instead of:

  foo(:bar, "Hello there") { |gee| gee * 2 }
Just that, not having to think about typing different symbols would be a serious quality of life feature for me.

>So basically all "programming" characters (braces, brackets, parentheses, apostrophes, quotation marks, pipes, colons) are behind modifiers.

Poland ditched a similar QWERTZ-based layout in favour of this: https://pl.wikipedia.org/wiki/Plik:Polish_programmer%27s_lay...

It's basically the standard US layout but the right alt (AltGr) is a modifier. So, for example, AltGr+A gives "ą".

I don't see why something similar can't be done for the Czech alphabet.

> I don't see why something similar can't be done for the Czech alphabet.

It probably could — we already can't fit all the letters with diacritics on the number row, so "ď, ť, ň, ó" are key combos. But as far as I know, Czech uses diacritics a bit more than Polish (e.g. for sounds that are digraphs in Polish), consider:

"Že se nestydÍŠ, nutit lidi psÁt ČeskÉ speciÁlnÍ znaky pomocÍ dvojhmatŮ!" — that's 10 modifiers just for the diacritics.

Having ALL diacritics as modifier combos would make typing actual texts even more annoying than programming is now.

My solution is just not to use czech characters, seems to work well so far :D

Have you tried the UCW layout? English-like keyboard, but with a bonus modifier key that produces Czech (and other) letters. I use it and it's so much better than the traditional Czech layout.

If voice dictation was a killer feature, everybody would use it all the time for ordinary texts. But for some reason only few (lawyers? doctors?) use it.

I believe that's mostly because it doesn't work reliably. Doctors, lawyers, architects etc have a somewhat limited professional vocabulary and often say the same things, so voice recognition works pretty well for them. But when you write a random message, you have a much broader range of topics, and dictation that fails quickly makes the whole thing change from an improvement to an ordeal. "No, not 'or deal'. delete word. delete word. delete word. O-R-D-E-A-L. Yes, that's it. No, don't write that, sigh".

"… this will be a generational paradigm change in how to write code… if it works."


Can't really see myself working like this in an office, plane, cafe, with music on (my favorite way to code), in the house where my partner is also working. Then as others have said, editing might suck.

If it was a neural link then I'd be in agreement.

> The hard part will be editing code with your voice too

The hard part will be open plan offices.

It’s bad enough that so many meetings are now zoom/teams and proximity to coworkers means you end up hearing their side of their meetings.

Just wait until all the devs are coding this way too.

It's the future I always imagined as a child. A vast divider-less cubicle scape of people in Patagonia vests who define all caps constants by yelling at their standing desks.


I dunno, we already have stuff like Krisp AI background voice cancellation. I don't think it's far away to completely cancel background talking out. This is already huge for things like pair programming while one person's in the office, one is at home. If you have noise cancelling headphones for the person in the office too (with a bit of white noise), you can have a pretty perfect call in a noisy room. (not sponsored)


I'm not bothered by my call quality, which as you noticed is fine, I'm bothered by all the other people speaking (sometimes quite loudly) on their calls while I'm not on a call :-)

True, that's where I find noise-cancelling headphones with some white noise helps a lot. But I feel you

Crazy idea – whisper to use your computer. Might produce some quality ASMR in open plan office.

> this will be a generational paradigm change in how to write code… if it works


I could see it maybe being important once github codepilot is embedded in it? You tell it roughly what you want and then adapt by hand. But it is kinda funny seeing parent make such claims so early

  > once github codepilot is embedded
That's exactly the point of the demo, no?

Yup my bad for jumping to conclusions. This certainly seems worthwhile exploring

How can it not be a paradigm change when it changes the way people write code from “write by hand” to “generated by ai with natural language”?

The problem with speech to code has always been that precise syntax is hard, but AI codegen solves that.

So, no, it might not take off, but I feel like if it does, then it means ai-codegen will become the dominant way code is crafted.

That would be paradigm shifting.

It’s inconceivable that it wouldn’t be.

> The problem with speech to code has always been that precise syntax is hard

The biggest problem is that talking sucks. You presumably can handle voice input as well as is possible, yet here we are typing to you anyway, and for good reason. Even if the natural language part is nailed, you may as well type in that natural language.

I imagine it will bring some quality of life improvements to those with certain disabilities, but I don't see why the typical developer would want to go in that direction.

> generated by ai with natural language

I don't want to disparage their work, because it's really impressive, but "fill null values of column Fare with average column values" is closer to AppleScript than it is to natural language.

It doesn’t matter.

It solves the issue of trying to speak obscure code syntax like “close parenthesis semicolon newline”.

That’s enough to lower the barrier to entry for many people; I don’t know how good it is practically but it’s disingenuous to suggest it’s not offering a novel solution to an old problem.

If you prefer that, why not just use a language like AppleScript or Inform 7?

> …because speaking is easier than writing, obviously?

Easier isn’t always better.

The example puts it quite well. You kind of know what you want to achieve, step by step, but are not so comfortable with your tools.

Usually this kind of exploratory work involves a lot of Googling and copy-pasting snippets from Stackoverflow without putting too much time in trying to deeply understand things. If you get out what you want - great, if not, back to Google.

Works only in remote. Used in the office that'd be madness.

I can't wait to edit my Unreal blueprints using voice commands. Truly the future of programming.

This feels like Github expanding because it can't find anything else to do...It being a for profit organization means that it's unable to say "you know what we pretty much have everything we wanted so we're just going into a maintenance/optimization mode". This happens all the time in open source project where they simply tell their users to move elsewhere for the better alternatives but will never happen to a for profit organization.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact