Hey, GitHub – Waiting list signup

dang · on Nov 10, 2022

This is the third such thread in the last 24 hours which consists of nothing but an elaborate waiting list signup. I've changed the titles to make that clear.

GitHub Blocks – waiting list signup - https://news.ycombinator.com/item?id=33537706 - Nov 2022 (41 comments)

GitHub code search – waiting list signup - https://news.ycombinator.com/item?id=33537614 - Nov 2022 (48 comments)

A good HN discussion needs more than a waiting list signup. A good time to have a thread would be when something is actually available.

dustedcodes · on Nov 10, 2022

When I talk to my Google Home then 50% of my brain power is engaged in predicting and working out how to best phrase something so that the "AI" understands what I mean and the other 50% is used to actually think about what I want to accomplish in the first place. This is just about okay for things like switching lights on/off or requesting a nice song I want to listen to, but I could never be productive programming like this. When I'm in the zone I don't want to have to waste any mental capacity on supplementing an imperfect AI, I want to be thinking 100% about what I want to code and just let my fingers do the work.

For that reason I think this will be less appealing to developers than GitHub may think, otherwise I think it's a cool idea.

chipgap98 · on Nov 10, 2022

I think the biggest use case for this is accessibility. There are plenty of people who permanently or temporarily cannot use a keyboard (and/or mouse). This will be great for those users.

For the average dev, I agree this is more of a novelty.

jesterswilde · on Nov 10, 2022

I am highly suspicious of new tech coming in the guise of 'accessibility'. As someone goin blind, a lot of things toted as good for me are cumbersome and bad.

Maybe this will be different, and that'd be neat. Though I just think more expressions of code is neat. I also know the accessibility you're talkin about isn't for blindness.

That being said I can talk about code decently well, but if you've never heard code come out of text-to-speech, well, it's painful.

I bring up the text-to-speech because if speech is input, it would make sense for speech to also be the output. Selfishly, getting a lot of developers to spend time coding through voice might end up with some novel and well thought out solutions.

harvey9 · on Nov 10, 2022

For sight problems you are correct. But voice input is valuable by itself. I had chronic tendonitis in my wrists a few years ago. I looked into voice coding and it was difficult to set up. Fortunately for me I've been able to adapt with a vertical mouse and split keyboard.

theCrowing · on Nov 10, 2022

You look at the product from your point of view and you are not the target group, it's that easy.

boredumb · on Nov 10, 2022

I do think there will be big advancements in the text-to-speech realm. I've noticed some ML projects imitating voices surprisingly well and while it's not quite there yet - it's already a bit less grating than it was even a few years ago.

melling · on Nov 10, 2022

“I think there is a world market for maybe five computers.” - Thomas Watson

I bet if we use our imaginations, we’ll think of a lot of places were using voice to code could come in handy.

Personally, I’ve been waiting for it for a few decades.

The creator of TCL has RSI and has been using voice since the late 1990’s

https://web.stanford.edu/~ouster/cgi-bin/wrist.php

Thought we were really close 10 years ago when Tavis Rudd developed a system:

https://youtu.be/8SkdfdXWYaI

GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

This would help if you barely knew the language.

Time to learn Rust or Scala with a little help from machine learning.

darkwater · on Nov 10, 2022

> GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

To me, looks like it's feeding your voice input to Copilot that then generates the code output just as before. So, the same strength and weaknesses of Copilot apply (and you can probably mimic it locally with a voice input method you control, just dictate comments for copilot)

mkl · on Nov 10, 2022

> “I think there is a world market for maybe five computers.” - Thomas Watson

This statement probably didn't happen. The closest thing to it was 10 years after the quote is usually supposed to have happened and was about a single model of a single machine: https://geekhistory.com/content/urban-legend-i-think-there-w...

awslattery · on Nov 10, 2022

As a new dad, I would love to have the voice-to-text accuracy and speed I get on my Pixel phone on my desktop OS. Done right, I could easily see myself using it more often than when I have my youngling in one arm as I've been WFH for the better part of the last 6 years of work.

cdrini · on Nov 10, 2022

This looks to be much more heavily using GPT3/Codex/Copilot, which I've found to be eerily effective. It basically feels like a voice interface to Copilot. The main difference between these and something like Google Home is how effectively they pick up on context. "Hey Github" would be able to use all the code in the file as context, so when you say "wrap this in a function", it'll have an idea of what you mean, without that function having to be explicitly programmed. Voice assistants have to _always_ be in a voice space, so context is very limited. And generally the way Google home-style voice assistants are created is by programming specific actions linked to specific phrases. ML helps make the phrase matching flexible, but the action is usually entirely explicitly coded. Using Codex would let the action be ML influenced as well.

If Copilot is any indicator of effectiveness, then I have high hopes for this! I've always wanted to program while stationary biking :)

bryanrasmussen · on Nov 10, 2022

I think yes this could be a real multiplier for seniors, you're doing something you have done lots of times before just a bit different you know pretty much everything you need to do, describe it until it is in a state where you can through and finish it off. Exactly like a stationary bike or out in the garden with your kid type thing.

IF the voice analysis was any good of course. But maybe it will also be able to be better than typical voice analysis because the syntax is limited, when programming I use a much more limited vocabulary than when writing literary criticism. So while text to speech is total crap for handling complex literary phrasing it might be adequate for programming structures.

edgyquant · on Nov 10, 2022

I’m a senior/systems architect coming down with bad carpel tunnel and this sounds like a godsend

raylad · on Nov 10, 2022

Around 1998 I broke my collarbone and had to use Dragon Dictate.

I found that for general subjects it was quite difficult to use because of the fairly poor recognition rate.

But when I talked about computers, it got almost everything right. I assumed it must have been trained by the developers, who talked about computers mostly.

This is another special purpose vocabulary, so it seems as if it would have a good chance of a high recognition rate.

eurasiantiger · on Nov 10, 2022

It’s most likely just Cortana bolted on to Copilot.

rahulpandita · on Nov 10, 2022

GithubNext here! Just clarifying that this is not the case.

machiaweliczny · on Nov 10, 2022

Then it’s Whisper into command tree thing?

rpastuszak · on Nov 10, 2022

I don't use voice assistants any more due to privacy concerns but I wrote some similar software in 2010s. I'm fluent in English, but with the current tech, the success rate for me giving commands to a machine is still 50/50.

> I could never be productive programming like this.

It's likely to work much better than a generic speech-to-text model due to fine-tuning.

Plus, consciously or not, we will adapt our human language to the English-ML "pidgin" (e.g. by introducing a more efficient grammatical structures, using a specific subset of vocabulary).

The way I see it is that it's not much different from giving commands to your dog, writing a Google query, writing a Stable Diffusion prompt. It'll get better. Manual input is not as fast as speech though and that's where I see the issue.

atdrummond · on Nov 10, 2022

I am happy to take a severe deficit over not being able to work at all. When my back was acting up, I could not physically use my left side. Dictation was the ONLY way I could code. By the end of this period, my output was back up to 95% of my typed output - especially as I don’t type code nearly as fast as I do general language writing.

rahulpandita · on Nov 10, 2022

GitHubNext here! We would love to hear more about your experience. Please help us out by signing up for this experiment :)

mrtksn · on Nov 10, 2022

The voice interface experience(in general) so far is like trying to make a really stupid person do something for you. Out of context misunderstandings are the worst because it breaks your flow trying to understand why that happens and how to fix it.

I imagine that voice to code would be like standing over the shoulders of a junior coder who knows the syntax and some techniques just enough to follow orders but has no idea whats doing and when gets it wrong will be very wrong.

smartmic · on Nov 10, 2022

"Writing is thinking. To write well is to think clearly. That’s why it’s so hard." ~David McCullough

This not only holds for literature but also for programming. Concerning the hard part, I would argue that is the reason why it is not called "talking is thinking".

yi_xuan · on Nov 10, 2022

"If you're thinking without writing, you only think you are thinking." -Leslie Lamport

Even though now speech recognition rate is really high, but I wonder how many authors use speech to write articles. The comparison may make sense. And I think there's few.

mjburgess · on Nov 10, 2022

I think there's a difference between communicating your intent to a machine, which is hopeless since it has no model of intention; and commanding a machine to reproduce something.

Ie., when you're managing your house you want something that can be communicated in an infinite number of ways, but the "AI" accepts a tiny finitude of ways.

However when programming it seems like we arent asking the machine to "write a function to do X", but rather saying, "def open-paren star args...."

This seems like a pretty trivial problem to solve.

dustedcodes · on Nov 10, 2022

> However when programming it seems like we arent asking the machine to "write a function to do X", but rather saying, "def open-paren star args...."

Click the link first and take a look at what is being showcased, because your comment is the exact opposite of what they demo when you visit the HN link.

mjburgess · on Nov 10, 2022

You're right... So, yes, it will be largely useless (as shown) for actual programming.

But I suspect there'll be a subset of its features consistent with my comment that will be actually useful.

Programming, via Naur/Ryle, is always a kind of theory building. And unless you're basically copy/pasting, it's a novel theory of some area (, business process, etc.).

That's something where intentions arent even really communicable as such, since the art of programming is sketching possible theories as a way of finding out what we ought intend.

So this is another gimmik with maybe marginal improvements at the edges.

wizardofmysore · on Nov 10, 2022

It's really useful for those who have challenges typing (arthritis, disabilities etc..), perhaps not best for general audience as typing with auto complete is faster.

alvis · on Nov 10, 2022

For repetitive tasks like preparing a report in the demo, saying is definitely faster than typing. It's quite impressive if your boss ask you to prepare one and the report is done in less than two minutes.

However, I too really doubt if there's any better use cases than simple tasks, let alone everyone would hear what you ask the AI to do in the office. Oh my! How embarrassing am I?

danielbln · on Nov 10, 2022

The assistants (Google, Alexa, Siri) are not great at NLP. Compare how you speak to them vs speaking to a LLM like gpt-3, there is a world of difference. The latter feels like speaking to a human, the former more like your trying to get your voice commands into a state machine.

Kiro · on Nov 10, 2022

Blind people are already very productive using voice-to-code.

jfk13 · on Nov 10, 2022

There may well be examples of this, but while the blind developers I have known (a small sample, I admit) typically use screen-reader technologies to navigate and read code, they use a keyboard to write and edit it.

dustedcodes · on Nov 10, 2022

I don't disagree with that. I just meant I don't think it's going to have mainstream appeal. A wheel chair also makes a disabled person super productive if the alternative is not being able to go anywhere at all, but it doesn't make wheelchairs super appealing to people with healthy two legs if you see what I mean.

dkns · on Nov 10, 2022

I think this is great for: a) people who are visually impaired or have issues with their hands/fingers b) people who aren't programmers; if you could make it more Scratch-like then this is amazing tool for showing off power of programming

0-_-0 · on Nov 10, 2022

The mental load would reduce with practice very quickly

CrociDB · on Nov 10, 2022

they're creating a new job "prompt engineers" to replace the engineers. this is 2022.

ykonstant · on Nov 10, 2022

The rise of the... t...talking monkey? cognizes intensely

ggerganov · on Nov 10, 2022

Very interesting - I was sort of expecting it to happen soon.

I have been playing with using Whisper + Github Copilot in Vim [0]. The Whisper text transcription runs offline with a custom C/C++ inference and I use Copilot through the copilot.nvim plugin for Neovim. The results were very satisfying.

Edit: And just in case there is interest in this, the code is available [1]. It would be very awesome if someone helps to wrap this functionality in a proper Vim plugin.

[0] https://youtu.be/3flN9kTcZJY

[1] https://github.com/ggerganov/whisper.cpp/tree/master/example...

Sheeny96 · on Nov 10, 2022

I feel like this is being misunderstood - the long term view of this wouldn't be for code scribing, it'd be for non technical people to be able to instantly create things. Imagine being able to say outloud to your phone "hey, create me a view of all the weather data from the past year correlated against x and y in z view format". The code is the means, not the product.

bodge5000 · on Nov 10, 2022

> "hey, create me a view of all the weather data from the past year correlated against x and y in z view format"

Ironically, I think only technical people would even want to do something like that. The less technical you get, the more high level (and ambitious) you need to go.

You can see this a lot in game dev questions. Beginner questions will be "How do I make an MMORPG?" and the more advanced questions will be "how do I return x from y" or whatever, and then it scales between the two ends of the spectrum.

golergka · on Nov 10, 2022

There's plenty of technical people who don't code — like, in your example, a game designer.

makeitdouble · on Nov 10, 2022

Why would it be more successful than the past countless efforts to make that a reality ? (it could as well be, but why do you think so?)

As a more simple problem space, building programs from UML charts was one of Java's promise, and it failed miserably, not because the technology was lacking, but because it's just a damn hard problem.

As of now ee have nothing approaching "non technical people to be able to instantly create things" if the "things" you want are useful in any way.

joenot443 · on Nov 10, 2022

The big difference is that Copilot generates real, functioning code based off past implementations of real, functioning code. A no-code tool like Webflow is an entirely different product all together. The ML transformers powering Copilot are the secret sauce here, that technology just literally did not exist when Java and UML were all the rage.

I'd encourage anyone who hasn't tried it to give Copilot a try. There really has never been anything like it in my memory, and while I totally agree there have been dozens of efforts to allow non-technical people to generate code, I think Copilot may be on to something very special.

candiddevmike · on Nov 10, 2022

Copying functions and composing them into apps are two completely different things.

yi_xuan · on Nov 10, 2022

The problems are more than how to express.

You have to read code and debug it that is inevitable, you can't say that there will not be any bugs if you use voice instead of writing.

nfRfqX5n · on Nov 10, 2022

I think it’ll be better because Copilot is pretty good for typing and this builds upon that framework

wwilim · on Nov 10, 2022

People who can think of weather in terms of plotting and correlating it are usually technical enough to code

urthor · on Nov 10, 2022

Exactly.

The crazy thing is... this probably will work.

In 20 years. But, it probably will work.

There is absolutely no reason you cannot use a neural network to transcribe appropriately phrased requirements into an AST.

tovej · on Nov 10, 2022

There are several reasons this wouldn't work:

1. How do you check the output of the voice to code step? If you need as much expertise as you do now to actually review the code, then the voice to code step is just a layer that adds confusion

2. How would debugging work? Again, would you still need to be able to understand the code? Same issue.

3. What if you have to pause and think? This will affect how the voice to code interface interprets your speech.

4. How would you make a precise edit to your source audio using a voice interface?

5. How would you make changes which touch multiple components across the project? How would you coordinate this?

6. Precisely defining interfaces between components and using correct references to specific symbols is very difficult to do in natural speech, which typically uses context to resolve ambiguous references. The language you would be using would still have to resemble the strictness of a programming language even when spoken, but you have replaced a reliable checkable channel (input through keyboard, transfer as-as to text buffer, feedback from visual view of source) with an unreliable channel (input through microphone, transfer through complex signal processing and multiple neural network language models, through multiple representations, where you have to be able to check multiple representations for feedback about the structure of your program (initial speech-to-text step, text to source))

urthor · on Nov 10, 2022

Voice to code into Scratch. Novice voice coder then debugs a scratch-like extremely easy to debug application.

Voice to code into Scratch, plus 20 years of hard product development effort.

I'm not saying it'll be easy or work the first time. The Apple Newton failed miserably the first time.

I'm saying touch screens still worked out amazingly well two decades later.

If we can say anything about programming today, it's that programming today with 20 years of advancements is remarkably different to 20 years ago.

I fully expect the next 20 years to be similarly fruitful.

nuccy · on Nov 10, 2022

It may work. Since it may be just a new "programming language" (somewhat literally), i.e. a new level of high level abstraction. We already know examples of such transition to higher abstraction levels: binary code -> assembly languages -> c/lisp/fortran/etc -> c++/javascript/go/python/r -> np/torch/react/whatever frameworks/libraries. For an average programmer nowadays knowledge of frameworks/libraries is as important (if not more important even) as actual knowledge of the programming language they use. The only disadvantage of this is that people will need to adapt to something generated and updated via a machine learning. So far there are not much examples of that, except maybe people adapting to Tesla Autopilot with every new release. Before we were adapting to a new c++/python/framework version, in future there will be GitHubNext v1, v2 and v3 with known features and bugs.

nuccy · on Nov 10, 2022

The only problem with this being a next abstraction level, is that it actually leads to more "coding", because of general spoken language being less informationly dense as any programming language.

Before by switching from binary to assembly to higher level languages to frameworks/libraries, you generally reduce amount of "code" being written after each step, with voice programming this seems to be the opposite.

BiteCode_dev · on Nov 10, 2022

Before copilot, we were far from it.

But now, given how magical this thing is, it opens many doors to what's possible with no code.

I never really believe in anything no code before apart from Excel and RAD.

But basic tasks are going to get accessible to a lot of people sooner than I expected

urthor · on Nov 10, 2022

I'm almost certain a trained professional is still going to do a far better job than an amateur.

That part is unlikely to change much, if it all.

Trained professionals will always outperform the latest fellow dragged off the street.

BiteCode_dev · on Nov 10, 2022

That doesn't matter much.

You free part of the industry from needing professionals for some simple tasks, which is enough to empower users with a lot of possibilities, and focus pros on where they are really needed.

"Hey phone, next time mum send me a text about voting, you can send back a 'ok boomer'?" or "hey phone, can you setup a webpage that list my tiktok videos up to last year?"

Most people are not going to hire a pro for that, but we could end up with a AI general enough to be able to do that for users. It's ok if the result is not extensible, maintainable or modular.

tweetle_beetle · on Nov 10, 2022

From memory, there was a time (end of millenium?) when using voice recognition to write documents was the next big thing. There was a pricey bit of software for Windows that was popular with power users and they would spend hours training it to their voice.

Then it seemed to just die off. I don't think it was bad technology, because I don't think novelty value was enough to account for its popularity - you had to put hours in to get it to work well, it wasn't a casual toy.

What's changed since then in terms of technology? Unless it's very significant, I suspect it will go the same way. Apart from an assistive technology viewpoint, my gut instinct is that it's not that satisfying or rewarding talking to a computer all day.

junon · on Nov 10, 2022

Ah yes, Dragon NaturallySpeaking. Training for hours and hours and getting incredibly subpar results. It was a fun toy but there's a reason it didn't really take off in corporate settings.

tecleandor · on Nov 10, 2022

Dragon NaturallySpeaking is still alive at least in medical practice. Its Nuance Dragon Medical One product is fairly popular in some regions for medical report dictation as radiologists don't like to write them down (sorry ;) ). I've seen a Philips product in that field too. Seems like LG's TVs used their recognition engine for a while (don't know if that's still valid)

The story for the most mainstream-popular dictation softwares is kind of funny. Back in the late 90's there was Dragon NaturallySpeaking and IBM's ViaVoice. In early 00s, after a financial fraud and bankruptcy involving both the then current Dragon owners and Goldman Sachs, they got bought by Scansoft. Scansoft bought Nuance, began to use its name, and then got exclusive rights for ViaVoice (!) from IBM.

Now, in March this year, Nuance has been acquire by Microsoft.

junon · on Nov 10, 2022

Interesting, I didn't know the history there! Just remember playing with it as a kid :)

Aeolun · on Nov 10, 2022

The best recognition rates on the words ‘scratch that’, which you used every other sentence.

mjochim · on Nov 10, 2022

> What's changed since then in terms of technology? Unless it's very significant, I suspect it will go the same way. Apart from an assistive technology viewpoint, my gut instinct is that it's not that satisfying or rewarding talking to a computer all day.

Training data is now abundant compared to twenty years ago, and so is computation power. That means training can be much more complex now.

The underlying technology is now typically neural networks (broadly speaking), whereas twenty years ago it might have been Hidden Markov Models.

Overall, recognition quality, even without speaker-specific training, is now on a very different level than back then. Whether it’s considered good is a matter of opinion. But it’s significantly better than twenty years ago.

rommel917 · on Nov 10, 2022

WFH become more common. You cant program with voice in office full of people who also program with voice. At home you can multytask work with voice while doing home stuff with hands. You just need projector do display on wall while you cook or assemble IKEA table.

JW_00000 · on Nov 10, 2022

I cannot even concentrate on reading a text while the radio is playing; let alone programming while assembling furniture.

Closi · on Nov 10, 2022

> What's changed since then in terms of technology? Unless it's very significant, I suspect it will go the same way. Apart from an assistive technology viewpoint, my gut instinct is that it's not that satisfying or rewarding talking to a computer all day.

It's hugely significant - look at this graph of Google's speech model accuracy across 2013 to 2017:

https://sonix.ai/packs/media/images/corp/articles/history-of...

Or this that shows a similar pattern:

https://cdn.static-economist.com/sites/default/files/externa...

danjc · on Nov 10, 2022

Unfortunately 95% isn’t a lot of nines

Closi · on Nov 10, 2022

Human transcription accuracy is just two nines :)

arez · on Nov 10, 2022

one thing that changed that you can see in a demo, that it's speech recognition paired with a neural network. He doesn't have to say "write titanic = titanic.dropDuplicated()" he just says, "drop duplicates from titanic" so you "in theory" have to write less. It probably falls apart when you have more complex things to write and then you have to fallback to speaking it out word for word, but it is an interesting development

xigoi · on Nov 10, 2022

There is a reason we don't have many programming languages where you can say “drop duplicates from titanic”.

tlb · on Nov 10, 2022

Programming languages need to be unambiguous, but voice interfaces to them don't have to be. With a voice interface, you can say something vague, see what it gets translated into, and either click Run or edit it.

arez · on Nov 10, 2022

what's the reason?

xigoi · on Nov 10, 2022

Natural language is ambiguous and doesn't represent nested structures well.

cdrini · on Nov 10, 2022

Main things that have changed are:

1) Improvements in speech to text, as others have mentioned

2) Improvements in language models (and model size) allowing for more flexible interpretation of speech. This isn't dictation anymore. It's more like instruction. You don't have to tell the computer exactly what to write, you tell it in much more broader terms. Eg "pull this out into a function". Or "delete the cookie before creating the transaction". Or "lint the file".

That's my guess anyways! This mostly feels like a voice interface to Copilot in a lot of ways. Can't say whether it'll be effective, but I'd love to be able to program while I'm e.g. on a stationary bike!

Double_a_92 · on Nov 10, 2022

Isn't that a very niche application simply because it's voice-based? I.e. it can only be used if you are alone in an office, otherwise you would be annoying your coworkers and the voices would get mixed up.

Aeolun · on Nov 10, 2022

The only time I’ve ever successfully used voice recognition was teaching skyrim to recognize my words of power. Shouting fus-roh-dah ! was incredibly satisfying.

kreddor · on Nov 10, 2022

It might have been Dragon NaturallySpeaking. I remember toying around with it 20 years ago or so. Apparently it has just been bought by Microsoft.

tjixxu · on Nov 10, 2022

I have good memories spending 4+ hours training Dragon to end up with what seemed like 30% accuracy.

mmikeff · on Nov 10, 2022

I sometimes use voice recognition in Notes on my Mac to write up my meeting notes, but I find that my waffly speech results in very verbose notes. Also still quite a few mis-hears where I read my notes back later and have to work out what I was actually saying.

Can I have transcription that can then turn my rambling into neat and concise prose?

bagels · on Nov 10, 2022

Many people dictate messages on their phones. Doctors use it extensively.

towawy · on Nov 10, 2022

Not a doctor, but I also started to using the dictation feature on my iPhone more and more recently. It's often more convenient than typing when I'm walking (which I do a lot) and the pickup of voice messaging, talking to your phone like it's a mic, made me more comfortable doing it in public.

msaharia · on Nov 10, 2022

Dragon speech? I used it quite a bit 10 years ago!

chocolatkey · on Nov 10, 2022

If this works well, I would pay a seriously high amount of money. My daily coding time is currently limited by the pain in my hand/fingers that eventually becomes too uncomfortable, and I have to wait for a "cooldown" period of days to "reset" my hands back to normal. I can't even code on a normal keyboard or trackpad for a long time anymore.

The problem with current voice programming systems is they're just too slow so I end up getting impatient and using my fingers anyway

eajakobsen · on Nov 10, 2022

I imagine you have done extensive research on your own, but in any case I found this article by Josh Comeau on coding with voice commands and eye tracking very interesting: https://www.joshwcomeau.com/blog/hands-free-coding/

langsoul-com · on Nov 10, 2022

I wonder if the quest pro could do the eye tracking as well? It has pretty extensive eye tracking cameras, not sure how precise they're though.

Could it also do a virtual keyboard, but a custom layout to not trigger arm, elbow and hand pains?

atonalfreerider · on Nov 15, 2022

We built a VR prototype a year ago for voice to code using Codex in VR: https://www.youtube.com/watch?v=icHLoxOFerk

ron22 · on Nov 10, 2022

If you want to code with your voice, also checkout https://github.com/cursorless-dev/cursorless

Hesinde · on Nov 10, 2022

Have you tried an organ MIDI pedalboard and a script to translate MIDI to keystrokes? You could also put a micro controller between the pedalboard and your computer so that it looks like a normal keyboard to the computer. I do not know whether that would be pratical, but some sort of feet keyboard is in my idea space for what if.

rahulpandita · on Nov 10, 2022

GitHubNext here! We appreciate your support. Please consider helping us by signing up for the experiment on the website and providing feedback. :)

langsoul-com · on Nov 10, 2022

I'm assuming you have an ergonomic keyboard? If so which one?

tgv · on Nov 10, 2022

And a movable monitor, decent chair and desk at the appropriate height...

Comevius · on Nov 10, 2022

An ergonomic keyboard layout like Workman doesn't hurt either. QWERTY was made with typewriter jamming in mind, sacrificing ergonomics completely, especially for English.

chocolatkey · on Nov 10, 2022

Ergodox EZ

FloatArtifact · on Nov 10, 2022

The challenges that remain in speech coding is not generating code as much as it is navigating through existing code or an application.

There's only two ways to do this effectively and unfortunately no one has taken the true path to accessibility. The more common way is plugins/extensions to grab a information from the editor.

Accessibility is more than just one editor. It's the OS and all the applications. Microsoft needs to take the hard route to make an accessibility UI automation server to grab that information and only make up the difference through plugins as needed.

It's all about grabbing information from the application and generating on the fly commands, not just parsing free dictation in order to get the best accuracy.

It takes a lot of expertise to make any sort of UI automation, fast and efficient for navigating and selecting text or out of focus menu items.

I've fussed around and managed to get tree sitter to navigate across code. For example generic commands are like 'next function'. Code simply isn't pronounceable when it's written by others. Therefore, navigating across generic tokens is really the best method. Then other methods can be used for fine navigation if needed.

My hope is that they develop a grammar system that is open source and integrates with accessibility frameworks focused on performance.

I wish I could have a phone call with the development team.

skydhash · on Nov 10, 2022

I think an accessibility a la vim or with something like tree sitter, would help immensely like:

  “Top of file
  Down 5 lines
  Modify import source to …
  inside first class
  Down 5 methods
  Insert new method after
  Inside arg list
  Append arg named … of type …
  …
  …”

And add a way to indentify types and parameters with special pronunciation.

FloatArtifact · on Nov 10, 2022

I recognize that all applications are not accessible through accessibility APIs. However, there is no high level access to accessibility APIs. There are quite a few for automated testing UI. However, none of them are performant enough for speech to code or screen readers. Testing automation frameworks don't really require high performance.

Accessibility accessing the content of the application and the context is what's important. It's more important than the speech recognition backend.

Speech recognition shines work best with a narrow context. (when those commands are available)

The type of performance we need as a speech recognition community and screen reader community is quite high. By the beginning of speech and just before decode time information needs to be available to be parsed for navigation/editing. That way these tokens can be weighted as commands for recognition.

Commands could be modeled after vim functionality though.

Outside of tree sitter it would be interesting to hook into hooking into as a client a language protocol server. However, I think they only expect one client. In addition, I still see that as a lesser approach without dedicated support for high performance UI automation server for speech recognition engine to leverage.

FloatArtifact · on Nov 10, 2022

Yes, minimizing number of command and specificity as much as possible for navigation by understanding the context of where the user is optimizes the user's time in navigation.

Imagine even more precise commands 'next function' followed by a letter. That allows you to navigate to only a function with that letter defined. Really the possibilities are endless when we have complete context of the screen and the structure of the code.

Someday I hope for the release of something like stable diffusion for voice coding. An open complete pipeline that users can illiterate fast and innovate!

Cort3z · on Nov 10, 2022

However weird and seemingly useless this might appear to the normal programmer on here, I see this as a huge accomplishment and an incredibly important tool. Why? Accessibility.

Let’s hope that I never get in a serious accident or get an disabling disease, but if I do I am not planning on giving up coding. What would you do if you lost your hands, or became permanently paralyzed. This is the tool we need to combat that. Hats off to github on this one.

bamboozled · on Nov 10, 2022

People who cannot see or use a keyboard already use tools like this to code. Been doing this for a long time.

MauranKilom · on Nov 10, 2022

Related: https://www.youtube.com/watch?v=MzJ0CytAsec

It does look like we've made some progress in the 15 years since. I do wonder how this would work in an office setting though - so much noise, so much distraction, and so much crosstalk between programmers...

avian · on Nov 10, 2022

> I do wonder how this would work in an office setting

Everyone gets a throat mic and the cubicle farm is full of unintelligible whispering instead of clacking of keyboards? Can't wait for the future. /s

Hortinstein · on Nov 10, 2022

Hahahah thank you for posting this, I was about to go look for this because I remember being in tears laughing when I saw it this first time and immediately thinonof this whenever I see voice controlled things

geewee · on Nov 10, 2022

Having programmed and navigated my PC via voice exclusively for about 6 months, done a ton of research and written several articles about it and what options are out there [0][1], I think might be pretty ground-breaking stuff.

Inputting code with voice is generally difficult, often due to variable names, casing, punctuation etc being hard to get right in voice-to-text. I think this might help quite a lot with that.

_However_, some of the hardest things in voice coding isn't necessarily just the input. Navigating large codebases is hard, and particularly editing existing code can be extremely difficult, probably much more difficult than just inputting new code.

I have my doubt that with the demonstration shown here, that it's able to make complex editing tasks simple, but if it does - I cannot overstate how huge of a leap forward it is.

[0]: https://www.gustavwengel.dk/state-of-voice-coding-2017/ [1]: https://www.gustavwengel.dk/state-of-voice-coding-2019/

jovial_cavalier · on Nov 10, 2022

>Having programmed and navigated my PC via voice exclusively for about 6 months...

I'm curious, why have you done this?

FloatArtifact · on Nov 10, 2022

I can't speak for his use case. However, people with medical conditions like RSI, stroke or anything that limits their action between keyboard and mouse.

However, the average developer doesn't need those fine-grained navigation controls but can still benefit from enhanced input through voice. Some have mental disabilities who interface differently. Others are simply supplement their input as an average developer by voice as a preventative measure for repetitive strain RSI. The day the hope is develop something that every developer could see the value and leverage. In a way accessibility is for everyone.

In general I see accessibility as a hierarchy that could benefit everyone. Accessibility APIs, close to real time OCR, Eye tracking, alternative inputs (eg pedal, touch pad, stylus) allowing for the broadest possible input and APIs to extract information from applications. Extraction of information from applications and input to applications allows the user to specialize for their use case.

My experience as people will become experts in voice their command vernacular shortens as they carve out their niche use case. It goes beyond singular shortcuts too series of actions to get stuff done. However, what really means to happen is voice systems need access to the OS and to the application to really shine. That would empower not only navigation for those that are disabled but context-specific commands that are intuitive and abstracted like next function or parameter.

geewee · on Nov 16, 2022

I had very bad RSI

birriel · on Nov 10, 2022

In the meantime, Talon is pretty good. You can use Vim motions and commands as you normally would, except using your voice (this applies to any editor, really):

https://talonvoice.com/

willjp · on Nov 10, 2022

Talon is exceptional, I only wish it was more natural to drive cli commands, I find I need to spell them out which I’m still quite slow at.

pfd1986 · on Nov 10, 2022

I think commenters here are -- as usual -- missing the point. This is the training ground (literally) for better models able to respond to commands like "take the CSV from me desktop, plot columns A and D and check if the KL divergence os close to zero". And from that to more complex tasks. You always need the first step and this is it.

I'm bullish.

BiteCode_dev · on Nov 10, 2022

Exactly.

Copilot is getting better everyday, because it's learning from the way we are using it.

rahulpandita · on Nov 10, 2022

GitHubNext here! We appreciate your support.

rahulpandita · on Nov 10, 2022

GitHubNext here! We appreciate your support.

onion2k · on Nov 10, 2022

I've tried writing documentation and fiction using text-to-speech and, for me, it doesn't work because the apparently the of my part brain I use to think about what I'm going to say is the same part I use to actually say it, so I can't do both things at once. I end up writing far more slowly than I can type.

singularity2001 · on Nov 10, 2022

In case anyone else stopped after watching the video, if you scroll down a bit further you see the list of

FEATURES

Write/edit code

Just state your intent in natural language and let Hey, GitHub! do the heavy lifting of suggesting a code snippet. And if you don't like what was generated, ask for a change in plain English. Go to the next method

Code navigation

No more using mouse and arrow keys. Ask Hey, GitHub! to...

    go to line 34
    go to method X
    go to next block

Control the IDE

"Toggle zen mode", “run the program”, or use any other VisualStudio Code command.

Code Summarization Don’t know what a piece of code does? No problem! Ask Hey, GitHub! to explain lines 3-10 and get a summary of what the code does.

Explain lines 3 - 10

susrev · on Nov 10, 2022

All i could think of while looking at this was having to tell Siri where every comma and period should go while texting with it.

"insert curly brace", "insert semicolon", "insert insertion", etc. does not sound to fun.

pmontra · on Nov 10, 2022

My reactions to the demo (when all is good there is no reaction, so here are only the problematic ones, sorry)

1) import matplotlib.pyplot as plt

Why "as plt"?! Let the import alone. But this is a matter of style.

2) Get titanic csv data from the web [...]

Surprise, it turns out that "the web" is an URL on raw.githubusercontent.com Hopefully I'll be able to spell an URL of my choice

3) clean records from titanic data where age is null

Somehow I already know that there is an Age field and somehow it knows that it must capitalize age into Age

4) fill null values of column Fare with average column values

The generated code looks great but somehow I managed to spell a capitalized Fare this time :-) (this is probably a typo in the demo)

5) Hey,Github! New line

Inserting a new line can't take so many words. We're going to do without new lines or rely on a formatter or something equivalent.

6) plot line graph of age vs fare column

This is where it becomes evident that there was no need to import as plt because I'm not pressing those keys anyway. But this is style and it's going to be uniform across all the users of these tools.

7) Hey, Github! Run program

Good.

Considerations:

A) Why do commands (new line, run) need "Hey, Github!" which is pretty long and terrible to repeat all the day long (just imagine having to say Hey Joe every time we have to say a sentence to Joe, withing a long conversation with Joe) and text-to-code doesn't?

B) We got a graph at the end. Now what should I do to edit the code in those 99% of cases where I got the graph wrong? An acceptable answer could be mouse and keyboard. It's a little underwhelming but voice to code already gave me the structure of the code.

C) Does that mean that Microsoft and GitHub are going to know all the closed source code we'll write for our customers (there might be contractual implications) or is this something that will be self hosted in our machines?

rahulpandita · on Nov 10, 2022

GitHubNext here! Here is a little writeup that explains a bit more about the project https://github.com/githubnext/githubnext/tree/main/HeyGitHub

Hope this is helpful :)

hcnews · on Nov 10, 2022

To note, there's a class action lawsuit against GitHub Co-Pilot since it learns from a bunch of open source code with very specific licenses. It's very interesting from establishing copyright in an AI training perspective. Hopefully it goes the distance and some nuanced arguments come out in the court case.

https://www.theverge.com/2022/11/8/23446821/microsoft-openai...

nightski · on Nov 10, 2022

Spoken language is incredibly ambiguous. It's one thing to generate a drawing which can vary wildly in output and still be acceptable. It's another to specify something precisely to a computer. Working with non-programmers on a daily basis it is incredible how difficult it is to communicate even relatively simple things without confusion.

So all the more power to them, but I am very skeptical. Especially since co-pilot has zero knowledge of the formal semantics of programming languages.

This is a lot different than the half ass auto complete that it already does since that at least has some context.

tluyben2 · on Nov 10, 2022

It's the same with copilot; you have to know how to implement things to implement things with copilot (for the most part), but when you are a programmer and you could write the code, then you know the prompt to write to generate 10+ lines of code for 1 comment of text all day long. Especially for data transformation, copilot has been a real magic tool; if you put in a comment:

      /*
      this functions transforms this json from: 

      { 
          ... some complex structure in json 
      }
 
      to this json: 
   
      {
          ... some different structure in json 
      }
 
      */

... copilot comes up with the function that takes in the first and spits out the latter. Even if the fieldnames do not match etc, it usually 'guesses' right what fits on what (so it does have some context from it's learning phase what 'looks alike' or 'might be the same thing'. Example: I had a structure with firstName: string, lastName: string and a target structure with name: string; it just did name: firstName+' '+lastName, which was indeed what I wanted. But it comes up with more intricate stuff as well that is pretty much surprising (too human basically).

What is another bonus; if you generated function transfromAtoB(a: A) above, then you only have to do:

      /*
      do the reverse of function transfromAtoB, accept json structure B as input and return structure A
      */

And it'll come up with the reverse.

It's not hard to write yourself, but it's boring and error prone (some of these structures are huge). Now I press tab a bunch of times, and run the tests to see if it worked. I am also not that worried i'm infringing someone's open source code; this is all way to custom to look like anything else. That's where this shines; things where it verbatim copies something, you should've been using a library anyway.

Statically typing and using typescript definitely works better than other combinations I have tried (C# was pretty bad last I tried it, JS is good but often subtly wrong because of type issues).

singularity2001 · on Nov 10, 2022

With copilot ambiguous language gets transformed into concrete syntax. If the implementation doesn't fit your ambiguous request, you should be able to refine … with ambiguous language. Theoretically this would create a "programming dialog" environment.

nightski · on Nov 10, 2022

So you are going to have to verbalize something, interpret the code, build a mental model of how it works, and then if it does not match what you want go back to step 1?

That sounds exhausting when we have spent countless human years developing languages which let us communicate our intentions precisely to computers.

If you don't do this there is no ambiguity detector. Meaning it's entirely possible for the computer to interpret what you are saying completely different than intended, yet it is a perfectly valid interpretation. So the only one who can qualify if it got it right is you.

jasonlfunk · on Nov 10, 2022

I probably wouldn’t use this to write code, but I could see it being really useful for navigating around a project.

“Go to line 35” “Open the model controller” “Show the get method and set method side by side”

anshumankmr · on Nov 10, 2022

if you remember the keyboard shortcuts, you can be quite fast while working with VSCode. Voice will never be perfect.

amarant · on Nov 10, 2022

Oh cool, my brother used to wish out loud something like this existed a few years back when his wrists were really killing him. He's wrists were so far gone he couldn't even type on a ergonomical keyboard for any greater duration of time, so he used to wish he could just talk instead.

For me, I got a ergonomical keyboard before my wrists went bad, and so far they seem to be holding up!

Moral of the story: get a good keyboard early, or you might need a tool like this one someday!

wooptoo · on Nov 10, 2022

Hey Github what did the previous developer actually _mean_ with this piece of legacy code?

evnix · on Nov 10, 2022

Eye strain is one reason I have been waiting for something like this. If I could close my eyes and just navigate the codebase through a mental modal and some voice commands, I really wouldn't mind paying!

I have looked at some tools for the blind, but you need just way too much dedication for it to work for you and since you have working eyes it is usually easier to just open your eyes.

glenjamin · on Nov 10, 2022

There was an excellent talk at Strange Loop a few years ago by Emily Shea about how she'd learnt to code vim using her voice to combat RSI.

https://www.youtube.com/watch?v=YKuRkGkf5HU

The demos are in Ruby, but I could imagine that languages with strong type-aware auto-completion could be easier to do.

philmander · on Nov 10, 2022

This is effectively a new higher level programming language without a fixed syntax. Describing more the "what", not "how", and being much closer to natural language over computer language.

The voice part seems like an (albeit important) accessibility add on.

I'm sure it won't be perfect but an amazing step forward in the evolution of programming languages

meowface · on Nov 10, 2022

I could be wrong, but I think (minus editor commands) much of this can be emulated in existing Copilot by writing a comment symbol followed by natural language. I wouldn't even be surprised if under the hood "Hey, GitHub!" is basically doing exactly that with the voice input.

silverlake · on Nov 10, 2022

I’m working on something similar. The target market is the 99% of people who want to program ad-hoc domain-specific problems. For example, generating charts w/o having to dig through all the data sources (Wolfram Alpha does a simple version of this). Building a financial risk model for a client’s specific request (you have to be a whiz at Excel, python or some internal ide). Even for home automation, my mom can’t use Alexa’s awful app to customize routines.

I don’t think the voice part is necessary. It’s easy enough to slap ASR on the front. But going from natural language -> full problem spec -> code is hard in the general case, but doable in well-understood domains. Why can’t Scotty talk to a computer? (https://youtube.com/watch?v=hShY6xZWVGE&feature=share)

lakomen · on Nov 10, 2022

Imagine sitting there, talking to your computer, and trying to get the notations right.

If err unequal nil opening bracket, no no don't open the racket opening bracket... BRACKET, do you know what a bracket is No don't do a do while, delete delete. Don't delete everything... sigh

Well something like that, I imagine it being a very painful experience.

dorkwood · on Nov 10, 2022

Maybe you could use little clicks and pops with your mouth to signify different characters. The computer could learn which one is which. That way instead of typing

"if (int i = 0; i < count; i++)"

you could say something like

"if beep int i click zero boop i bop count boop i pop pop zing"

This would achieve the same thing, but much faster and with less effort than typing.

xigoi · on Nov 10, 2022

If saying that is faster than typing it, you're really slow at typing.

3D30497420 · on Nov 10, 2022

Or debugging. Goodness. I can only imagine.

Quequau · on Nov 10, 2022

I remember a talk given some years ago by a man who was using voice to text for creating source code. The key point I remember from his talk & demonstration is that it was not casual ordinary speech but instead a very weird mashup of sounds intended to represent the various symbols which we use in source code.

simme_ · on Nov 10, 2022

I think you're talking about this video: https://youtu.be/8SkdfdXWYaI?t=1049

Quequau · on Nov 10, 2022

Yes! That's the talk.

nxpnsv · on Nov 10, 2022

GitHub is doing a whole lot. I think I prefer to edit my code in an editor, not on the website where it's hosted. And I think I don't want fancy AI driven code editor features using my code either. But I guess it is nice they are considering solutions for vision impaired users.

raidicy · on Nov 10, 2022

I really hope this is very easy to use. I have severe RSI and can barely surf the web. I tried using other voice to code stuff and it just hurt my voice so I'm hoping I can speak very naturally. I'm really looking forward to seeing if this can help me code again.

pcj-github · on Nov 10, 2022

I could see it being useful for things like "goto line 42" or "rename this file as...", or very simple things like that, otherwise, I don't want the cognitive overhead of having to translate coding intent through a voice interpreter.

falcor84 · on Nov 10, 2022

I think this, or a future version of this, would have real potential.

I'm thinking about this in terms of the navigator-pilot pair programming approach, and believe that as a senior, if it's even half-as-good as working with a fresh out of uni hire, then it could have real value. When there's a piece of code that I would like written, when I have good test cases in mind, but would prefer to offload it on someone, I could perhaps write the test cases and function signatures (maybe with the bot's help), get the bot to fill in the blanks until it passes the tests, and then give it direct feedback on how to refactor the code.

I've signed up for the waiting list and am excited to try this out.

knutzui · on Nov 10, 2022

What you are describing is more akin to what GitHub Copilot already does. It is really good at taking a description and a function signature and producing a solution. Paired with a solid test suite it can definitely speed up development in my experience.

kgrax01 · on Nov 10, 2022

People can’t seriously believe this is going to be useful at all?

I can see this helping as an accessibility tool, but beyond that I don’t think it will be useful. This kind of assumes you know everything about what you’re doing, most of the time you don’t.

boredumb · on Nov 10, 2022

As someone who works remotely from home, the last thing I need is to start babbling to myself in code for 8 hours a day. I imagine that's a one way ticket to developing some sort of disorder.

ddevault · on Nov 10, 2022

Someone emailed me the other day to share their FOSS voice control system. I was really impressed. It seems to map syllables onto actions in a modal sense ala vim. If I were to build a voice control system, it would look much like this.

https://numen.johngebbie.com/index.html

It's free software, it's local to your machine, you don't have to sign up for it, and it works today.

okasaki · on Nov 10, 2022

Great for accessibility, but I don't see this would work well in an open office, or even at home if other people are around. Seems really annoying.

tempodox · on Nov 10, 2022

Imagine using this in a setting where you're not alone in the room. Imagine using this surrounded by other developers who do the same.

danwee · on Nov 10, 2022

Curious: In "Clean records from titanic data where age is null", how does it know that the age field is exactly `Age` and not just `age`? You cannot know this without examing the data set (the headers), so is the software inspecting the loaded CSV "on the fly" before us telling it to actually execute the code?

kevmo314 · on Nov 10, 2022

Why are all the comments here so negative? Maybe typing is a hard sell, but some of the navigation stuff seems quite useful. Even being able to invoke VS Code's command palette would be really cool with this. Something like "Open Dockerfile" would be useful and maybe faster than typing.

lkrubner · on Nov 10, 2022

My worst prediction ever was at the end of my book, when I struck a positive note about voice interfaces. The startup I was at in 2015 had the pitch "Let your sales people talk directly to Salesforce" and we pushed the limits of what we could do with NLP. That particular startup had spectacularly bad management and so it flamed out in a series of screaming, raging fights, which I documented here:

https://www.amazon.com/Destroy-Tech-Startup-Easy-Steps/dp/09...

But at the end of the book I struck an upbeat note, about how the technology was advancing quickly and within 3 or 4 years someone would achieve something much greater than our own limited successes.

But I was wrong. 7 years later I'm surprised at how little progress there has been. I don't see any startup that's done much better than what we did in 2015. Voice interfaces remain limited in accuracy and use.

hintymad · on Nov 11, 2022

So this is a frontend of Copilot. The example of "import pandas" getting translated into "import pandas as pd" is pretty convincing, as the tool helps developers to state their intentions. On the other hand, "hey, github, a new line" kills me.

lleontop · on Nov 10, 2022

We have come a long way. I remember when announcements like this one were done by companies on April 1st!

squarefoot · on Nov 10, 2022

If translation is semantic and not literally identical, chances are that the user asks for a piece of code and it outputs something that is 100% identical to code that is copyrighted elsewhere. Big "blame the AI" legal loophole waiting to happen?

karmasimida · on Nov 10, 2022

Actually would be useful.

If this is reliable I would pay to use it to some capacity, like add an argument.

crucialfelix · on Nov 10, 2022

I spent half an hour today trying to convince the O2 voice agent to get me a real person. Conversational AI is a special kind of hell filled with unhappy paths.

But for a glimpse of the future watch The Expanse or read William Gibson's Agency.

darepublic · on Nov 10, 2022

Execution is everything with this. I've wanted something like this so I could actually code while performing other activities or in various states of intoxication. Don't code and drive. Don't drink and code

Tade0 · on Nov 10, 2022

I hope to see the click consonant "‖" adopted as "||" one day.

tabasselejambon · on Nov 10, 2022

Let's try to picture the noise in an openspace full of people using that ... focusing is going to be difficult, well at least for people like me who are easily distracted by background noise/conversations.

troelsSteegin · on Nov 10, 2022

What if the code in question is a DSL? Something say that is syntactically python, but with a namespace defined through a narrow set of imports. This would be interesting to explore for end-user scripting.

mtkhaos · on Nov 10, 2022

Nice attempt and interesting workflow using a prompt based transformer. I would prefer being able to spawn a command palette and skip over the voice, alongside having the choice between different variations.

gopheryourshelf · on Nov 10, 2022

Imagine an office where everyone is sitting screaming at their computer.

P5fRxh5kUvp2th · on Nov 10, 2022

Programming Perl with speech recognition (an oldie but goodie)

https://www.youtube.com/watch?v=vPXEDW30qBA

mindvirus · on Nov 10, 2022

This is awesome. I could see using this to write code on my phone even.

manesioz · on Nov 10, 2022

Interesting. I would find this annoying because its so different from what I'm used to, but the potential it has for people with disabilities is huge.

kdmytro · on Nov 10, 2022

This is not going to play well with open-space offices.

WormholeCreator · on Nov 10, 2022

it is not practical if we have to describe each and every line.

Also, imagine you are sitting in an office with other team mates - what happens if all of them talk together but are working on different projects. It will disturb others in terms of noise pollution.

but it will definitely be a fun project and might work perfectly when you are working alone from home.

iillexial · on Nov 10, 2022

Those who say it's useless, what do you think about blind people using this, or those who couldn't type?

dimazhlobo · on Nov 10, 2022

Why does the oauth scope requires to “operate on your behalf” but the app is “not owned or operated by GitHub”.

:/

qntmfred · on Nov 10, 2022