* Line-of-Code Completions - Kite's completions engine can now predict several tokens of code at a time, powered by the most sophisticated AI code models available.
* Cloudless Processing - Kite now performs all processing locally on users' computers, instead of in the cloud. No need to upload your code to our servers. You don't even have to sign up for a Kite account.
We know privacy is a big concern for users, so that's why we decided to bring Kite off the cloud. You can learn more about this decision as well as the full Kite release on our blog post, linked here.
Our core belief is programmers spend too much time on repetitive work like copying and pasting from StackOverflow, fixing simple errors, and writing boilerplate code. That's why Kite uses AI to make writing code less repetitive and more fun.
Speaking of fun, we've set up a playground for you to try our Line-of-Code Completions out in your browser. We hope you enjoy it!
As always, Kite is free to download and use. And we no longer require user accounts now that we've moved off the cloud.
If you already use Kite (thank you for your support!), you now have these features via auto-update.
We're really looking forward to your feedback. The detailed feedback we've received in the past has been immensely helpful in getting us to this point. We'll be here all day to answer questions, too.
I am interested (and I bet more people here) in getting a comprehensive list minus the marketing BS.
It may be useful to differentiate between what data we collect and what data we don't collect in order to clear up any confusion.
1. What we collect
We collect a variety of usage analytics that help us understand how you use Kite and how we can improve the product. These include:
* Which editors you are using Kite with
* Number of requests that the Kite Engine has handled for you
* How often you use specific Kite features e.g. how many completions from Kite did you use
* Size (in number of files) of codebases that you work with
* Names of 3rd party Python packages that you use
* Kite application resource usage (CPU and RAM)
You can opt-out of sending these analytics by changing a Kite setting (https://help.kite.com/article/79-opting-out-of-usage-metrics).
Additionally, we collect anonymized "heartbeats" that are used to make sure the Kite app is functioning properly and not crashing unexpectedly. These analytics are just simple pings with no metadata, and as mentioned, they're anonymized so that there's no way for us to trace which users they came from.
We also use third party libraries (Rollbar and Crashlytics) to report errors or bugs that occur during the usage of the product.
2. What we don't collect
* Contents (partial or full) of any source code file that resides on your hard drive
* Information (i.e. file paths) about your file system hierarchy
* Any indices of your code produced by the Kite Engine to power our features - these all stay local to your hard drive
Just like any other service, there's no guarantee that you won't start collecting snippets of source code or other metadata to start selling once the VC's start applying pressure to generate income. What's your strategy?
Shouldn't this be opt-in according to GDPR?
> * Line-of-Code Completions - Kite's completions engine can now predict several tokens of code at a time, powered by the most sophisticated AI code models available.
Won't this result into copy-paste-like functionality of StackOverflow? This seems great for small projects, but doesn't look useful at for anything else it seems. I'm skeptical that Kite provides anything of value compared to all of the other tools I have.
The website has mentioned that other languages "are coming soon", though it has said this since the last Kite scandal. Are these just empty words?
No, because it isn't personal data.
That is a huge step towards making us comfortable with the potential privacy aspects.
Do you collect any information?
And how will you make money? (I want you to be around, pay staff, do well etc)
TabNine is a different approach that uses far less semantic information than Kite. It really shines when you're talking about syntactic repetition, e.g. if you check out the screenshots on their homepage. They have a page about semantic completions, but the semantics are very shallow -- basically what attributes are on the instance you're accessing.
In contrast, we've spent ~50 eng-years semantically indexing all the code on Github, building statistical type inference, and rich statistical models that use this semantic information in a very deep way. The result is that Kite can help more often, in ways that reflect a deep understanding of the semantics of the code you're writing.
> If you're already a user, Kite has been auto-updated and is now working locally without sending code to the cloud. If you previously uploaded code to our servers, you can remove your data via our web portal.
> We're grateful to our users for helping us reach this point.
Why aren’t you scrubbing all surreptitiously uploaded data proactively?
Why are you pretending you are glad you got caught doing sketchy things?
You’re going to have to do a lot better than that to convince us that your company suddenly operates with integrity now simply because you got caught.
Do you have data backing this core belief up or is it just an intuition? I don't feel any of these things are problems I need a solution for. Maybe this depends on the type of programming someone does or the language they use but as a working programmer this is not a compelling sales pitch to me. You're offering solutions to problems I don't have. They're also not problems that I see to any great extent in junior programmers who report to me.
More concretely, the StackOverflow thread for "Parsing values from a JSON file?" has almost 3 million views. We take this as evidence that programming is repetitive.
(We should probably clarify that when we say 'repetitive' we mean on a global scale, not an individual scale.)
I've always avoided dynamically typed languages because of these types of issues so I guess this seems like a solved problem to me - just use a statically typed language. Clearly there are many people who are using dynamically typed languages for whatever reason though so I can see how better tooling might be useful to them.
> More concretely, the StackOverflow thread for "Parsing values from a JSON file?" has almost 3 million views.
I certainly make use of Stack Overflow, I was taking issue with the idea that significant time is spent copying and pasting code from the site though. I often refer to it to answer a programming related question but I almost never then copy and paste code. I'm looking for a pointer to a library or API, to understand some confusing or poorly documented behavior or for a workaround to a bug etc. and getting an answer to my question rarely leads to copying and pasting code in my experience.
I've been using Python on a daily basis for close to a decade now, I can probably count on my hand how many times a bug has been because of casting a string to an int.
Sounds like you will likely be introducing bugs into people's code. Or at the very least, regressing code quality towards the mean.
Moreover, there are common errors you can avoid if their ranking algorithm discourages code reported/identified as bad.
With a keyboard and mouse and large monitor, you can display the completion prompt and let the user select or ignore from the various options with Ctrl-Space or similar.
When "Select the first autocomplete over what I typed" is the default behavior, yeah, that sucks. I often end up using the "key combo" Space,Backspace,Space to accept the autocomplete on my phone.
But I definitely see utility in offering a correction for, eg. 'DateTime.Format("yyyy-mm-dd" -> yyyy-MM-dd' or any of the million other idioms that we have to remember or look up. What I don't understand is how Kite can offer useful local-only suggestions - Is the default dictionary that comes with it preloaded with a million lines of open source, or is it literally just my code being suggested?
The intuition here is that in natural language, context is defined locally. If you want to know whether a 'they' refers to a male or female, for example, you look at the nearby text. In contrast, the flow of data and control through code is highly non-local, which is a lot of why techniques like N-grams don't work.
This has been a very active area of research in academia since we started Kite in 2014. (Suggested google searches: [big code], [ml on code], etc.)
All that said I don't have data on how our approach would compare to N-grams, but I'm guessing if you look at some of the academic research, you'll find that NLP techniques were abandoned, despite the early papers' focus on them.
Did you publish any papers on how you approach intelligent code completion at Kite (the parts you can make public or just rough areas I should look into? I understand you have many emerging competitors). Thanks!
Thanks for checking it out!
Their only redeeming move could be to open source their project, and build a business model around that. Otherwise I don't see how will they ever gain back our trust, since they have repeatedly demonstrated a willingness to deceive users.
> I have to wonder if your goal was to upset enough people that you'd generate real attention on various news sites and get Kite a ton of free publicity before your next funding round,” @DevOpsJohn wrote
It seems to have worked out great.
>When we started Kite, we were excited about the possible benefits of internet-connected programming. We were also well aware of the privacy and security concerns some people would have.
So their approach is going to continue being to ignore what they did and pretend they're cool now. Alright, well you can download the maybe-malware if you want, I'll stay as far away as I can from this company myself.
I mean, I don't necessarily disagree with you, they have to regain trust, but at some point you have to be open to it.
Trust but verify.
I would like something like this, but they need to address this more thoroughly. It's not just a footnote.
>Trust but verify.
What does that actually mean? If I verify, then did I trust?
I don't want to verify, but if they expect us to be doing that work for them they sure as hell need to open source it.
Well no, we don't. Trust is difficult to gain and easy to lose, and almost impossible to regain. If the concept is sound then there will be others.
IMO, the devs have the freedom to do whatever they want with the code they maintain. If the users don't like it someone can fork and maintain their own version.
>Is a $4 million venture capital-funded startup stealthily taking over popular coding tools and injecting ads and spyware into them?
Ads? Yea, I guess you can call cross product promotion an ad, but it's far from flashing banners up for sale on ad exchanges. Spyware? Hardly. A service uploading data and responding with results is a perfectly legitimate interaction.
this reductive description is too vague to bear the label "legitimate interaction". context matters, no?
They took more of the Atlassian path of no/low-VC and slower organic growth starting in the early 2000's, so they've had a lot of time to build the "snowball" effect.
I wonder what their look on say, IntelliJ, is.
Im not sure what advantage I'd get with kite.
Often I'd start using a class in the code, and then ask intelliJ to import it, rather than going to the top, and use the autocomplete tool. It can do patterns as well, even custom defined ones, with default values (https://www.jetbrains.com/help/idea/using-live-templates.htm...)
Not everyone's goal is to build with investors' money. Sometimes it is easier to succeed without it.
It's not a question of 'easy or hard' it's a question of 'easier or harder'.
Being in Czech definitely makes it harder, hands down. Thankfully, they are in the EU, but it's still a completely different land that many VC's might not be remotely willing to touch. Consider that very few can even read the laws.
Czech startups would be limiting themselves to German and possibly Russian money in general, and it'd be harder to raise in the US.
JB is a great company that might be able to 'easily' raise money, but even then there could be 'red lines' for many firms that just make investing there impossible. For example, a firm may require they re-incorporate in London or Luxembourg or something, where there are far better established commercial laws.
Open Delaware C-corp and hire European programmers remotely. Either via consulting agreements or wholly-owned European subsidiary. Our company (tensorflight.com) did that.
After fundraising with both European and American VCs, indeed the VC market in the USA is much better.
Those of us who are fossil grumpuses already think IDEs often allow people to write code they don’t think enough about with the idea that issues will be caught by someone else in code review. This, at least, is what I’ve observed over the last few years.
Something that writes the code for people and people basically “code” by doing a series of micro-code-reviews seems really crazy to me for any application that isn’t just fluff. Just look at what autocorrect has done to average incorrect-words-per-sentence. One of the problems with predictive text generation in general is that in isolation the output can seem very sensible even if it’s gibberish.
So as an IDE skeptic in general, I’d be very curious to try this tool out, if only to see how they deal with that.
[I spent years and years using VC++ and other tools and it was actually this feeling of not really knowing anything that drove me away from it. Etags/Cscope/Grey/actually-reading-code was what I replaced it with..]
The problem with autocorrect is that it changes what you've written after the fact, without user input. This on the other hand is autocomplete, so if it suggests gibberish, you still need to explicitly accept it. If you do that, chances are you would've written gibberish in the first place.
Similarly, imagine going to type a common piece of code such as `if __name__ == '__main__':` or `def __init__(self):` at the start of a class, and have it automatically suggested. Obviously you can manually create snippets for each one of these, but that's much more effort.
I can see an interface similar to Gmail's new smart completion, or fish shell, that just shows you a suggestion in the background.
The example you link (from Zsh, not fish) is a fancy looking history autocompletion: in bash, just press Ctrl+r.
The parent has a point: I've spent time working with absolute beginners in programming and the first thing I was teaching them was to ignore the IDE "smarter autocompletion" and suggestions. For typing faster, sure; for suggesting anything else, not so clever.
This one suggests possible completions as you type, so it's a passive feature, compared to ctrl+r which is more active and requires explicit action to work.
And yes, I agree that the whole purpose is to write faster, not to write smarter. I'd maybe add cleaner too, because unless you auto format your code, people often don't write the best by default.
Thanks for making it run locally, I can finally give it a real try now.
Look, I'm all for second chances here. The past behavior of this company doesn't concern me as much as the pretty clear lack of strategy around how you plan to make money.
So this is what I've gathered so far (and correct me if I'm wrong on any of this):
- free product (based on the website)
- no serious mention of enterprise pricing or support (based again on the website)
- a promise in your TOS not to collect sensitive data (which almost all companies tend to modify, let's be honest)
- solicitation of monetization ideas on an internet forum by the CEO, having raised millions from VCs
This sounds like a product I'd stay away from if I cared about data collection.
Can you quantify how much money you're saving developers or companies who use this?
I don't want to complete with all the other code everybody else in the world has written. I want to complete with the words I wrote in a comment 3 lines ago. Or my README I've got open in another window. Or the JSON config file I was editing. That's my litmus test: can it autocomplete from all my other text? Kite can't.
I played with this online demo, and I actually found it pretty frustrating. It kept trying to replace what I was writing with snippets that other people, apparently, had written. I guess that could be useful when trying out a new library, but the rest of the time, it's just going to get in my way.
Codementor meets github.
Edit: Totally forgot to mention, exercism is a bit like this but not with your own custom code. Maybe codementor meets exercism meets github.
I can predict several weeks of stock prices... not very well.
Shannon had an estimation method: ask people to guess the next letter in text, to find its information content. (Assuming people have a perfect model of text - probably, today, with billions of samples, machines might be better?). Could do this with program text, to bound the benefit.
loc completion is a great idea, might work well with idioms, especially if it can figure out the likely parameterization (e.g. in a for(;;) loop). I reckon this approach will no where near realize its promise... but will serendipitiously reveal unexpected adjacent benefits.
Also reminds me of that joke tool that automatically finds and pastes Stackoverflow code.
Also, anyone pasting most of their code from SO is doomed already, and smart code completion isn't going to make them successful.
Is this Grammarly for "programmers"?
Quick suggestion - can you support a Docker based install ? Vscode now support remote debugging through docker, etc. I'm not sure about your architecture, but im wondering if this isnt something that you can do
Instead, only engineer for Linux and make it available through Docker on all platforms. Since VSCode exposes hooks to interact with Docker anyway, this might make more long term sense.
"Huh... that's weird
Something unexpected occurred. We'll investigate what happened"
It makes Vintage mode unusable.
So we took the approach of focusing on one demographic (Python developers) and making them really happy. We think we've reached that point, so we're excited to begin looking at how we can expand our reach. Stay tuned!
While I agree with your sentiment that the changes were intrusive, from the article they weren't actually ads in the strict sense. No Ugg Boots or Rolex Replicas. They were helpful links to knowledge articles or documentation, from what I gather.
Could be wrong though.
That's gotta be a tough position to be in, having raised ~ a lot of money and having the pressure to do a suboptimal move to please the investors.