Hacker News new | comments | ask | show | jobs | submit login
Kite launches line-of-code completions, goes cloudless, secures $17M in funding (kite.com)
173 points by adamsmith 20 days ago | hide | past | web | favorite | 86 comments

Adam for Kite here. We learned a lot over the past year and have worked hard to build several new features to Kite based on detailed user feedback (much of which came from the HN community). We're excited to release two new features of special importance today:

* Line-of-Code Completions - Kite's completions engine can now predict several tokens of code at a time, powered by the most sophisticated AI code models available.

* Cloudless Processing - Kite now performs all processing locally on users' computers, instead of in the cloud. No need to upload your code to our servers. You don't even have to sign up for a Kite account.

We know privacy is a big concern for users, so that's why we decided to bring Kite off the cloud. You can learn more about this decision as well as the full Kite release on our blog post, linked here.

Our core belief is programmers spend too much time on repetitive work like copying and pasting from StackOverflow, fixing simple errors, and writing boilerplate code. That's why Kite uses AI to make writing code less repetitive and more fun.

Speaking of fun, we've set up a playground for you to try our Line-of-Code Completions out in your browser. We hope you enjoy it!

As always, Kite is free to download and use. And we no longer require user accounts now that we've moved off the cloud.

If you already use Kite (thank you for your support!), you now have these features via auto-update.

We're really looking forward to your feedback. The detailed feedback we've received in the past has been immensely helpful in getting us to this point. We'll be here all day to answer questions, too.

Could you explain to us in details what kind of analytics data are you collecting? Also for which purposes are these collected?

I am interested (and I bet more people here) in getting a comprehensive list minus the marketing BS.

Great question. We're in the midst of updating our privacy policy, but I'll give a full explanation here.

It may be useful to differentiate between what data we collect and what data we don't collect in order to clear up any confusion.

1. What we collect

We collect a variety of usage analytics that help us understand how you use Kite and how we can improve the product. These include:

* Which editors you are using Kite with

* Number of requests that the Kite Engine has handled for you

* How often you use specific Kite features e.g. how many completions from Kite did you use

* Size (in number of files) of codebases that you work with

* Names of 3rd party Python packages that you use

* Kite application resource usage (CPU and RAM)

You can opt-out of sending these analytics by changing a Kite setting (https://help.kite.com/article/79-opting-out-of-usage-metrics).

Additionally, we collect anonymized "heartbeats" that are used to make sure the Kite app is functioning properly and not crashing unexpectedly. These analytics are just simple pings with no metadata, and as mentioned, they're anonymized so that there's no way for us to trace which users they came from.

We also use third party libraries (Rollbar and Crashlytics) to report errors or bugs that occur during the usage of the product.

2. What we don't collect

* Contents (partial or full) of any source code file that resides on your hard drive

* Information (i.e. file paths) about your file system hierarchy

* Any indices of your code produced by the Kite Engine to power our features - these all stay local to your hard drive

How do you guys plan on monetizing this? This list of analytics seems fine at a quick glance, but I'm concerned that there's not a transparent path to profitability here.

Just like any other service, there's no guarantee that you won't start collecting snippets of source code or other metadata to start selling once the VC's start applying pressure to generate income. What's your strategy?

> You can opt-out of sending these analytics by changing a Kite setting (https://help.kite.com/article/79-opting-out-of-usage-metrics).

Shouldn't this be opt-in according to GDPR?

> * Line-of-Code Completions - Kite's completions engine can now predict several tokens of code at a time, powered by the most sophisticated AI code models available.

Won't this result into copy-paste-like functionality of StackOverflow? This seems great for small projects, but doesn't look useful at for anything else it seems. I'm skeptical that Kite provides anything of value compared to all of the other tools I have.

The website has mentioned that other languages "are coming soon", though it has said this since the last Kite scandal. Are these just empty words?

> Shouldn't this be opt-in according to GDPR?

No, because it isn't personal data.

But cookies that save your search queries are personal data and require opt-in? You can consider anything not to be "personal data". I still really want this to be opt-in, not opt-out.

Thanks for moving it off the cloud and making it local.

That is a huge step towards making us comfortable with the potential privacy aspects.

Do you collect any information?

And how will you make money? (I want you to be around, pay staff, do well etc)

Quick question: Is it doing much differently from TabNine[1]? For me TabNine has been incredibly accurate at guessing what I'm wanting to type.

[1] https://tabnine.com

Great question!

TabNine is a different approach that uses far less semantic information than Kite. It really shines when you're talking about syntactic repetition, e.g. if you check out the screenshots on their homepage. They have a page about semantic completions, but the semantics are very shallow -- basically what attributes are on the instance you're accessing.

In contrast, we've spent ~50 eng-years semantically indexing all the code on Github, building statistical type inference, and rich statistical models that use this semantic information in a very deep way. The result is that Kite can help more often, in ways that reflect a deep understanding of the semantics of the code you're writing.

Not to mention by pushing malware to people via IDE addons, pretty good strategy for training those datasets.

Interesting - is the model that learns from (semantically indexed) code bound by its license? One could argue it is derivative work.

Interesting question. I’m not a lawyer, but I presume it’s derivative enough not to be copyrighted. For example, one could argue Google’s giant n-gram corpus is derived from many copyrighted webpages.

Here’s some feedback: pretending you’re grateful for users screaming at you for breaching their trust is not a good way to regain that trust.

> If you're already a user, Kite has been auto-updated and is now working locally without sending code to the cloud. If you previously uploaded code to our servers, you can remove your data via our web portal.

> We're grateful to our users for helping us reach this point.

Why aren’t you scrubbing all surreptitiously uploaded data proactively?

Why are you pretending you are glad you got caught doing sketchy things?

You’re going to have to do a lot better than that to convince us that your company suddenly operates with integrity now simply because you got caught.

> Our core belief is programmers spend too much time on repetitive work like copying and pasting from StackOverflow, fixing simple errors, and writing boilerplate code.

Do you have data backing this core belief up or is it just an intuition? I don't feel any of these things are problems I need a solution for. Maybe this depends on the type of programming someone does or the language they use but as a working programmer this is not a compelling sales pitch to me. You're offering solutions to problems I don't have. They're also not problems that I see to any great extent in junior programmers who report to me.

On an intuitive level, I would guess there are many, many developers debugging a "you forgot to cast your int to a string" error message as I'm writing this.

More concretely, the StackOverflow thread for "Parsing values from a JSON file?" has almost 3 million views. We take this as evidence that programming is repetitive.

(We should probably clarify that when we say 'repetitive' we mean on a global scale, not an individual scale.)

> I would guess there are many, many developers debugging a "you forgot to cast your int to a string" error message as I'm writing this.

I've always avoided dynamically typed languages because of these types of issues so I guess this seems like a solved problem to me - just use a statically typed language. Clearly there are many people who are using dynamically typed languages for whatever reason though so I can see how better tooling might be useful to them.

> More concretely, the StackOverflow thread for "Parsing values from a JSON file?" has almost 3 million views.

I certainly make use of Stack Overflow, I was taking issue with the idea that significant time is spent copying and pasting code from the site though. I often refer to it to answer a programming related question but I almost never then copy and paste code. I'm looking for a pointer to a library or API, to understand some confusing or poorly documented behavior or for a workaround to a bug etc. and getting an answer to my question rarely leads to copying and pasting code in my experience.

I am using dynamic languages and have for years, but I don't see this as something that would help me in my work. I mean, autocomplete is nice sometimes, but when I know what to write and where, those little autocomplete boxes just get in the way. Maybe I'm also not the target audience though. Which is good - after reading how they started I would never willingly install their software.

There's a trend amongst dynamic languages to slowly add type hints (see Mypy for Python, Typescript, etc) so I see these issues becoming less important over time.

I've been using Python on a daily basis for close to a decade now, I can probably count on my hand how many times a bug has been because of casting a string to an int.

"Our core belief is programmers spend too much time on repetitive work like copying and pasting from StackOverflow, fixing simple errors, and writing boilerplate code."

Sounds like you will likely be introducing bugs into people's code. Or at the very least, regressing code quality towards the mean.

Maybe, but these days you have programming languages/libraries with minimal documentation and underspecced parameters/returns, you have to hunt for examples of use on StackOverflow/Github/etc. and many frameworks/tasks have annoying boilerplates everybody has to type all the time.

Moreover, there are common errors you can avoid if their ranking algorithm discourages code reported/identified as bad.

Kite aside, I think both of these things will happen as developers adopt these technologies. We're seeing similar effects with autocorrect on your phone and self driving cars (i.e. there will still be accidents, and they tend to drive more slowly), but on balance these technologies are positive.

For me autocompletion is the second thing I turn off on a phone (right after vibrate on touch). There is nothing more annoying than writing a word correctly, and autocomplete "fixing" it, just because it was slang word, or a word in another language because I forgot the english word in that moment.

A lot of that is because the minimal phone keyboard doesn't give you a good UI to accept or decline a suggested autocompletion.

With a keyboard and mouse and large monitor, you can display the completion prompt and let the user select or ignore from the various options with Ctrl-Space or similar.

When "Select the first autocomplete over what I typed" is the default behavior, yeah, that sucks. I often end up using the "key combo" Space,Backspace,Space to accept the autocomplete on my phone.

But I definitely see utility in offering a correction for, eg. 'DateTime.Format("yyyy-mm-dd" -> yyyy-MM-dd' or any of the million other idioms that we have to remember or look up. What I don't understand is how Kite can offer useful local-only suggestions - Is the default dictionary that comes with it preloaded with a million lines of open source, or is it literally just my code being suggested?

Curious how much better your approach is versus fairly vanilla "most popular ngrams".

N-grams is the first thing you try when you want to statistically model code. We tried it in 2014 and the results are disappointing, if you want to do anything beyond identifying low level syntactic patterns.

The intuition here is that in natural language, context is defined locally. If you want to know whether a 'they' refers to a male or female, for example, you look at the nearby text. In contrast, the flow of data and control through code is highly non-local, which is a lot of why techniques like N-grams don't work.

This has been a very active area of research in academia since we started Kite in 2014. (Suggested google searches: [big code], [ml on code], etc.)

All that said I don't have data on how our approach would compare to N-grams, but I'm guessing if you look at some of the academic research, you'll find that NLP techniques were abandoned, despite the early papers' focus on them.

Thoughtful answer, thank you. People complain about HN having gone downhill...but answers like this one still pop up fairly frequently.

This is excellent! Thanks for allowing me to play with it!

Did you publish any papers on how you approach intelligent code completion at Kite (the parts you can make public or just rough areas I should look into? I understand you have many emerging competitors). Thanks!

Yes, this may be a good starting point — https://github.com/src-d/awesome-machine-learning-on-source-...

Thanks for checking it out!

To those thinking of using this service, you might want to read this first: https://theoutline.com/post/1953/how-a-vc-funded-company-is-...

Discussion: https://news.ycombinator.com/item?id=14836653

And that isn't even the full story. After Kite was forced to come clean about their Atom extension purchases, they were caught with yet another unrelated Sublime Text plugin that was phoning home to an IP address for an entire year. Their explanation? They forgot they were doing it!


Their only redeeming move could be to open source their project, and build a business model around that. Otherwise I don't see how will they ever gain back our trust, since they have repeatedly demonstrated a willingness to deceive users.

The original article contains this quote:

> I have to wonder if your goal was to upset enough people that you'd generate real attention on various news sites and get Kite a ton of free publicity before your next funding round,” @DevOpsJohn wrote

It seems to have worked out great.

From what I remember the biggest criticism was that they uploaded your code to the cloud, but if you read this, they say that it's now done entirely locally. So honest question, what would the risks be at this point?

What should compel anyone to trust them in the first place? Seems like the balls in their court to figure out how to regain trust.


>When we started Kite, we were excited about the possible benefits of internet-connected programming. We were also well aware of the privacy and security concerns some people would have.

So their approach is going to continue being to ignore what they did and pretend they're cool now. Alright, well you can download the maybe-malware if you want, I'll stay as far away as I can from this company myself.

I feel like the answer to your question is implicitly that there is nothing they can do. they're "cancelled". Even a personal apology (with supporting documentation) would be ignored, since you don't care about the product anyway. Is this true?

I mean, I don't necessarily disagree with you, they have to regain trust, but at some point you have to be open to it.

Trust but verify.

No, but they need to explain what went wrong to cause them to think what they did was okay, and why we should believe it won't happen again, at a minimum. If they have no integrity, and hijacking an open source project certainly is sending that message, then I can't imagine what other incentives would stop them from abusing my trust in them for their gain.

I would like something like this, but they need to address this more thoroughly. It's not just a footnote.

>Trust but verify.

What does that actually mean? If I verify, then did I trust?

I don't want to verify, but if they expect us to be doing that work for them they sure as hell need to open source it.

>...but at some point you have to be open to it.

Well no, we don't. Trust is difficult to gain and easy to lose, and almost impossible to regain. If the concept is sound then there will be others.

tl;dr Kite hired a developer for an Atom plugin and promoted themselves in this plugin. Kite acquired an autocomplete plugin and switched the engine to use their own.

IMO, the devs have the freedom to do whatever they want with the code they maintain. If the users don't like it someone can fork and maintain their own version.

>Is a $4 million venture capital-funded startup stealthily taking over popular coding tools and injecting ads and spyware into them?

Ads? Yea, I guess you can call cross product promotion an ad, but it's far from flashing banners up for sale on ad exchanges. Spyware? Hardly. A service uploading data and responding with results is a perfectly legitimate interaction.

In a corporate environment this may lead to accidential violations of Contracts. Usually code isnt allowed to leave the corporate network. So one day poor joe updates his favourite autocomplete plugin, the next day he violated a contract

An organization with those requirements should be careful which tools they use.

> A service uploading data and responding with results is a perfectly legitimate interaction.

this reductive description is too vague to bear the label "legitimate interaction". context matters, no?

I see it as the difference between a transactional email and an unsolicited one. This service seems to fall under what I would call transactional.

How come JetBrains didn't raise bazillion VC dollars, while their code completion and tools like refactoring are the best option for most popular programming languages?

Jetbrains has put a lot of engineering effort into building really great tools. They're really wonderful!

They took more of the Atlassian path of no/low-VC and slower organic growth starting in the early 2000's, so they've had a lot of time to build the "snowball" effect.

They've even put Jetbrains' IDE at the bottom of the "Staircase of intelligence" in their little infographic... I'm not sure if developers familiar with those tools can take this company seriously (especially given what they've done previously.)

I wonder what their look on say, IntelliJ, is.

My primary experience is with Intellij, though most of their IDE's behave similarly.

Im not sure what advantage I'd get with kite.

Often I'd start using a class in the code, and then ask intelliJ to import it, rather than going to the top, and use the autocomplete tool. It can do patterns as well, even custom defined ones, with default values (https://www.jetbrains.com/help/idea/using-live-templates.htm...)

Their focus is a sustainable and profitable business which does not need VC and is usually incompatible with the large exits required for investors.

maybe they don't want it. VC money comes with strings attached

JetBrains are from Russia. It's harder for them to raise VC because of that.

They are from Prague (Czech Republic). While not SV, I can assure you they could easily find VCs there (or abroad, maybe in Russia?) if they wanted to.

Not everyone's goal is to build with investors' money. Sometimes it is easier to succeed without it.

" I can assure you they could easily find VCs there "

It's not a question of 'easy or hard' it's a question of 'easier or harder'.

Being in Czech definitely makes it harder, hands down. Thankfully, they are in the EU, but it's still a completely different land that many VC's might not be remotely willing to touch. Consider that very few can even read the laws.

Czech startups would be limiting themselves to German and possibly Russian money in general, and it'd be harder to raise in the US.

JB is a great company that might be able to 'easily' raise money, but even then there could be 'red lines' for many firms that just make investing there impossible. For example, a firm may require they re-incorporate in London or Luxembourg or something, where there are far better established commercial laws.

I know many European startups raising money from American VCs.

Open Delaware C-corp and hire European programmers remotely. Either via consulting agreements or wholly-owned European subsidiary. Our company (tensorflight.com) did that.

After fundraising with both European and American VCs, indeed the VC market in the USA is much better.

I acknowledge that this will be unpopular, but ...

Those of us who are fossil grumpuses already think IDEs often allow people to write code they don’t think enough about with the idea that issues will be caught by someone else in code review. This, at least, is what I’ve observed over the last few years.

Something that writes the code for people and people basically “code” by doing a series of micro-code-reviews seems really crazy to me for any application that isn’t just fluff. Just look at what autocorrect has done to average incorrect-words-per-sentence. One of the problems with predictive text generation in general is that in isolation the output can seem very sensible even if it’s gibberish.

So as an IDE skeptic in general, I’d be very curious to try this tool out, if only to see how they deal with that.

[I spent years and years using VC++ and other tools and it was actually this feeling of not really knowing anything that drove me away from it. Etags/Cscope/Grey/actually-reading-code was what I replaced it with..]

> Just look at what autocorrect has done to average incorrect-words-per-sentence.

The problem with autocorrect is that it changes what you've written after the fact, without user input. This on the other hand is autocomplete, so if it suggests gibberish, you still need to explicitly accept it. If you do that, chances are you would've written gibberish in the first place.

As pointed above by the other comment, autocorrect's issue is that it sometimes forces a correction on you (unless you disable that). On the other hand, there are many words I could never dream of spelling correctly without looking up, but that I know get perfectly. Sadly, you don't notice those, but only notice it when it goes wrong.

Similarly, imagine going to type a common piece of code such as `if __name__ == '__main__':` or `def __init__(self):` at the start of a class, and have it automatically suggested. Obviously you can manually create snippets for each one of these, but that's much more effort.

I can see an interface similar to Gmail's new smart completion, or fish shell, that just shows you a suggestion in the background.


Gmail smart completion is more akin to a template, like giving you an empty class or an empty main function. Its utility (at least in my experience) hasn't gone much farther.

The example you link (from Zsh, not fish) is a fancy looking history autocompletion: in bash, just press Ctrl+r.

The parent has a point: I've spent time working with absolute beginners in programming and the first thing I was teaching them was to ignore the IDE "smarter autocompletion" and suggestions. For typing faster, sure; for suggesting anything else, not so clever.

Just a correction, my example was from a zsh plugin which imitates a built-in fish feature. Also, said feature is different from ctrl+r, which all 3 shells still have.

This one suggests possible completions as you type, so it's a passive feature, compared to ctrl+r which is more active and requires explicit action to work.

And yes, I agree that the whole purpose is to write faster, not to write smarter. I'd maybe add cleaner too, because unless you auto format your code, people often don't write the best by default.

What's Grey?

Autocorrected grep.

Can I ask what your business model or pathway to monetization is?

Thanks for making it run locally, I can finally give it a real try now.

We'll monetize through the enterprises. We've had lots of conversations with larger companies and it's clear that engineering time is really precious to them, as is shipping faster. We're not really sure how to mechanize charging enterprises yet. When we had a freemium offering we didn't like that e.g. students didn't get all the features. Github had an elegant approach with their approach of charging for private repos. Maybe Kite is paid if you're working in a repo that's not open source and has lots of active contributors. Would love any ideas!

You've raised millions and you're spitballing monetization ideas on an internet forum?

Look, I'm all for second chances here. The past behavior of this company doesn't concern me as much as the pretty clear lack of strategy around how you plan to make money.

So this is what I've gathered so far (and correct me if I'm wrong on any of this):

- free product (based on the website)

- no serious mention of enterprise pricing or support (based again on the website)

- a promise in your TOS not to collect sensitive data (which almost all companies tend to modify, let's be honest)

- solicitation of monetization ideas on an internet forum by the CEO, having raised millions from VCs

This sounds like a product I'd stay away from if I cared about data collection.

Can you quantify how much money you're saving developers or companies who use this?

In all my years programming, there's been only one feature that I've seen that I consider The Killer Feature of Autocomplete, and I've only seen it in one editor: Emacs.

I don't want to complete with all the other code everybody else in the world has written. I want to complete with the words I wrote in a comment 3 lines ago. Or my README I've got open in another window. Or the JSON config file I was editing. That's my litmus test: can it autocomplete from all my other text? Kite can't.

I played with this online demo, and I actually found it pretty frustrating. It kept trying to replace what I was writing with snippets that other people, apparently, had written. I guess that could be useful when trying out a new library, but the rest of the time, it's just going to get in my way.

Can you also add a feature where my code is analyzed by a community/individuals for $/month if I wish to submit it? Sometimes as a dev, you get stuck or need to do refactoring & need help. Glitch has a community feature like that - but it would be amazing to build a team of experts paid to unstuck fellow devs - on top of a tool like kite; esp because you can then use that data for building a better suggestion engine.

I've used a tool similar to what you've described here called https://www.pullrequest.com/ for code review, though it's fairly new IIRC and not cheap.

200$/h for the cheapest option O.o

AFAIK that's pretty enterprisey

What a great idea. That could be a tool/app in its own right compatible with whatever IDE.

Codementor meets github.

Edit: Totally forgot to mention, exercism is a bit like this but not with your own custom code. Maybe codementor meets exercism meets github.

Yep, I hope someone can make it a reality!

> Line-of-Code Completions - Kite's completions engine can now predict several tokens of code at a time

I can predict several weeks of stock prices... not very well.

Shannon had an estimation method: ask people to guess the next letter in text, to find its information content. (Assuming people have a perfect model of text - probably, today, with billions of samples, machines might be better?). Could do this with program text, to bound the benefit.

loc completion is a great idea, might work well with idioms, especially if it can figure out the likely parameterization (e.g. in a for(;;) loop). I reckon this approach will no where near realize its promise... but will serendipitiously reveal unexpected adjacent benefits.

Also reminds me of that joke tool that automatically finds and pastes Stackoverflow code.

No way. I already spend nearly as much time double checking code completion as I save being able to tab my way thru a line of new code.

Also, anyone pasting most of their code from SO is doomed already, and smart code completion isn't going to make them successful.

Is this Grammarly for "programmers"?

> Kite doesn't support Linux yet, but coming very soon.

Quick suggestion - can you support a Docker based install ? Vscode now support remote debugging through docker, etc. I'm not sure about your architecture, but im wondering if this isnt something that you can do

How would that work? Docker is based on container technology inside the linux kernel. Docker on mac and windows uses virtual machines to provide a linux kernel. If Kite doesn't support linux, how would their technlogy work inside a container, which is an isolated linux environment?

well Kite's server side is presumably linux (most people's are). Pretty sure that they had to re-engineer specifically for Windows and OSX on their native APIs.

Instead, only engineer for Linux and make it available through Docker on all platforms. Since VSCode exposes hooks to interact with Docker anyway, this might make more long term sense.

So very impressive if you want to type exactly what they recommend in the demo app. And what's the copyright on that auto completed code?

If anyone from Kite here readying your site fails to load with Brave browser (on Windows 10) with Shields Up. Both the homepage and the blog post.

"Huh... that's weird Something unexpected occurred. We'll investigate what happened"

Seemed lackluster to me. Adding code in a literally defined data structure failed to complete anything. with open() as f: ll = [f.(no help from here on out)]

Is there any way of using Kite in Sublime Text with Vintage mode, so that Kite doesn't capture the Esc keypress?

It makes Vintage mode unusable.

How is that different than tab9? From what I remember it’s one guy and it’s already working for VSCode and Sublime.

How come its only for Python?

Looks like the plan is to support more than Python: https://kite.com/letmeknow

Yes, because we use deeply semantic information about code there is some engineering work to build support for other languages.

So we took the approach of focusing on one demographic (Python developers) and making them really happy. We think we've reached that point, so we're excited to begin looking at how we can expand our reach. Stay tuned!

Does "really happy" include injecting your ads into our editors by hijacking open source projects? Because that makes me really unhappy.

Ah yes, now I remember where I'd heard about kite.

Dunno if I should enter this seemingly infected argument, but I only now read up a bit on it: https://qz.com/1043614/this-startup-learned-the-hard-way-tha...

While I agree with your sentiment that the changes were intrusive, from the article they weren't actually ads in the strict sense. No Ugg Boots or Rolex Replicas. They were helpful links to knowledge articles or documentation, from what I gather.

Could be wrong though.

That's gotta be a tough position to be in, having raised ~ a lot of money and having the pressure to do a suboptimal move to please the investors.

Demo doesn't seem to work.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact