Lilac is an open-source tool that enables AI practitioners to see and quantify their datasets.
Lilac allows users to:
- Browse datasets with unstructured data.
- Enrich unstructured fields with structured metadata using Lilac Signals, for instance near-duplicate and personal information detection. Structured metadata allows us to compute statistics, find problematic slices, and eventually measure changes over time.
- Create and refine Lilac Concepts which are customizable AI models that can be used to find and score text that matches a concept you may have in your mind.
- Download the results of the enrichment for downstream applications.
Out of the box, Lilac comes with a set of generally useful Signals and Concepts, however this list is not exhaustive and we will continue to work with the OSS community to continue to add more useful enrichments.
Yosemite is desireable to bears so long as humans aren't occupying it. They moved in from other areas -- allowing the now vacant areas they left to support more bears.
Unfortunately there is no attribution, but this tool was created by Daniel Smilkov, who also built TensorFlow Playground and who is a cocreator of TensorFlow.js.
- Many companies and projects have their entire server-side stack in JavaScript and Node.js, and often they want to simply make a prediction through a model. It's quite a lot to ask them to pull in a python runtime just to make a prediction. TensorFlow.js with node bindings to TensorFlow C enables this type of inference with minimal overhead.
- Privacy. You can make predictions locally, or send embeddings back to a server without the raw data ever leaving a client.
- Flexibility of JavaScript / TypeScript. Dynamic languages are great for scientific computing, TypeScript allows you to define your own level of type safety, from raw JS on one end, to strict typing support on the other end.
- Interactivity / education tooling. See tensorflow playground for an excellent example.
- No servers for applications. Making predictions in TensorFlow on a server can be expensive in the long run. Hosting static weights on a server is much much cheaper.
JavaScript and Python ecosystems for machine learning are not mutually exclusive -- they both have their strengths and weaknesses.
There's literally only one reason to do it in Javascript: you want to use Javascript. There are dozens of reasons why it's a terrible idea: unfortunate memory consumption, abysmal performance, poor abstractions, bad library support, and so on. Tensorflow in Python isn't exactly a stellar choice for performance, but at least you gain flexibility and nice abstractions and good high-performance math/stats library support. Go with JS, and you're getting none of that.
"Many companies and projects have their entire server-side stack in JavaScript and Node.js, and often they want to simply make a prediction through a model."
Right. So this is "we don't want to use another language". Acknowledged.
"Privacy. You can make predictions locally, or send embeddings back to a server without the raw data ever leaving a client."
If calling out to a binary is a security problem for you, you have bigger problems than choice of language. Also, of course, you don't need tensorflow to convert your top-secret data into an input vector that you can send somewhere (seriously: it does not help with this problem).
Your third and fourth points -- flexibility and interactivity -- are indeed why people use Python vs C++ (even though it's more difficult and painful to get decent performance out of TF with that approach). So again, this boils down to "I don't want to use Python and I'd prefer to use JS instead."
"No servers for applications. Making predictions in TensorFlow on a server can be expensive in the long run. Hosting static weights on a server is much much cheaper."
You're contradicting yourself with this point. Servers are expensive so hosting static weights on a server is cheaper? I have no idea what this means.
> So don't do that. If you can run tensorflow on your device, you can call out to a local process.
Not from a webapp (without jumping through a dozen other hoops.) With tensorflow.js, you can do (for example) pose estimation, or face detection, or audio recognition, right in the browser without sending data to a remote server.
> But that's not a good reason. It's just a reason.
Yes, of course it's a reason. The point is that it can be a good reason in many cases.
So now we're down to "I want to run a neural network exclusively in the browser" as the primary reason you'd want to use this.
OK, fine. That's a niche use-case for a domain where scale and performance matter so much that we're building specialized hardware to support it. For 99.99% of developers, they would be better advised to find another way to solve their problem using more conventional tools.
There was a guy who built a life-sized house out of Lego once. It was a cool trick, but the difference between him and "modern Javascript" developers is that he didn't try to make anyone live in the house.
How can you run Python tensorflow locally? We're talking about web apps here.
Try to understand what is at play here. It's not all about raw performance, there are many more important things to consider that GP explained extensively.
I think you're conflating privacy and security. Serving the
model on the client side means that the server is never aware of who put the inputs in.
Also he isn't contradicting himself, serving static data off a CDN or server is a lot easier and a lot less expensive than having a cluster setup to do inference.
We've done some initial tests ourselves. WASM doesn't yet support SIMD so WebGL tends to be 5-10x faster. SIMD is actively being worked on by many smart people in Chromium / other browsers, so I would expect to see huge wins in the near term future. When that happens, deeplearn.js will have a WASM backend. WASM has a much better memory management story (destructors on the C++ side) so I'm super excited about its future.
GPUs are at least an order of magnitude faster at training neural nets than CPUs. This is why companies like Google very large amounts of multi-thousand dollar GPUs like the Tesla V100. If cpu WASM is ever faster than WebGL, it's just that the webGL implementation is suboptimal (or that WebGL has too much overhead for real GPU compute)
My comment was meant to imply that if you're using JS, then you already care about other things _more_ than you care about performance.
Once you've decided that you do care about those other things, then getting the best performance you can is great and deeplearnjs is very useful.
But if what you care about most is performance, then using JS is not the way to go. I'm guessing deeplearnjs is an order of magnitude slower (at least) for training a modern convnet relative to cudnn or MKL. And I don't think it's currently possible to use multiple GPUs? Although you could imagine some crazy distributed asynchronous training running in browsers all over the world.
Companies will improve the algorithms and go low level with hardware instructions. The problem with GPUs is that they have special instructions for some operations. Not using them can be 20 times slower. You are not going to improve the algorithms 20 times, ND if you do you can still improve it another 20 times with hardware instructions. You do both if you can. That is a big saving in electricity in data centers.
Lilac is an open-source tool that enables AI practitioners to see and quantify their datasets.
Lilac allows users to:
- Browse datasets with unstructured data.
- Enrich unstructured fields with structured metadata using Lilac Signals, for instance near-duplicate and personal information detection. Structured metadata allows us to compute statistics, find problematic slices, and eventually measure changes over time.
- Create and refine Lilac Concepts which are customizable AI models that can be used to find and score text that matches a concept you may have in your mind.
- Download the results of the enrichment for downstream applications.
Out of the box, Lilac comes with a set of generally useful Signals and Concepts, however this list is not exhaustive and we will continue to work with the OSS community to continue to add more useful enrichments.
Check out the demo on HuggingFace: https://lilacai-lilac.hf.space/ Find us on GitHub: https://github.com/lilacai/lilac