You're not the only one. I've worked quite a bit with Minio, Arrow, and Spark, and I don't really understand the point the article is trying to make or how it's related to Minio. It's either badly explained or just a fluff piece throwing together a bunch of technologies.
To me, the article says "columnar formats that don't require deserialization are good, oh, and you can use Minio to store data!"
I typically only look for upvoted content. Prefix your searches with "site:reddit.com", "site:news.ycombinator.com" or "site:stackoverflow.com" - The social/human filter is quite a good one in my experience and it gets rid of all the Medium-like personal branding fluff.
I'm skeptical of JAX. It feels good right now, but when the first TF beta version came out it was very much like that too - clean, simple, minimal, and just a better version of Theano. Then the "crossing the chasm" effort started and everyone at Google wanted to be part of it, making TF the big complex mess it is today. It's a great example of Conway's Law. I'm not convinced the same won't happen to JAX as it catches on.
PyTorch has already stood the test of time and proven that its development is led by a competent team.
I know where you're coming from, but TF in my opinion was very user-hostile even on arrival. I can't tell you how much hair-pulling I did over tf.conds, tf.while_loops and the whole gather / scatter paradigm for simple indexing into arrays. I really think the people working on it wanted users to write TF code in a certain, particular way and made it really difficult to use it in other ways. Just thinking back on that time still raises my blood pressure! So far Jax is much better and I'm cautiously optimistic they have learned lessons from TF.
I had the opposite experience. The early TF versions were difficult to use in that they required a lot of boilerplate code to do simple things, but at least there was no hidden complexity. I knew exactly what my code did and what was going on under the hood. When I use today's high-level opaque TF libraries I have no idea what's going on. It's much harder to debug subtle problems. The workflow went wrong "Damn, I need to write 200 lines of code to do this simple thing" to "I need to spend 1 hour looking through library documentations, gotchas, deprecation issues and TF-internal code to figure out which function to call with what parameters and check if it actually does exactly what I need" - I much prefer the former.
Having barriers of entry is not always a bad thing - it forces people to learn and understand concepts instead of blindly following and copying and pasting code from a Medium article and praying that it works.
But I agree with you that there are many different use cases. Those people who want to do high-level work (I have some images, just give me a classifier) shouldn't need to deal with that complexity. IMO the big mistake was trying to merge all these different use cases into one framework. Let's hope JAX doesn't go down the same route.
Not quite sure why you picked those particular examples... JAX also requires usage of lax.cond, lax.while_loop, and ops.segment_sum. Only gather has been improved with slice notation support. IMO, TF has landed on a pretty nice solution to cond/while_loop via AutoGraph.
While jax has those operations you don't always need them, it depends on what transformations you want to do (JIT or grad) and they have been working on making normal control structures compatible with all transformations
You can't blame the TF people for things like while_loop. Those are inherited from Theano, and back then the dynamic graph idea wasn't obvious.
JAX is indeed a different situation as it has a more original design (although TF1 came with a huge improvement in compilation speed, so maybe there were innovations under the hood). But I don't know if I like it. The framework itself is quite neat, but last time I checked, the accompanying NN libraries had horrifying designs.
The difference is that in TF1 you had to use tf.cond, tf.while_loop etc for differentiable control flow. In JAX you can differentiate Python control flow directly, e.g.:
In [1]: from jax import grad
In [2]: def f(x):
...: if x > 0:
...: return 3. * x ** 2
...: else:
...: return 5. * x ** 3
...:
In [3]: grad(f)(1.)
Out[3]: DeviceArray(6., dtype=float32)
In [4]: grad(f)(-1.)
Out[4]: DeviceArray(15., dtype=float32)
In the above example, the control flow happens in Python, just as it would in PyTorch. (That's not surprising, since JAX grew out of the original Autograd [1]!)
Structured control flow functions like lax.cond, lax.scan, etc exist so that you can, for example, stage control flow out of Python and into an end-to-end compiled XLA computation with jax.jit. In other words, some JAX transformations place more constraints on your Python code than others, but you can just opt into the ones you want. (More generally, the lax module lets you program XLA HLO pretty directly [2].)
There are a bunch of frameworks built on top of Pytorch too (fastAI, lighting, torchbearer, ignite...), I don't see why this should be a problem (or at least a problem to JAX but not to Pytorch)
IMO, this is not a fair comparison because Pytorch spans a larger amount of abstraction than jax (I don't quite know how to explain it other than "spans a larger amount of abstraction").
You can do much of the jax stuff in pytorch, you can't do the high level nn.LSTM stuff in jax, you have to use like flax or objax or something.
> I think the most dangerous distractions are the ones that feel productive but don’t actually work toward your goals. For example, browsing hacker news.
Time to set up my emacs config to manage my life and stop wasting time.
Not everyone is optimizing for making money. I left a Silicon Valley FAANG for another country more than five years ago, even though I was making ~3x there compared to what I made in the new place. Even with the higher living costs, that's still a lot more money.
I regret nothing. I'm so much happier here than I ever was in the Bay Area, and I doing this early in my life allowed me to have a lot of fun and stories to tell. Why would I "waste" my precious 20s and early 30s being miserable and retire at 35? Who even wants to retire? I wouldn't even know what to do with that money? Buy a house and sit in the garden the whole day? :) Not the life I wanted.
This whole "make a lot of $$$ and be miserable in your 20s to optimize for the future" is such a common thing I see in SV and HN as part of the narratives that these VCs are pushing. I think you have it all backwards. You can make money any time. You can never back to your 20s and 30s where you don't have health problems, are full of energy, and have an easy time making friends.
Another illusion is that these are absolute numbers. They're relative. When you're young, $100k is a lot of money. Enough to be happy and worry-free. Once you have that, you need $1M in cash to be happy and retire. Once you have $1M but realize everyone around you is making $1M/year to retire at $10M you suddenly need $10M. This will repeat itself, and you'll never have enough. You'll just become more miserable realizing how much more everyone else has. Yeah, platitudes. We all know this, right? Turns out, this cycle is totally unconscious and incredibly hard to break once you're in it and surrounded by such people. Get away while you can.
At least for me, this was one of the reasons why working at FAANG where everyone is striving for $$$ made me miserable. It gave me such a warped view of the world.
> working at FAANG where everyone is striving for $$$
This really hasn’t been my experience at either Google or Facebook, in fact I would nearly go so far as to say the opposite is true. I’ve seen far more “striving” in the traditional corporate world
The common term for this is lifestyle inflation. As you make more money your “needs” keep increasing. It’s incredibly challenging to reserve this trend but very easy to follow it.
I think the best way to prevent this is, whenever you get a raise, open a new bank account and have the difference direct deposited into it. That way, the main bank account that you work with regularly, always sees the same amount deposited.
> “ I left a Silicon Valley FAANG for another country more than five years ago, even though I was making ~3x there compared to what I made in the new place.”
This is kind of what the comment you’re replying to was suggesting. Take advantage of the high income first before moving.
> “You can make money anytime.”
Sure, but interest compounds - money earlier is way more helpful than money later.
I always wonder a little about the family wealth of people that write things like this. Most of the people I know who left high paying jobs come from wealthy families and they’re already basically working for fun.
I didn’t grow up poor by any means, but I also don’t come from family wealth. If I want to have a family of my own, and make sure we will always have a place to live, saving high salary (and equity) in order to reach financial independence is worth it.
If you don’t want a family or you come from wealth then prioritizing other stuff is probably fine. If you do want these things though, I think it’s helpful to be more forward looking until reaching financial independence.
> “Not the life I wanted.”
Totally reasonable, but what people want over their life changes. Financial independence gives you freedom to be able to change too.
> I always wonder a little about the family wealth of people that write things like this. Most of the people I know who left high paying jobs come from wealthy families and they’re already basically working for fun.
Seriously, these replies that turn securing a future for you and your family into some kind of avarice anti-virtue are ridiculous.
As per the OP, the Quality Migrants visa becomes harder to obtain once you reach 30. Japan wants young talent to replace their aging workforce so youth is valued and considered. Grinding in the valley could have caused him to lose the chance to move to Japan completely.
What city are you in if you don't mind me asking? While I agree with the sentiment to live in the present, I do think many of us don't see ourselves living in major US tech hubs for long, it's more of optimizing for money when you are young and de-prioritizing certain other aspects while making sure you still remain Happy and not miserable. For instance, if you are from India, you can get 3x+ salary/savings, learn different cultures/lifestyle and also learn from the best in technology and business for the cost of living away from your family (which you kind of do anyways if your family isn't in an Indian tech hub city).
"Work" (as in the thing you do for money) doesn't make me happy so why would I want to work for the rest of my life? Also, if I'm coming from a poor family why would I want to stay poor and doom my children?
I'd rather want to "work" (as in the thing you do for happiness) for the rest of my life and be "rich" at the same time.
But big SV money? I wouldn't be so sure. Ageism is rampant in tech. FAANGs love their coders young, in 20's-30's. Also the dollar saved earlier in life is worth exponentially more later thanks to compound interest
There's no better place to solve your financial problems than in SV in your 20's.
I'm not super familiar with prediction markets. Could someone explain how exactly these markets create initial liquidity, how they set the number of tradeable shares, etc?
I generally like next.js, but I have since built some things in Svelte [0] and prefer it for simple projects. Not dealing with JSX and complex React components is refreshing. It just works. There are some upsides to its static generation (with sapper) as well - it can create static pages for things that would need API routes in Next.
TLDR; For large projects that must make use of the React ecosystem I'd take Next. For smaller SPA projects I prefer svelte.
I'm in the same camp. I'm quite interested since it mentions asset data as an example, but I have no idea what this does from looking at the landing page. Does someone have an end-to-end example? Since this stores arrays, is this kind of like Apache Arrow but with a persistence layer? Is this suited for large amounts (~1TB) of time series data?
Please see my comment above in the thread for a full description of TileDB[1].
Compared to Apache Arrow we have some similarities but also some significant differences. Arrow as a project has many components, the most directly comparable are the in-memory data structure and parquet for on-disk storage. For the in-memory data structure TileDB has similar goals of doing zero-copy for moving data between libraries and applications. In-fact we even use Arrow in our TileDB-VCF[2] genomics project for zero copy into spark and python (pyarrow). We are looking to expand support for arrow into other integrations where appropriate.
For parquet, a brief comparison is that parquet is a one dimensional columnar storage format, where TileDB is multi-dimensional. TileDB subsumes parquet in that we include all of its functionality and more. TileDB natively handles the eventual consistency of cloud object stores, and natively handles updates through its MVCC design. TileDB is a complete storage engine, not only a file format. That said parquet does have some advantages, a primary one it has a several year headstart on TileDB in being integrated into many tools, so its more well known.
> Is this suited for large amounts (~1TB) of time series data?
Yes, it's well suited for time series data. We natively support timestamp/datetime fields, in the core library and in many of our integrations (TileDB-Py, TileDB-R, Spark, MariaDB to name a few). We allow for fast sub-slicing on the dimension. You also have configurable tiling[3] so you can shape the array to fit your timestamp granularity and volume. The support for updates also can help if your timeseries data ever gets updated. Many timeseries databases don't recommend updates to records, or they recommend no primary keys and to have duplicates. TileDB supports fast and efficient updates (and duplicates) so you have full control of your design and implementation.
Someone asked for a more nuanced perspective, so here we go.
For a lot of AI researchers, OpenAI has been a huge disappointment. We had hope that OpenAI would be the company to democratize AI with good open source work, transparency, no PR bullshit (aka DeepMind), and evangelism. That they would develop in the open, and perhaps even do research in the open. You know, kind of like the name says.
It all started out okay with their release of OpenAI Gym, tutorials, leaderboards, and competitions around that. That was when Karpathy was still there. Over time, many projects have become abandoned, poorly maintained, or just disappeared [1]. And many projects they promised never happened [2]. OpenAI became just another research lab obsessed with publishing papers in closed (!) journals, indistinguishable from Google AI, DeepMind, FAIR, MSR, and the many others.
There is nothing open or different about them. Most paper code is not published, and even when it is, it's just the typical poorly written and unmaintained research code that you see from other labs. None of their infrastructure is open source either, because it's needed to maintain their competitive advantage to train models and publish research papers. GPT-3 being offered as a paid API to a select number of people is latest joke in a long series of other jokes. All of this would be fine, if it was not for the name and branding of being a transparent and good-willed nonprofit company. It is just misleading and that rubs many people the wrong way, as if the whole "open" thing was just a PR stunt.
HuggingFace [0] these days is pretty much what OpenAI should have been, but only time will tell what happens.
I don't see how this is a nuanced perspective - it seems to restate the same complaints/arguments just about every comment makes in these discussions.
A nuanced perspective would look at the arguments as to why OpenAI is doing the things they are doing. For example:
* OpenAI publishes in closed journals (actually conference proceedings) because that is where all the cutting edge research is published and reviewed. I cannot recall an OpenAI paper that wasn't available either via arXiv or their website, despite being published in a closed journal. What is the alternative here? Where should they go for quality peer-review? Yes you can argue the peer review at top conferences is not quality, but is worse quality than no peer-review or peer review from open-access no-name journals?
* How does OpenAI make money? How much are they bringing in? How much does it cost to support things like the OpenAI Gym, etc.? How much does it cost OpenAI in terms of bandwidth to host pre-trained versions of GPT-3? At some point a company needs to make money and prioritize resources - they can't give everything away for free in perpetuity.
I don't think these questions have obvious answers - there is give and take.
It seems like there are a lot of good reasons for every choice they made.
Organizations are constantly making decisions that are trading off certain values for others, i.e. openness vs safety/expediency/funding. But if they use the word open in their name, signalling to people that is one of their foundational values, people will expect them to pick openness even when it's not necessarily the easiest, safest, most expedient, or most profitable choice. They expect them to pick openness when it's hard.
that's not what happened with OpenAI though. They're not a non-profit anymore, they changed to a "controlled profit" (lol) model.
I didn't know this was even possible/legal. Start as a non-profit for all the tax advantages and convert to for-profit once you've got a saleable product? Maybe startups should start doing this
What's the point? If your business doesn't turn a profit then you don't owe business income taxes anyways. Most businesses take several years to reach profitability.
"OpenAI is governed by the board of OpenAI Nonprofit, which consists of OpenAI LP employees Greg Brockman (Chairman & CTO), Ilya Sutskever (Chief Scientist), and Sam Altman (CEO), and non-employees Adam D’Angelo, Holden Karnofsky, Reid Hoffman, Shivon Zilis, and Tasha McCauley."
Yes, but the computing they are trying to do is expensive, so it makes sense to then try and get some self-sustaining revenue by leveraging their research into a software service. I must admit I don't know about their current status of funding from large companies etc, but I do think it makes sense to try and make a bit of their own money to be more independent.
A bunch of fellow researchers and I started Manifold Computing (https://manifoldcomputing.com), were we’re hoping to do live, open source research and build open source tools. As you said, time will tell but I hope we can do good work this way.
I think this illustrates just how complex the topic is, though. Hugging Face is awesome, and Transformers has done so much to democratize NLP, but does it exists without labs like OpenAI releasing models like GPT-2? The ecosystem is still young and fluid, and as a result, it's super complex. I completely understand the critique of what OpenAI is now vs. what they positioned themselves as early on, but I think it is also a symptom of all the open questions around how ML research is to be done in a way that maximizes community benefit while remaining sustainable.
As soon as I learned that Sam Altman is involved, I could infer the direction. I cannot recall Sam being activist for anything open - but I do remember him as a the "head of the startup world", aka creating exclusive opportunities for venture capitalist by using and tearing bright eyed talent. I think he was pretty successful at the latter.
The sad thing is: OpenAI might be just the foreshadow of the power step function segment of the future, disenfranchising those who are on the wrong side of the API much more that we see today (e.g. somewhat limited to the gig economy).
Preface: I'd wait a few months before declaring it a game changer. There currently is a huge OpenAI marketing campaign going on touting how amazing GPT-3 is. Almost all the GPT-3 press comes from OpenAI's friends and network of YC founders who got privileged access and promised to build apps on top of the new API. You can see that by looking at the most popular tweets and posts. It's usually ex-YC people who are now working on new GPT-3 powered apps. Their examples and demos are clearly cherry picked. It's the typical Silicon Valley VC pitch deck smoke demo approach that you would expect from SV startup people. Let the hype and marketing campaign settle down.
That being said, if it all works as promised and the model becomes widely available it's quite amazing and has the potential to change a few things. The obvious one is that it becomes much easier to automatically create plausible-looking content such as news stories, comments, etc. This will create many more bots and spam than you are used to seeing. The other obvious one is to act as a natural-language based search engine or database, where you can ask questions and get facts as answers. This would be restricted to non-subjective things that are in the training data of course.
On a longer time scale, it could drive the adoption of technologies around fact and identify verification. As it becomes so much easier to automatically generate content, we need to better ways to establish trust. Safety and bias is yet another, since anything generated will obviously be biased to whatever is most common in the training data.
> They're probably being paid for it and their examples and demos are clearly cherry picked.
This is a hefty and ungrounded accusation. You get access to the API by sending an email with a use-case; this has been said independently by multiple people, eg. [1]. Gwern has also commented that you don't have to cherry pick nearly as much as you did for GPT-2; a decent fraction of samples are simply good.
One of the key points of OpenAI API is that they can vet users to prevent abuses like malicious bots and spam.
You're right that this is an ungrounded accusation and I changed my comment.
For what it's worth, I know several people at competing labs who applied for access and didn't hear back. If you are doing science, aren't other scientists, especially those who are critical, the first ones that should get access?
You know, this week I've been fascinated by a GPT-3 implementation writing HTML/React code, but, as you said, that is coming from a YC Alumni, so there might be actually some truth about the OpenAI press "conspiracy" theory.
Granted I was/am still sceptical, but also very curious, about the actual performance and quality of that code generator thingy and I hope he actually opens the tool this weekend this time around, as he didn't open it up at the previous announced date.
I've seen multiple profiles just like that one tweeting amazing things about GPT-3. I didn't keep track, but [0] and [1] just turned up doing a quick Twitter search. There are a lot more.
It's quite suspicious that instead of giving access to AI researchers who have the ability to evaluate the model and may be skeptical, OpenAI has largely been giving access to Silicon Valley startup people and VCs who know very little about AI but say how game-changing it is. Perhaps it's just their network with extra incentives. Gwern being the only exception that comes to mind.
To me, the article says "columnar formats that don't require deserialization are good, oh, and you can use Minio to store data!"