Hacker News new | past | comments | ask | show | jobs | submit | physcab's comments login

If you have enough connections to get you through your first 6 months or so then you can make the jump. Often when consulting your rate is double your normal take home so it balances out the period of searching for more work. But you highlighted the key anxiety in going freelance: you will always have to be selling your services. Its just like any startup except you are the product. Having to get new gigs will always be part of the job whether its now, 6 months from now, or 3 years from now


I've used Snowflake a fair amount. It's a decent product, probably on par with Redshift / BigQuery. Obviously theres a lot of hype and free money floating around but my take on why they are popular is that they are basically a replacement for large Hadoop installations that have become untenable to manage over the past decade. If a company is already using Redshift or BigQuery I'm not sure why they would switch.

I would be apprehensive in investing in Snowflake long term purely because their product is highly susceptible to being obsoleted in the next 5-10 years.


I was at a company that switched from Redshift to Snowflake. It was a night and day difference. Faster (orders of magnitude!), cheaper, and significantly easier to work with (since everyone had their own personal view of the data to mutate/work with).

As far as I can tell, it is a unique product in the database space. Extremely well executed ideas and design.


Snowflake seems like a unique product and I can only imagine the complex math they're doing under the hood to achieve these incredible query times. memsql is the only real competitor I know of. Redshift is a lot less user friendly (constant need to run vacuum queries). Parquet lakes / Delta lakes don't have anything close to the performance.

Predicate pushdown filtering enabled by the Snowflake Spark connector seems really promising. Lots of companies are currently running big data analyses on Parquet files in S3. Snowflake has the opportunity to grab a huge slice of the big data market.


What kind of math is involved in building a faster database? Genuinely curious. I would guess maybe linear algebra, indirectly.


Not at all. I'd highly recommend CMU's 15-445/645 Intro to Database Systems course (sponsored by Snowflake lol) because they put all their lectures online on YouTube [1]! Here's what's involved in making fast databases from the syllabus [2]:

This course is on the design and implementation of database management systems. Topics include data models (relational, document, key/value), storage models (n-ary, decomposition), query languages (SQL, stored procedures), storage architectures (heaps, log-structured), indexing (order preserving trees, hash tables), transaction processing (ACID, concurrency control), recovery (logging, checkpoints), query processing (joins, sorting, aggregation, optimization), and parallel architectures (multi-core, distributed). Case studies on open-source and commercial database systems are used to illustrate these techniques and trade-offs. The course is appropriate for students that are prepared to flex their strong systems programming skills.

[1] https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...

[2] https://15445.courses.cs.cmu.edu/fall2020/syllabus.html


Oof... CMU courses directly sponsored by Snowflake. Gross.


Please elaborate? I can see a lot of ways a sponsored course could go badly, but I can't immediately see which ones apply here.


I'm not qualified to evaluate this particular course. But any time there is a corporate sponsor of a course, it provides strong incentives to the professor to not harm that sponsor at a minimum. If there's a methodology that the professor would like to teach, but that sidesteps, or calls into question, the sponsor's main offering, then that content is in jeopardy. The corruption will always take root given enough time, so that's why editorial and advertising, or academic content and corporate sponsors, etc. should always be at arm's length. Snowflake should give money to CMU to fund "database-related research and teaching" and the university should decide what to do with it. There's still a possibility of improper influence, but it's harder to achieve. This is particularly bad because it's CMU and not University of Phoenix... CMU is in the highest echelon of computer science universities, so it's sad to see it so debased.

What if Kodak sponsored an imaging class in 1990... what do you think they would have said about film vs. digital photography?


A lot of ML classes at CMU (and probably other prestigious campuses) are sponsored by AWS or GCP through cloud credit donation, including the popular Cloud Computing class. Is that any different ?


Not really. Cloud computing has a lot of benefits, but a lot of risks and drawbacks. Who is sponsoring a class to teach about those? About keeping users’ data private by building your own infrastructure? CMU is actively tilting their students, who are the top CS students in the world, towards cloud computing, based on the choices of these sponsors.


Sounds kind of conspiratorial.

I think any increase in educational content is good, even if ‘bad actors’ are funding it.


Bad actors funding it always leads to bad actors writing it. Then it's hard to argue that an increase in its quantity is good.


>I can only imagine the complex math they're doing under the hood to achieve these incredible query times

Maybe its cynical/paranoid, but in this age of Theranos I must ask: is it possible their algorithm excels at showing you a reasonable looking number, rather than an accurate one?


It's SQL, if they were giving wrong answers people would notice.


It's not too terribly difficult to load test Snowflake to get a sense of scaling. Jmeter does the job well. Heck I can pass you along some sample projects I've done against them if you really wanted.


yeah redshift is not at all comparable to snowflake. big query is much closer, it's ahead in some areas and in the last year has closed some of the gaps where it wasn't. big query's biggest problem is that it's tied to gcp which is a distant 3rd in cloud marketshare. they have big query omni coming which is multi-cloud but it'll probably be a while before it's comparable to big query in gcp.


The other problem with BigQuery is that you can very easily write a query that's going to cost you a lot of money to run - with Snowflake you can let it run for an hour or so, and then realise it was a bad idea and you're only out a few credits, a handful of dollars.

The killer feature for me was the query profiler - you can see WHY a query is taking a long time and optimise it - BigQuery just felt like Google were brute forcing the performance, and then charging you accordingly.

When the project I was on switched, the micro-clusters (and the ability to recluster a table) as well as the MERGE semantics beat BigQuery hands down - although those features my be out of beta now (but I've moved on to a new gig).


That's also a problem that it'd be fairly straightforward for Google to solve by automatically spinning up smaller, entirely separate serving clusters for customers who are worried about such a blowout (for a fee, obvs). It's just the serving tree (+ whatever in-memory storage service they use to do distributed joins nowadays), no need to duplicate the rest of the service. The caveat is, a smaller cluster will favor query optimizations specific to that smaller cluster. Some of those "small cluster" optimizations could hurt query performance when deployed against BQ proper with its tens of thousands of workers.

Also, BQ does explain the query plan to some extent: https://cloud.google.com/bigquery/query-plan-explanation. Not quite at the level of a "regular" SQL DB, but it does give you some info to work with when optimizing queries. If you haven't used it in a while I'd give it another try.


I believe this is exactly what slot reservations in BigQuery achieve. Instead of paying on-demand pricing that is determined by data read, you purchase a fixed number of “slots” that are shared by queries running within that particular project.


Ah OK, after reading their docs I see they've changed what "slots" used to mean in Dremel (internal version of BQ). It used to be that slots _guaranteed_ capacity, but did not limit it. Meaning that you could rely on having a certain number of workers in the cluster when you issue a query, but if Dremel had more it'd give you all it's got. Obviously this is not viable when people have to pay per terabyte read, because a ton can be read.

What they have now strikes me as an even better solution to the problem of bankrupting someone with a query IMO. Not sure how pricing compares to redshift et al, but pricing is the easiest thing for Google to change.


Slots don't control how much data you consume, your query does.

If you need to read a terabyte of data to answer your query then more slots only gets it done faster.


BQ Slots lets you do essentially that (pre-commit to a particular cluster size)


I was hitting some rough edges / complexity with BigQuery's MERGE recently, but wasn't able to ascertain any significant difference with Snowflake by scanning their docs briefly -- what aspects of the MERGE semantics are better in Snowflake in your opinion?

Wondering if this is a somewhat new feature in BQ since you used it, or if there's still a feature gap here (e.g. see https://cloud.google.com/blog/products/gcp/performing-large-...).


BQ has per-project and per-user cost controls. Normally when running new large queries one would run them under a special user with a limit on costs.


I think the obsolescence issue is complicated.

I recently saw a criticism of Palantir which went: "The company has largely succeeded, they say, not because of its technological wizardry but because its interface is slicker and more user friendly than the alternatives created by defense contractors."

A lot of the most successful tech firms started post-dot-com are decent interfaces to not-particularly-revolutionary databases. In high-end consulting and investment banking, appearances are hugely important. You can't have trash decks. It's unsurprising to me that the same is true in defense and intelligence. You can get a roof over your head and breakfast at a trashy motel or the Ritz. Everybody knows the Ritz can command a much higher price because "its interface is slicker and more user friendly than the alternatives."

I think the same thing is true here.


The ritz has far better beds, cleaner & safer rooms, better food and is far more likely to deliver that consistently. It's not just the appearance.


A closer reading will reveal that I'm not talking about superficial appearances, but the interface. That's an important distinction.

When I start talking about the Ritz and high-end consultants, I'm discussing the interface, which of course includes the "far better beds, cleaner & safer rooms, better food..." and consistency you're trying to contrast with appearance. I would agree that those things are more than superficial and are extremely important to the experience of the user, because that's exactly the point I'm making.

The beds and concierge are nicer at the Ritz, and the interface (note: not appearance) and support are better at Palantir (or, as we're discussing here, at Snowflake).


Maybe your Ritz experiences have been different than mine, but IMHO all hotel rooms are concrete boxes with a facsimile of home stuffed inside them, copied and pasted as many times as local demand will merit.

Hotel restaurants are the same principle, except replace furnishing with food.


Stay at an aging Courtyard Marriott. Some boxes are nicer than others.


I've stayed at everything from a Motel 6, to Courtyards / Residence Inns / Sheratons between NYC and San Diego, to Four Seasons / Ritz Carltons.

I stand by my claim. The relative differentiation in niceness is swamped by their mass produced boxness.

Ironically, my favorite road chain tends to be Aloft. At least they're upfront about their capsule-esque nature, in a sort of ironic/not-ironic way?

Least favorite: Embassy Suites. shudders It's like every Disney vacationing family's fantasy about what a hotel should be... packed with every Disney vacationing family. Omelette?


The point of hotel chains, and chains in general, is the consistency of the mass-produced experience. I can walk into a DoubleTree hotel anywhere in the world and get the same welcome cookie. It's a positive, not a negative; people often enjoy knowing what they're going to get. If you prefer a more unique experience, which is perfectly understandable, then simply avoid chains perhaps?


That's my point, but extended: I feel like walking into any hotel chain (including different product tiers and luxury brands) gives effectively the same experience.

Don't get me wrong, there's a benefit to consistency of product (especially when you travel Su-F for consulting).

But that benefit, parent company consolidation, and economies of scale drive a net result of overwhelming homogeneity.


Totally get your point of view, and I share it in vacation contexts.. As the hotel chains have consolidated, they slice pennies everywhere.

When I'm travel for business or putting my head on a pillow on a roadtrip, consistency makes my life easier and less stressful. I'm a gorilla-sized person :), I would rather stay at higher end hotel that provides an actual bath sheet than a marriott whatever where I have to call for 6 towels. Surprises aren't delightful at 10PM when you've been on the road for 15 hours.


I got eaten up by gnats (they claim not bed bugs) over a week at a particularly nice hotel. On the plus side, nothing came home with me, the bites healed, and they gave me enough "points" as compensation to cover a luxury hotel in Barcelona for 2 weeks. So... Future Self can look back on the experience with a smile.


Nothing ever gets obsolete once it gains a large foothold in the enterprise space. There's a reason why Oracle and IBM are worth what they are today.


> Nothing ever gets obsolete once it gains a large foothold in the enterprise space.

Lotus? Delphi?


Both still in very heavy use. In 2014, anyway, every single IBM employee had to keep a Lotus Notes window open. It was hellish.

Dunno if that's changed since Red Hat took them over.


Used Lotus notes as recently as 2010, I am pretty sure it's going strong in my megacorp former employer.


Lotus is all over in government and insurance. As a mail client it is mostly dead, but the apps live on.


There is a reason, but ain’t bc of their cloud databases...


Novell, Word Perfect


Wordperfect was used in certain industries (legal especially i think) long after it started dying everywhere else. I don't think its an exception to this rule.


Yes but it’s dead now.


There was a post maybe two weeks from Tavis Ormandy (a tweet) that made the HN front page, about how he uses WordPerfect:

Tavis Ormandy (@taviso) Tweeted: @mkolsek Funny you should mention that, I was recently curious if there are any console word processors. I discovered there's a community who still use WordPerfect 5.1 for DOS. They kinda sold me on it, got it working in DOSEMU. https://t.co/t6j0c1G3w1


WordPerfect still has some users.

Last year we recruited an attorney from a firm that still uses WordPerfect for all their documents.


My school district still runs on ZENWorks.


At the end of the day, all the data warehouses run on SQL, with a bit of customization around ingestion and export. Most of them are backed by object storage (S3/GCS) and those integrations look very similar.

I wouldn't be that worried about lock-in or being made obsolete. Business logic is going to be pretty easy to port between Redshift, BigQuery, Snowflake, or whatever comes next.


> going to be pretty easy to port between Redshift, BigQuery, Snowflake, or whatever comes next.

This isn't even remotely true. Each has unique SQL syntax, and once you have few hundred or thousand queries written using vendor-specific SQL (be it date functions or JSON), it is non-trivial to migrate.


> Most of them are backed by object storage (S3/GCS)

Redshift is backed by worker instances that have their own stores in what's basically an EC2 instance. It's definitely not backed by S3 like Athena.

Bigquery and GCS are both built on top of Colossus, but they have different layers in between them.


With the newer Redshift ra3 instances you use S3 backed storage with local SSD caching

https://aws.amazon.com/redshift/features/ra3/


Same applies to Teradata vantage on cloud.


Sorry, probably should have been more precise. Meant to say: most users are going to interact with the warehouses via object storage for import and export of data.

Since the object store APIs are almost identical across platforms, it doesn't matter that much which warehouse you actually use for production work. It's something that does massive SQL, imports data from S3, and exports data to S3.


> most users are going to interact with the warehouses via object storage for import and export of data.

No, most are going to be using SQL IDE's to query and export data.


> I would be apprehensive in investing in Snowflake long term purely because their product is highly susceptible to being obsoleted in the next 5-10 years.

This can be said about most products and companies. What keeps them alive is how robustly they capture (and hold on to) the market, reduce costs through economies of scale, and innovate. This specific market is also very rapidly growing.


I would think it wouldn't be the same product in 5-10 years.


Lots of companies have built on top of snowflake.


Violence begets more violence, so I don’t see how that is a good strategy, unless you like seeing more violence


Should continue to do online demo day in conjunction with live ones. Would be a great way to scale the platform


YC's Demo Day attendee portal has had an online component at least since Winter 2018--the first I attended. The pitch videos were made available a few minutes delayed from real-time live.


I’ve lived in SF for 10 years and my family of 4 is finally moving out. Part of it is economic as we can’t really afford to raise kids here. Part of it is rapid deterioration of public spaces. The breaking point was someone blocking my apartment door by shooting up heroin when I brought my newborn son home from the hospital.

It’s not bad everywhere. The Marina, Noe, Pac Heights, Cole Valley, Outer Sunset, and parts of Richmond districts all manage to keep their neighborhoods clean. The mission feels like a 3rd world country.

I’m democrat but I would most certainly vote in a hard line republican DA to take a stand. I’m done having my car broken into and literally shat on. I’m tired of seeing piles and piles of trash like we’re living in some kind of garbage mound. People just don’t care.

It wasn’t this bad even 3 years ago, but it’s certainly taken a turn for the worse. It’s untenable.


What I’ve heard from my therapist is that anxiety is often a cloud that covers the real emotion underneath. I experience anxiety when I’m really upset because I don’t know how to express anger. But sometimes it is sadness or fear. Anxiety is basically an evolutionary adaptation — we no longer are being hunted or having to hunt for survival, but our minds still think danger is present.


Story of my life. I’ve lived in SF for almost 9 years, my wife for 15 years, and we have a toddler, 90 lb dog, and a baby on the way, making it work in 650sf apt on combined $120k. I’m immensely grateful that the only reason we have savings at all is because my wife bought a below market rate condo with the city when she was single making $40k a year.

It feels weird to make this much but feel so poor. That any hiccup in life could make you miss bills. It’s been fine so far but as our family grows there will be no choice but to leave. Childcare is just too expensive.

I love this city, but over time I’ve felt I’ve gotten less out of it in exchange for more.


> making it work in 650sf apt on combined $120k.

Ten years ago I announced to my GF at dinner that I was going to write a book. Flat fucking broke in San Francisco on only $250 a day.


Reading about histories like this, not worry. Is everywhere.

Living paycheck-by-paycheck is far more common than most imagine.

In my country, Colombia, is the same not matter where you ask, or if are poor, middle or almost rich.


Prevailing wage data indicates you should really be making north of $110k as a database analyst or administrator with over a decade of experience...


I used to be an analyst, and used to make atleast double that. But to earn that kind of salary you often sacrifice work life balance and experience all consuming stress. I decided being poor was more healthy and quit.


Some clarification would be helpful for nonamericans. Is 650 sq ft considered small?


In an expensive city like SF, its not small. It's a 1 bedroom. Most single people live in a 1bd or studio apt. But to raise a family with multiple kids, it feels tight. My wife and I live in the living room off the kitchen and our kids will take the bedroom. For the same (market) price of our unit, you can get 10x more space in the "country", but then finding work and interacting with friends will be harder. Lots of trade-offs


As a former Twitch employee, I lol’d at this.


By that, do you mean there's billions of $$ in Twitch chat, or is the concept too absurd (and hilarious as a result)?


Why? I am genuinely curious


Might be hard for physcab to reply from their yacht, I guess we'll find out eventually though.


Probably because twitch chat is the absolute darkest place on the internet, at least of you are a mod.


There’s always YouTube comments.


How so?


With new programmers, you can't have the expectation that you're going to get a response or be upvoted. You help them precisely because you like helping, not that you expect anything in return.

Its the long term goodwill that eventually brings new programmers into the field, because programming is hard! Newbies need to know its ok to take risks and there will be a safety net.


I understand that, it's just hard to know if what I told them was actually helpful or not since I got no feedback


Does have to fit in their 1MB limit though (i think)


It's the CPU time that you should really worry about mainly. But the limits are here: https://developers.cloudflare.com/workers/writing-workers/re...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: