Hacker News new | past | comments | ask | show | jobs | submit | rcar's comments login

Agreed. These meetings can certainly end up being as cynically founded as the version posed in the post, but they're often also useful opportunities for execs to make sure that communication of the ultimate intent that led to the proposed work has made it through to the folks who will be tasked with doing the work. Understanding the business goal of the work makes it much easier to accurately weigh the inevitable tradeoffs that come up in projects without needing to run everything all the way back up the chain.


Title should be "packages" rather than "packets".


I got really excited thinking this was something super interesting that I didn't know about i.e. "Python packets?? What?? Stealing AWS keys?? How??" Disappointed that it's actually just malicious packages


There are far more fictional TV characters than there are types of wood. I'm a hobbyist woodworker and TV snob but could still many more characters than woods.


It's wasn't my best metaphor.


Regardless you are correct.

I often wonder what society would be like if we treated engineers and scientists like we do sports and entertainment and what you said I think falls loosely to into that realm of thinking.


https://theinfosphere.org/Transcript:Crimes_of_the_Hot

    [Scene: Outside Conference Centre. Scientists from all over the world arrive. Photographers take photos of them and fans in the crowd wave papers for them to sign.]
Woman #1: [shouting] Oh, God, I can't believe it!

Woman #2: [shouting] I love you!

    [Farnsworth steps out of a limo. Joan Rivers' Head is commentating at the star-studded event.]
Rivers: Oh, oh, oh! It's Professor Hubert Farnsworth! He's looking sharp in a standard white lab coat and dark slacks! His wristwatch is a Casio.



Has anyone figured out who or what is behind Outline?


I don't know but I know that I LOVE that person deeply


Their balance sheet looks very healthy for a growing company with a strong gross margin and very sustainable net losses relative to the capital they've been able to raise. As long as they can maintain their brand position in the space, the multiple they last raised at doesn't seem absurd to me at least.


Oh I agree, but I would put large bets against any company with a fresh take on a popular product that suddenly has an IPO. The growth curve to satisfy shareholders will destroy the brand's name and quality as what they do...doesn't really scale well. Eventually you are outsourcing shoe production to new sources, getting lower quality wool for cheaper, reducing costs in customer service. For a year or so things look great on paper, then slowly sales start to slip, the company cuts costs further, slowing the slip and then the cycle repeats until the brand is known for their Shoe Department collection for 29.99.


From their S-1, 2020 revenue was $219m. From that, $106m was cost of revenue, $87m went out to SG&A, and $55m for marketing. Presumably if they wanted to hit pure profitability, they could take their foot off the gas on the admin expenses and marketing, but since they're still seeing good growth and seem to be having an easy enough time getting capital, they may as well keep spending on additional growth.


PCA is a cool technique mathematically, but in my many years of building models, I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model since you're going to have to do a lot of feature preprocessing, but tree ensembles, NNs, etc. are all able to tease out pretty complicated relationships among features on their own. Considering that PCA also complicates things from a model interpretability point of view, it feels to me like a method whose time has largely passed.


> Considering that PCA also complicates things from a model interpretability point of view

This is a strange comment since my primary usages of PCA/SVD is as a first step in understanding latent factors which are driving the data. Latent factors typically involve all of the important things that anyone running a business or deciding policy care about: customer engagement, patient well being, employee hapiness, etc all represent latent factors.

If you have ever wanted to perform data analysis and gain some exciting insight into explaining user behavior, PCA/SVD will get you there pretty quickly. It is one of the most powerful tools in my arsenal when I'm working on a project that requires interoperability.

The "loadings" in PC and the V matrix in SVD both contain information about how the original feature space correlates with the new projection. This can easily show thing things like "User's who do X,Y and NOT Z are more likely to purchase".

Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-Frequency matrix you will get a first pass at semantic embedding. You'll notice, for example, that "dog" and "cat" will project onto the new space in a common PC which can be used to interpret "pets".

> I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model

PCA/SVD are a linear transformation of the data and shouldn't give you any performance increase on a linear model. However they can be very helpful in transforming extremely high dimensional, sparse vectors into lower dimensional, dense representations. This can provide a lot of storage/performance benefits.

> NNs, etc. are all able to tease out pretty complicated relationships among features on their own.

PCA is literally identical to an autoencoder minimizing the MSE with no non-linear layers. It is a very good first step towards understanding what your NN will eventually do. After all, all NNs perform a non-linear matrix transformation so that your final vector space is ultimately linearly separable.


Sure, everyone wants to get to the latent factors that really drive the outcome of interest, but I've never seen a situation in which principal components _really_ represent latent factors unless you squint hard at them and want to believe. As for gaining insight and explaining user behavior, I'd much rather just fit a decent model and share some SHAP plots for understanding how your features relate to the target and to each other.

If you like PCA and find it works in your particular domains, all the more power to you. I just don't find it practically useful for fitting better models and am generally suspicious of the insights drawn from that and other unsupervised techniques, especially given how much of the meaning of the results gets imparted by the observer who often has a particular story they'd like to tell.


I've used PCA with good results in the past. My problem essentially simplified down to trying to find nearest neighbours in high dimensional spaces. Distance metrics in high dimensional spaces don't behave nicely. Using PCA to cut reduce the number of dimensions to something more manageable made the problem much more tractable.


Plenty of examples for these in finance and economics (term structure, asset pricing factors).


By definition there are more accurate models, the PCA is kind of like a general lossy compression algorithm. Any model you come up with can be superseded by a more accurate model up until you have a perfect description of a phenomenon, but PCA is a well understood technique, can be computed very fast using optimized algorithms and GPUs and pretty much anyone can easily understand PCA and apply it to a wide variety of problems, and from a technical standpoint the ratio of output bits to input bits preserves the maximum amount of information.

We use PCA quite a lot at my quant firm do something similar to clustering in high dimensional spaces. A simple use case would be to arrange stocks so that stocks that move similarly to one another are grouped close together.

Another use case for PCA is breaking stocks down into constituent components, for example being able to express the price of a stock as a linear combination of factors: MSFT = 5% oil + 10% interest rates + 40% tech sector + ...

You can also do this for things like ETFs, where in principle an ETF is potentially made up of 100 stocks, but in practice only 10 of those stocks really determine the price, so if you're engaged in ETF market making you can hold neutral portfolio by carrying the ETF long and a small handful of stocks short.


By definition, it's going to result in a less accurate model, unless you keep all of the dimensions or your data is very weird, right? And NNs are going to complicate your interpretability more?


When/if used properly, no. The idea behind PCA is to find a set of features with far less dimensionality than the original data. The hope/intent with this sort of approach is that any more fitted features are just fitting noise.


For people who are curious, the GP is correct when it comes to fitting the training data. Recall, with enough parameters, we can get 100% on training. The parent’s comment is about testing/validation where we want to avoid overfitting so removing the least important parameters can be helpful.


Not if many columns in your data are driven by some common latent factors.


PCA is good enough for a lot of things. For example, it is used in genetics to measure relatedness between populations reasonably well. A perfect model doesn't really exist when the data you are able to realistically collect is only a subset of the population anyway, perhaps biased toward how it was collected.


i can think of a few places where it's useful:

if you know that your data comes from a stationary distribution, you can use it as a compression technique which reduces the computational demands on your model. sure, computing the initial svd or covariance matrix is expensive, but once you have it, the projection is just a matrix multiply and a vector subtraction. (with the reverse being the same)

if you have some high dimensional data and you just want to look at it, it's a pretty good start. not only does it give you a sense for whether higher dimensions are just noise (by looking at the eigenspectrums) it also makes low dimensional plots possible.

pca, cca and ica have been around for a very long time. i doubt "their time has passed."

but who knows, maybe i'm wrong.


It is still a nice tool for projecting things (at least to visualize) where you expect the data to be on a lower dimensional hyperplane. I do agree in most cases t-SNE or UMAP are better (esp if you don’t care about distances).


I believe that the real intention behind the statement is that not all users of their product are going to have 5 working fingers if they've had some sort of accident or similar.


That line of thinking doesn't get you far with anything. Nothing is universal. You can always find someone with some disability that prevents them from using a product.

Accommidating as many people as possible is good, but you can never accommidate everyone. Same goes for all of these "things programmers believe about X", be it names or whatever. You absolutely need to provide working product for majority of people first and majority of people have at least two names or in this case 5 fingers.


I wannnt to erase data stolen from my phone


Agreed with all points. I often find that these sorts of observational studies with weak methods that provide evidence for something the unblinded study participants would want to support (e.g., this study, observational studies of the effect of reduced workweek hours on productivity, many of the observational basic income studies) are really hard to evaluate fairly. It becomes easy to latch onto the conclusions when the intervention is one you're predisposed to believe in and to latch onto the methodological weaknesses when it's one you start of opposed towards, and so I feel like they don't really advance science forward at all.


I think it's useful to think of studies like this as a MVP. They're relatively cheap and fast. If they give a null result, you've failed fast. If they give a non-null, then you can invest in further study and iterate.


As opposed to a MVP, such study can give you any results you want.


They provide dirty data but cheaply which can be used to prod a more rigorous study into life. So they do advance science. It isn't all sterile test tubes and lab coats, at the coal face it gets mucky, it has to.


I found myself wanting a little more content here. Felt mostly like hand waving followed by a plug for their product.


Yeah, half the article was commentary on an Andrew Ng keynote and the other half was an ad for their employer.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: