There would almost surely be a censoring effect no matter what age cutoff you pick, and even allowing for typical government incompetence, surely at some level health econometricians are involved in this type of policy and are aware and facilitate whatever trade-offs are being sought.
I love roasting poor government policy decision making as much as the next person, but that would be a bridge too far.
> There would almost surely be a censoring effect no matter what age cutoff you pick
This seems to be an ethical argument to make the age cutoff 0: in other words, medicare for all.
> surely at some level health econometricians are involved in this type of policy and are aware and facilitate whatever trade-offs are being sought.
I feel like this article would not be massive news if this was true. The other case I recall that would be like this is when oil execs held back info that they were causing climate change. That seems to me to be completely different, however, because that was a private company and we're talking about government employees here.
I don’t see any reason your comment would be accurate. It all depends on trade-offs. If the goal is to reduce cost, then you could equally argue to get rid of medicare entirely.
Clearly the goal (no matter what anyone’s separate normative opinion is) is to balance some complex tradeoff between costs borne by tax payers, costs borne by corporations (through taxes and through employer based healthcare for working age adults and their dependents), and a high level of access for all people.
Nothing about this censored data effect can say anything about the morality of different regions of that trade off space.
As to your second comment, this article is not massive news and it seems laughable to say it is. It’s just a blip in the news cycle, using some data artifact to drum up attention to something that is already well-known for any econometrician or health policy analyst.
Full disclosure: I personally favor nationalized medicine and welcome higher taxes across the board. Nonetheless I don’t find your comment to be accurate or valuable.
Companies don’t pay financial rewards for doing the data plumbing work, yet it’s often much harder, prone to unusual failure domains mixed with high scaling, and carries more stringent on-call and incident triage responsibilities.
Given that it’s (a) more difficult, (b) more business critical, (c) more stressful and requiring an incident alerting on-call rotation, then “data work” should be much better paid and offer job security and career growth.
Yet no company I know of pays expert modelers & researchers less than expert data platform engineers.
So either the companies know something you don’t (e.g. that data platform work is more commodity and easier to replace than rarer modeling talent) or there’s a free lunch you can get by exploiting the arbitrage opportunity to pay data platform experts more and consume correspondingly higher business value that other orgs are missing out on by putting modeler / researcher higher on the status hierarchy than data platform engineer.
My perspective after many years of experience managing machine learning teams (both platform/infra and research/modeling) is that data platforming is just a worse job. It’s unpleasant and stressful and business stakeholders who are removed from backend engineering complexity and just want the report or just want the model couldn’t care less about organizational structures and workflows that support healthier lives for intermediate data platform teams. Because of this, the pay and bonuses for data platform roles should be much higher, but politically speaking it’s impossible to advocate for that, so it becomes a turnover mill where everyone burns out to keep the existing shitty system running, with comparatively low pay and low autonomy, and so nobody ends up wanting to join that team or do that work.
There are so many issues besides just saving for a downpayment.
How will you keep a job that supports the mortgage? Companies are exceedingly disloyal and looking to cut labor costs whenever possible, whole industries go through huge labor contraction, not to mention accumulating effects of automation.
Even if I saved a downpayment, I would not feel comfortable entering into a mortgage like this at all. It just adds stress to your life and you get none of the independence or peace of mind you expect to get from private property ownership.
Your comment is exceedingly political. It’s unfortunate you mistakenly feel you are politically neutral and that you represent “actual” free speech, whereas companies who don’t choose to render services to insurrection-plotting fascists and racists apparently only practice “convenient” free speech.
I hope you can step back and evaluate the mistake in your perspective. Number one, you are demonstrating that you care more for politics than for free speech (let alone the fact that your viewpoint literally leads to people being harmed in preventable ways where the prevention methods don’t threaten free speech). Number two, you are trying to define the conversation in a way where you are neutral politically and merely hold up an ideal principle, but that’s very clearly false to huge degree and you’re not admitting the political bias and blinders you are yourself bringing into it.
I think your perspective is actually way off the mark here. AI excels at long-tail problems where the cost of failure is high, precisely because human failure is such an expensive problem in those cases and the nature of long tail problems prevents it from being possible to apply QA to every use case. In other words, you know you are forced to deal with getting it wrong a lot and paying the high failure cost, so using a system capable of optimizing that trade-off explicitly is often much better than pretending as if a human in the loop is somehow sparing you the failure costs when they aren’t (and in fact they are simply less efficient than algorithmic solutions).
What constitutes a useful sequence of facts in root cause analysis is not just some platonic existing thing. It’s a complex problem involving mind-melting log sleuthing, correlating all kinds of disparate metrics, comparing against timestamps of merges and eventually synthesizing the results.
Even seasoned veterans who know systems inside and out struggle with the sheer volume of logs, metrics and facts to compile. And most of the time their approach is based purely on inductive experience with similar incidents combined with heuristics.
This is precisely the kind of problem that ML solutions excel at. It has many hallmarks of a good fit and almost none of the hallmarks of “solution in search of a problem” ML over-engineering.
First, you are conflating the underlying log relevance scoring ML system with the GPT-3 summarizing system. ML is a good fit for relevant log identification for the reasons you describe, although characterizing this software as root cause identification is not very accurate in my opinion, based on the examples you can find on their website. But the value of summarizing a log line into natural language is low, while the cost of misleadingly characterizing that log line is high. Whoever needs to debug this system and find the real root cause (e.g. why did the system go OOM?) probably needs certainty more than the convenience and in all likelihood, they are more likely to correctly summarize what the log line says than GPT-3 is (obviously we don't know since there is no evidence, but I don't work with any engineers whose ability to summarize the contents of a log line would be described as "mostly not misleading").
Secondly, I can't agree with this sentence:
> AI excels at long-tail problems where the cost of failure is high, precisely because human failure is such an expensive problem in those cases
Maybe it depends on domain and tech, but in my experience humans don't fail on out-of-sample data nearly as often as AI does. When they do fail, it is often more predictable to other humans and humans inherently have the ability to assign confidence levels to their conclusion which you don't see in many AI models such as GPT-3. Humans are also more effective at applying rules (e.g. common sense) to improve predictions on out-of-sample inputs. I think of "AI is worse than humans at generalizing to out-of-sample" as being a widely held, well-evidenced belief, but I would be interested if you disagree.
For me, the quintessential example is something like traffic light identification, where models generally struggle to identify unseen variants correctly while humans rarely struggle at it. What examples are you thinking of where AI excels at long-trail problems?
Can it be both? AI excels at doing repetitive things better than humans, like maybe driving a car. Until it encounters a situation that it hasn’t seen before that cannot accurately be described with bits and pieces of what it knows. Then it’s result isn’t a little off the mark, it’s a lot. Think of a disaster like what happened with Ever Given or Chernobyl. How many different ways could AI have made the situation worse when confronted with those problems because there is no good definition of an optimal solution here.
“Root cause” is relative. To the CEO the root cause is “some engineering thing broke.” To the data engineer the root cause may be that a human config error led to a rogue process that caused a VM disk to fill up. To a quantum super-intelligence the root cause may be that in Everett Branch 2765425 atom 67896533 collided with atom 78532578.
(Just kidding. The space of Everett branches is not countable.)
I worked previously for a very high traffic ecommerce company (Alexa top 300 site).
As part of the search team, I worked on a project where we deliberately rewrote the whole product search engine in Python and Cython, including our own algorithms manipulating documents for deletion, low latency reindexing after edits, and more.
We did this because SOLR was too slow and the process of defining custom sort orders (for example, new sort orders defined by machine learning ranking algorithms, and needing to be A/B tested regularly) was awful and performance in SOLR was poor.
It was a really fun project. One of the slogans for our group at the time was “rewriting search in Python for speed.”
The ultimate system we deployed was insanely fast. I doubt you could have made it faster even writing the entire thing directly in C or C++. It became a reference project in the company to help avoid various flavors of arrogant dismissal of Python as a backend service language for performance critical systems.
Defining custom sort orders in Solr is as simple as uploading a text file with the values you intend to use for ranking.
This is a great feature that is in fact missing from Elasticsearch and saves you so much reindexing time.
There certainly are usecases where Lucene based solutions aren't the best fit. But I think the claim that you couldn't make something faster by moving away from Python is outlandish.
> There certainly are usecases where Lucene based solutions aren't the best fit. But I think the claim that you couldn't make something faster by moving away from Python is outlandish.
I read that as a statement that they implemented a proper and bespoke algorithm, not that the speed of Python is greater than C. I am surprised that you read it that way. Who in their right mind would say Python speed is faster than C speed?
Yes. If the implementation language isn't the determining factor for speed, then what is the cause? There is a branch of Computer Science called Algorithmics[0], wherein one expression of measurement is called Big-O notation[1].
You seem fundamentally confused, for example with tools like Cython.
Many extension module implementations in Python are literally as fast as pure C (not just nearly as fast with minor extra CPython overhead, but literally as fast as pure C by deliberately bypassing CPython VM loop and data models).
Because you're writing regular Python for a production service though, and not artificially writing optimized examples, then you will occasionally have to pay extra costs.
Are you perhaps a bit too invested in your own narrative?
You are incorrect. This is only true if you can precompute all sort orders (many types of sort orders cannot be precomputed and depend on additional context data only available at query time, especially for personalization or trending solutions). Additionally you must deal with precommitment to very poor sharding properties with Solr. With our Python approach, we could hold all results (billions of content items with trimmed down data structures representing only the thinnest container needed for each sort order representation) in memory easily, and dynamically resort on the fly or double sort by multiple sort orders that each required contextual data only available at query time.
I find the AWS docs too verbose, however it’s quite possible that doing something in AWS is just more complicated than doing it in GCP. In that case the issue is not with the docs, but with the complexity of AWS vs GCP. Either way GCP has an advantage here.
The resource limits are so severely restricted on Cloud Run that I don’t think it’s fair to compare it to Fargate. The space of use cases solved well with Fargate is far, far larger.
Maximum memory per instance (8GB) is an extreme limit. Disk and CPU limits per container instance are also quite bad.
And, laughably, for any workload just a bit out of reach for Cloud Run, GCP docs immediately recommend switching to GKE (and even Anthos).
Imagine having a high RAM workload that is just a simple RPC service. Many (probably most) machine learning services fit this model. Many routine ML models require more than 8GB RAM just to load the model, bit it’s a good use case for serverless non-lambda infra because it runs out of a very unique Docker image and does nothing but serve stateless model predictions.
Needing to bring in all the machinery of GKE or pay out the nose for Anthos just because you need exactly the same operational model as Cloud Run just with high RAM or CPU is a really poor customer experience that feels deliberately set up to push you towards more expensive Kubernetes products.
I love roasting poor government policy decision making as much as the next person, but that would be a bridge too far.