When I joined, I thought they using state of the art tools to monitor millions of sources.
Turns out they just had a team of thousands in India that manually visited certain websites to check for press releases, transcripts, earnings reports, etc and push them to a database if they found them.
There's plenty of other blogs / articles out there now that people are trying to "Moneyball" soccer a bit more. Like anything that has a tinge of glamour and a lot of money around it, you have to take poor earnings to "pay your dues" in the trenches. Same as other aspects of pro sports, entertainment, etc. Hopefully something motivates you other than money.
One cannot even imagine how many people in the world have a boring job of looking at word document and C&P some of the data into excel spreadsheet. Other half have a job of looking at some document and effectively making Yes/No decisions all day.
Disclaimer, I am the CTO
It's not so obvious when you're looking at the breaking releases for a few stocks or companies, but historical records have at least 1 error per stock per year.
1. Data matching expectations (you do have a definition of correct, right?)
2. Log for manual review -> manual inserts or correction and placed into queue for (1)
The idea was rejected, they wanted either a perfect solution or nothing. I don't know why, but for some reason the idea computers removing humans is acceptable to management, but computers augmenting humans wasn't.
Wikipedia is some MySQL databases with read-only slaves front-ended by Ngnix caches and load balancers. That seems to get the job done. Wikipedia is the fifth busiest web site in the world.
Netflix's web site (not the playout system) was originally a bunch of Python programs.
The article mentions a PostGres query that required a full table scan. If you're doing many queries that require a full table scan, you're doing something wrong. That's what indices are for.
Maybe startups don't want the absolute best of the best but rather the best of the gullible?
Edit and or poor
If I'm interviewing a candidate, I'm not interested in which tools they know how to use. I'm interested in whether they understand the problem those tools solve and when to apply them. Over-engineering is problem #1 with the community at large in front-end dev at the moment and you'll go a long way as an engineering team if you make it your priority to defeat that mentality.
I'd want grumpy, paranoid sysadmins who really don't want to be woken up late at night because something has gone wrong.
Starting with this nugget:
>Our gateway is on the frontline of our infrastructure. It >receives thousands of request per day.
LOL! Wat???!!! Thousands of requests per day. Thousands. Not per hour or per second. Per day.
Call out the big guns, people! We need cutting edge enterprise-scale architecture! A boring rails app or a flask app will never be able to handle this kind of scale!!
I don't even need to go farther down the checklist than that. You're putting all of that complexity and infrastructure and problems-waiting-to-happen behind an API that serves thousands of requests per day.
Yeah, I'd rather be homeless than work on that team. Jesus.
Next you are going to be criticizing 3GB big data databases with dedicated sys admins.
Sometimes I wonder how much of this is some type of "jobs program". A transfer of money from the financial sector back to the grunt sector via VC.
And in your "expert" opinion where did they go wrong here. What is the correct number of articles they should post on any particular subject in order to get the HN tick of approval ?
How can it be a bad thing for developers to write about what they are working on ? Not all developers are genius experts who suddenly woke up one day and instantly knew how to perfectly architect a system. People learn, people grow, people make mistakes and most importantly people need feedback.
Blogs like this are fantastic to (a) learn what other people are doing and (b) for developers to have an opportunity to show off and be proud of what they are working on. Especially since the rest of the company in most cases isn't going to care less.
Our gateway is on the frontline of our infrastructure. It receives thousands
of request per day, and for that reason we chose Go when building it,
because of its performance, simplicity, and elegant solution to concurrency.
Then again, if they didn't have that beefy infrastructure, they could get DDOS'd from up to multiple request per minute! I mean you can't be too careful when you are in that league!!1!
Also interesting (in the same way as rubbernecking a car crash) is going to their homepage and viewing source.
Probably was someone's full time job for a couple months.
When was this written? Why is there no year in the date?
More seriously, you can set this up with KPIs in a day with Zabbix. A week, if you've never dealt with Zabbix, which is admittedly somewhat complex compared to some of the others.
But setting up a data warehouse that builds out the KPIs you're looking for along with a dashboard to display them (the easy part, for sure) can certainly take over a month.
Something about that appeals to me in a huge way. Showing people my "server room", etc etc.
You're not wrong, but they are slightly more complicated than that:
They have Kafka, ElasticSearch, some workflow engine thing, and they manage a distributed object store (Swift) deployment.
Apparently in Wikimedia's case they also have a stated company policy / philosophy around self-managed infrastructure that prevents them from using public cloud for a bunch of stuff (I've heard this from a former employee but can't find any further documentation about it).
seems that the bunch has grown slightly since then:
(from https://medium.com/netflix-techblog/announcing-ribbon-tying-... )
Or this view https://image.slidesharecdn.com/devopsisraelpdf-141030000609...
(from https://blogs.mulesoft.com/dev/microservices-dev/api-approac... )
It's simply, "think for yourself." Do you ever talk to someone on a topic (in technology or politics) and it feels like you're talking to list of commonly accepted prepackaged talking points?
That is fascinating, I would love to read up articles about this.
Working on a migration from a site that modified the WooCommerce core to a new ecommerce platform. Couldn't believe WooCommerce ever did that when I started.
Congrats on such a widely used platform though.
Which is, you know, 1 maybe 2 machines with nginx plus 1 for redundancy. Our internal services are slow because people cohost them with batch jobs, and no other reason.
1) What metrics reflect your customer experience? Monitor them, alarm on them. Target them as a priority for improvement. Check in on them weekly at a minimum.
2) What metrics make up the metrics that reflect the customer experience? Monitor them as well, and consider whether alarming on them is the right thing to do. Use these to direct what you target to solve 1.
3) What do you need for SPOF resilience? If you've got something critical, you need a minimum of two servers running it, and no more than, say, 40% total CPU usage so that you've got sufficient overhead to cope with a single host failure and any unexpected work increase.
No idea what company that was, but I'd love to work in a place where that was the most pressing concern on my plate.
Honest question. I have only seen the "it's written in X, so therefore it must be re-written in something nicer even though it is working" thinking from incompetent people who were just trying to take ownership of something they didn't quite fully understand. Though I never have seen it in a appropriately functioning commercial setting; if management is competent, they'll immediately recognize the high costs with no concrete benefit and say no.
It's one thing to say "we have to re-write this because it uses Java applets, and Java applets are problematic because Oracle is dropping support for them, so our customers are going to be screwed soon if we don't do something." It's another thing to say "we have to re-write this because it's in Perl because Perl is something I don't like."
I think the tendency to over-engineer and over-polish comes mostly from getting too invested in one particular project or task. The developers have "professional pride" - they want to deliver software that has good architecture, high test coverage, easy to understand and maintain code, reliable, scalable, etc.
This means competent developers are very tempted to continue working on a project as long as there are possible improvements to it, even if these improvements do not make business sense. Nobody wants to admit that "cron job that fails once per month" is a sufficient solution when they can see a better solution, and go work on the next hacky cron job instead.
1. This is why you ask in the interview about it.
2. I don't know about you, but if I see certain buzzwords, I assume that the candidate merely chases the latest fads and subtract points accordingly until counter-evidence is produced.
U.N.P.H.A.T does raise a smile to my face for trying, but if the author is reading, I'd suggest you change it to a prescriptive paragraph where the first word in each sentence becomes a letter in the acronym (e.g. B.A.M.C.I.S).
Here's my best try:
Understand the problem.
Nominate multiple solutions.
Prepare by reading relevant research papers.
Heed the historical context.
Appraise advantages versus disadvantages.
GHETTO: Getting Higher Education To Teach Others
For example, I'd be more than happy to use the same old, tried and true, boring tools to just get the job done, if it meant that I could then go play golf or otherwise not be in the office.
But if you insist that I be in my seat 8 hours a day regardless of workload, then goddamn let's take this shiny new tool for a spin!
Do I want my resume to show that I used the same tool for every job for the last ten years? Or do I want it show some new hotness?
The industry and employers are as much to blame for this as the engineers, if not more. When you use middleman firms to find your employees, and all they understand is buzzwords, well then guess what game the devs are gonna play?
Fact is I know Django inside out, and plenty of Python libraries. Its very rarely that I will find something that requires me to learn a new language or tech (I will likely get things done a fair bit faster using the tech that I do know). Anything else feels like resume driven development.
They replaced it with Spark and Scala. Rewrote the whole thing and bumped into the same previously fixed problems for 8 months. It was not faster and it was a pain to extract the result data for the online website. Oh, and they had only 1 server running Spark. On top of the existing one. They also dumped IRC with very useful plugins for proprietary Slack channels.
Yeah, their resumes are very competitive now with the Big Data hype. Worst part is the higher ups liked it because they had more keywords in the bag for investors.
We are living in very strange times.
Convincing others to not chase the shiny new toy is harder than picking the right technology. The other posts under this parent are right: we are a product of the job market.
My first programming job was a 3 month contract at the maintenance department of an international airport. They had a bunch of information in large, unwieldy ERP system and wanted to automatically generate job sheets for the different maintenance crews. So I did the simplest thing possible - I generated an excel file from the ERP system, then using that file as input, I outputted different excel worksheets for the different crews.
It was very plain GUI app that had one or two buttons. I remember being a bit worried that it wasn't nearly fancy enough for 3 months work, but everyone seemed pretty happy with it.
Later on I found out that - before me - they had hired an experienced software developer who had worked on the same problem for 6 months, and at the end of that 6 months had apparently not produced a solution. I had done the dumbest, simplest thing - not because I had any insight or wisdom, but because it was really the only thing I had the skills to do. But I delivered.
It was a brilliant, accidental first lesson in not over-engineering.
As a complete aside, you might be surprised how far you can go with Excel these days. Do you know it has a built-in in-memory columnar database now? You can have millions and millions of rows of data in there that you can use in tables and charts completely independently of the size of the grid. Pull back a huge chunk of data from the DB and slice and dice it to your heart's content locally.
I look at people buying expensive "business intelligence solutions" and I think, it's right there on your PC all along and you don't even know it...
More recently I had to solve time drift on 1000s of devices. The problem was someone installed puppet to manage those devices which uses NTP. The devices are behind firewalls so if they block the puppet master or mess with SSL puppet doesn't even phone home. Or worse it gets incorrect time from NTP peers on the network. The solution was to throw out the shiny tool "puppet" and just call "date". Puppet and NTP are great in theory for getting time down to the millisecond but totally backfired when some devices were off by over 24 hours. For our purposes as long as all devices were within 5 minutes we were good. The irony was after disabling NTP puppet just started it again. And we couldn't use puppet to fix that since 50% of our users had it blocked. No other choice but to throw out puppet and start over from scratch. The guy who spent months setting up puppet was not happy.
Everyone involved sounds like they need a lot more experience.
Our sales pitch is "these devices use plain http and will work behind your corporate firewall". The blockage wasn't an issue that could be solved, it was our whole business model to workaround the blocks by using simple http instead of https, proxying everything through our IP, and things like that.
Even the puppet documentation says not to run a puppet master when you have devices that are behind firewalls or limited network. The guy who added puppet apparently didn't read that.
I wasn't the one who decided the business model just the guy who fixed it to work as advertised while dealing with the pressure of everything crashing & burning. You're right no one had experience but thats not the point.
My point was that the fancier tools sometimes just add new issues without solving your real issue. Despite my lack of experience I solved the time drift using a linux built in "date" to set the date time. It didn't account for network lag like NTP, and an NTP developer would probably laugh at my solution, but now all devices are accurate to within a few minutes & that particular problem was solved. So don't always go for the most complex tool is all I'm saying.
For what its worth I do plan to bring back puppet but run it in "puppet agent" (offline) mode. We'll using custom scripts to copy in new puppet configs so puppet does not need to phone home.
1. Client records "origin" timestamp T1 when its sends a query to the server.
2. Server records "receive" timestamp T2 when it receives the query from the client.
3. Server records "transmit" timestamp T3 when it sends a response back to the client (the response includes T2 and T3).
4. Client records "destination" timestamp T4 when it receives the response from the server.
Round-trip delay = (T4 - T1) - (T3 - T2)
And that's it. You could do this with "date" too!
distributing hardware outside of your network and using puppet in master/client mode is obviously a bad idea, just like having any dependency is difficult to manage (sometimes like NTP)
However, clocks will drift. Consider ntpdate in a cron or an easier-to-manage sntp client vs ntpd, which is a little nutty.
So the point is that a tool like puppet, only properly configured, is probably a great asset for your use case of distributing hardware, as it can help keep things working as expected.
But then it created a whole new world of problems since it violated our business model to have it phone home. Thanks for the suggestions on NTP. We'll likely add features that do require more accurate time in the future & your suggestions will probably come in handy!
I'm not seeing your option wasn't correct (if NTP didn't work, you can't use it), but I don't think it's a good example of the errors described in the article.
When the devices phone home to our IP (which is whitelisted in all their firewalls) I compare to the server time & set the device to the server's time if it is off by more than 5 minutes. Our server in turn uses NTP (which is not whitelisted on the devices networks). The devices will be still be off by 30 seconds or so if there is a network delay. On one hand we could have just told them to whitelist NTP, on the other hand we tried that & you get one department who blames another department, or worse they just don't return our calls. Plus we originally told them they only needed to whitelist one IP.
> you essentially had to re-implement NTP
No, I called "date -s". I didn't re-implement NTP. My solution does not handle many of the things NTP handles. It does not attempt to deal with any of the things NTP handles like compensating for network delay. If I had originally written this, I would have just used NTP but proxied it through our domain. Instead I was called in when things were "on fire" & had to come up with a quick fix.
> Your solution was the more engineered one
That's your opinion. My solution took 15 minutes while he spent months setting up Puppet. His solution resulted in devices being off by multiple days, whereas my solution has proven to keep devices accurate to within +/- 5 minutes.
> he used the standard tool for syncing dates.
What "standard" says I need to use NTP? Surely the "date" command can be considered standard as well.
> don't think it's a good example of the errors described in the article.
Its exactly what's described in the article. Our problem was to do a job at a certain time. No one cared if it happened at 4:21 instead of 4:20. Big companies like Google need more accurate time & control their network so ntpd is a good fit for them. I don't need as accurate of a time & don't control my network, so ntpd was not a good fit. So just because Google does it doesn't mean you/I should. Because sometimes you/I are solving a different problem than Google had to solve.
I'd guess that this guideline alone would stop 2/3 adoptions of JS SPA frameworks (and 4/5 Angular adoptions!) if followed.
People wilfully wrestle with thousands of functions worth of APIs every day and don't even notice the immense slowdown it's causing them. It's especially bad in React land, which is ironic seeing as Sebastian Markbage at Facebook has an excellent talk about reducing API surface area.
If someone would care to suggest a good title—i.e. accurate, neutral, and preferably drawn from the language of the article itself—we can change it again.
EDIT: how about combine the two?
You Are Not Google: Another "Don't Cargo Cult" Article
What's predictable is intrinsically uninteresting. The OP partly redeems itself by offering concrete ideas, though, which gives the article a bit more substance than usual.
Information theory aside, I don't agree. Observing repeated patterns of failure and error is very instructive, especially since restatements often help me see it in a better light.
If anything, HN trends heavily towards repetitions of survivor bias. Much more fun to read cool papers and imagine yourself implementing them. But the fellow on the chariot with the red-painted face still needs an attendant whispering "remember, thou art mortal" over and over.
Finally, do these forced acronyms ever help anybody else out there? I mean seriously, the "N" standing for "eNumerate?" The "P" standing for "Paper," which barely correlates to the actual meaning "consider a candidate solution."
Seems to me just saying "apply a principle of unfattening your technology decisions" would be a hell of a lot easier to remember.
There are very few technologies that can reasonably be thought of as strict upgrades, and the few that do exist (i.e. MySQL to Postgres) tend to be incremental enough that switching rarely justifies the migration costs. Instead, many solve one or two exceptionally dramatic problems (gargantuan datasets, huge write volumes, partition tolerant master-master replication, etc.) and are willing to make equally dramatic tradeoffs to achieve it. Saying Technology X is good at Problem Y is only half the story.
In my own practice I've found that forcing myself to stop and explicitly enumerate both the pros and cons of a new technology is usually enough to get my professional intuition to kick in. And 99 times out of 100 it tells me to just use boring old SQL and move on.
or OSEMN data science - Obtain/Scrub/Explore/Model/iNterpret
Trying my hand:
- understand the DOMAIN
- find the OPTIONS
- research a CANDIDATE
- know the HISTORY
- consider the ADVANTAGES
- apply deliberate THOUGHT
1. Understand what your problem is, and the potential solutions to that problem.
2. Use your brain.
3. Make an intelligent decision.
Or really just
1. Use your brain
I mean I just don't see the need here. Could be arrogance, I guess? Is this the very problem the author was trying to solve? Create a defined process for technology choosing?
The point of most processes is to help stupid people (or, just, people without large amounts of analytical talent) make smarter decisions than their own brains would generate. Processes "raise the waterline" of an organization's aggregate behavioral intelligence, by ensuring that the stupid-est decisions being made are no more stupid than the process.
Processes also frequently serve as checklists, to ensure that smart people aren't being temporarily stupid—"did I check that I have all my surgical tools before closing?" and such.
Don't Over Engineer
(which more or less brings us back to KISS principle)
What CAN this (pre-chosen) something (insert here hardware or tool or programming language or library) do?
Let's use ALL (or most) these functionalities! (because we CAN)
Losing sight of the actual question which should be "What is actually needed"?
Can be confused with Do Over Engineer.
YSNOE (You Shalt Not Over Engineer) which is more imperative is worse at reading.
Maybe NOE (Never Over Engineer) would be acceptable.
And, since you clearly didn't actually read the article: he never says a word about his own technical preferences except "don't pick ones that don't fit your scale and scope".
At some point things change, the new normal changes, etc. The shift we are seeing in some areas also contributes to finally accepting the realities of distributed systems.