Hacker News new | comments | ask | show | jobs | submit login
Taco Bell Programming (2010) (archive.org)
172 points by Jarred on Jan 3, 2016 | hide | past | web | favorite | 109 comments

> The Taco Bell answer? xargs and wget. In the rare case that you saturate the network connection, add some split and rsync. A "distributed crawler" is really only like 10 lines of shell script.

As someone who has had to cleanup the messes of people who started with this and built many hundred line dense bash scripts... please do not do this.

> I made most of a SOAP server using static files and Apache's mod_rewrite. I could have done the whole thing Taco Bell style if I had only manned up and broken out sed, but I pussied out and wrote some Python.

I feel sad for whoever inherited this person's systems.

"Write code as if whoever inherits it is a psychopath with an axe who knows where you live" is something I heard pretty early on in life and it's been pretty useful.

> "built many hundred line dense bash scripts... please do not do this"

Of course. But that's with any language that's many hundred lines of dense code.

His point is if something can be done simply with built-in proven tools, use them until you need something more.

No, it's not the same.

Most experienced programmers know a little bash and enough UNIX commands to get by. This is enough to write a script that handles the happy path, but not enough to handle all error conditions correctly. There are all sorts of tricks you need to know that are commonly skipped. (Forgetting to use -print0 for example, and that's an easy one.) The resulting script is probably okay if you run it interactively and check the output but will blow up or silently do the wrong thing for unexpected input in production. To properly review a bash script for errors you need to be an expert.

By contrast, Go programmers with a few months of experience typically know all of Go.

The older tool is not necessarily better if it has lots of obscure sharp edges that most people don't learn.

+1 - If you thought "it works on my machine" was bad with binaries, shell scripts are so much worse.

Just like Excel "programs", shell scripts can be easily mined for requirements for a real program though.

This is a simple UNIX pipeline, not multi-hundred line spaghetti of korn, c, bash or even zsh shell scripts.

No builtins were used in the example, just core utilities deployed the way they were designed.

Rewriting the wheel is completely bogus, doubly so when you ultimately make calls to those utilities, as is common when 'admins-cum-programmers' start getting their hands dirty with Python.

I am generally in favor of the idea of the OP, but the core utilities do differ across environments and this will bite you sooner or later.

I was bit recently by some pretty boring search and replace functionality differing between sed on OSX and on Debian. Like, I would have had to pass a different argument to sed based on the version of sed (So I switched to Perl for the task). But this is certainly an insidious category of bug where you don't discover it until to try to run the script in another environment and then you're potentially stuck debugging the script from the top down.

If you need something that is only used once or for a short time, in one place, script away!

I'm actually strongly in favor of scripts; but the web is eating everything. If it has to scale, put it on the web.

> By contrast, Go programmers with a few months of experience typically know all of Go.

Not really. Just to give an example of things that new Go programmers don't know: what the limits of json serialization are, how introspection works, the funcitonality and limits of the "virtual inheritance", how the GC handles stuff like Go routines... There might be a selection bias though

Xargs with crawled data sounds like a nightmare. Allow me to link to example.com?$(rm -rf /).

xargs doesn't pass its arguments to the shell, it directly invokes exec:

    $ echo 'test   $(foo   bar)   test' | xargs echo
    test $(foo bar) test
    $ echo 'test   $(foo   bar)   test' | xargs python -c 'print(len(__import__("sys").argv))'
the $ and ( are nothing special to xargs, nor to echo (or wget).

Not to say yay xargs is always great, just that this specific counter example doesn't hold up.

It's also expensive in resources and by extension energy. Unless your data centre is powered by renewables inefficient code can become an ethical issue.

While I agree with some comments here that "Fuck you, I had to MAINTAIN your bullshit Taco Bell system", for any project that needs to be run likely no more than once (prototypes, single-run analysis, etc) or is never going to be checked into source control, the power of the shell cannot be underestimated.

I had an intern a couple years ago. Nice guy, but he didn't listen when we said "Keep this simple". We had all the data from an A/B test he ran, and we needed to do the analysis. He broke out MapReduce on EMR and all sorts of other complexity. It was a few MB of data!

After his analysis went pretty poorly, I wrote up a shell script in a few hours (sed, awk, xargs, woo) and got us the data we needed. I'd never ask someone to maintain that madness, but I was able to break it down into simple functions, piped into each other, in a single file.

Using MapReduce for a few MB of data? Jeese. That's a whole new can of worms.

I remember being younger, doing things the way I found "interesting" rather than the way I found practical. Eventually for a project or two you end up with basically nothing for all that complexity because you're focusing on the wrong things, and I learned to start putting functionality first.

This article is also on Ted Dziuba's current site, where you don't need to go way back to see it:


Another, similar post:


The point of both of these posts is more that Unix style tools are extremely powerful and expressive, to the point that writing more involved code for most simple tasks (especially one-off tasks) frequently isn't worth the effort.

Complexity doesn't improve things, especially when it doesn't add value.

I found myself nodding along (if not necessarily agreeing 100%) until I got to this:

I could have done the whole thing Taco Bell style if I had only manned up and broken out sed, but I pussied out and wrote some Python.

I know this article was written over 5 years ago, but I still feel the need to say: expressions like manned up and pussied out are a huge turn-off for me. Regardless of the ideas surrounding them or the author's programming skills, the author loses a good chunk of credibility in my eyes simply by using them. Sure, it's a stylistic choice, but it's one that I feel is actively harmful to our industry, especially for those who are new to software development or interesting in learning.

So if you're reading this and you're one of those new or interested people: please don't let this turn you off from discovering and learning tools like sed or Python. One is not inferior to the other—they are simply different tools that can be used for a wide range of different things. Don't be afraid of using the "wrong" tool for the job (because even experienced developers do this sometimes), just keep on learning new tools and adding them to your own tool belt. Share your work with others, and then when someone tells you you're using the "wrong" tool, ask them to explain why and to propose alternatives. Lather, rinse, repeat.

>So if you're reading this and you're one of those new or interested people: please don't let this turn you off from discovering and learning tools like sed or Python.

Your comment has exactly the same effect on me that its content warns people about. I'm not interested in an industry where people engage in policing other people's tone and word choices in such a harsh way.

For every imaginary person you claim who might be turned off from the industry when reading such terms, there's some other imaginary person who feels more connected and enjoys the atmosphere better maybe because it seems more honest and less "professional".

"policing other people's tone and word choices in such a harsh way"

Feeling offended, are we?

His words no more harmed you than the original article harmed the person you're replying to, and they certainly don't have the force of police backing them up.

We all have the right to express our opinions, and we all should have the expectation to be called on it when we say stupid shit (like "pussied out" or "manned up", which are sexist in a time where at least some of us would like for our industry to be more diverse and welcoming to new people, especially people who aren't especially common in our industry, like women). The article said some stupid shit, surrounded by lots of good advice. It's OK to call out the stupid shit.

Take your own advice: Don't be so sensitive.

IMHO, stating '"manned up" is stupid shit' is at least as stupid as saying 'manned up,' but probably more so: where the latter is just a bit juvenile, and can be used deliberately for effect, the latter is just childish. It shouldn't be taken any more seriously than emanations of 'Timmy is looking at me!' from the back seat.

My metric for why "manned up" and "pussied out" is the kind of language I don't want to use, is "will this make someone, particularly someone in the extreme minority in my community, feel unwelcome or unwanted or unappreciated in my community?"

So, of course, my first amendment rights mean I can say those kinds of things, without fear of police knocking down my door and dragging me off to a gulag. But, I don't want to say them, because it makes some people feel unwelcome in the conversation. Who do you believe I've made to feel unwelcome in saying that those phrases are stupid? (If the answer is, "People who enjoy saying shit that contributes to a culture of misogyny and sexism, with no one ever calling them on it." Then, I'm OK with it. It there's some other answer, I'd sincerely like to hear it.)

The thing is, I don't think that 'manned up' is something which 'contributes to a culture of misogyny and sexism'; on the contrary, I believe that it's a valuable phrase. It is objectively better to be a man than a child.

Like you, I don't mind making people feel unwelcome who really are unwelcome; unlike you, I consider people who derail technical discussions ostensibly to complain about perceived bias (but really to attract attention to themselves) to be unwelcome.


In your reference [1], one of the answers is http://english.stackexchange.com/a/100992:

> OED gives at least two different senses for this word. That of female genitalia is attested as early as 1699, but it's not considered for the sense meaning "coward". The other one, which I copy below, is the purported source of pussy meaning "coward". Basically, using a pet name usually given to women, like sweetheart, princess, etc. to refer to a man mockingly.

The part where making fun of a person by calling them women names is what’s not okay, regardless of whose genitalia are involved or aren’t.

The only outrage I'm seeing here is yours.

Regardless of the ancient linguistic evidence you want to trot out, "pussy" is a gendered term in this day and age. It just is. And using gendered terms as insults is going to make people uncomfortable, "social justice warriors" notwithstanding.


Those same people also understand that the word "pussy" means "vagina". I refuse to believe that everyone is truly under the impression that these are two completely independent words with no relation whatsoever. This is the same argument that leads people to believe that it's okay to call someone a faggot if you're not literally talking about gay people and I don't buy it.

Moreover I don't see what "hoops" you have to jump through to see why using the word in that way could upset some people. We should use language which isn't divisive.

Finally, the "pussy" thing is kind of secondary to me to "man up", which is also far from ideal to me in terms of phrasing.

It's always hilarious when people get offended on behalf of someone else. Particularly when the someone else is theoretical.

I am offended by language such as the one in question.

There, it’s no longer theoretical.

From the perspective of the OP, it was theoretical when the post was written. You being offended now does not change the past.

[Too late to edit] Please tell why the downvotes. Logically, I conclude that I am correct.

How nice for you.

Ignoring useful information because it's conveyed using crass language is incredibly foolish. Do so at your own peril.

Is the mindset of "cats are easily scared" considered socially unacceptable and offensive nowadays? Or is the word pusillanimous a derogatory term? I'm extremely confused by your distain of the phrase "pussied out." "Manned up" is understandable, though.

> Or is the word pusillanimous a derogatory term? I'm extremely confused by your distain of the phrase "pussied out."

I think they meant "manning up" signifies to act like a man, while "pussied out" is use in a derogatory fashion to signify acting like a woman (especially since the term "pussy" is often used to describe a vagina). I could be wrong, though.

However, even though I am a proponent of equality (and know quite a few developers who are women), I don't find those expressions offensive (neither do they). Also, I don't even find LGBT (I'm gay) derogatory terms offensive. Being offended is, in my opinion, always a choice.

With all this said, I do believe that such language doesn't belong here, because this type of behaviour likely has deterred some people from being more active in this industry (it's irrelevant if they're "overly sensitive" or if the author lacks decency/respect/politeness/tact/diplomacy/whatever).

It doesn't refer to cats. It's a rude way to refer to women.

Do you have a reference on that? I've always interpreted as in 'scaredy cat,' since female genitalia aren't traditionally considered afraid of anything, and cats are.

I don't really want to search for a reference, but one other person posted with the same interpretation and this one got upvoted, which suggests at least some other people understood it the same way.

I'm not claiming it's the correct interpretation, and honestly my googling is coming up short because of the vast usages of that phrase on urban dictionary, but honest to god I had never heard a misogynistic interpretation of it before this thread. I never use the phrase in real life because it sounds vulgar, but it's surprising.

If my interpretation was correct, I wonder if it's sort of a 'niggardly' situation, where people stop using a word because it sounds similar to something vulgar.

Yeah, that's why I didn't google it.

Like all words its meaning depends on what most people think it means. I'm not all that surprised that you never heard of it. Maybe I just had rude friends when I grew up.

skybrian, not everyone thinks about pussy the way you or some others do. Pussy = scared cat or uncertain cat. Insert the word vagina into the author's sentence and see if he meant he took the easy way like a prolapsed vagina. Over-sensitive, usually wrong, pussy.

That's a very interesting interpretation.

>if I had only manned up and broken out sed, but I pussied out and wrote some Python

Sounds to me like the author was speaking about being tough vs being weak. It doesn't sound like he was referring to women at all. I wonder if we asked the author if he would say he was talking about how weak women are? Cause I seriously doubt that.

Yes, he wasn't talking about women specifically but it's like "manned up" or "throws like a girl". It's a figure of speech based on the assumption that women are weaker than men. At one time such metaphors were common but they're no longer considered polite. Times have changed and if you want your writing to have the effect you intended, you need to know these things.

I suspect the author wanted to sound a bit rude, though.

Is it possible that the author's use of "manned up" concerns growth and ascension, i.e. boy -> man? There is nothing wrong with being a man, some boys in this society should try it some time.

Example, from my father when I was a boy: "Son, sometimes you'll be in situations where you just need to man up and go through it alone. I won't be there, your Mother won't be there. It will be just you and any wisdom you may have."

Yes, I agree, "manned up" didn't used to be considered rude. I'm sure a lot of people still wouldn't consider it rude.

I'm not personally offended. But still, if you knew Spanish and someone gave their product a name that was rude in a common dialect of Spanish, you'd tell them, right?

Especially when the language is changing, not everyone is going to know, and opinions about what's rude sometimes differ. It's still useful as a writer to know how different people interpret things.

Stop thinking for "a lot of people"..you're doing it wrong. I speak Spanish, Hombre. I think you are over-sensitive and distracted...it's an article about Taco Bell. The author was speaking to himself, aka self talk...so you posit he was being rude...to himself. If you and like-minded individuals consider "manned up" rude...you are either over-thinking, trying to be the thought-police, are over-sensitive, or have never been in a situation where you manned-up. Rude? Seriously.

Thank the stars society has someone like you to interpret someone else's words and translate to all languages censoring and removing all ambiguity.


> "if you want your writing to have the effect you intended"

And here's where we are in strong disagreement.

You know exactly what the author meant. I know exactly what the author meant. We both know that the author did not intend to offend anyone, and being a blog post, was speaking the way he would to himself or to his friends.

However, the way that I see things, is that yourself, as well as others in this thread, want to make-believe that the meaning was somehow lost because the author didn't use words and phrases from the PC playbook. This is where I believe I our argument lies (and I think you'd agree). Just because you are offended, or you think someone else was offended, doesn't change what the author meant, and it hasn't prevented you from knowing what the author meant.

In my opinion, the use non-PC language is a very effective tool of weeding out people who aren't good listeners.

I don't know whether the author intended to offend anyone. The article is by Ted Dziuba who likes to publish controversial articles sometimes so I'm not sure how rude he was trying to be.

But in any case, I was just trying to clarify English usage for someone who sounded like they might actually be confused. So explaining the meaning of words (including whether they are "rude" or "PC" or not, for some audiences) was on-topic for the subthread (though not for the original discussion).

Your idea about being intentionally rude and seeing who ignores it is an odd strategy for a writer; that's one step away from trolling, which is usually done to derail conversations.


That's a whole lot of words without actually saying anything.

What credibility is the author losing? Credibility as a programmer? Credibility as someone who can speak to you personally?

How are phrases like "manned up" or "pussied out" harmful to "our industry"? I guess I'm living under a rock, because I had no idea that we were destroying the future of programming by using words that aren't on your approved list.

Your post has absolutely nothing to do with the content, the author, or the author's intentions. It has everything to do with your own insecurities. I sincerely hope that your post was actually sarcastic and I've simply jumped to conclusions here.

I truly hate this zero-tolerance generation (my own generation), and I have never met a man I respected that didn't curse.

You're living under a HUGE rock. Both phrases are disliked by anyone who also wants to increase the diversity of the people working in the field of "computers". The phase "manned up" and "pussied out" are harmful and off-putting and I winced when I read them as well. They're harmful to the author's credibility in the same way that people saying things are "gay" would be to, well, anyone. They show that the person uttering the phase has, in fact, zero tolerance for diversity, the exact opposite of your claim, if they're willing to use such language.


To even further help remove that big rock. Using those phrases shows a lack of tolerance for diversity, because they're willing to use phrases that other people, ala diversity (not you), find offensive.

> Using those phrases shows a lack of tolerance for diversity

Not at all. You can be tolerant and socialized with a particular vocabulary. Same old arguments from the same old SJWs.

>Both phrases are disliked by anyone who also wants to increase the diversity of the people working in the field of "computers".

I want to increase the diversity in the field of "computers". I do not dislike either phrase.

There's a huge difference between wanting more variety and political pandering. They're only tangentially related, in fact. I actually don't want to work with the kind of people who get really upset at vernacular.

You should have a look at this if you are jumping to conclusions about the way the word "gay" is used (actually watch the video its more informative).


Every time I hear it being used, it is in a derogatory fashion, ie "That's so gay" to something that they find distasteful or dislike. I'm not jumping to conclusions, since I saw it in my generation and in my sibling's generation (a decade of age difference) and the meaning certainly hasn't changed.

Louis CK has a hilarious bit about a very similar word that a lot of people think is homophobic and offensive but which he argues isn't.

Tolerance is not being politically correct, it's tolerating people who aren't. In recent history, it was not politically correct to be gay. In fact, things which were gay were very offensive to a lot of people. Tolerance is having the common sense to know the difference between someone who's trying to offend you and someone who's not. If you are "harmed" or "off-put" by those phrases, then you really need to go out and talk to these people that you think are so offensive. When someone doesn't mean you any offense, then none should be taken. Maybe you never played sports, or cursed, or got in fights, and that's fine. But to judge someone on their connotation rather than their meaning is irrational. I say shit's gay all the time. And I have no problem throwing around the word faggot. Does that mean I'm intolerant to gay people? No. Does it mean I'm trying to offend boys who like boys, or girls who like girls? No. They're just words. And when I use them I'm trying to express an emotion that I feel towards a certain experience or person. An emotion that is easily understood by those who are actually listening. But impossible for someone to understand if they are just looking to be offended. Words mean what the speaker intends them to mean, not what you want to interpret them to mean.

You're making a huge leap of faith in using words that you know others find offensive, even if you yourself don't find them offensive nor mean to use them in an offensive manner. The leap of faith you're making is that _everyone_ will interpret their meaning in the way you've intended them to be used. Does it seem like that will always work out? The short answer is no. No it will not.

This is _EXACTLY_ why we have conferences now enacting a code of conduct [0].

"Harassment includes offensive verbal comments related to gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion, technology choices, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention."

The phrases used above certainly fall within the provision above.

[0] http://confcodeofconduct.com/

It's neat how "political views" is missing from that list up there. I wonder why that is.

People are people. As I tried to communicate to you before, this intolerance is no different than the intolerance used to persecute gay people all over the world. They are different, they are offensive. "It offends my religious beliefs." Morality is not about what's popular, and it's not about what offends you, it's about empathy. Failure to see that these words are not actually offensive is a direct result of failing to empathize with the speaker.

Probably going to get a fair amount of negative karma over this, but... here goes with the "in my experience" stereotypes (albeit largely also agreed to by a friend from the Gulf states who spent time in both Atlanta and Boston):

In the Northeast and much of the West coast, being tolerant is not saying a thing. In the Southeast, it's saying a thing with humor or empathy.

Example: my friend makes a joke with an African American friend in Atlanta referring to a stereotype in a humorous manner.

In the Southeast, that plays like "we both know our friendship is solid, and yes, some stereotypes we're both aware of are amusing."

In the Northeast? Surrounding people (not the friend in question, mind you), "How could you even refer to a stereotype out loud? That just shouldn't be done."

Call me crazy, but I subscribe to the Mel Brooks theory of laughing at a thing is the strongest form of rejection.

You can reject his language, but not his right to use it.

If you know you are going to be misunderstood and offend people with the words you choose, and you use them anyway, that is the exact opposite of empathy.

Failure to see that your words are offensive to some people is a direct result of failing to empathize with the listener.

That is an absolutely ridiculous statement if you look at the context of this conversation.

You are saying that I shouldn't call myself a pussy, because other people might get offended. That I should empathize with the fact that other people might get offended when I call myself names.

> if I had only manned up and broken out sed, but I pussied out and wrote some Python

Are you really suggesting that any given person should watch how they talk to themselves, because other people might get offended by how they talk to themselves?

This is lunacy. Absolute lunacy.

Censorship on how we talk to others is a joke in its own right, but now (when one puts your statements in context with the material being discussed) you are suggesting that failure to censor how we talk to ourselves represents a lack of empathy with the people who are listening in on our thoughts.

Is it so necessary that I defend my right to talk to myself how I please, that I defend the author's right to talk to himself as he pleases, do you honestly fail to see the absolute ridiculousness of getting offended over someone's inner monologue.

> should watch how they talk to themselves

A blog post is not talking to yourself.

It is very obvious that the author is referring to his inner monologue when when he states:

> if I had only manned up and broken out sed, but I pussied out and wrote some Python

It's very frustrating when people try to take an argument out of its context.

You said "I say shit's gay all the time. And I have no problem throwing around the word faggot."

Here: https://news.ycombinator.com/item?id=10831106

That isn't about an inner monologue, and my reply is not out of context.

I fail to see how the subject of the sentence has any relevance to how offensive it is, if it indeed contains disparaging remarks. If I wrote in a blog post

> I stopped being a stupid ni _ _ er and smartened up

any well adjusted person would be offended by that statement. Even though I was referring to myself.

I don't care what their inner monologue is, but it stops being an inner monologue when they put it on a blog.

Do you think it is just a coincidence that the words gay and faggot became insults? I keep seeing this idea repeated that it isn't homophobic because these words have taken on a new meaning and the people using them aren't actually referring to homosexuals when they lob them at people or things. But the reason why they are used as insults is because of a generally understood idea that homosexuality is bad. Do you really not see why it is disparaging that words used to identify a minority of people are becoming synonyms for bad, uncool, and so on?

Getting offended by etymology is impractical. I’ll explain.

Faggot and gay are really fantastic examples because they have gone from everyday words to offensive slurs to useful emotional expressions all in the lifetime of many people who are alive today.

Are we going down the etymology route only so far as it fits your argument? Are we going down the etymology route only until someone gets offended? Because if I wanted to play that line of reasoning I would just say a faggot is a cigarette and to be gay means to be happy. Now we both know those words don't mean those things anymore. But are we both willing to admit that when I call someone a faggot or say something is gay I am in no way referring to boys who like boys?

So that argument actually works in my favor.

How did gay go from a positive word to a negative one? Because along the way it was used to refer to homosexuals, and lots of people hated (and still hate) homosexuals and want to distance themselves from that, and so everything else people hate becomes gay. Gay doesn't go from positive to negative without the hatred for homosexuals. Same goes for a word that means a cig or a bundle of sticks to become an insult.

Do words randomly change meaning over time? Sure. Are a disproportionately large number of them that have referred to homosexuals now all being used as insults? Yes, because of homophobia, and the continued use of these words as synonyms for bad is disparaging.

So you're just going to ignore my argument?

1. What's the point of getting offended over etymology? I could argue that no offense should be taken because of what the word "really" means.

2. I'm willing to admit that a word really doesn't mean X anymore. Are you willing to admit that the same word doesn't mean Y anymore?

2b. The same argument. I'm willing to admit that a word doesn't mean X when certain people say it (think British vs American use). Are you willing to admit that the same word doesn't mean Y when I say it?

You've ignored my entire argument and forced me to restate it. Please address my points.

I didn't ignore your argument, I threw it back at you.

1. Ok, here is why it's offensive. Let's use an example. Alice says: "Dota 2 is gay." That sentence only makes sense today because gay means bad. But why does it mean bad? Because gay also means homosexual. So it implies homosexuals are bad. And it is offensive because it is continuing to reinforce the association of gays to all that is bad. That's how it got its meaning after all. Same goes for faggot. She calls Bob a faggot over Skype not because she thinks he is attracted to men, but because no one wants to be a faggot or associated with faggots. Same goes for queer and all the other words that mean homosexual and are now being used as insults. Using these words in these ways is disparaging to gays because it implies they are bad.

2. Not sure what you are getting at because gay still means X and Y. It still means homosexual and it now means bad. Faggot is still a homophobic slur and now a general purpose insult. This is the very problem.

2b. I acknowledge that words have multiple meanings. Gay has multiple meanings and this is actually essential to my arguments above.

"Words mean what the speaker intends them to mean, not what you interpret them to mean."

It's both. If you understand that someone else will interpret a word in a certain way, and you choose to use that word, then you are responsible for them receiving that meaning from you - since it was what you expected them to understand from what you said.

If you don't actually know that other people will interpret words differently from the way you mean them, then you can't be responsible for misunderstanding until you get feedback and have a dialogue to figure out the different on understanding.

And of course it's fine to speak however you like with people who understand that you don't mean to offend them.

But you do know that people will understand certain of your words as offensive, so when you choose to use those words with people you know will be offended, you are intentionally offending them.

So yes, you actually are trying to offend people - you just don't care about it.

> Words mean what the speaker intends them to mean, not what you want to interpret them to mean.

Do you really shovel snow?


TLDR: Donald Knuth and Doug McIlroy both wrote a program that would read a textfile, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.

Knuth's was 10 pages of (very tightly/well written/literately documented) Pascal.

McIlroy's was tweetable:

   tr -cs A-Za-z '\n' |
   tr A-Z a-z |
   sort |
   uniq -c |
   sort -rn |
   sed ${1}q
Now... a programmer doesn't always have the luxury of working with a full suite of convenient tools well-suited to their problem domain (as UNIX shell tools were in this case), and the merits of Knuth's careful and literate approach can serve well across many domains.

Still, there's something about this I think should strike any developer. It seems we talk about reuse & composability a lot more than we see it done this well, and when it is done this well it's not just elegant but kindof shocking.

(And I bet the astute FP programmers see some familiar lessons at work here...)

Sometimes you have to admit that the problem you need to solve is hard. Hard problems usually can't be solved by easy solutions.

Some examples:

> I have far more faith in xargs than I do in Hadoop.

Me too, but those things are very far from comparable. You can only compare xargs to Hadoop only if you have a Hadoop cluster with one node and I'm really not sure why would anybody use Hadoop like that.

> I trust syslog to handle asynchronous message recording far more than I trust a message queue service.

You mean you trust the protocol that sends messages over UDP and can silently truncate or lose messages?

Standard Unix tools are nice and I always try to use them first, but for some tasks, they are just not the right tools.

I think the point is that most problems aren't actually hard enough to need the more complicated tools.

Actually syslog via UDP has been declared obsolete by RFC 5424.

Not only are Unix tools nice, they also change over time like all tools that are in heavy use...

It's like the old programming interview, someone sits down and is asked to write a program that sorts a file containing a list of 100 random numbers.

The programmer enters the UNIX command sort -n numbers

Off-topic: every Taco Bell restaurant has a server running "Taco Bell Linux" (or did, a few years ago, anyways -- I assume they still do).

Off-topic of off-topic: I tried to google it and came up with this instead: https://www.youtube.com/watch?v=FcAgIapM9HM and it was just to magnificent not to share

Makes you wonder if @jlgaddis set us on this path of discovery on purpose. This is hilarious, by the way! Thanks for linking it.

Hilarious and strangely addicting! We need to know what is in that Taco Bell Linux flavor.

It was OpenSUSE flavored, if memory serves. :)

So did Pizza Hut

You mean SUSE?

ssh pizzaHut.yum.com ssh tacoBell.yum.com ssh combinationPizzaHutAndTacoBell.yum.com

I was going to write a program to generate static files from our horrible Wordpress site and then I remembered oh yea - wget. I relate to this post in the fact that I'm a system admin that hopes the devs learn some daily tricks, faster word manipulation on a large scale but keep the system calls out of production.

Hearing hooves out in the wilderness? Ah, obviously zebras escaped from the zoo. What if all the worlds zebras escaped continuously from all the zoos, better make it scalable. You'll never be able to reimplement so better make it very generic... like a self reproducing herd of artificially intelligent superhuman androids. I've had to maintain systems that should have been implemented with herds of artificially intelligent androids but were not. Most importantly, this is gonna look awesome on my resume.

Or... those hooves could be a couple horses. Use the standard low effort solution of stepping out of their way. Redirect the energy that would have been wasted into something actually useful.

"Resume ability" is kind of useful when dealing with large data sets though. This kind of scripting often fails in that regard.

The tools he mentions support "resume ability".


Well, yes, for a single download. Not for crawling - a repeated pattern of download, inspect for links, download some more etc.

Not sure I understand your point. Wget does recursive crawling, and supports timestamp comparison. Perhaps you are noting that it wouldn't prioritize new pages over changed pages? Or that it doesn't support something like "don't check for changes at all unless the download is more than X days old?".

Generally, yes, there is some point at which wget would fall short of a purpose built tool. I think that point, though, is farther out than you're suggesting.

In this very specific example:

It doesn't keep the crawler state (which links have been visited, which links are discovered but not yet visited) persistently.

(You made it this specific by picking this particular example; my more general observation is that this is a common thing in command line/shell script constructs. They remain simple only until you start to care about such things.)

>>It doesn't keep the crawler state

It does though. Just not in the way you're expecting.

The timestamp support would keep it from re-downloading anything already downloaded (though it would do an HTTP head for the comparison).

Or, it also has a "no-clobber" feature that would keep it from even trying to download.

Yes, both approaches are more limited than a specific state datastore, but there is state.

If only we could get paid to find solutions, not write lines of code.

I was once asked to create a pie chart of the number of lines of code my team wrote every week. I still haven't figured it out.

> I still haven't figured it out.

Pray you never do. This is the sort of pointless "metric" that is used by clueless management to withold a raise for the "underperforming" programmer who spends his time tracking down critical bugs with minimal code changes rather than producing volumes of new code.

Tell them that you would like to show them a pie chart of how many lines of code your team removed. Find a team member who removed 2000 lines of code, then promote them and give them a big payrise. Nickname them "Bill".

Your experience is far off mine :).

Most (all) programmers I socialize with seem to get paid for results, not lines of code.

For some reason though, a few of them build things that are way more complex than they need to be for no better reason (as far as I can tell) that they like doing so.

diff + sloccount?

They could figure it out but there's no value so why bother.

> The Taco Bell answer? xargs and wget. In the rare case that you saturate the network connection, add some split and rsync.

If you're going to download millions of webpages, you'll instantly saturate your network io. yes you can split and rsync, but you'll lose proper error reporting, and ability to systematically retry, dynamic scaling, machine failure recovery, along many others that a properly designed system would provide you.

It often depends on what you require and expect from your solution.

Unless, of course, you read wget documentation and realize most of that stuff is available as flags, you can log errors, you can use GNU parallel and a bunch of other specialized tools.

It still complexity in the end, but you just have to factor it all in - I bet you that 90% of those "big data" problems can fit in a small server's RAM. Most of the time is just people making shit up, so they'll have a job.

I've read wget, and I am a cURL contributor as well. I know the capabilities of each very well. But when you have 1M+ urls to download and you care about proper error handling at a large scale, those tools will fall short. Not all errors need to be handled in the same way. Some need retries some don't depending on http response codes for example.

Another problem you'll hit is how to make sure that the machines are saturated. How many jobs should be running at the current time depends on what's being downloaded and how much room you have to run additional downloads.

Again, it all depends on what you need from the system and how much leeway you have.

I'm not sure I agree. They may work under good circumstances, but how do you test the error cases? What about when you need to scale beyond a single node or you need an "online mode"?

Well, of course, the "online mode" is an `nc -l` at the beginning of the pipeline.

Most crawlers of any reasonable size need to use anonymizing proxies, throttling and shuffling of proxy IP addresses. All this starts to get complicated in bash.

Taco Bell programs as one liners.

Hopefully the 80 character restriction applies.

N.B.: [2010]

shellscript that crawls web? Surely doable but never secure.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact