Hacker Newsnew | comments | show | ask | jobs | submit | jmileham's comments login

This works great unless you're rebasing and the commit hash you reference falls out of use. In PR comments that I expect to rebase again before release but for which I don't expect changes earlier in the file I'm referencing, I often live on the edge and link to the mutable diff. In the context of PR comments, you can edit them after the fact, so neither choice is necessarily game over - you can fix the line number or the commit hash you point to after the fact.

-----


You already know the answer to that problem: «Do not rebase commits that you have pushed to a public repository.» - - [http://git-scm.com/book/en/Git-Branching-Rebasing]

-----


Trite one-liners aside, I think you'll find in practice that some good teams rebase commits on feature branches all the time in order to keep the commit history readable. The implicit contract in that case is that nobody else considers that branch to be usable until it's merged.

It all comes down to the definition of publicity. If your team considers a feature branch to be privately owned by the requestor except for the purposes of code review, it works great.

-----


I do not mind feature branch rebasing, when it does not violate the publicity rule (let's call it that). After the history is public, rewriting history is such a dangerous endeavor as to be completely inadvisable. More so, if one is actually looking at the end goal: a linear repository history. I can't even understand the need for a (fake) linear repository history.

-----


I have come to understand the need.

If a feature branch is under development for a year, and then merged in.... you wind up with commits in master that only appeared today, but are dated (and appear in the commit history timeline) up to a year ago. It's actually a different kind of 'rewriting history', really.

This can be very confusing when trying to figure out what happened. And even worse when trying to figure out where a bug was introduced (via git bisect or manually).

Still, I, like you, try to avoid rebase history rewrites whenever possible, because they can lead to such messes. But I've come to understand why people want them, it's not just "aesthetically pleasing commit history", which sounds kind of inane I agree.

Then "only when it's not public" rule effectively means "never do it" for most people's development practices. Anything that takes longer than an hour or so for me to do is going to be in a public repo, for the 'backup' nature of making sure i don't lose it while in progress if nothing else. Anything that more than one person collaborates on (or you want code review suggestions on) obviously is going to be in a public repo. It's pretty rare feature branch work I do that never makes it into a public repo (before it's committed to master).

-----


> The implicit contract in that case is that nobody else considers that branch to be usable until it's merged.

'not usable', then, includes not being able to link to specific sourcecode in that branch with a URL.

You pays your money and you takes your chance.

-----


The decision to push 4k streams certainly would carry repercussions for Netflix both from a storage and transit standpoint, even absent paying interconnect fees to Comcast.

As a cable subscriber, I expect that when paying for N Mbps of bandwidth, I'm entitled to N Mbps of bandwidth of the content of my choosing. If Comcast's pricing model needs to change to a cost-per-gigabyte model in order to cope with the increased quantity of data customers consume, so be it. But sneaking the costs onto Netflix's tab effectively shifts Comcast's costs to all Netflix customers, allowing Comcast to artificially lower their prices relative to smaller ISPs without the market share necessary to effectively extract rent from Netflix.

-----


> As a cable subscriber, I expect that when paying for N Mbps of bandwidth, I'm entitled to N Mbps of bandwidth of the content of my choosing

100% of the time ? not going to happen

So let's say I'm an ISP and I bring a fiber to your home, and gives you a gigabit ethernet port. You cannot expect all current and future customers to be able to use 1 Gbit/s at the same time. You would need a big non blocking switch with as many ports as subscribers, the technology for this does not exist once you reach a large customer base.

Now do you want me to shape your link to 0.1Mbit/s, because that's the only thing I can guarantee if all customers uses their link at the same time ? Or you'd rather have the 1Gbit/s possible bandwidth ?

Which one is better:

1) guaranteed 0.1Mbit/s for 10$, 5Mbit/s for 100$, 1Gbit/s for 4000$ 2) possible 1Gbit/s for 20$ ?

If you take the globalized approach, you cannot have business like Netflix, they destabilize the equation.

-----


I'll take the non-false dichotomy, please. Having enough capacity to meet the streaming demand of your current customers during peak times is a solvable problem that doesn't require a full gigabit for everybody, all the time. The pricing model required to support this while still looking attractive to customers I'll leave to Comcast's well-funded marketing department.

In order to make that happen, of course, we'd have to live in a fanciful world where shifting last-mile delivery cost to content providers wasn't an option so the painful process of exposing this cost to customers couldn't be hidden in a rat's nest of perverse incentives that benefit the most entrenched corporations. (Ironically, and despite its protestations, Netflix's ability to pay this rent is a barrier to entry for its own future competitors.)

-----


> Having enough capacity to meet the streaming demand of your current customers during peak times is a solvable problem that doesn't require a full gigabit for everybody, all the time.

With adaptive streaming technology, there is not such thing as "today streaming demand"

If whenever you raise the pipe size, it is automatically filled by an higher resolution/quality/bitrate stream, it is a never ending game.

What I mean is that, whatever the state/quality of ISP networks is or could have been, we are doomed to face that "Netflix problem". If networks were better, then Netflix would already offer 4k or dual/triple HD streams and so there would be saturation anyway.

> (Ironically, and despite its protestations, Netflix's ability to pay this rent is a barrier to entry for its own future competitors.)

Actually, moving all Netflix traffic to private pipes will free the shared ones, so a new freeloader can use them and cut Netflix prices ;)

-----


When they sell me the service, they don't sell it as possibly 1 Gbit/s. They sell me 1 Gbit/s. If they can't deliver, it is time to adjust that promise when I purchase the service.

-----


Probably what is better (or at least fairer) is a token bucket with a low guaranteed rate and a very large burst. http://blog.felter.org/post/64832783299/apropos-of-a-recent-...

-----


I don't think he'd refute that he'd have more to worry about had he not been lucky enough to be on an unaffected version of OpenSSL. I believe his point was that the use of stunnel to terminate SSL connections mitigates some of the attack vectors that could've been used to recover customer information in the event of having been compromised at the OpenSSL layer, and that the architecture of Tarsnap itself absolutely precludes recovery of customer backups in any event. And that these facts aren't an accident.

The important takeaway from this post is that it pays to employ layers of security when building software systems.

-----


I was reacting mostly to the apparent exclusion of usernames, passwords, and session cookies (all exposed in net traffic) from the category of "anything sensitive".

-----


Fair point. Perhaps I should have said that the stunnel/jail setup keeps OpenSSL bugs away from the more sensitive things.

-----


If I understand correctly, he's decoupled the SSL connection handling from the http server. That seems to have all kinds of advantages to me, for example if a vulnerability in your SSL library is found you could quickly swap in SSL termination based on another library (e.g. gnutls or nss) or if there was a vulerablity in stunnel you could change it out for stud or even apache or nginx in a jail (assuming you had any or all of these things ready to go). Should also make for more flexibility with load balancing. Brilliant engineering.

-----


Yes. And assume there will be bugs.

-----


Painful how? It sounds like the author was genuinely moved by the game. It has to be a good thing that a AAA console title can have that effect, even if it only took 30-odd years for us to get there.

-----


Such console games "get" this effect just because they look more and more like movies, not because they are better intrinsic games than what we had before. And the examples shown by the author show her clear ignorance of PC games in the 80-90s, so it's not surprising he didn't feel "much" in front of consoles games. We basically had to wait for late SNES games and PS1 games to see more and more games on consoles with deeper storylines and character development. But this occured on PC way before it happened on consoles, and you had actual strong female characters way before on PC too (hey, come on, the Roberta Williams games had almost always female protagonists, and were hugely popular!).

The idea that "we had to wait until 2014 to feel something when playing games" is what I considered the most painful. Because it's preposterous.

Wired has really become an over-hyped tabloid.

-----


The whole article is about how the author found a game which she could identify with the character, yet you still refer to her as "he".

I just thought it was curious.

-----


Corrected.

-----


The description of security by obscurity in this article reads a lot like Kerckhoff's principle, which when employed correctly is actually a virtue. Not to defend cybercrime, but completely covering your tracks (digitally or otherwise) is a very tricky problem - one that people have long tried to solve with both malicious and benevolent intent - and failings in that vein aren't necessarily of the level of amateurishness that the term implies.

-----


Cool. In the comments from that post, there's another article that traces that inverse square root implementation back even further - almost 20 years back at this point:

http://www.beyond3d.com/content/articles/8/

-----


One caveat comes to mind about the approach this technique supports. Anonymizing your production data and distributing it to developer laptops is something you should think hard about before doing it, and approach very carefully if you do. Sometimes the sensitive information you should be protecting isn't just the users' creds and addresses. Sometimes it's the ways in which they've used your app, the content they've created, and the graph of other accounts with whom they've associated.

A typical anonymized DB dump is likely to share primary key values with production, and often an adversary knowing that user X posted something they'd rather keep private on your app will be able to simply look up their identity on your production server without privileged access.

Generating useful fake data is also hard, of course, and won't hit your edge cases like real production data, but then again, the code you're writing now won't really be exercised by the unwashed masses until you release it, so using production data is mostly a protection against regressions. If you've got sensitive content in your app, you should consider stronger test coverage in lieu of production dumps. Of course performance regressions can be hard to keep under wraps with automated testing (though you can defend against things like N+1 query problems), so YMMV.

-----


Absolutely, the anonymisation method is an important consideration. You can probably safely drop most user-specific data, like any tracking you've done. Unless the dev is building functionality that actually depends on that data, such as data-mining, they're unlikely to need it.

I'll make another argument in favour of production dumps though. It gives developers a proper feel to how a website functions. For UI-oriented developers like myself, having just enough data to allow the website to function on a technical level, isn't enough. Your search code is going to feel very different with 20 million products vs a few dozen dummy products.

-----


In order to reconcile ACID with CAP, this defines a weakened form of ACID to mean whatever-some-databases-currently marketed-as-ACID-compliant-support in order to say that you can still offer effective ACID compliance and still choose CA over partition tolerance (in the http://codahale.com/you-cant-sacrifice-partition-tolerance/ sense). For a lot of applications, the weakened isolation guarantees aren't, or shouldn't be, negotiable (if you try to sneak by without them, they'll cause data integrity issues at scale).

Not saying that the solution doesn't provide a valuable framework for building robust applications that can overcome those issues (necessarily pushing some of that complexity up the stack to the application developer), but the marketing seems a little bit suspicious?

Edited to add: In fairness, the article doesn't actually claim to have evaded CAP - it recognizes that HAT is a compromise. But I believe it's easy to understate the practical problems with non-serializable transactions. It becomes impossible to prevent duplicate transactions from being created on the split-brain nodes. In banking, for instance, this would be a Bad Thing, and lead to potentially hairy application-specific mop up when the nodes resync.

-----


Good point, and well-taken. As I mention in http://www.bailis.org/blog/hat-not-cap-introducing-highly-av... (and devote an full section to in the paper, including documented isolation anomalies like lost updates, write skew, and anti-dependency cycles), there are many guarantees that aren't achievable in a highly available environment. Our goal is to push the limits of what is achievable, and, by matching the weak isolation provided by many databases, hopefully provide a familiar programming interface.

As I tried to stress in the post, we aren't claiming to "beat CAP" or provide "100% ACID compliance"; we're attempting to strengthen the semantic limits of highly available systems. I intended "HAT, not CAP" as a play on acronyms, not as a claim to achieve the impossible.

edit: We're also certainly not claiming to have a "CA" solution, whatever that means. There's a lot of confusion between "CAP atomicity"==linearizability and "ACID atomicity"=="transactional atomicity"/"all or nothing"; see http://www.bailis.org/blog/hat-not-cap-introducing-highly-av...

-----


> matching the weak isolation provided by many databases, hopefully provide a familiar programming interface

I'm not sure it's really that familiar. Just knowing how to make requests doesn't ensure you really understand all the ways the answers could be wrong, much less have done the analysis and proven you can withstand all those failure modes. I think a lot of systems out there are quietly corrupting themselves in ways the maintainers didn't have high enough scale or good enough analytics to notice, at least not early enough to recover to a valid state.

-----


> In order to reconcile ACID with CAP, this defines a weakened form of ACID to mean whatever-some-databases-currently marketed-as-ACID-compliant-support

Are you referring to the isolation guarantees? "Repeatable Read" (which this provides) is a pretty reasonable standard of isolation; while "Fully Serializable" is stronger, it's also more expensive. Engines like PostgreSQL that can be run in either mode are most often run in "repeatable read" mode AFAIK.

-----


Good point, repeatable read is a pretty useful guarantee, though I would be loathe to give up global integrity constraints. Read committed seems to put a lot of work on the application developer's plate, though, and it's not clear to me what impact the different isolation levels have on system performance.

-----


To be fair to the article, It's pretty upfront about the constraints. From the article:

'Of course, there are several guarantees that HATs cannot provide. Not even the best of marketing teams can produce a real database that “beats CAP”; HATs cannot make guarantees on data recency during partitions, although, in the absence of partitions, data may not be very stale. HATs cannot be “100% ACID compliant” as they cannot guarantee serializability'

My concern is the pitched low latency use case, if I understand correctly there's no way to avoid an extra round trip?

Could be very useful all the same.

-----


> if I understand correctly there's no way to avoid an extra round trip?

With HATs, you only need to contact one replica for every key. This is to goal behind our definition of "high availability" (http://www.bailis.org/blog/hat-not-cap-introducing-highly-av...).

In general, I haven't seen algorithms guaranteeing serializability or atomicity that complete without a round trip to at least one other replica (or a possibly long trip to the master). Intuitively, the impossibility results dictate that this must be the case, otherwise partitioned replicas could safely serve requests. Daniel Abadi has a great post about this latency-consistency trade-off: http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-an...

-----


There's power in the concept of chain of custody. I agree that you don't want to associate negativity with this stuff, but identifying an owner who understands the code and whose job it is to fix or find the right person to fix has huge merit. I've worked on too many projects where the issue tracker balloons due to diffusion of responsibility.

-----


This has a big (but easy to fix) security flaw. Don't deploy this publicly without changing the secret token or else you are effectively publishing your app code for analysis by attackers.

https://github.com/SquareSquash/web/blob/master/config/initi...

More broadly, if you're writing an open source rails app please don't commit a hard-coded secret_token into the repo or session fixation attacks are trivial.

-----


Why has the Rails team not fixed this?

Seems like a sane default would be to have a special token (used in the auto-gened config) that generates a new random key, and then writes it to a 2nd config file (which is in the default gitignore)

-----


Because it's not a problem for private applications. You actually do want the key in your VCS in most cases since it should be stable across deploys. Rails will log out every user and throw an error for each active session when the key changes because it suspects that the session has been tampered with.

The handling for OS applications is a little more difficult and I must admit that I know a couple of mediocre and no good solution to it.

-----


I have to disagree. An application connected to a network is never "private" unless you configured a strict firewall and are sure that other running services don't contain exploits. Besides, if an employee leaves the company, this "private" app is now known to an outsider.

Also, just because something needs to be "stable across deploys" doesn't mean it needs to be in VCS. Are your application's third party passwords and API keys all stored in its version history? We picked a solution where the deployment tool configures the sensitive pieces of the application.

-----


Private was meant as "the source is kept private". In this case it's less of a problem since only people with access to the source get to see the source. In any mid-size team those probably have access to the servers anyways, so they have access to the secret in any case, so they're implicitly trusted. You're certainly right that there are other ways to store the secret and if you read my posts in this thread you'll see that I'm aware of that, however, keeping the secret in the VCS is a simple solution that works without any further magic. If your requirements dictate that you can't do that, chance it, but that doesn't mean it's not a working solution for many other people. And that's the reason things are as they currently are and I don't expect them to change soon.

To answer your second question: depending on the case, I store 3rd party passwords and API-keys in the repo. If an employee leaves the company I'll have to change those anyways since he probably had access if he had access to the project at all.

-----


In this app's case the setup.rb is an ideal candidate for generating a new secret_token (and raising an exception on startup until you've generated a token).

-----


I actually went through the setup.rb and was surprised they don't do it. Rails itself will raise the exception if the initializer does not exist IIRC, so just removing the file and .gitignoring it would do.

-----


.gitignoring will hurt anybody deploying via git (e.g. heroku) in the absence of some ENV magic. It'd be fine to have the regenerated file checked into each user's repo as long as they have the good sense not to push that repo up to a public fork on github.

There is talk about more general solutions here:

https://groups.google.com/d/msg/rubyonrails-core/N2EFnf6X_i4...

-----


Neither of the solutions is actually beautiful, they all require some sort of deployment magic - env variables set, cap symlinking files or similar. What works for me probably doesn't work for you.

Btw: you can check the regenerated file into the repo, adding it to .gitignore just prevents you from accidentally adding it:

  Last login: Tue Jan 15 17:07:14 on ttys005
  Voice-of-Evening:~ fgilcher$ cd /tmp
  Voice-of-Evening:tmp fgilcher$ git init test
  Initialized empty Git repository in /private/tmp/test/.git/
  Voice-of-Evening:tmp fgilcher$ cd test/
  Voice-of-Evening:test fgilcher$ echo "README" >> .gitignore
  Voice-of-Evening:test fgilcher$ touch README
  Voice-of-Evening:test fgilcher$ git status
  # On branch master
  #
  # Initial commit
  #
  # Untracked files:
  #   (use "git add <file>..." to include in what will be committed)
  #
  #	.gitignore
  nothing added to commit but untracked files present (use "git add" to track)
  Voice-of-Evening:test fgilcher$ git add -f README
  Voice-of-Evening:test fgilcher$ git status
  # On branch master
  #
  # Initial commit
  #
  # Changes to be committed:
  #   (use "git rm --cached <file>..." to unstage)
  #
  #	new file:   README
  #
  # Untracked files:
  #   (use "git add <file>..." to include in what will be committed)
  #
  #	.gitignore
  Voice-of-Evening:test fgilcher$

-----


That doesn't stop you from accidentally pushing it to GitHub though.

-----


No, quite to the contrary: Once you do this, you'll push it in the next commit. But in the case of an OS app you'd probably want to create a private repo for the app since all your configuration would be public otherwise. If you deploy to heroku, you'll have at least one private git repo on heroku.

-----


The setup script has been updated to generate a session token. Thanks!

-----

More

Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: