Hacker Newsnew | comments | ask | jobs | submit | JunkDNA's commentslogin

I've been slowly coming to this realization myself lately, I thought our situation was just outside the mainstream of what most people work on, but maybe not.

Our team builds data-intensive biomedical web applications (open source project for it all here: http://harvest.research.chop.edu). Much of our UI is data-driven, so many of the bugs we encounter are at the intersection of code, config, and unique data circumstances. While a lot of the low-level components can be unit tested, individual apps as a whole need functional testing with real (or real enough) data for us to consider them sufficiently tested. The effort required to mock out things is often higher than just cloning production and running new code on top of existing data. This gets complicated in a hurry when you also have to introduce schema migrations before you can test. It's almost like we need to be doing integration testing, far, far earlier than you would normally.

Furthermore, the reality is that what started out as a Django app now has almost as much client-side JavaScript as it does Python code. This complicates the testing picture further, and I suspect many teams pushing things further in the direction of true web applications are starting to bump into this more and more.


prawks 5 hours ago | link

> the intersection of code, config, and unique data circumstances

I'm sure I'm way over-simplifying here, but those sound like missed edge cases? I can imagine they'd be difficult to predict, regardless.


edwinnathaniel 6 hours ago | link

Why moving toward more client-side JS complicates testing further?


JunkDNA 6 hours ago | link

Because now you're introducing an interface when previously you had shared objects all on the server in a single codebase. For example, let's say my JavaScript client works with a "User" object delivered via REST API from the server. Let's say I change Django's User model object to modify some existing attribute. Previously, all my Python code's tests would now use this updated model object and I could simply run my tests and count on finding bugs where things that make use of "User" were broken as a result of the change. But now, with lots of client side code, a whole other JavaScript-based test suite (completely outside Django's) needs to run to make sure the new JSON representation will work.

However, this means I have to not just test the backend Django code, but also the output of the REST service, the interaction between the JavaScript code and REST API, and the internal JavaScript methods that use the User object on the client. You are now dealing with at least two completely independent testing frameworks (one in Python and one in JavaScript). If you want traditional unit tests, you need to mock the API calls and the JSON payloads in both directions. Now you've got to maintain all those mock objects for your tests so that your tests are actually testing real payloads and calls and not outdated versions. Ultimately, the only foolproof way to actually be sure it all works and you didn't miss anything is to actually deploy the whole app together and poke at the full stack through a headless browser test.


wpietri 5 hours ago | link

I'm personally not a big fan of mocking; it introduces a lot of duplication into the system. On the other hand, I'm not a big fan of testing a fully integrated system unless you've TDD'd everything from scratch, because then people let systems get slow enough to make tests too slow to maintain a good pace.

If you're already in the "integrating with slow things" problem space, then one solution is to automatically generate the mocks from real responses. E.g., some testing setup code calls your Django layer to create and fetch a User object. You then persist that so that your tests run quickly.

And yeah, I'll definitely use end-to-end smoke tests to make sure that the whole thing joins up in the obvious spots. But those approaches are slow and flaky enough that I've never managed to do more than basic testing through them.


edwinnathaniel 5 hours ago | link

1) See the testing pyramid article posted somewhere within this thread.

2) I never have to maintain "Mock" objects in my tests, my Mock objects came for free (I use Mockito and I have less Java Interface, I mock my real classes and I inject them in the right places).

3) Separating Django tests and JS tests shouldn't be too bad and often preferred.

4) You can test JSON payload from backend to front-end IN the back-end serialization code, speaking of this, I use JAX-RS (and Jackson/JAXB) so JSON payload is something I normally don't test since that means I'm testing the framework that is already well-tested. I normally don't test JSON payload coming from the front-end: it's JavaScript, it's the browser, I don't test an already tested thing.

But I'll give you another example of Object transformation from one form to another: I use Dozer to convert Domain model (Django model, ActiveRecord model) to Data Transfer Object (Plain old Java object). To test this, I write a Unit test that converts it and check the expected values.

5) Nobody argues end-to-end testing :)

Check PhantomJS, CasperJS, Selenium (especially WebDriver) and also Sauce Lab (We use them all). But end-to-end testing is very expensive so hence the testing pyramid.


jarrett 6 hours ago | link

Largely because more of the automated integration testing has to be done with a headless browser, e.g. Poltergeist.

If you have no JS in an important area of your site, you can integration test it without a headless browser. Your test process models HTTP requests as simple method calls to the web framework. (E.g. in Rails, you can simulate an HTTP request by sending the appropriate method call to Rack.) The method calls simply return the HTTP response. You can then make assertions against that response, "click links" by sending more method calls based on the links in the response, submit forms in a similar manner, etc.

But if a given part of your app depends on JS, you pretty much have to integration test in a headless browser. Given the state of the tooling, that's just not as convenient as the former approach. Headless browsers tend to be slow as molasses. There are all kinds of weird edge cases, often related to asynchronous stuff. You spend a lot of time debugging tests instead of using tests to find application bugs.

Worst of all, headless browsers still can't truly test that "the user experience is correct." That's because we haven't yet found a way to define correctness. For example, a bug resulting from the interaction of JS and CSS is definitely a bug, and it can utterly break the app. But how do you assert against that? How do define the correct visual state of the UI?


edwinnathaniel 5 hours ago | link

Yes, I've known about the headless proposition for a while.

Splitting front-end and back-end tests is desirable.

> Worst of all, headless browsers still can't truly test that "the user experience is correct."

This is the claim from the old Joel Spolsky article about automation tests but should not be the ultimate dealbreaker.

Nobody claims you should rely on automation-tests 100%. Automation-tests test functionality of your software not the look-n-feel or user-experience. You have a separate tests for that.

The problem between JS and CSS shouldn't be that many either (shouldn't be a factor that, again, becomes a dealbreaker). If you have tons of this then perhaps what's broken is the tools we use? or perhaps how we use it?

I don't test my configurations (in-code configuration, not infrastructure configuration) because configuration is one-time only. You test it manually and forget about it.


jarrett 5 hours ago | link

> Splitting front-end and back-end tests is desirable.

I don't feel confident without integration tests. An integration test should test as much of the system together as is practical. If I test the client and server sides separately, I can't know whether the client and server will work together properly.

For example, let's say I assert that the server returns a certain JSON object in response to a certain request. Then I assert that the JS does the correct thing upon receiving that JSON object.

But then, a month later, a coworker decides to change the structure of the JSON object. He updates the JS and the tests for the JS. But he forgets to update the server. (Or maybe it's a version control mistake, and he loses those changes.) Anyone running the tests will still see all test passing, yet the app is broken.

Scenarios like that worry me, which is why integration tests are my favorite kind of test.

> Automation-tests test functionality of your software not the look-n-feel or user-experience.

It's not about the difference between a drop shadow or no drop shadow. We're not talking cosmetic stuff. We're talking elements disappearing, being positioned so they cover other important elements, etc. Stuff that breaks the UI.

> The problem between JS and CSS shouldn't be that many either

Maybe it shouldn't be, but it is. I'm not saying I encounter twelve JS-CSS bugs a day. But they do happen. And when they make it into production, clients get upset. There are strong business reasons to

> If you have tons of this then perhaps what's broken is the tools we use? or perhaps how we use it?

Exactly. I think there's a tooling problem.


edwinnathaniel 2 hours ago | link

> I don't feel confident without integration tests.

Nobody does. Having said that, my unit-tests are a-plenty and they test things in isolation.

My integration-tests are limited to database interaction with the back-end system only but do not test near-end-to-end to avoid overlap with my unit-tests.

I have another sets of functional-tests that use Selenium but with minimum test cases written for it only to test the happy path (can I create a user? can I delete a user? There is no corner cases tests unless we found that they're a must) in most cases because it is expensive to maintain the full blown functional tests.

Corner cases are done at the unit-test or integration-test level.


>This is how the internet is designed, and you can already do this today. In my case, I host my own dns, email, and my own webpages, locally on my home connection. You just have to be willing to learn, and willing to do. Once you've learned, the actual "do" is rather trivial.

That's great that it works for you. But there are lots of people who are perfectly capable of doing this who don't want the hassle (let alone the huge numbers of people for whom this is completely impossible). I spent most of my youth screwing around with computers and learning a lot about how all this stuff works and it was great fun. Now that I'm getting older, it is frankly growing tiresome. The last thing I want to do on a Saturday-- when I should be playing with my kids and enjoying life-- is fuss with some file server that is acting up, preventing my wife from posting vacation photos. I've got enough work to do around the house so as it is. I don't need to be on call 24/7 for IT infrastructure support.


pwg 6 days ago | link

Very true, but also not my point.

My point was: "there is no change to the internet at all necessary for this to happen".

The original comment to which I replied implied that the commenter believed the internet needed to change in some way in order to decentralize. My point was it was already, and still is, natively decentralized. What is preventing "decentralization" is convenience, and lack of knowledge, not the underlying architecture of the internet.


plg 6 days ago | link

It's true... but I view this as a technology problem not a problem _in_principle_. After all we don't all need to be engineers and electricians to operate a refrigerator, or our automobiles, for example. Society has built up the infrastructure to support individuals owning cars. We aren't all forced to use buses.


I'm sure the author is a nice guy. It is hard to put yourself out there like he has done. That said, when you put your name on something and put it in the public space, you have to be prepared for people to write these kinds of things. Furthermore, I think tptacek's blunt and at times snarky style is necessary to make his point. It is extremely hard to write clear critiques that don't sound harsh while at the same time clearly conveying the gravity of the situation. In short, tptacek can't afford the risk that softening his natural style means a major point will be missed. It's a bit like the old quote, "Sorry this letter is so long, I didn't have the time to make it short.". Politeness is a luxury one can't really afford when a book that has factual errors is already out there (and to be clear, I'm not qualified to assess whether this is true, I'm just speaking about the approach here). It is far better to write precisely what you're really thinking, than to couch it in all sorts equivocation and self-censorship.

Academic researchers get these kinds of critiques of their publications all the time. It's extremely useful to the whole academic process despite being infuriating and depressing. That said, most of those critiques happen before publication and in private. But as a book author, that's something one can control. If I were writing a book like this, my #1 worry would be that I was making claims or errors that would be held up on HN by folks like tptacek as evidence of my incompetence. I would therefore made it the highest priority to approach the most likely people to have an opinion to get them to review my draft ahead of publication. That's what people writing serious publications that have real world consequences do. Make no mistake: crypto is in this category. It's not like writing "The 4-hour Work Week", "Web Design for Programmers", or "JavaScript for Aspiring Ninjas".


I would actually really like to see a lengthy investigative journalism piece that looks at all the factors that go into the costs of internet access around the world. I hear lots of hand-wavey stuff all the time about how Japan or Singapore have low population density so it's cheaper there relative to the US. But that can't be the total picture. Major US cities also have high population densities and that doesn't seem to be enough. Different countries have also spent public funds to build out infrastructure which likely never goes into the quoted prices of access. Japan's price for 2 gigabit (quoted here: http://valme.io/c/technology/vkqqs/the-cost-to-connect-inter...) is a good example. What's the hidden cost? How much do taxpayers pay to maintain the parts of that infrastructure that are government owned? There are lots of countries that have higher tax rates than the US to subsidize public services. In some places, people are probably paying a lot for this stuff, just parts of it are hidden because they are indirect through taxation. However, I bet that's not totally the case in all areas (perhaps in a place like Romania, given the comments elsewhere in this thread).

Are there any places that have vibrant commercial competition for broadband in the absence of significant government intervention? Are there places where because of government ownership of infrastructure in part of the market (e.g. fiber backhaul) there is increased consumer choice in providers, such that the fully-loaded costs are cheaper that the current US situation? We know from numerous markets that lots of competition drives down prices and drives up quality. The problem with broadband is it feels like a situation where you have a natural monopoly, so what is the right policy mix to enable competition? Or isn't there one?


kansface 16 days ago | link

I don't think any in depth analysis is really needed for the US. Lack of density is certainly a problem, but not the main one. In nearly every major market, broadband is dominated wholly by a single company through municipally granted monopolies. Cities do this because they get kickbacks from the ISPs and other concessions like internet for unprofitable (poor) neighborhoods. In our system, ISPs offer an palatable municipal tax (by any other name) whereby the ISP becomes the hated party. Fortunately for them, ISPs aren't elected.

If you want faster internet in the US, start with your local government.


anon4 15 days ago | link

Eastern Europe, Bulgaria in particular. I've always attributed it to lax copyright laws (or at least lax application of said laws) and low average income, but not so low that you can't afford to buy a PC and pay it off over a few months, resulting in rampant file sharing driving high demand for fast, unmetered internet access. For a while in the early 00s each ISP ran their own file sharing site even.

It also helps that you can just string cables between buildings tying them to lampposts and trees. "Can" meaning that it's against all regulations and kind of unsafe in a thunderstorm (one friend had his NIC fried once, fun times), but people don't complain - you can either have internet that you have to unplug when it rains or no internet at all because running cables and obeying the building code gets too pricey for lean ISP startups.

Today of course most cables are underground and smaller ISPs have merged or were bought by larger telecommunications operators. Also if you want internet access on your mobile phone, be prepared to get robbed in plain sight - GSM operators here operate just like everywhere else.

In the end, the actual pushing of bits around doesn't really cost anything. Just some small amount of electricity. The actual costs to an ISP are all infrastructure and maintenance of the infrastructure. It's not hard to get some money to buy routers and enough bandwidth to service a neighbourhood, then work your ass off as a sysadmin keeping everything running. The hard part is the cables. You have to run your own Ethernet and optics and anything that gets in the way of you running it drives cost up and makes it ever so harder to get into business.

In conclusion, lax building code laws and high consumer demand allowed our ISPs to start out with a minimal investment and upgrade their infrastructure as they went along to make it more and more reliable and safe. Today I'd say the price for access is as low as it can go, the service is pretty solid, with 8-10megabit speeds at the lowest and everybody gets given a static non-NAT IPv4 address for something like 15$/month.


xur17 15 days ago | link

I've always wondered why some of the larger apartment complexes don't buy a fat pipe to the internet, and provide internet service to their residents (either as a perk, or a competitively priced service to make money). For example, the current complex I am in has hundreds of apartments - it seems like the perfect place for them to do so.

Is it prohibitively difficult for them to find a fast enough pipe to the internet, or is there something else stopping them that I am missing?


Makes me nostalgic for the early days of WIRED. This article was one of the best. It blew me away when I read it the first time-- so much so, I read it twice just for the sheer enjoyment of it.


I'm working on a book right now and I struggle with the syntax highlighting part with Scrivener. Overall, I really like Scrivener, but it seems really hard to integrate proper syntax highlighting and code formatting in an efficient manner.


There is another loophole that is admittedly unlikely (and the post doesn't go into any details on what the actual records contain). If these records were somehow scrubbed of HIPAA identifiers, then it would in fact not be a HIPAA violation in the US. For example: a dataset of randomly assigned ID's tied to diagnosis codes. You could uniquely ID an individual within the dataset but not know who they were in the real world.

I hear all the privacy folks lighting their torches and sharpening the pitchforks. So, for the record, yes, there are all sorts of methods and studies that show you can potentially re-identify people from all sorts of data that seems at first blush to be not that identifiable [1] and isn't part of the list of HIPAA identifiers. However, in the US, in actual practice, when you talk to compliance people, they often take a very narrow view of what "identifiable" is. The standard is often that it has to be more or less trivial to do. For example, matching on easily accessible public records.

I encounter this all the time in my capacity as a biomedical researcher and have discovered that my "geek intuition" on what is identifiable does me no good in this space. The most crazy one is your DNA sequence. I'm having trouble finding the original document now, but Health and Human Services went out of their way to not make this a formal HIPAA identifier (except in very narrow cases relating to insurance companies) when they had the opportunity to do so during some recent rule-making. Which you would think it would clearly be because HIPAA allows for "other biometric identifiers" and what could be a better biometric than your DNA? But I digress...

One of the problems with HIPAA is that it leaves a lot to the eye of the beholder, and many beholders have wildly differing vision. This, as you state, is why you need a lawyer who can make sure your vision doesn't lead to decisions with a high probability of business-ending bankruptcy and going to jail.

[1] http://arstechnica.com/tech-policy/2009/09/your-secrets-live...


enjo 51 days ago | link

From the article:

"And what they uploaded was the entire shooting match—full personal medical records indexed by NHS patient number—with enough additional data (post code, address, date of birth, gender) to make de-anonymizing the records trivial."

So I'm guessing that they are not properly scrubbed.


JunkDNA 51 days ago | link

Thanks, I missed that on my initial read somehow.


ronaldx 51 days ago | link

> when you talk to compliance people, they often take a very narrow view of what "identifiable" is. The standard is often that it has to be more or less trivial to do.

Frankly, I would expect those compliance people to be disciplined for this obvious neglect of their duties.

(or whoever is responsible for the custom that 'identifiable' means 'identifiable to a 2 year old')

This reminds me strongly of ethically obviously-wrong tax avoidance schemes. "Yes, it's OK to pretend you're a used car salesman for tax purposes. There's nothing illegal about it." Let's get real.


JunkDNA 51 days ago | link

This stuff isn't done in a vacuum by compliance offices. It's done with guidance from HHS. HIPAA has a lot of stuff that is not clearly defined. As a result, it's important to be keeping with the spirit of the rule or HHS will come after you. The analogous healthcare loophole scenario you describe would not hold water with HHS.

Again, my perspective is from the biomedical research world for which the HIPAA privacy rule gives certain limited affordances for communicating patient data that is de-identified to other institutions. Without that safety valve of de-identification being fairly reasonable, there are tons of research studies that would not be allowed to go forward. There is a point where the very tiny risk of re-identification is vastly outweighed by the good of a research study going forward. This is what HHS and institutional review boards struggle with all the time.


rmrfrmrf 51 days ago | link

> (or whoever is responsible for the custom that 'identifiable' means 'identifiable to a 2 year old')

You mean the American people? If you don't like that definition, propose a bill and get signatures.


I for one am not totally convinced by all the hype around electric vehicles. Battery capacity has never in the past been able to make gigantic leaps. It has always been slow incremental year-over-year progress. Nothing seems to indicate that situation will change any time soon, but I'd love to be wrong about that because I certainly like the idea of having a car that doesn't need oil changes and all sorts of other routine maintenance that we take as a given with internal combustion engines.

All that being said, Elon Musk is no ordinary entrepreneur and you ignore that at your own peril. I think Wall Street is making a long-term bet that he's going to pull this whole thing off. Has this author not read "The Innovators Dilemma"? This is straight out of the playbook. Capture a small portion of a market largely being neglected by the established players and use it to work into larger and larger pieces of the market, learning and refining as you go.


>It'd be equivalent to Oracle releasing their DB server for Linux way back in the day

I think Microsoft should look less to Apple and Google for inspiration and more to Oracle. There are a lot of similarities between the two companies: enterprise focus, pure software company, lots of legacy applications to support, etc...


Not unit tests, but a fair bit of in-line verification that the correct base has been added. There are numerous DNA proofreading mechanisms to prevent errors. If they didn't exist, we would be riddled with mutations.



Lists | RSS | Bookmarklet | Guidelines | FAQ | DMCA | News News | Feature Requests | Bugs | Y Combinator | Apply | Library