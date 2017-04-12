In patent search, there is no "search" box. Instead, the "quick search" forces you to specify two (and exactly two) text queries on the database columns with obligatory boolean operation.[1] Even if you happen to find an interesting patent, good luck linking to it (which should be the number one service they provide - GET individual patent documents). The page showing the patent document has a dozen cryptic query parameters in the URL, some of which relate to the search query you used to find the patent! No "make shareable link" button to be seen, either.
And don't get me started on the trademark search, or Trademark Electronic Search System (TESS), as they like to call it.[2] When you navigate to the front page, you get a private session identifier - in your URL of course! And when you search for a trademark ("simple search" is intuitively known as "New User" here) and select a TM to view, you would be excused of thinking that the short URL in your browser address bar is the linkable URL of this entry. But no - it's just your session identifier along with the document's index in the results of your last search query.
When you leave the trademark site or just click "Logout" (since you're a kind person - they after all ask you "logout when you are done to release system resources allocated for you"), that URL is gone in the wind. If you shared a link to that trademark to your friend, they only get this very helpful page:
<H1>This search session has expired. Please start a search session again by clicking on the TRADEMARK icon, if you wish to continue</H1>
1: http://patft.uspto.gov/netahtml/PTO/search-bool.html
2: http://tmsearch.uspto.gov/
During the explosion of the web, enterprises moved much of their development into web solutions using an existing workforce that had skills in different technologies - often client-server desktop-to-RDBMS tech. Java was a popular choice for these new projects. Those developers would range in their general aptitude - some were quite good, some weren't - but they all came onto these projects with existing skills and knowledge in patterns and architectures that weren't particularly well aligned with web development. They were expected to adapt and deliver at pace consistent with their previous projects. So they did what they could to get the job done, even though that meant that their solutions were not well suited to the web environment.
That doesn't have a lot to do with Java specifically, except that it was a common language for that situation. It was far more common for enterprises to take VB/PowerBuilder/Delphi/etc Devs and move them into Java than something like PHP.
But Java was more likely to produce this horrible implementation for a couple of reasons.
Firstly, as a general purpose language (that also happened to have decent web capabilities) it was easy to design an application using your traditional patterns and then just shoe-horn the web UI on top. You were far less likely to do that in a web-oriented platform like PHP or ColdFusion (!).
Secondly, the Java servlet spec makes sessions really easy (because it stores standard, stateful, Java objects) and didn't offer a lot for managing complex page state. You could spend several days trying to build something to page through database search results with query parameters, only to find that the back button broke your design. Or you could just throw it all into the session, and call it "done".
That probably peaked about 15 years ago, but inertia is a strong force, particularly in government and large enterprises. Some of those developers are probably still working like that. Some of them trained others. And many others know that it's a bad design, but there's no budget to fix it.
And Java was disproportionately used for very large complex systems which would be expensive to replace, in any case.
https://www.youtube.com/watch?v=PH1BKPSGcxQ
Reading other comments, it's now clear that in this instance we are dealing with a Microsoft Server. [0]
[0] https://news.ycombinator.com/item?id=14173595
Most web technologies support rendering a view based on a set of values residing in a database, and use the values to populate a template. The template contains placeholders reserved for values expected to originate from a database.
The version of the page that appears in your web browser usually does not exist as a serialized HTML page on a disk connected to a server. It often only exists as a representation of two pieces of information bound to a session in memory on a server: 1. The Template or server-side script, 2. the values you are authorized to retrieve from the database, which may or may not be a SQL database.
What you see in the address bar does not need to correlate to the HTML rendered in the browser window. JavaScript (which is not Java) often employs this technique, termed as a "single page web application."
This is true for PHP, Mcrosoft, Ruby, Python, Perl, server-side JavaScript (such as node.js) and yes Java too.
It is not Java-specific, but it is common to a few specific Java frameworks as well as at least one of Microsoft's older frameworks (I forget which one).
The vast majority of frameworks (including most modern Java frameworks) do the sane thing and keep application state on the server side (often in a database, as you describe) and UI state on the browser side, which allows links and forms to work the way links are generally supposed to work.
ASP.NET WebForms has a feature where the Session ID, normally stored in a cookie, is stored in the request URL path instead. This was done back in 1999-2001 to support extremely rudimentary HTTP clients which did not support HTTP Cookies - imagine really prototypical mobile-phone web-browsers or hastily-written shell scripts using CURL.
Fortunately it was disabled by default and the documentation describes why it's a bad idea to use it - and I understand the business reasons for including it - however this design does not break things like multiple-tab browsing: that's just what happens when you have any stateful web-application, regardless of framework.
It's now gone in ASP.NET MVC (where you gave full control over rendered URLs), so that horrid chapter has ended.
What's being described above isn't simply template-based rendering
The GP didn't supply a link as an example. It is simply a non-specific anecdotal remark, with no further evidence provided.
> The bizarre practice of assigning session keys to visitors and somehow storing the page they're viewing in the server instead of in the URL is pretty common in Brazilian government, which is dominated by Java programmers.
Having dealt with such a service just today (King County Recorder's Office public records search) that behaves in this manner, I felt pretty familiar with what I was describing.
It is entirely possible I'd misapprehended the situation, but as it's an informal conversation on the internet about bad web development practices, I'm going going to dwell on it too much.
I don't know how could anyone think I was talking about general template rendering (but maybe I expressed myself badly, English is not my first language).
https://en.wikipedia.org/wiki/Single-page_application
https://en.wikipedia.org/wiki/Session_(computer_science)#Ser...
This existed since the Perl CGI times. "Bizarre practice", wow.
This is like your whole website existing as a single endpoint and making POSTs to the same URL to get to different parts of the site.
Many, if not most, isomorphic react/node.js apps have a single API url, with microservices behind the load-balanced URI, that accepts HTTP POST requests, and looks at a JSON object sent as the request body, in turn responding with a JSON object that informs the client-side app to re-render accordingly.
This isn't bizarre at all. It is very, very common; if not the norm.
It could have been redeveloped in anything, but kept the original url scheme for backwards compatibility.
Do you honestly believe anything this terrible went through a rewrite? I sure don't.
Question: if it's really this bad, this site is going to have incredibly poor ratelimiting or per-IP analytics/access controls, if it has these things at all.
So, it probably wouldn't be too impossible for someone to build a new site (maybe even an API) that talks to this and prettifies the results, lets you copy URLs (most likely via caching¹), and so forth. (The case in point about my previous paragraph is that the new site would generally hitting the old one, possibly several times a second, from one IP.)
You'd just need a strong mind to handle the inanities of talking to this system. :P
¹ But with the caching thing, you'd absolutely have to have a disclaimer stating that the services don't replace the USPTO website, yada yada (along with some wording buried in a policy document that carefully points out that the data is cached). I mean you'd need that anyway, but yeah.
[1] https://www.uspto.gov/blog/ebiz/search?q=
Jesus christ.
Probably the only thing keeping this from being abused is that it's the government, it's a low-value target, and they're paying millions upon millions for someone to support this trainwreck with security patches.
The amount of computing power it takes to encrypt with SSL is minimal, especially if you use some of the newer systems like ECDSA and should not be of concern to a company like the Patent Office.
I don't want to speculate about what's going on at the managerial/administrative level, but I notice the current administration is committed to the goal of slashing most government spending by some huge amount while simultaneously cutting taxes. It may be that the head of the USPTO got a phone call telling them not to spend a single damn penny. Now, iirc the USPTO is actually self-financing on patent application fees, but I don't think they're so independent that they can just ignore directives from higher up in the executive branch.
/s
Congress has often tried to undermine the ability of the EPA, IRS, NIH, NOAA... to do their job which then makes it seem they are ripe for disruption.
Are we talking about the same government??
They'd have that data already, so could just share it directly.
I'm just playing Devil's advocate here.
No, a third-party attacker can just look at size/timing of packets to figure out which page is being viewed, especially given it's among a limited and static corpus.
If third party tracking (for malicious intent or otherwise) is the main reason behind the change, why not do it how everyone else does?
It stands to reason they just don't want to deal with SSL termination anymore, for whatever reason. Though, at least in my eyes, that's a solved problem too.
It's also possible that their configuration was causing them performance problems and decreasing overhead by killing HTTPS for "unnecessary" endpoints was seen as a potential solution. Requesting a public record about a patent is not something that, at first glance, seems like it should need to be transferred over a secure protocol.
Of course, none of these are really good reasons to disable HTTPS, but they're some potential explanations.
-----
Separately, I think some people who remember HTTPS being used to secure "true secret" pages kind of resent the "HTTPS must be used anywhere and everywhere" trend that has taken hold. It's not that there aren't good reasons to do that, but it's also silly to pretend there aren't side effects of doing it.
From some perspectives, the need to encrypt all communication can be seen as an external concern for something like a VPN tunnel to handle. End-to-end crypto is good because it, theoretically, precludes reception from anyone who can get in the middle of the server and the VPN, but it needs to be more transparent before everyone is willing to consider that a worthwhile/important tradeoff.
One side effect of HTTPS everywhere is that the site can no longer really designate some portion of traffic as "secret". If every admin in your org needs to be able to decrypt all HTTPS traffic to debug issues, you're giving some access away. Maybe some of them would've been able to get to that data anyway, but probably many of them would not.
Again, this is not to say that that HTTPS shouldn't be used, but just some musings into why someone would not necessarily be enthusiastic about it. Working to integrate HTTPS more transparently to admins and working toward the ability to mark specific information for extra "app-layer secrecy" instead of just relying on transport-layer secrecy seem like they'd be good steps.
I know you were only trying to coming up with some kind of reason but, there just isn't a valid one.
HTTPS everywhere reduces the number of teams that used to, in the old "HTTP-only" world, serendipitously pitch in to help troubleshoot tickets. Now, instead of anybody within the network capable of sniffing HTTP packets, only one or two groups are limited to troubleshoot.
In your example, terminating SSL at the LB, or adding a proxy in front of the app, would either be an annoyance or major project, respectively. Small firms wouldn't think twice and would jump into action; but large organizations have too much internal inertia.
I see your point too, but the USPTO probably: a) is underfunded; and b) exhibits all the average capabilities and organizational "effectiveness" of a large bureaucracy.
Perhaps a better question is whether the USPTO would object to having their site content mirrored by a 3rd party better capable of offering features that users are complaining about (HTTPS & better search). Google has their own version[1].
[1] https://patents.google.com/
What would be others expectation for such a service? USPTO do have a web team, yes? That site has been the same for over a decade AFAIR, what have they been doing?
So, even though you could setup a working system top to bottom from scratch, depending on which team you're attached to you'd have to design, explain/argue and work with other teams and their overbearing workloads and attendant baggage.
An experienced team with full ownership of load balancers, firewalls, hosts, security and applications could conceivably do this in their sleep inside of a week -- but few large organizations split their responsibilities in this manner.
I don't know anything about USPTO and am only making sweeping generalizations, but my experience with government and large network operators seems to generalize well (so far).
This is why I hope that other 3rd parties can somehow export USPTO data and make them available somehow. Not sure if this is a potential target for the Archive Team[1] or archive.org, although the missions aren't cleanly aligned to this particular need.
[1] http://archiveteam.org/index.php?title=Main_Page
https://securityheaders.io/?q=www.uspto.gov&followRedirects=...
HSTS is 1 year at the time this comment is posted. They're in for some pain.
They are basically going to be DOSing a huge segment of users who've previously had that header set on their browsers...they're also violating the OMB mandate that requires TLS for all government sites...it is a rather strange move, I can't imagine there is a good reason for it.
EDIT
If it is for portal.uspto.gov then HSTS is a non-issue but still a very bad move.
Is it? I still see HSTS from portal.uspto.gov with the same age.
doesn't seem to show anything for the domain
Has this been superseded by a new policy?
[1] https://cio.gov/resources/it-policy-library/
>Immediately after the maintenance, users will only be able to access Public PAIR through URLs beginning with HTTP, such as http://portal.uspto.gov/pair/PublicPair. Past URLs using HTTPS to access Public Pair, such as ...
A URL beginning with HTTPS ALSO begins with HTTP
HTTP/1.0 302 Found
Location: http://portal.uspto.gov/pair/PublicPair
Server: BigIP
Not unless you think "extract a lot of money from clueless execs" is secret sauce. That, and support contracts - you know, throats to choke when it all goes wrong.
Also, F5 load balancers have real hardware failover capability, and can even synchronize TCP session state across instances. That's a pretty nice feature.
Plain old TLS termination isn't license-limited as far as I know.
EDIT: to answer your question, yes, you could probably do it with haproxy, but the added value in these appliances is iRules (TCL hooks for all network events, you could augment request processing etc) and vendor support.
I seem to remember even the entry license includes full TLS offloading so I doubt the poster above is correct that it is a cost issue.
As to if HAProxy can do the job, well, that depends. F5s are complex beasts and they can load balance application specific protocols that can be hard to find elsewhere, with the support contract that goes with it.
Or that the browser vendors that trust the CA the signed the certificate aren't just puppets for our lizard overlords.
Edit: the PDF attachment URL's are very predictable, they're the number of the patent in a weird order + a page number, e.g.: http://pdfpiw.uspto.gov/10/292/096/2.pdf
Back of the envelope calculations say all PDF's should only take 1 - 6 TB's (assuming 100kb to 600kb in PDF's on average). Seriously, why hasn't anyone mirrored this?
[0] https://www.uspto.gov/learning-and-resources/bulk-data-produ...
[1] http://patents.reedtech.com/Public-PAIR.php
Copyright generally doesn't stop people from copying what they want to copy. Do only BigCo's have a use for patents?
I wonder if 18f can help at all, it seems agencies must contact 18f themselves.
1: https://www.digitalgov.gov/2017/04/12/dotgov-domain-registra...
Horse carraiges: 1400
Automobiles: 1890
I'm claiming the technological backstepping is not nearly as drastic as the GP implies.
That's around the time carraiges got suspension and were pretty much like the modern form.
I've been involved in writing a few and the result was useless drivel. Combined with the fact that there are penalties for willful infringement, I'm not sure what the benefit to reading patents in your field would be.
This however seems to be everything:
https://pairbulkdata.uspto.gov/
Looks like people tend to be satisfied with Google Patent as well.
* except uspto.gov
Are there any other .gov sites doing this? Can anyone shed more light on this?
Fortunately, there are quite a few workarounds and opportunities that minimize all the associated risks ...
And it's legal to publish a .GOV site using Drupal?
The US Federal Government is essentially the arbiter of all regulations and minimum standards within the United States. It's just surprising to see an opensource framework running on Apache/Coyote expected to run over HTTP.
It indicates what department of the USPTO manages that page of the website. All pages have a "page-owner" tag in the footer.
https://obamawhitehouse.archives.gov/blog/2015/06/08/https-e...
Does anyone have any experience with their support system? On Monday I will be calling them in an attempt to understand why they are exempt from the HTTPS Everywhere federal directive.
But to answer your other question, as part of the Department of Commerce, a "CFO Act" agency, USPTO would not be exempt.
Don't do this.
"Don't do this" is probably not very convincing.
Don't be the person that gets someone fired, arrested, jailed, and in lifetime legal trouble because some asshole thought it would be hilarious to display porn on their computer at work.
(Yes, I'm equating "meatspinning" to malware. Argue if you like about that, but the main point stands: If you meatspin someone, you're taking a chance of ruining their entire life. Don't be That Guy.)
No it doesn't.
> Yes, I'm equating "meatspinning" to malware
Don't be That Guy.
Not to defend "meatspinning" (first time I've heard the term), but don't you think the real problem here is that people can be fired/jailed/etc without due process is the problem here? This was just some drive by malware, imagine what you could do to someone's life if you intentionally targeted someone?
If he was fired for meatspin or lemonparty, he would publicly accuse them of homophobia on twitter and things would be different ;)
*https://thestack.com/security/2017/04/12/netflix-found-to-le...
So it seems that the maintenance will turn of HTTPS, not that it's unavailable during the maintenance.
> Immediately after the maintenance, users will only be able to access Public PAIR through URLs beginning with HTTP, such as http://portal.uspto.gov/pair/PublicPair. Past URLs using HTTPS to access Public Pair, such as https://portal.uspto.gov/pair/PublicPair, will no longer work.
Immediately after the maintenance, users will only be able to access Public PAIR through URLs beginning with HTTP, such as http://portal.uspto.gov/pair/PublicPair. Past URLs using HTTPS to access Public Pair, such as https://portal.uspto.gov/pair/PublicPair, will no longer work.
> Immediately after the maintenance, users will only be able to access Public PAIR through URLs beginning with HTTP,
In patent search, there is no "search" box. Instead, the "quick search" forces you to specify two (and exactly two) text queries on the database columns with obligatory boolean operation.[1] Even if you happen to find an interesting patent, good luck linking to it (which should be the number one service they provide - GET individual patent documents). The page showing the patent document has a dozen cryptic query parameters in the URL, some of which relate to the search query you used to find the patent! No "make shareable link" button to be seen, either.
And don't get me started on the trademark search, or Trademark Electronic Search System (TESS), as they like to call it.[2] When you navigate to the front page, you get a private session identifier - in your URL of course! And when you search for a trademark ("simple search" is intuitively known as "New User" here) and select a TM to view, you would be excused of thinking that the short URL in your browser address bar is the linkable URL of this entry. But no - it's just your session identifier along with the document's index in the results of your last search query.
When you leave the trademark site or just click "Logout" (since you're a kind person - they after all ask you "logout when you are done to release system resources allocated for you"), that URL is gone in the wind. If you shared a link to that trademark to your friend, they only get this very helpful page:So no way to link to individual TM registrations here either.
1: http://patft.uspto.gov/netahtml/PTO/search-bool.html
2: http://tmsearch.uspto.gov/