
Lon.gs: A URL shortener in C - efnor
http://lon.gs/
======
colemickens
Is _anyone_ suffering from lack of URL shorteners? Or does anyone really care
what their URL shortener is implemented in? Is there even any way that the
application code contributes to the request time more meaningfully than the
persistence mechanism used?

Cool hacks are cool I guess but from the sounds of it, the C code is a bit
scary. If you had to build this, why not build it in Rust? At least it
wouldn't be so terrifying from a security standpoint and you'd still get
whatever performance is supposedly needed.

~~~
q3k
This code is more than a bit scary. This code is of very poor quality, is very
poorly documented and has numerous (I'm pretty confident that they're
exploitable, too) bugs.

It really isn't a project that I'd recommend to anyone, unless I wanted a
shell on their machine. I'm quite distraught that it is getting this many
(seemingly blind) upvotes.

~~~
ryanlm
Can you define shell? Do you mean you can get a bash shell from an exploit in
this code?

~~~
msbarnett
Yes. If there's a working buffer overflow anywhere in code (and there appear
to be several here), you can turn that into a call out to a shell, and hand it
some script code to perform.

~~~
ryanlm
Wouldn't the process just die? It would say segmentation fault or something
and return to the caller I would assume. How would you get a bash shell?

~~~
msbarnett
> Wouldn't the process just die? It would say segmentation fault or something
> and return to the caller I would assume.

A segfault only happens when you try to access a virtual memory address not
mapped to your process, or try to write to a page mapped to you but not mapped
as writable.

Remember, this is C. There's nothing to stop you from writing past the end of
some memory like an array as long as whatever memory you're writing into still
belongs to you. If you write _far enough_ , you'll eventually walk out of your
mapped memory pages and trigger a segfault, but if you don't stray toooo far,
you can overwrite some important data and absolutely nothing will complain.
When people say C isn't a safe language, they mean it. C will let you get away
with murder.

And it turns out that there's some very important data you can overwrite this
way.

> How would you get a bash shell?

Let's say there's a function foo() that writes some data to a buffer on foo's
stack, and I can control what data it writes (because it's data from a text
box on a webpage, let's say). And foo() is buggy and doesn't validate that in
all cases the data I control fits in the bounds of the buffer it writes to.

I can then overflow the buffer with my data, and take advantage of that to
overwrite the return address of foo(), because the return address for the
function happens to exist past the end of all the stack local memory for the
function. When foo() returns, it will jump to the address I wrote in there,
instead of where it was supposed to go back to. And as long as that address is
in the process's mapped memory pages, again, nothing will complain.

To get a shell, as part of the overflow I either insert the binary data
corresponding to the x86 instructions for something like a call to
execve("/bin/sh", ...) and then have foo()'s return jump to the beginning of
my instructions, or I cause the return to jump to some other code or library
that will do that for me that happens to already be in place (there are more
sophisticated versions of this exploit called Return Oriented Programming).

If want to read up on this:
[https://www.win.tue.nl/~aeb/linux/hh/hh-10.html](https://www.win.tue.nl/~aeb/linux/hh/hh-10.html)

------
pslam
Looking at the coding patterns used in the C source, I am utterly horrified
this is running live on a public facing website.

I can see at least one buffer overrun dependent on database contents, and I
wouldn't be surprised if there's public-facing vulnerabilities in this thing,
but I don't want to spend another 5 minutes looking.

~~~
efnor
Judging by the code, everything that goes into the database is a Base64
encoded. Where do you see a buffer overflow?

~~~
willvarfar
Long URLs make it crash. Really. Just try and put in a very long URL...

[https://github.com/riolet/longs/blob/master/longs.c#L238](https://github.com/riolet/longs/blob/master/longs.c#L238)
looks like an example of a buffer overflow.

Etc.

~~~
snerbles
SIZE is defined as 2048 bytes (line 59). Firefox can do GET requests with 8KiB
URLs.

~~~
willvarfar
The URL is base64-encoded before they try and fit it in a 2K buffer, so the
the longest URL it can handle is actually shorter than 2K.

------
didip
I respect the desire to work on low level language like C. Programmers should
not be afraid of it.

But that said, building a proper HTTP stack is not trivial.

If you want to use C language, then why not create Nginx module?

Nginx already solved the hard problems:

* HTTP parser

* Distribute work via event loop on multiple workers

* Useful load balancing strategies (not as great as HAProxy, but i am satisfied with it)

* Serious effort in dealing with CVE

* Widely used and battle tested

Here's a fine guide on how to write Nginx module in C:
[http://www.evanmiller.org/nginx-modules-
guide.html](http://www.evanmiller.org/nginx-modules-guide.html)

~~~
AdamJacobMuller
[https://kore.io/](https://kore.io/) is also a (very?) good C framework for
this stuff. I don't know enough C to honestly evaluate that though.

~~~
didip
Thanks for sharing, I haven't heard about this framework.

------
tptacek
The C code in the "framework" this thing uses is pretty scary; grep for
MAX_BUFFER_SIZE, malloc, strcpy, &c. The header parsing in particular.

~~~
q3k
Agreed.

    
    
        q3k@nihilism ~/Projects/longs $ python2 -c "print 'GET / HTTP/1.1.\r\n' + 'a'*2000 + '\r\n\r\n'" | nc 127.0.0.1 1337
    
        q3k@nihilism ~/Projects/longs $ PORT=1337 ./longs
        *** Error in `./longs': free(): invalid next size (normal): 0x000000000155a5f0 ***
        ...
    

Sounds like a fun heap exploitation challenge. Almost CTF-like.

EDIT: It also throws a whole bunch of warning when compiling. [moved this here
from the first line after child post mention]

~~~
ryanlm
That's not a warning. That's a runtime error.

~~~
q3k
I know, this is some internal glibc allocation code failing because that
request overflowed a buffer on the heap. The bug is, as tptacek mentioned, in
the header receiving function [1].

The compilation warnings was an additional remark, I should've phrased that
post better.

[1] -
[https://github.com/riolet/longs/blob/master/wafer.c#L334-337](https://github.com/riolet/longs/blob/master/wafer.c#L334-337)

~~~
tptacek
Not the only case of this, either.

------
alternize
hum. I tried to short "[http://lon.gs"](http://lon.gs") and then short the
result again, now it redirects indefinitely...
[http://lon.gs/ack](http://lon.gs/ack)

------
kornish
Seems a bit buggy. I shortened "[http://hello.com"](http://hello.com"), it
returned "lon.gs/akm", and I visited the shortened URL only to be redirected
to [https://codeandoando.com/integracion-continua-con-
drone/](https://codeandoando.com/integracion-continua-con-drone/).

------
BrainInAJar
Looks like the perfect way to turn URL's in to root shells on the hosting
server

------
ryan-c
From
[https://www.reddit.com/r/C_Programming/comments/4p5ung/longs...](https://www.reddit.com/r/C_Programming/comments/4p5ung/longs_we_wrote_a_url_shortener_in_c_with_wafer/d4iihi7)

    
    
        $ curl -i http://lon.gs/abf
        HTTP/1.1 301 Moved Permanently
        Location: foo
        X-Evil-Header: evilvalue
    

there were also examples of XSS and data URIs.

(they claim to have fixed this elsewhere in the thread, but I guess some of
the "evil" URLs still work)

------
zoom6628
Some comments about comments and the code: 1\. The URL shortener is just a
demonstration of the framework being used. For a POC/MVP i would not expect it
to be the most reliable code on the planet. Im ok with that. 2\. The WAFer
framework is great idea for bringing a very lite server and minimalist
framework to certain devices. This has applications in SBC and IOT devices
where all resources are at a premium. 3\. Yes the C code might not be perfect
but remove any github project that isnt perfect, secure and you wont have many
left. The principle objective is 'get it out there' and let people like the
other commenters have their input/opinion from which to make it better. 4\.
The concept and this lite implementation of the idea is hugely useful in
certain use cases. Just like nginx is great for general purpose servers. This
is almost a C implementation of Bottle for python. 5\. nodesocket and
chadscira seem to have gotten the point of this and given useful feedback to
the authors (Im not one of them in case you were thinking that).

So to sum it up - a great post, excellent work with a tool that has a lot of
potential for specific scenarios. When deploying if using things like
Cloudfare Edge and giving it a bit of Productizing (my day job is as a Product
Manager for a global ERP business) then this could be a hit (pun intended :-J
).

~~~
tptacek
The vulnerabilities found so far are in the framework, not in the application
code. Those vulnerabilities are grave.

------
mwcampbell
I'm surprised there's still any interest in standalone URL shorteners. Didn't
Twitter make them obsolete when it implemented its own?

~~~
rsync
We[1] just launched a new shortener and it supports URLs.

That's not the primary use-case for it[2][3], but we're happy to support
people shortening URLs if they want to. The reason there's room for this is
that our URL shortening function has no ads, no third party tracking, no bloat
... no dark patterns.

That's worth something to some people.

[1] Oh By, Inc.

[2] [https://0x.co/examples.html](https://0x.co/examples.html)

[3] [https://0x.co/hnfaq.html](https://0x.co/hnfaq.html)

~~~
hk__2
> The reason there's room for this is that our URL shortening function has no
> ads, no third party tracking, no bloat ... no dark patterns.

But it’s still a closed, proprietary database and there’s no way to decode an
“Oh By Code” if the service goes down, which is the disadvantage number one
you have with URL shorteners.

> The real utility are the easily recognizable codes, prefixed with "0x".

Why not using a prefix that helps differentiate your codes from hexadecimal
numbers? How do you plan to get people to know your service if it’s not
recognizable? Let’s say I put one of those codes on my business card, I still
have to write somewhere that people should use 0x.co to access the content,
which ruins the advantage of having just one code rather than a URL to my
website (literally anyone knows how to open a URL in a browser).

~~~
rsync
"But it’s still a closed, proprietary database and there’s no way to decode an
“Oh By Code” if the service goes down, which is the disadvantage number one
you have with URL shorteners."

Yes, that's correct. In this case, "Oh By" will continue running indefinitely.
This is true for two reasons:

1) Building an extremely lightweight, ad free interface has a happy byproduct
of ... being extremely lightweight. The infrastructure requirements for even a
wildly popular "Oh By" are trivial.

2) Because Oh By is self-funded, there is no pressure from outside parties to
_break_ the pattern of an ad free, tracking free service with zero bloat.

If you're having trouble swallowing this, remember that rsync.net has been
running continuously since 2001[1]. I think that's a credible track record[2].

[1] As a feature of JohnCompanies, the first VPS provider, and then as a
standalone corporation in 2006.

[2] In fact, the rsync.net warrant canary turns 10 years old this year.

~~~
paulcole
Past performance is no indicator of future success.

There's a 0% chance this will be running "indefinitely." You're just skirting
the issue by making a promise you can't keep.

~~~
rsync
The "Oh By" shortener will be running, and codes you create today will be
functioning, twenty years from now - in 2036.

------
chadscira
CloudFlare is a perfect companion for something like this because you can hard
cache the redirects on their edges. I have a few services that run off tiny
boxes, and just leverage CloudFlare free edge caching.

~~~
ianlevesque
And handle an HN influx better than many Wordpress sites.

------
nxzero
Shortened a URL, but service doesn't appear to redirect:
[http://lon.gs/akt](http://lon.gs/akt)

EDIT: Very strange, the redirect now goes to a URL I didn't enter...
([http://www.sadfasdfasfdasdfsadfasd.com](http://www.sadfasdfasfdasdfsadfasd.com))

------
nodesocket
Seems great and performant, but not productized. Seems like things are hard-
coded in the c code. For example...

What are the api endpoints? docs? Can you view a list of all shortened urls?
Can you delete shortened urls?

Can you change the base domain of lon.gs?

------
andrew3726
Website seems down (connection refused). Did you use any network library
(aside from native, I mean)?

~~~
tcdent
Is the website actually the URL shortener, or some other stack? I think it's
highly unlikely HN just C10K'd them.

~~~
fideloper
t2.nano has CPU credits, they may have gotten spent and we're now seeing the
results of throttling (limit on % cpu allowed to use)

------
616c
Perhaps I am the only one, but is anyone interested in the opposite direction,
static site generator like CLI app to do short linking?

YOURLS was popular for a while, and I tried it, but I was concerned with
running a not very popular PHP app even on shared hosting. At least Wordpress
gets decent attention. I was worried of people compromising my own YOURLS
instance against me.

[http://www.cvedetails.com/vulnerability-
list.php?vendor_id=1...](http://www.cvedetails.com/vulnerability-
list.php?vendor_id=11533&product_id=21232&version_id=0&page=1&hasexp=0&opdos=0&opec=0&opov=0&opcsrf=0&opgpriv=0&opsqli=0&opxss=0&opdirt=0&opmemc=0&ophttprs=0&opbyp=0&opfileinc=0&opginf=0&cvssscoremin=0&cvssscoremax=0&year=0&month=0&cweid=0&order=1&trc=2&sha=59b76e93ee62fd17aa2253076d9777cb4f7d57f6)

~~~
JonathonW
There's this, which is a script that manipulates Apache .htaccess to do short
linking:
[http://lucasgonze.github.io/shurl/](http://lucasgonze.github.io/shurl/)

No idea about equivalents for nginx, or something that could run on Github
Pages (these would probably be the same, given that nginx doesn't have an
equivalent to .htaccess out of the box).

~~~
616c
I am going to check out shurl, very interesting. It does sound familiar but I
cannot remember why. Sometimes I see these things and forget to bookmark them
later.

------
vmarsy
the URL creation and the browsing of a short URL definitely works fast, I
guess this being on the front page is a cheap way to verify if the C10k claims
are true!

------
ryanlm
I consider the fact that it is written in C a feature.

~~~
josteink
And now, reload this entire thread and check all the horrible bugs and
security vulnerabilities this so called "feature" got you.

This is why you should never use C in networked code or when working with
third-party data unless you absolutely bloody well have to: It's just too many
ways to fuck up, and most programmers will.

~~~
ryanlm
You're forgetting something about networked code. At some layer of the stack
it has to be written in C or assembly.

~~~
nomel
No he didn't forget. The lower layers, yeah it probably will be in C (but it
doesn't HAVE to be). But in the application layer, it probably shouldn't be.

If things like memory management are not _the_ first priority when writing
every line of code, then you shouldn't be using C, and that's really the only
reason you should be using C, when there's a specific need for memory
management.

Not surprisingly, this is exactly why the world has moved beyond C for the
application layer, it's painful to have to think about that stuff constantly,
so people just don't (this being an example).

------
nine_k
Let's look at this project as a cautionary tale.

------
logicallee
This is broken.

Here's the short url of
[http://news.ycombinator.com](http://news.ycombinator.com) \--> lon.gs/amk

Going to lon.gs/amk redirects me to an overstock.com address for a specific
product.

The site doesn't appear to have been hacked, at least there's no affiliate
link in the URL I was sent to. It just appears the site is broken.

~~~
longs
Thank you for raising this bug. We have fixed it.

------
nxzero
Oddly, was hoping this was a way to turn URLs into long URLs; 2,083 characters
if you want to support all the web clients.

~~~
nomel
There are many, like [http://longurlmaker.com](http://longurlmaker.com). I
find [http://shadyurl.com](http://shadyurl.com) to be more useful in getting
people to not click a link though.

------
theseoafs
If there are any aspiring crackers looking to bring down their first site,
this one should be easier than the average

------
ryanf323
This does not handle Microsoft office URL pre-fetching...and is written in
C...

------
knicholes
Wasn't there something posted recently about urls that are too short are too
easily guessed and that it's good to make sure that these short urls are
longer?

------
cmdrfred
It seems pretty responsive.

------
efficax
Why would you write something like this in C anymore? If you really need it to
have a tiny memory footprint, write it in rust. If you need it to be fast, but
can handle a runtime taking care of memory management, write it in go. C is
just asking for trouble anymore.

------
ratsimihah
this made my URL longer :/ lon.gs/ae9

