

Show HN: A simple url shortener written in Node - dtreacy
http://www.github.com/danieltreacy/naurls
Was bored at work so I threw this together. First bash at a Node app. Built a simple Chrome extension for it too.
======
darklajid
I realize criticism is cheap, and I like seeing small projects like this to
explore other people's mind, but:

What's the reason for the valid_url_pattern in shortener.js?

/https?:\/\/([-\w\\.]+)+(:\d+)?(\/([\w/_\\.]*(\?\S+)?)?)?/

Why? This seems to be broken by design (I can construct invalid urls that
match and find valid urls that don't. What's the idea behind this expression,
why is it necessary?

And looking at the code it just seems to prepend '<http://> to the url if it
doesn't match this expression?

~~~
dtreacy
If you're able to input invalid URL's I guess I didn't test the regex
thoroughly enough ;) I'll look into it. The prepending "http" is just a lazy
hack to handle the case where they enter "google.com" instead of
"<http://www.google.com>, which would fail the regex, but is a valid URL. This
would be deprecated by improving the regex validation. Keep in mind I threw
this together in an afternoon.... But thanks for the critique ;)

~~~
inakiabt
Maybe adding a simple "^" at the begining of the regex should be enough :)

    
    
         /^https?:\/\/([-\w\.]+)+(:\d+)?(\/([\w/_\.]*(\?\S+)?)?)?/
    

Otherwise, could be used for XSS. Like this url (<http://naurls.me/39696e>):

    
    
         javascript:<script>alert('HELLO world');</script>http://naurl.me/

------
samarudge
Having a quick read through the code (so I might have missed something) but
the URL is generated from a substring of an SHA hash, does this not mean, at
least in theory, URLs could be duplicated?

With a hex hash at 6 chars, there are a possible 16,777,216 unique URLs
(Probably not an issue, unless you're bit.ly). But when there have been
1,677,721 URLs generated, there is a 10% chance a URL will be duplicated, at
8,388,608 there is a 50% chance, and so on. While those numbers are very high
and probably not an issue, it's worth considering this probably wouldn't work
on a large scale (Though I'm sure it wasn't designed too)

Not really a criticism, I still think this is a great example of how simple
Node is, just an observation

~~~
dtreacy
Good point. Yes, in theory they could be duplicated. I chose 6 characters
firstly to keep the url as _short_ as possible (I thought 5 chars was too
short, 7 too long).

I am planning on introducing a background process to clean up shortened URL's
that haven't been accessed in a given time frame (still deciding but I think
around 7 days). While it still wouldn't solve the problem you mentioned at a
certain scale, it would help. But true, I very much doubt it will ever reach
that scale.

Thanks for the critique

------
inakiabt
I'm new with Node.js, but the "shorten" process shouldn't be an asynchronous
process? Maybe something like this?:

    
    
       app.get('/shorten', function(req, res) {
          console.log('Shortening url...');
          shortener.shorten(req.param('u'), function(result){
            res.send(result);
          });	
        });
    

It's well known that Redis is pretty fast, but, citing "Node Web Development"
([http://www.amazon.com/Node-Web-Development-David-
Herron/dp/1...](http://www.amazon.com/Node-Web-Development-David-
Herron/dp/184951514X)) on it "Architecture: Threads versus asynchronous event-
driven" chapter, _"(...)Depending on the query that pause can be quite long.
This is bad because while the entire thread is idling another request might
come in, and if all the threads are busy it will be dropped. Looks like quite
a waste. Context switching is not free either, the more threads we use the
more time the CPU spends in storing and restoring the state. Furthermore, the
execution stack for each thread takes up memory. Simply by using asynchronous,
event-driven I/O, Node removes most of this overhead while introducing very
little on its own."_

It is not a criticism but a question.

