

Google's library for dealing with URLs (C++) - coderdude
http://code.google.com/p/google-url/

======
A1kmm
I would be careful using this with data from untrusted sources without knowing
exactly what you are doing, especially if you don't provide any hard limits of
length of URL data from users.

e.g. within a few minutes of looking, I found this:

    
    
      void Append(const T* str, int str_len) {
        if (cur_len_ + str_len > buffer_len_) {
          if (!Grow(cur_len_ + str_len - buffer_len_))
            return;
        }
        for (int i = 0; i < str_len; i++)
          buffer_[cur_len_ + i] = str[i];
        cur_len_ += str_len;
      }
    

cur_len_, buffer_len_, and str_len are all signed int (31 bits on most modern
32 and 64 bit compilers). Say cur_len_ is 1000, buffer_len_ is 1024, and I
convince the program to call append with a string of length 2147483647 (2^31 -
1). Then cur_len_ + str_len = 2147483647 + 1000 = -2147482649 < 1024, so the
first if passes, and my data gets copied to the heap past the end of the
allocated buffer.

This might crash the program in this case, but if I have more memory and can
control all parts of the URL, I could arrange to only write a few bytes over,
change headers used by the allocator, and exploit the system.

It took me longer to write this comment than it did to find something dubious
in the codebase, so I'd strongly suggest auditing the library carefully before
using it.

------
benfrederickson
I use this library at my work, and I'm happy with it - but a word of warning:
it's not trivial to integrate into your project. There is no makefile included
here, and you'll have to modify the logging.cc/logging.h code to even get this
to build under linux.

------
coderdude
There's also some Python bindings for it: <http://code.google.com/p/python-
google-url/>

~~~
kierank
Python already has url processing functions inbuilt. Are the google ones
better in any way?

~~~
coderdude
Python has urlparse, which parses a url into its various components but this
library handles canonicalization among other things. It also handles uri
schemes that urlparse does not. Not to mention the speed difference between
the two libs (google-url can handle more urls per second). I think it also is
able to distinguish between the domain name and the public suffix.

------
hendler
Really interesting that this project was last modified in 2007, but it's still
news. Still news because there's so much great, open stuff out there and even
with ... google ... it's still hard to discover.

If you know what you are looking for google is great, but I think this post is
a good example of why I rely so much on Hacker News.

------
zandorg
Does anyone know of a Perl-like strings library for C? Perl's string replacing
code is miles better than anything I can find for C.

For instance, the PCRE library provides Perl compatible regular expressions.
Why not the same for strings?

~~~
nuxi
Maybe not exactly Perl-like, but you may find the bstring library useful:
<http://bstring.sourceforge.net/>.

The documentation also lists alternative string handling libraries like c2lib,
wxString etc.

------
wslh
At last!

~~~
coderdude
At last, what? This project has been around since at least Aug 7th, 2008.

~~~
jacobbijani
At last he's found out about it, duh.

