

That UDID script I wrote - 350K hits... (w/source) - bbunix
http://blog.maclawran.ca/udid-it-350k-hits-on-a-little-server-in-less

======
acqq
Security-wise, the traffic to his server is http, meaning any man-in-the-
middle can collect tuples of UDIDs, IP and browser fingerprints, even if he
doesn't collect anything.

If you worry about UDIDs being tracked I guess you should worry about the http
to his page too.

However, according to somebody who apparently tested it a year ago (and posted
it on HN), half of the iPhone apps pass UDID over the same http, allowing
collection of the very same or similar data.

Which makes the whole subject of a government agency having UDID database
without other convenient data, ummm, quite less probable. IMHO most of the
"business entities" are able to collect much more than presented here.

------
TomGullen
The page size is just shy of 3kb. At 350k hits, that's around 1gb of data
transfer.

If he had 100 requests in one second (average is 8/ps over 12 hours), that's
300kb per second transfer rate.

1gb of data transfer for a single day is relatively small. Server upload rates
of <1mbps is also small.

If we make the page 100kb in size now, for example with a few nice images 350k
hits turns to 34gb of data transfer. 100 requests in one second would require
a 10mbps transfer rate. A $15p/m host likely wouldn't be able to cope with
this.

> I think the combination of Lighttpd+php5-fpm is underappreciated...

So for me the real reason the site stayed up was not because he picked PHP or
any other technology, but because the page was barebones and didn't put any
strain on the server at all. Page file size is an underappreciated advantage
for websites!

------
getsat
It's cool that he got so much press in such a short time, but applauding
technology choices that allowed him to sustain _8 requests per second_
(assuming this was over a 12 hour period) is ridiculous.

~~~
redslazer
I really doubt that these 350k hits were spread neatly over 12 hours.They most
likely came as one giant rush. The author would have to confirm.

Considering that lots of peoples blogs die because they get 10k visitors from
HN. His small $15 vps did fine.

~~~
nodesocket
The PHP script is super simple, so lighttpd (which is a great web server, in
the same performance realm of nginx) and php-fpm on a shared VPS, is plenty of
power and resources.

Never seen $found = `grep $udid FILE` using back-ticks in PHP land. Is that
the same as the exec() function?

~~~
thwarted
It's also grossly insecure unless you sanitize $udid, which thankfully the
author of this script does.

It's still relatively unsafe, as it invokes external programs through a shell,
so you're dependent on the shell's environment, which can diverge on different
systems. Last I checked, PHP did not have an easy, obvious way to safely
invoke another program. You can do it manually with a mix of pcntl_fork and
pcntl_exec, but capturing the output using that is more difficult.

~~~
fooyc
PHP provides escapeshellcmd / escapeshellarg functions for escaping when
building commands.

Is there any shell on which those functions are unsafe ? I think they assume
cmd.exe on windows and sh on others.

~~~
thwarted
Escaping is difficult to get right, especially in PHP... witness things like
magic quotes that existed for so long.

The right thing to do is provide an output capturing API that does not require
escaping because it takes a list of strings passed to the exec(2) system call.

------
Udo
I had basically the same experience a few years ago when the Wikileaks stuff
was first released and I hacked together a basic search engine for it.
Interestingly, I got most traffic through Twitter which I hardly ever use
personally.

Anyway, that was an EC2 Micro running PHP-FPM on Nginx, and it stayed up
without problems through millions of hits over a few days as well. It's easy
to write a well-scaling single-function site like that.

Yet a lot of blogs and more complex sites die instantly when they cross a
couple of dozen hits per second. I suspect there are multiple reasons for it,
but the most egregious one coming to mind is the typical PHP setup within
shared hosting environments where every single request means about 30 file
compilations and a lot of DB requests. Caching is probably not very popular
either.

------
mgz
My <http://pastehtml.com/udid> got 130k hits yesterday. PasteHTML.com itself
and this UDID tool are running on a moderate dedicated server,
nginx/rails/postgresql.

------
Daviey
What has impressed me most, is that a pretty crap script has received so much
praise.

EDIT: Re-reading, that sounds a little bitchy.. I didn't mean it that way.
Sorry

~~~
delinka
Please describe how this script is crap. It's short and to the point, it does
its job, handles the load, and it doesn't pass bogus characters to the command
line.

Perhaps it needs commenting? Could be formatted more nicely? I'm truly
interested in your characterization because I'm just not seeing it.

~~~
reidrac
I don't know if the 350k hits run that script, let's say they did.

The OP is praising the performance of lighttpd + php5-fpm, but the script
actually spawns an external process to search for the UUID. That's like
running an old GGI that needs to be executed with every request.

I may be wrong but using lighttpd and php5-fpm (with fastcgi) may not be as
relevant as it seems. I'd say that the operating system is caching everything.

EDIT: yes, php-fom is "a simple and robust FastCGI Process Manager for PHP".
I'm no doing much PHP lately!

~~~
delinka
Premature optimization. Obviously, this thing works well enough and handles
the load. Why worry about whether it's launching a process?

When the system begins to show signs of stress, you look at where it's
spending its time, _then_ alleviate the bottleneck.

------
xmodem
What someone needs to do - ASAP - is do another one where people whose UDID's
were on the list can list all the apps they've bought (it should be possible
to get these out of itunes somehow) and we can use that to narrow down which
ones most people have in common, and track down the organisation responsible
for collecting this data in the first place. I would do it myself but I really
don't have the free time at the moment.

~~~
dominicrodger
Isn't that what <http://fredericjacobs.com/identifying-the-traitor> is?

~~~
xmodem
Exactly what I was thinking of.

------
lazyjones
That demonstrates the value of launching early and "advertising" in the right
spots ..: Not so much the capabilities of php or cheap VPS.

------
FedericoElles
If your UDID script returns a positve, could you display a link to
<http://fredericjacobs.com/identifying-the-traitor> and his "identifying the
traitor" survey?

~~~
bbunix
Done.

------
zachwill
Pretty incredible that you were able to withstand that on a $15 VPS. Nice job,
man.

------
edoloughlin
The article mentions that it's difficult to oversell Xen boxes. Anyone care to
comment on why this is?

~~~
bbunix
Take a look here:
[http://www.freewebspace.net/forums/showthread.php?2246409-Xe...](http://www.freewebspace.net/forums/showthread.php?2246409-Xen-
or-OpenVZ-Which-is-faster-and-which-is-better)

------
DanielShir
+1 for lighttpd. We use it extensively here at Nextpeer and it's been super so
far.

------
jasonlingx
I wonder... Could he have done it on a free heroku instance?

