
"How do I set the User-Agent string in Java?" - L. Page (1996) - frisco
http://guyro.typepad.com/blog/2008/12/google-i.html
======
donw
Why didn't he just Googl... er... uh... nevermind.

------
wlievens
This is magnificent. I imagine decades from now, web archeology will be a true
academic discipline.

~~~
pavel_lishin
You should read "A Deepness In The Sky" and "A Fire Upon The Deep"; the topic
is actually covered fairly well.

------
frisco
Original, for reference:
[http://groups.google.com/group/comp.lang.java/msg/88fa108450...](http://groups.google.com/group/comp.lang.java/msg/88fa10845061c8ba)

~~~
n-named
And with the answer:
[http://groups.google.com/group/comp.lang.java/browse_thread/...](http://groups.google.com/group/comp.lang.java/browse_thread/thread/6923c024ed392c85/88fa10845061c8ba#88fa10845061c8ba)

------
mynameishere
Can't blame him for not googling it himself.

~~~
huherto
This reminds me of Doc Brown in 1955 before he invented his time machine. (In
Back to the future)

------
sofal
I imagine his bot has undergone a couple of revisions since then.

~~~
swombat
I wonder if it still runs under Java 1.0.

~~~
borism
it seems they got rid of java pretty early, fortunately.

~~~
axod
Why? Java is extremely well suited to crawling and parsing. Extremely well
suited to backend tasks running on servers for months on end without crashing.

It's blisteringly fast, low CPU usage, and would suit the core task of
crawling websites extremely well. The async NIO libs are fantastic for network
io.

What would you use and why? (And why would you not use java)

~~~
borism
Perhaps you're talking about Java SE 6, not Java runtime of 10 years ago?

> _Extremely well suited to backend tasks running on servers for months on end
> without crashing._

Running for months without crashing - maybe. But by the end of that month
(well, week really) it will be so slow (i.e. due to memory leaks ironically)
that your only option will be autorestarting it every now and then...

AFAIK Larry and Sergey chose Perl at the beginning. Now it should be mostly
Python and C.

~~~
axod
I've been using Java for backend/net crawl tasks since about 2001. It
definitely improved drastically with the addition of nio, and there were some
irritating segfault issues a few years ago, but nothing a rollback to earlier
JVM didn't fix (Until sun fixed it).

You can certainly run for months without issue (memory/crash/speed) as long as
you don't have any leaks in your own code.

I'm pretty sure Java is still widely used at Google.

If I was writing the google crawler from scratch today, I'd certainly start
with Java, then probably use perl/python for less critical scripting glue, and
maybe rewrite any CPU intensive stuff in C/asm.

~~~
derefr
Would you really use Java, or just language X that runs on the JVM?

------
sown
The internet has a neat ability to remember things.

------
staunch
I thought they were using Python back then? Has anyone ever talked about the
structure of Google v1? Code made public?

~~~
basugasubaku
This is probably the closest you'll find:

<http://infolab.stanford.edu/~backrub/google.html>

~~~
revorad
Interesting excerpt on their view of advertising back then:

 _Currently, the predominant business model for commercial search engines is
advertising. The goals of the advertising business model do not always
correspond to providing quality search to users. For example, in our prototype
search engine one of the top results for cellular phone is "The Effect of
Cellular Phone Use Upon Driver Attention", a study which explains in great
detail the distractions and risk associated with conversing on a cell phone
while driving. This search result came up first because of its high importance
as judged by the PageRank algorithm, an approximation of citation importance
on the web [Page, 98]. It is clear that a search engine which was taking money
for showing cellular phone ads would have difficulty justifying the page that
our system returned to its paying advertisers. For this type of reason and
historical experience with other media [Bagdikian 83], we expect that
advertising funded search engines will be inherently biased towards the
advertisers and away from the needs of the consumers._

Looks like they solved the problem by turning it on its head.

~~~
duhprey
For me the first step in figuring out a solution is to clearly understand the
problem. It always seems like the solution is far easier than that first part.
Well... then there's implementation :)

~~~
LogicHoleFlaw
I am reminded of the Feynman Algorithm:

 _Write down the problem.

Think real hard.

Write down the solution._

 _The Feynman algorithm was facetiously suggested by Murray Gell-Mann, a
colleague of Feynman, in a New York Times interview._

The first step is usually the hardest.

------
yupbank
with the eclipse u'll never worry about such question~

------
technomancy
The hilarious bit is that they dropped Java for Python, probably due to the
massive levels of frustration encountered when trying to do simple things like
this.

~~~
qw
It was simple. This is the code he had to use (assuming he used URLConnection)

connection.setRequestProperty ("User-agent", "GoogleBot/0.01");

I had never done this before and found it out by looking at the javadoc. I
don't see how switching to Python would have made this easier.

~~~
simplegeek
May be older versions of Java didn't support this or didn't document it
properly--just saying?

~~~
pyre
What version was he running?

Wikipedia's history of Java verions says 1.0, as does the request string in
the article. [<http://en.wikipedia.org/wiki/Java_version_history>]

Was URLConnection available back then?

According to the URLConnection docs it's been around since JDK 1.0.
[[http://java.sun.com/j2se/1.3/docs/api/java/net/URLConnection...](http://java.sun.com/j2se/1.3/docs/api/java/net/URLConnection.html)]

Did URLConnection.SetRequestProperty exist back in JDK 1.0?

The closest I could find were the docs for JDK 1.1.8 in a downloadable zip
file, and _yes_ SetRequestProperty existed back in JDK 1.1.8 at least.

Looking at the actual response, and the JDK 1.1.8 docs, he would probably have
been using HTTPURLConnection (could not find HttpClient anywhere in the
jdk1.1.8 docs) and even HTTPURLConnection in JDK 1.1.8 I could not find the
string 'agent' anywhere on the page.

So yea, if the settings were there they were buried and not readily accessible
in the documentation of the time.

~~~
jdale27
[http://www.aquaphoenix.com/ref/jdk1.0.2_api/java.net.URLConn...](http://www.aquaphoenix.com/ref/jdk1.0.2_api/java.net.URLConnection.html)

setRequestProperty is there.

~~~
pyre
Yea, but there still isn't anything in those docs referring to the User-Agent
string/property.

