
The new archive.org - bpierre
https://archive.org/v2
======
aw3c2
Awww why... This looks incredibly cluttered. Infinite scrolling is a terrible
idea in an _archive_. If you use the list view instead, you get a very hard to
scan "3 narrow lines for 1" design.

The site is very broken if you don't have Javascript enabled. I am scared how
CPU intense it would be on my mobile or cheap netbook. The details of
collections don't even display any items.

Where is the list of files inside an item? Previously there was a nice table.
Now everything seems to focus on images instead. IA hosts a lot of things that
are not visual. Music, texts, data. Those seem like second row citizens now.
The cover of an album tells me .. nothing about the music itself.

For an archive, I think this is a rather bad interface. The technical
implementation seems very un-archivey and more suited to a "dumb user"
discovery interface built upon an existing well-presented archive. :(

PS: The categories and tags on the side are a nice addition.

If someone from IA reads this, I think at least
[https://archive.org/advancedsearch.php](https://archive.org/advancedsearch.php)
is not using output buffering which might make user's performance much better.

Overall the site is very very slow.

edit: Some comparison images.

Old: [http://i.imgur.com/gJXgJhI.png](http://i.imgur.com/gJXgJhI.png)

New default: [http://i.imgur.com/JOEoAiu.png](http://i.imgur.com/JOEoAiu.png)

New list: [http://i.imgur.com/m2d1Gf4.png](http://i.imgur.com/m2d1Gf4.png)

Old: [http://i.imgur.com/X7e2s5T.png](http://i.imgur.com/X7e2s5T.png)

New: [http://i.imgur.com/9HsrQO1.png](http://i.imgur.com/9HsrQO1.png)

~~~
xanderstrike
> The site is very broken if you don't have Javascript enabled.

I agree with all of your other concerns, but it is not the responsibility of
the web designer to accommodate people who selectively disable parts of their
website. If you turn off CSS, or the color blue, your experience will degrade,
but that's 100% your fault. The same is true with javascript.

It runs quite well on my 1.8ghz Atom netbook and my Nexus 5. The JS they're
using doesn't seem that expensive.

~~~
charonn0
True, but no one wins if the user leaves your website frustrated. Gracefully
degrading in absence of a full-fledged scriptable web browser should be
considered best practice.

~~~
textfiles
Graceful degredation is definitely a plan.

------
sirn
My first impression: wow, looks minimal. This is a nice change. I like it!
Then I read the comment here and realized _the page hasn 't fully loaded yet_.
The fact that above the fold part loads almost instantly (including top menu
which worked right away) is very nice, but I'm not quite sure about the rest.
It's slow, heavy and really hogs up CPU when it's loaded.

~~~
Mahn
Oh, wow, had you not mentioned it I would have guessed that there _was_ a part
below the fold.

------
droithomme
The page here has 13.5MB in assets for the initial load, including a 7.9MB top
html file. Takes 29.5 seconds to load, assuming all extensions and plugins
like adblock are disabled, otherwise it seems to never complete loading. In
both cases, it pins one CPU at 100% while it loads.

------
Mithaldu
The new archive.org is over 8MB of HTML, quite impressive.

Snark aside, i'd like to read a post about what they changed and why. For
example i see they almost entirely removed the previously prominent links to
the forums. Why is that?

~~~
maaarghk
It does say on the top right that this is a beta site, so it's probably a
weeee bit early to be making comments about "the new direction" and all that.

~~~
pbhjpbhj
Well beta means feature complete and considered ready to ship excepting as yet
unspotted last minute bugs from testers ... so it's way too late to have input
to the new direction, the new direction is at beta stage set in stone unless
the update is scrapped.

~~~
vertex-four
> Well beta means feature complete and considered ready to ship excepting as
> yet unspotted last minute bugs from testers

No, that's a release candidate. A beta is considered feature complete, but
potentially not, and almost definitely somewhat buggy and/or non-performant.
It's also often not undergone any major form of usability testing, and will
often need to be modified to incorporate the results of that.

------
asaddhamani
I agree with the general sentiment of this thread. This redesign was not
needed in the first place. The new website is _incredibly_ slow to load, and
the infinite scrolling thing absolutely sucks. While I can see all my uploads
with the old(better) website, the new one doesn't seem to work. Just says
"Fetching more results", and then nothing happens. With the tabs, if I try to
list my uploads by, say, text, it reloads the page and switches the tab to
collections, and then I have to click the Uploads tab again to see the
filtered results. Same when I remove a filter.

Even on my workstation computer with a nice overclocked CPU, I can see the CPU
usage jump to the top whenever I load the site. The website takes 19.90s to
load with cache disabled, with ~200 requests and 6.5MB of data transferred.
The older website takes 4.21s to load, with 18 requests and 280KB of data
transferred, in contrast.

Meanwhile, features like the ability to playback WARC files that are uploaded
by users don't seem to be getting any attention, but a feature like that would
make so much sense for a site like the Internet Archive. I can see they
provide a player for media files, why not provide something for WARC files
too, then?

As a heavy user of the site, the redesign(at least at the current state) will
only ever hinder my experience, I can't see it being helpful in any way.

~~~
db48x
A WARC might have been put into the Wayback Machine; do a search there for the
url. Failing that, there are proxies you can run locally that let you access
the content of a WARC as if you were browsing the original site.
[https://github.com/internetarchive/warcprox](https://github.com/internetarchive/warcprox)
is even by IA themselves, so clearly they're working on it.

A viewer that let you browse the contents of a WARC the same way you can do
for a zip would be really nice, but it's probably a separate project from
redesigning the site. In fact, browsing zip files (and a bunch of other
similar file types, of course) was added only a few months ago.

~~~
asaddhamani
True, the WARC file might have been included in the Wayback Machine, but I
meant something different. I'm specifically interested in the ability to
specify which WARC file I want to play back. I think an example would be
better than me trying to explain it. Have a look at
[https://webrecorder.io](https://webrecorder.io), it lets you play back any
WARC or ARC file, you just have to provide the URL.

I know this is outside the realm of a redesign, but if IA could add something
like this, it would be a big UX improvement.

------
ANTSANTS
This looks fancy and Web 2.0 and pinteresty and whatnot (and has the page size
and CPU requirements to prove it), but it still has the fundamental problem of
the old archive.org: it thinks portals are still relevant to the internet.
They're not. One person or organization can't meaningfully organize all of the
world's information, it's not worth trying. It's like saying "we're bringing
the Dewey Decimal System into the 21st century." No, rigid hierarchical
classification just doesn't cut it any more. Just focus on archiving the
information, improving the search mechanism[1], and _staying alive_ , and let
the community handle curation and discovery.

[1] Merchanisms? Sorry to weeb out, but they should really look at Danbooru
(NSFW) sometime. Tag-based classification and search works extremely well when
(1) the users submitting the content aren't the people that made the content,
so there's no conflict of interest encouraging them to try to game the system
by spamming tags and whatnot, (2) there are strong guidelines for what is and
isn't acceptable, what is and isn't subjective, and (3) the users are as
dedicated and passionate as anime fans and archivists are. Let the users
contribute objective tags, add support for subjective/personal tags ("pools"
in booru lingo) that don't show up in search results but provide a way for
users to curate, and for the love of god let them "fave" things and see their
friend's favorites, and participation on archive.org would explode overnight.

------
domas
You can provide feedback/comments directly to them by clicking "exit beta" in
the top right.

------
textfiles
Jason Scott, here. Disclaimer: I work for the Internet Archive (although I
don't speak for the entire Archive) and I'm vaguely gung-ho on the place:
[https://twitter.com/textfiles/status/527549181175427072](https://twitter.com/textfiles/status/527549181175427072)

I am also, as part of my job there, one of the largest individual uploaders of
data to archive.org - I've added hundreds of thousands of individual items
(texts, movies, music, websites) since I started working there in 2011.

So, moving on.

Welcome to the new user interface beta. I'm glad to see people toying with it,
and the commentary and complaints are very, very welcome. As a smallish
organization with a lot going on, the responses from people really digging
down in the beta are very appreciated.

First, I'll say that the Beta interface is a "true" beta - it's the result of
a lot of internal work, arguments, and discussions, but nothing is 100% set in
stone. This isn't a beta like Gmail or a FPS trying to determine the rate of
firing of the chaingun weapon: this is a lot of best-approach attempts at a
whole range of goals. There's bound to be lots of responses from a lot of
camps that are now coming forward. (For example, the new site has
accessibility issues that need to be addressed.) If the term "beta" has been
wrecked, stick with "prototype".

The internal name was V2, so I tend to keep calling it that.

V2 is the first major redesign of the main archive.org site in over a decade.
And part of the conditions of this project (done by a handful of people) were
to keep the old site (retroactively called V1) running, and mostly unchanged.
That was a whole bucket of headache that isn't even obvious when you come into
the site. (Anyone who has done this knows how it can be). With over 20
petabytes of data on the site, and millions of items and objects, spanning the
whole environment without downtime is a feat in itself. So there's a whole
range of philosophies being approached, but just getting the backend into a
shape where it could sustain a new interface to it was a lot of non-obvious
work.

Moving to the site as it is now.

Definitely slow. Definitely a shock. Definitely some great choices, some which
might seem like head-scratchers. There is a designer, with a vision (his name
is David) and there's been approaches to all the intended known shortcomings
of V1 Internet Archive in this prototype.

One of the issues with Archive.org that's been an issue is non-responsiveness
for different platforms - you got one site and that was it. Another was a lack
of visual interface as an option. Now there is one.

The tagging and metadata efforts were spotty before now, because you were not
really rewarded for doing so. The V2 site uses these tags and metadata
extensively, and will continue to. This has been a nightmare for me, frankly -
I've had to add logos to the 1,200 collections of items I've been uploading,
and I'm doing descriptions as well as tags. But under the new system, the
chance for finding things has increased exponentially.

There are definitely cases where I have to swap back to V1 to get kinds of
"work done", because as an intense power-user, I do all sorts of grandiose
work. But then again, 99% of my interaction with maintaining and adding
content to the Internet Archive, I do through the API, and specifically
through a python command-line interface we've had a developer working on for
over a year:

[https://pypi.python.org/pypi/internetarchive](https://pypi.python.org/pypi/internetarchive)

I've uploaded many thousands of items, analyzed and upgraded their metadata,
and done search-and-modify runs by the hundreds with this tool. It's being
constantly updated.

In the future, I expect us to see multiple improvements to the interface - one
which is much more bandwidth and processor friendly, a version of the "view"
(we have image and list right now) that is optimum for researchers, and so on.
But I'll stress again:

\- This is a prototype which was done with a pretty small team who had to keep
the old site running as smoothly as possible, while doing essentially a decade
of upgrade in one swoop; \- Now that it's "proven" that it works, refinement
by the truckload needs to happen \- Your comments are not just welcome but
encouraged \- Increased interest in the archive and the materials, and working
together to find ways to access the petabytes of data in a meaningful way is
not just a nice side benefit, but a vital core of the Archive's mission

Thanks for reading.

~~~
walterbell
Thanks for the additional detail.

>
> [https://pypi.python.org/pypi/internetarchive](https://pypi.python.org/pypi/internetarchive)

Can anyone use the API key, e.g. does it require auth for both upload and
download? For upload, is an archive.org userid sufficient, or is a separate
API key needed? Will the new metadata be available via the API?

Onsite searches are usually less successful than a Google search with
site:archive.org. Within the archive, it has been near impossible to create a
URL-based search query that will find all editions of a given title/work. Will
the new site/tagging help?

Thanks to the entire archive team for a precious resource.

~~~
textfiles
Anyone can generate an S3-like/API key. They have the same rights and
restrictions as someone using other methodologies. So, for example, you can
upload into the general Audio or Texts collections, but you can't upload, say,
right into the Grateful Dead archive or the CD-ROM collections we have.

In the future, we hope to have it that accounts will be assigned and de-
assigned by some credential different than the current somewhat-binary
approach we have now, but that functionality doesn't exist yet.

So basically, upload is constricted like before. Download, however, is as
unconstricted like before.

------
danbee
This is practically unusable in Safari on my MacBook Pro. It's a shame because
once it actually loads it looks quite nice!

------
TomGullen
I love archive.org. Has my first website ever on it, no way would this exist
without archive.org any more!

[http://web.archive.org/web/20020726224013/http://www.gamezon...](http://web.archive.org/web/20020726224013/http://www.gamezoner.com/)

------
bgutj
The only thing I've tested with the new beta interface is the ability to
search all books for a particular word or phrase at the top level. This has
not been implemented. If I am looking to write a biography about a particular
person, for example, who is mentioned in passing in n books in the archive and
I search for this person, I will find a very small amount of these books,
perhaps even zero.

If I go to a particular book, I can search -inside- that book for
words/phrases. I would like to do with the raw text from all books.

------
bane
I with there was a word stronger than love for what I feel about archive.org.
It's one of the amazing promises of the Internet come true.

If I have one criticism of archive.org it's that things are impossible to
find, even if you know they have them - this redesign doesn't solve this
problem.

I think the principle problem is that what should be a meta-layer on the
organization, the provenance of a collection of stuff, is often used just as
often as an organizational scheme as media and subject type.

And example. I'm looking to see if they have "A calendar of dinners, with 615
recipes" by Marion Harris Neil. Where would you suppose this book would be?

If I go to "eBooks and Texts" I'm simply met with a wall of collections, none
of which are subject area organized, is it under Microfilm, or Canadian
Libraries? Boston Library Consortium? Who knows? I'll never find it by
browsing and the way books are collected is pretty much useless. Unless I
_know_ there's a copy under "Canadian Libraries" I'll probably not find it.

Sure I can search for it, "A calendar of dinners" gives me 3 results! Turns
out it's buried under the following Archives:

"Toronto Public Library", "The Library of Congress", "Cornell University
Library". Notice that none of these are the crumbtrail I used to find it the
first time on accident (Canadian Libraries)!

How about Omni Magazine? Is it a "Text"? I'm not sure, even today. I _do_ know
if I go to texts and search all texts for "Omni" I get it back. But it's part
of "The Magazine Rack" and "Additional Collections" which I still have not
figured out how to just navigate to.

These are just texts, video, audio and other media types are similarly hard to
navigate and find stuff. There's little pleasure in browsing archive because
if you find something, it'll be by accident, not because you navigated to some
pocket of cool stuff.

Good luck seeing what SF books they have and browsing it. That's actually a
collection I'd care about.

I also like old radio shows, and those are scatter shot all over the site.
Unless somebody basically just uploaded an entire series at once, good luck
piecing it together.

Right now, about the only way I know something is on archive.org is because
the person who uploaded the item mentions it on a podcast or something.

I'm almost tempted to just start a meta-website of some sort to start
organizing stuff I care about to that other people like me can find it.

It's kind of a mess, and it's given me a lot more respect for what librarians
have largely solved in the physical world.

~~~
db48x
Interestingly enough, at the open house last night they talked about how they
want to shift focus a bit and make it easier for people to do exactly that.
They've gotten really good at storing things, and at digitizing them, and now
they need to be better at letting people curate and organize collections.

Of course, as it stands anyone can make a website of their own that links to
and/or embeds anything stored in the archive, but I gather that hasn't
happened often.

~~~
bane
> Of course, as it stands anyone can make a website of their own that links to
> and/or embeds anything stored in the archive, but I gather that hasn't
> happened often.

Yeah, and I've even debated it enough to think about doing it myself. For me
at least, it feels kind of wrong to just put up a site that curates and
organizes somebody else's collection. I'd also be worried about going through
the effort and then having archive.org change the link-to urls or rules or
whatever (even though they're a benevolent organization) and then have to go
through all the effort again.

It _really_ is a lot of work to find stuff on archive, even when you know it's
there.

And again I attribute that to too much effort to keep track and give credit
for the provenance of an item rather than organizing it in a reasonable way.
As much as I'm glad that the Universal Library (Million Books Project) donated
their work, it doesn't do anything for me as a user when I'm trying to find
the Collected Stories of William Faulkner.

I think a better way would be to categorize the archive like any other
library, and then for each individual work, provide alternative
scans/recordings/transcriptions, etc. and a link to the donating organization
that goes to a page that _then_ gives you links to everything they donated.

But honestly, it's a fairly mild inconvenience. Most of the stuff they host is
fairly long-dwell. Once I find a book or whatever, I'll be tied up with it for
quite a few days and don't need to be bouncing all over their site several
times an hour.

~~~
db48x
Yes, probably they need to simply talk about this possibility more, or more
prominently.

If you already know what you're looking for, then feel free to simply search
for it; you don't have to browse through the categories it might be in, or the
subjects it ought to be listed in, or any of that. Likely the only reason why
it isn't in the places you looked is just that nobody has gotten around to
applying the right metadata to it. This is simply a fact of life; there are
millions (or billions, if you squint a little) of items in the archive, and
most of them don't have all the metadata that they ought to have.

I wouldn't worry about stepping on their toes by organizing things better.
It's not actually a collection until someone organizes and curates it; until
that happens it's just a pile of stuff. IA has always operated on the
assumption that it's ok to just have a pile of stuff if the alternative is to
have nothing at all. Given the size of the pile there's no way they could ever
organize everything themselves, even assuming that there's one obvious right
way to organize things.

~~~
bane
> IA has always operated on the assumption that it's ok to just have a pile of
> stuff if the alternative is to have nothing at all.

Absolutely. This definitely helps keep things in perspective. And their pile
is priceless and staggering.

It's amazing to me that there are more things on IA to keep me entertained,
for free, than I could ever possibly experience in this lifetime.

------
taejo
There's an escaping bug: at
[https://archive.org/details/movies](https://archive.org/details/movies) I see
"Arts &amp; Music", etc.

The download links for movies (and I guess other files) should set their
disposition to download rather than playing in the browser.

------
72deluxe
Isn't infinite scrolling a memory hog?

~~~
acdha
It's a risk but implementers can take the sting out of it. Browsers aren't
currently smart enough to do things like unload decoded <img> memory for
things which aren't visible but you can avoid the worst of it if you use a CSS
background-image (which browsers _do_ unload) and a visibility test on scroll
to avoid loading things which aren't visible or soon to be visible. This works
as far back as IE8 so it might be worth the hassle.

~~~
72deluxe
Thank you - very informative. I would not have known the behaviour of browsers
with regard to that - is there an established standard or accepted behaviour
documented somewhere? I'm guessing that the CSS spec only dictates how things
should be shown and not what browsers can do with off-screen items.

~~~
acdha
Correct: the only way to know for sure is to test it. I think a few minor
extensions could make that a lot easier – e.g. a CSS :visible selector and
support for background-image: attr(…)

~~~
72deluxe
Thank you. The massive amount of testing is surely what hampers web
development? Years ago in the dark ages people dreamed of cross-platform apps;
the three main players in this market (desktop only) now are Windows, Mac OSX
and Linux (to a tiny extent). Writing a cross-platform native app is no longer
a massive exercise in frustration (see wxWidgets or Qt).

Someone thought Java would solve all our problems, but it seems to have fallen
out of fashion, and everyone pins their hopes and dreams on cross-platform
apps with websites, but surely the effort is far larger: testing on 3+
browsers per OS at least!

I don't see how everyone copes.

------
glomek
Yikes! That's different!

But it still doesn't have the one feature that I've been wishing they would
implement forever. In a collection of audio files, I wish they would provide a
podcast feed. It would be so nice to be able to listen to Old Time Radio shows
as podcasts.

~~~
aw3c2
That's a great idea, be sure to mail them about it at info@archive.org!

------
wj
I like it. I was looking at it for thirty seconds and came across two
collections that I didn't know existed. Great to discover new stuff and I can
still search if I know what I'm looking for.

------
butwhy
I've been waiting for something like this to happen, as the old design just
looks old and doesn't encourage me to use it.

But.. I'm sure haters gonna hate.

~~~
userbinator
If you're looking at the Internet _Archive_ , then you should very well expect
to see a lot of other "old" things besides the site itself.

------
frik
Direct link to the WayBack Machine:

[https://archive.org/web/](https://archive.org/web/)

------
davea37
Is the old one available anywhere? ;)

~~~
aw3c2
Just click "exit beta" in the top right corner and please leave feedback.

~~~
TheLoneWolfling
How do I leave feedback?

~~~
aw3c2
If you click the exit beta button there is a form.

~~~
TheLoneWolfling
There wasn't when I clicked it...?

Exit beta takes me straight to the old version without any form or anything
showing.

~~~
textfiles
If you aren't getting the form, you can write info@archive.org with any
thoughts or comments you have and they'll be forwarded to the dev team, who
are gleefully swimming in mail as we speak.

------
calinet6
Increase.. spacing... between top icons... and labels.. _twitch_

Sorry, designeritis.

------
ooooak
> server: nginx/1.1.19 > Powered By: PHP/5.3.10-1ubuntu3.2

why 5.3 ? and ubuntu3.2 !!!

~~~
db48x
Eh, that's not the version of Ubuntu that they are running; the whole thing is
the php version number. There is a general convention among Linux
distributions to backport security fixes to the older versions of software
that come with their older releases.

In this case, Ubuntu 12.04 (Precise Pangolin) was released with PHP 5.3.10
plus some security patches, available in the Ubuntu package repository under
the name php5 with the composite version number 5.3.10-1ubuntu3.14. Their
website doesn't list a newer version of this package
([http://packages.ubuntu.com/precise-
updates/php5](http://packages.ubuntu.com/precise-updates/php5)), so possibly
they're ahead of the official Ubuntu releases.

The reasoning for this is that while it might be nice to upgrade in order to
get new features, new bug fixes, and new performance enhancements, these
potential benefits are often outweighed by the very real cost of testing
everything to make sure the upgrade doesn't cause regressions. Backporting the
security fixes makes sticking with a base version possible. I imagine that
upgrading is pretty low on their list of things to do; it would have to get
them some nice benefits, and nothing about php is ever nice.

I'm a software engineer myself, and I upgrade individual libraries far more
often than I upgrade the actual programming-language runtime (or compiler),
simply because that's where you get the most benefit (usually a fix for a
specific bug, but sometimes a new feature will be tempting) for the least
risk.

