
Deep-Fried Data - udba
http://idlewords.com/talks/deep_fried_data.htm
======
leesalminen
> For the generation growing up now, the Internet is their window on the
> world. They take it for granted. It’s only us, who have seen it take shape,
> and are aware of all the ways it could have been different, who understand
> that it's fragile, contingent. The coming years will decide to what extent
> the Internet be a medium for consumption, to what extent it will lift people
> up, and to what extent it will become a tool of social control.

I agree completely. This is something we should be cognizant of.

------
pimlottc
> Many [programmers] work jobs that are intellectually stimulating, but
> ultimately leave nothing behind. There is a large population of technical
> people who would enjoy contributing to something lasting.

This hits pretty close to home.

~~~
keyle
Same here. I'm battling with this thought a lot. Beyond jobs, I think there
should be communities of developers, designers, producers, writers, getting
together and figuring out this stuff. And I don't mean open source projects.
Let's group together smart people wanting to make a difference and have a hit
list of things we (people) actually need. A group that would organise people
into mission driven development.

I'm so fed up of getting paid to potentially make founders rich. Or to be a
small cog in a gigantic machine on a slow decline. I'm also unemployable
because I can't buy into the corporate BS anymore. And where I am, there
doesn't seem to be design/dev jobs that actually want to make a difference.
It's an economy problem.

The startup thing seems to be the best way we go about solving problems in the
world today. But if you happen to _not_ be at the right place at the right
time, meeting the right people, poof, it's gone. I can't imagine that an
advanced specie would operate this way. We should be focused on solving
problems, instead of being focused on escaping the rat race, to then be able
to solve problems.

I'm glad I am not alone seeking purpose. There goes a point where you're
technically advanced, you have itches to fix things and all you see is the
broken economy of consumerism and "let's give kids video clips and smileys,
derp".

~~~
wtracy
> And I don't mean open source projects.

Then what do you mean? You described exactly what some of the largest, most
successful FOSS projects (Firefox, KDE, Gnome, Libre Office, FreeBSD) are
already doing.

> Let's group together smart people wanting to make a difference and have a
> hit list of things we (people) actually need.

Well, the FSF maintains a list of "high priority Free Software projects" that
need help, but it's strongly colored by the FSF's politics:
[http://www.fsf.org/campaigns/priority-
projects/](http://www.fsf.org/campaigns/priority-projects/)

~~~
coldtea
> _You described exactly what some of the largest, most successful FOSS
> projects (Firefox, KDE, Gnome, Libre Office, FreeBSD) are already doing._

They "make a difference"? How exactly? At best, I can understand that for
Firefox.

~~~
wtracy
By providing quality software that lets me and thousands of others get useful
work done, and not forcing us to accept onerous licensing terms in the
process?

But hey, none of those can cure cancer, so what's the point, right?

~~~
coldtea
> _By providing quality software that lets me and thousands of others get
> useful work done, and not forcing us to accept onerous licensing terms in
> the process?_

That software (desktop Linux/UNIX) comes for free, and yet, only 1% or less
(from browser stats of major traffic points) seem to opt to use it as their
desktop.

Is this the kind of difference the parent was describing? Letting a small
minority of people avoid "onerous licensing terms" that billions of others are
OK and can get "get useful work done" with?

In the server, of course, it's a whole different story.

~~~
sangnoir
> That software (desktop Linux/UNIX) comes for free, and yet, only 1% or less
> (from browser stats of major traffic points) seem to opt to use it as their
> desktop.

>Is this the kind of difference the parent was describing?

Cancer kills 171.2 per 100,000[1]. So by your metrics, the Linux desktop folks
make a bigger difference than curing cancer as 1% > 0.1712%

1\. Cancer mortility. see [https://www.cancer.gov/about-
cancer/understanding/statistics](https://www.cancer.gov/about-
cancer/understanding/statistics)

~~~
coldtea
> _Cancer kills 171.2 per 100,000[1]. So by your metrics, the Linux desktop
> folks make a bigger difference than curing cancer as 1% > 0.1712%_

This is the kind of illogical result stemming from only reasoning half-way.

First of all, the cure for cancer wouldn't affect only the ones that die but
also the ones that don't but do suffer complications from current treatment,
from going broke from paying for therapy/losing their job in the process, to
severe chemo side-effects. It also hugely affects the families and loved ones
of those who currently die of cancer.

Second, ever considered the kind and magnitude of impact? Saving even 171
persons (those in a single bunch of 100,000) from dying from cancer is,
arguably, quite a bigger deal than sparing millions to have to use Windows or
OS X or some commercial UNIX.

~~~
sangnoir
>>Cancer kills 171.2 per 100,000[1]. So by your metrics, the Linux desktop
folks make a bigger difference than curing cancer as 1% > 0.1712%

> This is the kind of illogical result stemming from only reasoning half-way.

I'm glad you saw the flaws in your reasoning - the key phrase in that
paragraph was "by _your_ metrics".

> Second, ever considered the kind and magnitude of impact?

My point exactly!! It's not just about numbers/proportion game where you get
to say "1% or less [desktop Linux usage] is not a big difference" since it
means the world to those who _depend_ on it, for example those who cannot
afford Windows licenses or an Apple computer or who find non-free software
unconscionable.

------
Asparagirl
_> I’ve saluted the efforts of Archive Team and the Internet Archive, but
their activity is like having a museum curator that rides around in a fire
truck, looking for burning buildings to pull antiques from. It's heroic, it's
admirable, but it’s no way to run a culture._

...but in the meantime, here's an obligatory and shameless plug for donating
to the Internet Archive[1] (tax-deductible in the US), or better yet making a
recurring monthly donation so they can more accurately forecast revenue for
the year, or better still getting your employer to make a nice big donation to
this crucial bit of Internet memorybanks.

And as for Archive Team, we're always looking for a few good geeks.[2] Run an
instance of the Warrior on spare cloud servers, or help patch and ship code at
GitHub.[3]

[1] [http://archive.org/donate/](http://archive.org/donate/)

[2]
[http://archiveteam.org/index.php?title=Main_Page](http://archiveteam.org/index.php?title=Main_Page)

[3]
[https://github.com/ArchiveTeam/ArchiveBot](https://github.com/ArchiveTeam/ArchiveBot)

~~~
dcposch
The Archive is awesome, but the author's sensationalist description of what
they do isn't really accurate.

For the most part, archive.org is not rushing in to save stuff that's about to
be deleted.

Instead they are crawling the web 24/7, patiently maintaining a historical
record.

Check out [http://oldweb.today](http://oldweb.today)

It is amazing

~~~
polpo
What the Archive Team saves does get uploaded to the Internet Archive, but
they aren't officially part of it. I think Maciej's description of the Archive
Team is accurate - they are the archivists of last resort. When a commercial
service is about to disappear forever, they're the ones that spring into
action and rescue as much data as possible. If companies and the people that
comprise them cared enough about their users' data, there would be no need for
the Archive Team.

------
chewxy
Regarding Maciej's fears about machine learning -

I've written about this before, and even right now I'm not sure where I stand
exactly, except that tweaking the algorithms to compensate for bias is
definitely not the right answer: if you look at the mirror and don't like what
you see, you don't draw on top of the mirror to accentuate the result! You go
on a diet!

I liked the idea of data gardening, but the thought of going-to-communities is
daunting. I get tired even thinking about it.

Regarding living beyond walled gardens:

> Publish your texts as text. Let the images be images. Put them behind URLs
> and then commit to keeping them there. A URL should be a promise.

But people already do that! The question now is to turn to why people do
otherwise. I personally do not understand the reason people say, post long
blogposts on Facebook, but I do understand for services like Medium.

For example, I'm extremely tempted to write on Medium because it provides the
network effects of readers clicking on tags to read next. So the question is
how do we democratize that?

~~~
big_surprise

      So the question is how do we democratize that?
    

Commenter _wtracy_ has already linked to the FSF's list of _High Priority Free
Software Projects_... From there, look into what they have to say about free
wifi (and in particular, but not limited to the OrangeMesh package):
[http://www.fsf.org/campaigns/priority-projects/free-
software...](http://www.fsf.org/campaigns/priority-projects/free-software-
drivers-for-network-routers)

If a convincing case could be made that the benefits to National Security
outweigh the costs to the copyright cartels, I'd be willing to bet that young
secondary-schoolers would have a blast with a decently designed curriculum
that includes a working student-to-student mesh-network as one of its goals.

~~~
chewxy
I actually meant democratizing the network effect that Medium has. The "free
marketing" bit.

I mean, right now one can pretty freely go write up a blog using self-hosted
wordpress, octopress, pelican, hugo or whatever. But choosing that over Medium
can sometimes mean a lot more work to put in. But if we can democratize the
ease-of-use and the good bits of Medium/Facebook/Twitter... Maciej's end
statement about "using open standards, write text in text, images in images"
would have been achieved.

The problem is that corporations now create a significantly more compelling
version (in most criteria - UX, UI, etc) of the Free and Open versions out
there.

~~~
big_surprise
Right, I got that. I guess I meant to say- start young. Provide a compelling
alternative to ad-driven consumerism... Otherwise, it seems that you're just
talking about creating another new corporation with an more compelling (even
if nonsensical) reason to adopt.

------
makomk
Have to admit that I didn't expect to see that quirk of LiveJournal culture
mentioned in an article on the HN front page, let alone in a speech to the
Library of Congress. It just sort of faded away without really influencing the
current generation of social networks.

Also, it's funny how the net changes, how unthinkable it is to have a social
network that doesn't slice up people's data and use it to advertise to them
now compared to how anti-advertising LiveJournal was back then. Not convinced
it's a change for the better.

~~~
idlewords
I'm the guy who gave this talk. To add to the funny, LiveJournal hired me to
rewrite their ad engine in 2007. I did a horrible job at it, but turned my
ineptitude into a principled and lucrative ideological stance that I have
milked ever since.

Don't be afraid to pivot.

~~~
embarcadero
Was this recorded? Link?

~~~
pauldino
There's a YouTube link at the top of the page.
[https://www.youtube.com/watch?v=8gcu2GQf7PI&feature=youtu.be...](https://www.youtube.com/watch?v=8gcu2GQf7PI&feature=youtu.be&t=5h31m50s)

------
jaywunder
I don't think I understand what the exact point of this talk was. Maybe the
thesis was stated at the end of the talk when he said that he wishes the
internet were more like a city rather than a mall. I think the internet can be
like a city, and I think a great example of a place where people with
conflicting ideas talk together is HN. Sure HN can be an echo chamber at
times. But there's quite a few times when people with differing opinions talk
about their different opinions.

Also I don't necessarily understand Ceglowski's stance on why we shouldn't use
deep learning and should avoid surveillance on the web. I don't take issue
with becoming a datapoint in Facebook's web of people because nothing bad has
happened or can happen from me giving Facebook my data. When most people speak
out about the data that's being collected about Facebook and Google users they
say they're "worried about what could happen" but then never list any bad
things that they're actually afraid of. The speaker falls to this issue too.
Ceglowski says:

>I worry about legitimizing a culture of universal surveillance.

But then never explains what bad could happen from legitimizing that culture.
Maybe I'm completely missing the point of the talk? Please explain what I'm
missing if I'm actually missing something.

~~~
idlewords
The audience for this talk was a bunch of librarians and fellow travelers who
are bringing large archives and collections online, often at great expense. I
wanted to encourage them to find new, engaged audiences for these collections,
rather than fixate on how to analyze them with computers.

With regard to the dangers of surveillance, I've made a sustained argument
about this in other talks. It boils down to the data being collected having
great power to harm people if it is ever put to malicious use, and a lifespan
that exceeds that of institutions we know how to run. My beef is not with the
surveillance alone, but with the combination of surveillance and permanent
storage.

~~~
jaywunder
Thank you for explaining that! The context is meaningful and makes your talk
make sense.

On the regard of data talking into the wrong hands, I take issue to this
argument because it's not a unique problem to personal data collection. Any
data could be hacked - bank information, address, whatever. But that doesn't
mean we don't use the internet for banking and etc. It means we try to make
systems that are difficult to hack. It seems like you'd want data collection
not to happen on websites like Facebook and Google, when hacking isn't a
unique problem to those websites.

~~~
idlewords
Here's a capsule summary of what I'm pushing for:
[http://idlewords.com/six_fixes.htm](http://idlewords.com/six_fixes.htm)

~~~
iyn
I agree with that, but I have a small suggestion until these points are
reality: you can consider adding/enabling SSL/TLS for your blog. Thanks!

P.S. I really like your posts and your tweets are hilarious, please don't ever
stop.

~~~
rubidium
Ok, this tweaks my curiosity. Why would one put SSL on a static personal/blog
type website (assuming one doesn't care about the google penalty)?

~~~
b2m9
For example, ISPs are not able to crawl your traffic if it's via HTTPS. I've
worked on data sets gathered by major ISPs and it's scary how much they know
about their users (especially if they also have a mobile phone with the same
company). ISPs use such intelligence for personalised marketing (either for
their own product catalogue or 3rd parties)

~~~
odbol_
The URL isn't encrypted though, is it? Since there's no dynamic content on the
page, they already know exactly what you're looking at.

~~~
detaro
The domain isn't, the full URL is. (But content size etc probably still allow
identification of an individual page on a small site, and the context of the
domain is already valuable)

------
marklyon
I provide guidance to attorneys involved in the discovery process; "Technology
Assisted Review" is of huge interest to those teams, as it allows them to
leverage coding on a small sample of the population across a much larger set
of documents. For many cases, the cost and (occasional) time savings is
instantly attractive. Sadly, the process is hard to do well. Far too many
screw it up in new and amazing ways.

The author's concerns over machine learning are well-founded. The best option
I've been able to identify to ameliorate some of the concerns is focusing on
the population that will be suppressed. Once the model returns the desired
recall / precision, drawing samples from the excluded population with a
rigorous acceptance standard can help validate whether you've simply built a
model around your biases. Couple that with allowing an opponent to validate a
randomly-selected sample and you've cleared up a lot of the uncertainty in the
model.

It's not perfection, but perfection is a very difficult standard.

~~~
abofh
The issue with that approach is ensuring the suppressed are represented. When
it's black vs white, you can oversample one and be done.

However, if there's any winner take all built into the system, there's a
strong incentive to not even acknowledging dissent.

------
pcmaffey
Machine learning does not have less bias than human researchers. It is simply
magnified at scale.

And that scale is exactly the state of the internet. There is so much data
available to study and understand, that we absolutely need better tools, like
machine learning or whatever we want to call it, to help us keep up. Shit's
moving faster than our human perception can handle, especially for those who
didn't grow up with the internet.

Yes the data analyctic tools we have right now are premature— like fast food
to our productized minds— but they will improve rapidly, as our taste for
quality improves.

But sure demonizing the things you don't like is one step on the path to
learning what's truly valuable.

~~~
yummyfajitas
This is simply not true. Most algorithms can and will correct for biases in
their inputs.

See this (somewhat technical) article where I go into explicit (simulations in
numpy) levels of detail:

[https://www.chrisstucchio.com/blog/2016/alien_intelligences_...](https://www.chrisstucchio.com/blog/2016/alien_intelligences_and_discriminatory_algorithms.html)

The best analogy I've come up with for the non-technical is that algorithms
are like humans trying to draw inferences on octopus society. Some octopi
might have bias against some other octopi, but it's the height of
octopusthromorphism to to expect a human to reproduce that bias.

~~~
wrsh07
This is very optimistic. There are well known and documented cases of ml
algorithm bias and its cause [1].

And it's not surprising that data itself contains some biases from the humans
creating it. Suppose police are asking machine learning where more crime is
committed - there will be a feedback loop. Where are they currently making
more arrests? If they spend more time there, the bias will be exaggerated.

The op correctly gauges how we should be cautious. Your post, I'm afraid, is
misleading at best.

[1]
[https://www.google.com/amp/s/www.technologyreview.com/s/6017...](https://www.google.com/amp/s/www.technologyreview.com/s/601775/why-
we-should-expect-algorithms-to-be-biased/amp/?client=ms-android-google)

~~~
yummyfajitas
Of course data contains biases. But again, please read the article I linked;
algorithms will have a tendency to correct that bias.

The examples in the article you link to are not algorithmic bias at all. They
consist of:

1) Humans at Facebook manipulating trending results.

2) Google's keyword algorithm (accurately) reflecting the fact that people
with black names are more likely to have arrest records.

Lets distinguish "bias" from "accurately learning things you wish it wouldn't
learn" or "accurately learning things you wish weren't true."

None of what I'm saying is remotely controversial. If I told you statistics
could detect and correct bias in a mobile phone compass, you'd just think
"cool stats bro". Is this article remotely controversial?
[https://www.chrisstucchio.com/blog/2016/bayesian_calibration...](https://www.chrisstucchio.com/blog/2016/bayesian_calibration_of_mobile_phone_compass.html)

The specific feedback loop you describe - variable detection probability =>
variable # of detections - can be directly mitigated. For a non-controversial
example drawn from sensor networks (sensors report events with a delayed
eraction, the longer you wait the more events you detect), see here:
[https://www.chrisstucchio.com/blog/2016/delayed_reactions.ht...](https://www.chrisstucchio.com/blog/2016/delayed_reactions.html)

(You can find similar examples all over the place. I just link to the ones I
wrote because they spring immediately to mind.)

In a compass, a sensor network, adtech or other quant finance, the idea that
machine learning can fix biased inputs is not remotely controversial. The
concept that statistics suddenly stops working to fix racism is just silly
anthropomorphism.

~~~
wrsh07
Aha - I think I see our miscommunication. When you say bias you mean
statistical bias.

Yes, machine learning is able to correct for that kind of bias - 538's polls
forecast is a good example of that.

But you don't get to redefine racial bias to be something innocuous. Yes,
black names are more likely to have arrest records, but that "fact" is super
misleading [1].

Finally, you're talking past me. I'm not saying that statistics is broken. I'm
saying that we should be especially mindful of the OPs point when they say
this:

> So what’s your data being fried in? These algorithms train on large
> collections that you know nothing about. Sites like Google operate on a
> scale hundreds of times bigger than anything in the humanities. Any
> irregularities in that training data end up infused into in the classifier.

I think the OP author also has a related post about the kind of bias I'm
talking about:
[http://idlewords.com/talks/sase_panel.htm](http://idlewords.com/talks/sase_panel.htm)

[1]: [http://www.huffingtonpost.com/kim-farbota/black-crime-
rates-...](http://www.huffingtonpost.com/kim-farbota/black-crime-rates-your-
st_b_8078586.html)

~~~
yummyfajitas
Without getting into a dispute about the definition of "bias", I'm saying that
algorithms can accurately measure reality even if input(x=white, all else
equal) != input(x=black, all else equal).

You are saying that algorithms are accurately measuring a reality you wish
were different. I don't disagree with this.

The right thing to do is to actually answer unpleasant moral questions like
"if blacks are 4x more likely to be dangerous criminals, what should we do
about it?" But I guess overloading the word "bias" is a nice substitute for
clearly thinking things through.

~~~
eridius
The problem is you're modeling a biased reality. And accurately modeling a
biased reality may in many cases accentuate the bias. Take for example the
previously-mentioned case of using an algorithm to determine where to focus
your policing efforts. If the data you have says that more arrests are done in
a particular part of the city, then you'll want to put more police there,
right? But areas where there are more police will tend to see more arrests. So
the fact that you're putting more police in an area where you see more arrests
is just going to make the bias more extreme, causing even more arrests there.
This causes a feedback loop. So you may be accurately modeling reality, but
you're modeling a pre-existing bias and making it worse. And who knows why
that pre-existing bias was even there? The fact that there were more arrests
there may not be because that area actually has more crime committed, it could
be due to other factors, such as racial profiling by police, and in that case
your algorithm is now accidentally racist because it's perpetuating racial
profiling.

~~~
dragonwriter
The problems are really twofold:

(1) Defining the proper goals, and

(2) Measuring the right things (such as the real goals of interest rather than
biased proxies.)

With police deployments, you are assuming the solution (rather than letting
your algorithm optimize it) by saying "I want to put more police where more
arrests occur". What you really want is probably something more like (the
exact goal may be different, of course) "I want to deploy police resources
where it will most effectively reduce the incidence of crime, weighted by some
assigned measure of severity." Then let your ML algorithm crunch the various
measurable factors and produce an optimum deployment to do that.

(But, then again with that goal -- and similar problems exist with many likely
real goals -- you run into the other problem, which is measuring the
_incidence_ of crime -- measuring crime reports may be the obvious approach,
but there's plenty of evidence that lots of factors can bias crime reports,
including communities having bad experience with police being less likely to
report crimes.)

~~~
wrsh07
Thank you. This is so much clearer than what I was saying.

As you say, proper goals and measurement can fix a lot of these problems, and
I don't think it's obvious that ml algorithms solve either of those

------
1024core
"The names keep changing—it used to be unsupervised learning, now it’s called
big data or deep learning or AI"

Um, I'm sorry, but unsupervised learning and deep learning are not the same.

~~~
idlewords
What's the distinction?

~~~
absherwin
Unsupervised refers to whether or not the dataset is being trained against
anything. Think about the difference between: How many people will view this
webpage? Divide these pages into 20 clusters? The first is supervised. The
second isn't.

Deep learning refers to a particular type of a particular learning technique:
Specifically a neural network that has many hidden (intermediate) layers. Deep
learning can be used for either supervised or unsupervised learning.

~~~
gabrielgoh
I agree with your sentiment, it feels out of place because deep learning, AI
and big data are buzzwords, but unsupervised learning is a rather technical
term in machine learning referring to a very specific class of problems.

~~~
dkersten
Are they really buzzwords? To me, they have rather particular meanings
(although I guess others may feel differently):

 _Deep learning:_ a particular type of artificial neural network with many
hidden layers (and the associated tech to make this work/trainable)

 _AI:_ The field of computer science which aims to make computers smarter.
Like most fields, there is much overlap with others, for example, statistics.

 _Big Data:_ A buzzword. About the best definition I can find is anything
which has the 3 V's: Volume, Velocity & Variety. In general, outside its use
as a buzzword, I think big data is generally thought of as "when you need a
distributed system to process your data", be it because of volume, velocity or
variety.

 _Supervised and unsupervised learning:_ whether or not you require example
data for training

 _Machine learning:_ some people say its the subset of AI that deals with
statistical methods, other people say its just another word for AI.

------
dvdplm
Idelwords is one of those blogs worth dropping everything you do and just take
a deep breath and dive in and revel in the joy of clear thought expressed
through clear prose. Love it. Thanks.

------
sdenton4
"...Dim witted grad student that you can't really trust..."

Reminds me of the phrase "graduate student descent" for training neural
networks...

I've been noticing more casual dismissiveness towards grad students lately.
They are certainly often treated as the grunt laborers of academia, in areas
where career prospects are downright stupid. I generally feel it would be more
productive to at least pretend that they're being trained to be independent,
aggressive researchers in their own right, though.

~~~
gabrielgoh
grad students are put in the same category as interns and teenagers, a naive
type of person still in the making. i dont think there's any ill will
intended.

~~~
eanzenberg
You are dim witted.

No ill will intended.

~~~
elmigranto
There's a difference between metaphors/jokes and insults addressed at you
personally.

------
Esau
"And this time it's not the government, but the commercial Internet that has
worked so hard to dismantle privacy."

So true.

------
udba
I'm currently applying for co-op jobs (internships) and while trawling the
university job board I've seen many positions requiring big data this or
machine learning that.

What's not clear to me is why companies who don't seem to have any need for
machine learning team (i.e. a subscription box company) are looking to hire
one.

Surely part of this can be pinned down to the hype associated with ML that may
well die out, but the proliferation of these tools doesn't bode well for
Maciej's dream of a weird, creative, and interesting internet.

~~~
teej
These companies aren't looking for someone to develop new machine learning
techniques, they are just looking for someone who can slap together existing
utilities to meet their goals.

Companies that run on subscription literally live and die by their churn rate.
It is both feasible and reasonable for a subscription box company to hire
someone to use machine learning to build a predictive churn model. That may
seem trivial to you but that's the reality behind those job posts.

------
morecoffee
Giving up control is harder than the author makes it seem. It isn't so much
that you give it up, but that you give _to_ someone else. Picking that someone
else is extremely difficult and a wrong choice will destroy your community.

Using machine learning on the other hand is a safe bet. It is much easier, I
would assert, to write machine learning code to organize data than to curate a
community of humans to organize data. The ML approach will do pretty good even
if it isn't the best, which is why it's what everyone is switching to.

Keeping with the author's example, is it easier to organize erotic fanfic with
a computer, or enable a community to do it without spiraling out of control?

~~~
crzwdjk
For example, it is clearly easier to write a machine learning system to find
interesting articles and highlight insightful commentary, compared to using
something so crude as a group of people on a website collectively voting on
which stories and comments they like... wait a minute...

------
skybrian
If you think the Internet is as safe and controlled as a shopping mall, you
probably should be reading Krebs on Security more.

People tend to move towards the more mall-like areas of the Internet due to
spam and abuse that they don't want to deal with. This can be low-level stuff,
or (as in the cases of Kreb himeself) sometimes the attackers get out the big
guns, and you need to run for cover.

And that's why we're hanging out here, after all, and not in some unmoderated
forum. And even here, post on certain subjects and conversation quickly
degenerates.

I think we do need a wider variety of spaces to hang out, though. No set of
rules works for everyone. And if you do want 4chan, you know where to find it.

~~~
aab0
> If you think the Internet is as safe and controlled as a shopping mall, you
> probably should be reading Krebs on Security more.

That's an amusing comparison, given how much of Krebs focuses on offline ATM
skimming, copying credit cards at point-of-sale terminals, hacking major
retailers's CC databases, and using stolen cards at retail and mall stores to
cash them out...

------
erichocean
> _Publish your texts as text. Let the images be images. Put them behind URLs
> and then commit to keeping them there._

I sounds like he's saying ephemeral content is worthless and should be
shunned.

I, and hundreds of millions of others, disagree. You want a bland, awful,
boring society? Easy: make everything you do stick around forever—like a
promise. And then watch the world self-police as the lifeblood drains out of
it.

You'll get…Facebook. No thanks.

~~~
idlewords
The audience for this talk was people with very large collections they're
bringing online. I was trying to encourage them to avoid exotic formats,
custom plugins, custom software (shudder) when they put this material online,
and make them web accessible.

For example, here is three quarters of a PETABYTE of historical American
newspapers:
[http://chroniclingamerica.loc.gov](http://chroniclingamerica.loc.gov)

~~~
erichocean
Okay, that makes sense. I 100% agree that bringing collections online in
exotic formats is a terrible idea.

------
goldmar
Thank you for this article. I have really enjoyed your writing style,
especially the creative metaphors you're been using :)

------
TranceMan
I have been having similar thoughts for a while:
[https://news.ycombinator.com/item?id=10937201](https://news.ycombinator.com/item?id=10937201)

------
qwertyuiop924
>the Internet is a shopping mall. There are two big anchor stores, Facebook
and Google, at either end. There’s an Apple store in the middle, along with a
Sharper Image where they are trying to sell us the Internet of Things. A
couple of punk kids hang out in the food court, but they don't really make
trouble. This mall is well-policed and has security cameras everywhere. And
you guys are the bookmobile in the parking lot, put there to try to make it
classy.

It's already been mentioned, but this guy needs to get out a bit more.

The internet is a city. There's the specialist shops (HN), the bustling malls
(Reddit, YT), the shady back alleys (4chan, 8chan etc.), the historical
districts (Usenet, Archive.org), the cafes (IRC, ICQ, Slack, etc.). To their
credit, the author is more knowledgeable than most, however.

I see so many dismiss the internet as just Facebook, or YouTube, discuss
trolling as if it's a single phenomenon, and it's a recent thing, associated
with Social Media. So many think that there's an internet culture: there
isn't: there's a set of almost infinite numbers of overlapping, interlinked
cultures. I can even map out the origins and historical influences of a few.
There are even a few who think that social media sites are good forums of
discussion. The poor sods: the Usenet was a better discussion forum than
Facebook ever was, and the Usenet's not that great.

If you really want to see what the internet is like (that isn't advice for the
author: I'm pretty sure the mall analogy doesn't encompass his internet
experience, and is merely an analogue I find odd), explore. See it all, in all
of its weird, wacky, zany, jokey, serious, offensive, manic, smart, stupid,
brilliant, insane glory. I promise you, you won't be dissapointed.

People ask me why I'm not on social media. It's because social media is
boring. Unlike Reddit, 4chan, and the rest, not much interesting happens.
Unlike HN, I'm not likely to be intellectually stimulated, or learn something
new. Unlike static sites, I don't get to see that kind of wild creativeness
that personal webspace tends to invite in hackers, nerds, and others who know
what makes the web tick. I don't want to see what you ate, I don't want to see
your cat, I don't want to hear banal details about your everyday life. I want
to hear something intersting, new, and original. I want to hear the next Ze
Frank, or Tom Ridgewell, or Simon Travaglia, or Steve Yegge, or RMS, or PG, or
Ryan Dahl, and you can bet I won't on a site with a signal:noise ratio that
high.

People also ask why I'm fascinated with the internet. My response is, why
wouldn't I be? It's a catalogue of decades of human creativity and
interaction. It's open mike night at the largest club in the world, which is
also a discussion forum, and a shady back alley, and a convention. It is - to
borrow and butcher Sir Terry's words - like being blindfolded and drunk at
several different parties at once.

But, in what it rapidly becoming the sign-off on my incoherent, long-winded
ramblings that are really only tangentially connected to the topic at hand,
maybe I'm just totally mad.

EDIT: tried to clarify that I wasn't trying to insult the author. Not my
intent, but it seemed to come off that way. It still does, but less so, and I
prefer not to edit my old content too much. Also, I just checked out pinboard.
Pinboard is amazing, and I am impressed.

Basically, don't take this as anything more than a tangential, incoherent
ramble started by an analogy the author used which I found unrepresentative.
Because that's what it is.

~~~
joshu
"Needs to out a bit more" given the context is hilarious. I bet a cool $20 he
is more widely traveled than you - both physically and digitally.

~~~
qwertyuiop924
Indeed, as would I, on further reflection. I wrote a postscript at the bottom
to this effect, more or less.

I write most of my HN comments in the spur of the moment. As a result, they're
often inaccurate, idiosyncratic, poorly explained, or just weird. If anybody
asks, I ususally try to clear up any confusion.

This isn't necessarily a good idea, but if I thought too much before I spoke,
beyond a cursory look to see if I'm violating the rules, I'd be to afraid to
post anything interesting, or anything at all beyond polite agreement with
everybody, which is so very dull, don't you agree?

2933 votes and countless interesting discussions later, seems to have worked
our okay for me.

~~~
AceJohnny2
Maciej/IdleWords is a bit of a sacred monster around here. He loves to hate on
the HN crowd, and he definitely has a dim opinion of the VC/get-rich-quick
internet schemes that one could characterize the Valley for. He provides a
great reality check for the kind of internet bullshit that flies around a lot.

He's featured here frequently:
[https://news.ycombinator.com/from?site=idlewords.com](https://news.ycombinator.com/from?site=idlewords.com)

On a lighter tone, I highly recommend his "Argentina on Two Steaks a Day" [1]
and "The Alameda-Weehawken Burrito Tunnel" [2], each of which had me laughing
harder than anything else in my life.

[1]
[http://idlewords.com/2006/04/argentina_on_two_steaks_a_day.h...](http://idlewords.com/2006/04/argentina_on_two_steaks_a_day.htm)

[2]
[http://idlewords.com/2007/04/the_alameda_weehawken_burrito_t...](http://idlewords.com/2007/04/the_alameda_weehawken_burrito_tunnel.htm)

~~~
qwertyuiop924
Ah. That explains a lot. He's definitely well-spoken. And I don't mind people
criticizing the VC crowd. I don't people criticizing stuff in general, so long
as it's well done.

------
curuinor
Rather hilariously, deep frying is already a term of art in ML, of course in a
radically different setting. Deep fried convnets
([https://arxiv.org/abs/1412.7149](https://arxiv.org/abs/1412.7149)).

~~~
argonaut
One (not especially widely read) paper's title can hardly be called a "term of
art" in ML.

------
aub3bhat
Frankly as a grad student (The kind that the author apparently considers "dim
witted"), the entire article is meaningless babbling without any underlying
theme.

I wonder if the author truly understands "Machine Learning", what are his
qualifications? A degree in Art History, and some "programming experience"
aren't very assuring. E.g.

>> "The names keep changing—it used to be unsupervised learning, now it’s
called big data or deep learning or AI"

WTF?? The author should enroll in a beginner Machine Learning course on
Udacity or Coursera before making philosophical statements about fields he has
zero clue about.

It seems the only skill the author has is piecing together meaningless
arguments that appeals to average HN users incapable of distinguishing between
informed opinions and pseudo-scientific rants. Hell at least bad graduate
students have to give examinations, read papers and make original
contributions that get peer reviewed (otherwise they fail/get-kicked-out/drop-
out). Not like this guy who does not understands difference between
"supervised" and "unsupervised" machine learning, yet feels comfortable in
making "prophetic" statements about machine learning.

Also

>>> "These techniques are effective, but the fact that the same generic
approach works across a wide range of domains should make you suspicious about
how much insight it's adding."

What does he means by "same generic approach". If we assume he is implying
specific algorithms then we have a good "No free lunch" theorem that shows
that a single algorithm is not effective across all domains. Now if by
"generic approach" the author mean "machine learning" in general then its as
ridiculous as saying

"Mathematics is effective, but the fact that the same Mathematical approach
works across a wide range of domains should make you suspicious about how much
insight it's adding."

The entire article is filled with "truthiness" and "feel-good" statements,
which fall apart on closer examination.

~~~
idlewords
My degree was in studio art, not art history.

~~~
aub3bhat
Those two are still orders of magnitude closer to each other when compared to
difference between unsupervised learning with deep learning.

~~~
detaro
As someone with not particularly deep knowledge in either area, that
admittedly sounds a lot like "no, but the differences between subfields in MY
subfield are way more important than all that stuff over there", which is
similar to what you claim the article does. They are important once you care
about any details, but not for just describing changing fashions.

So I'm curious to hear a good explanation for that assertion, founded in
knowledge of both areas.

