Hacker News new | past | comments | ask | show | jobs | submit login
Distill: a modern machine learning journal (distill.pub)
930 points by jasikpark on Mar 20, 2017 | hide | past | web | favorite | 105 comments

I sure hope this catches on, but we should all be aware of the hurdles:

- Little incentive for researchers to do this beyond their own good will.

- Most ML researchers are bad writers, and it's unlikely that the editing team will do the work needed (which is often a larger reorganization of a paper and ideas) to improve clarity.

- Producing great writing and clear, interactive figures, and managing an ongoing github repo require nontrivial amounts of extra time, and researchers already have strained time budgets.

- It requires you to learn git, front-end web design, random javascript libraries (I for one think d3 is a nuisance), exacerbating the time suck on tangents to research.

Maybe you could convince researchers to contribute with prizes that aligned with their university's goals. Just spitballing here, but maybe for each "top paper" award, get a team together to further clarify the ideas for a public audience, collaborate with the university and their department and some pop-science writers, and get some serious publicity beyond academic circles. If that doesn't convince a university administration that the work is worth the lower publication count, what will?

In the worst case it'll be the miserable graduate students' jobs to implement all these publication efforts, and they won't be able to spend time learning how to do research.

You're absolutely right that this is a lot of work, and not many ML researchers have all the skills needed for it.

In the short term, Distill's editorial assistance will help authors produce outstanding papers, although they need to be willing to work as well.

In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

And in the very long term, I think the right solution is to add a new component to the research ecosystem. Just like we we have people who specialize as research engineers, theoreticians, and experimentalists, I'd like to have a respected "research distiller" specialization. Eventually, I'd like to try and start special grants for research groups to have someone focused on this.

I fall into the longer-term category as a front end data visualization person who would like to learn more ML. Please reach out to me if you're looking for JS volunteer to help with code review, visualization polish, or implementing new visualizations.

I already know a guy who's doing this. Although he chose to publish very short videos on various research (including many AI/ML), the concept and goal is more or less the same.

Two Minute Papers on YouTube:


Karoly does lovely work! :)

As another designer + researcher with a varied background and an interest in data viz as well as ML, I am super interested in this as a potential contributor. I have experience creating an interactive visualization interface for simple ML algorithms (which has been used by professors in the life sciences department to understand / get a new perspective on what's happening). I would LOVE to be able to be involved with Distill.

I have actually been meaning to write a paper on my findings and have been looking for journals to write for. However it doesn't quite "fit" with most journals. Distill looks like it's more catered to "professional" machine learning people, at least for now. Is there any way that somebody with my background (design+data viz+development+interest and curiosity to learn ML) could be involved with Distill?

> Is there any way that somebody with my background (design+data viz+development+interest and curiosity to learn ML) could be involved with Distill?

Absolutely. We know a number of leading ML researchers who would love to publish papers as Distill articles but don't have the design/data vis skills. We'd like to facilitate collaborations which would lead to data vis people co-authoring cutting edge research papers.

This is very exciting. How are you looking at facilitating these collaborations? Will there be a listing of sorts, where, say, ML researchers would say "I need a dataviz guy" and then dataviz specialists can apply, almost like a job (or rather more like matchmaking I guess--bad analogy)?

Or would said facilitation be done by the admins / editors / steering committee? If so, then how do you plan on finding dataviz people? I'm asking this in particular because I would imagine that people who have ML findings to talk about would probably contact you ("I researched such and such, and found such and such. Now I would love to publish in Distill"). But I wonder if data visualization specialists would do the same thing. Contacting with "hey, I love data viz, would love to collaborate with somebody looking for one" feels a little inappropriate to me.


> In the longer-term, I'd like to explore match making between data visualization people who would like to get into machine learning and machine learning researchers publishing papers.

As a data viz person, I would be absolutely thrilled to work on this, I'm trying to scratch time here and there to position myself better in that respect, learning more and trying to bridge that gap.

I left a comment on your blog announcement to this effect, but I'd love to be a "research distiller" :)

You already are. :) Love your blog.

Well, to be paid as such :)

And I rarely cover recent work.

  > In the longer-term, I'd like to explore match making
  > between data visualization people who would like to
  > get into machine learning and machine learning
  > researchers publishing papers.
I'm into data viz and interested in doing this. I'm currently plowing through the Fast.AI course, and was actually already considering creating visualisations to help test my thinking.

Thanks for bringing these points up j2kun.

I'm a junior faculty working in ML with no personal knowledge of web development, d3, etc. While the papers currently on Distill are absolutely gorgeous and will be an invaluable tool for learning advanced ML concepts, I simply cannot see myself or my students putting the time to actually create something like that.

Unless a student is especially adept at the specific tools needed to create these and especially enthusiastic at using them, I will actively discourage them from doing it. The time needed is simply not worth it right now.

I would be happy and grateful if tools for creating these articles become easier to learn and use eventually, such that even the lower-budget, time-constrained researchers could afford to create them.

From my experience, most ML researchers are in your camp. They are primarily interested in the ML, and good (not-just-in-your-head) visualizations are at best icing on the cake of their understanding.

i disagree with the first point. I'm working on a distill article with Chris and Shan, and the major draw for this has been impact. It seems very plausible that an article on distill has the potential to reach a far broader (and different) audience than a paper in even a top tier mathematical journal like SIAM would.

I won't deny the time commitment needed for a distill article is not trivial - it is far more work than a technical blog. But in terms of a pure tradeoff of time per publication, the calculus makes sense. Most of the work of research distillation and synthesis is already part of the research process, and writing a distill article is just a matter of putting it all of down on paper. Doing research is a far more time consuming and less predictable process.

I meant incentive with respect to career advancement, in the narrow sense of what metrics hiring and tenure committees use to make decisions.

To get a grad student or post-doc position, you're really just trying to convince a specific human that you're smart, useful, and to some extent personable. Metrics are a good argument for that, but having them know who you are before you apply is even better.

This applies especially if you write the distillation targeted at the lab you want to hire you.

Good points. We do believe that well-written articles save readers time on the other end, which hopefully will offset some (if not all) of the cost of producing them. We also believe that taking the time to edit your ideas not only helps your audience but helps your own thinking. Outsourcing the work to others would most likely just lead to adding a veneer to an article rather than a substantive improvement. Instead of outsourcing we're thinking about how to foster collaborations in the future.

I think you have emphasized the main point: a lot of work for a low reward. Research is more above exploring the state of the art and new venues, divulgation and graphics is more akin to book sellers (for example Nielsen open science, and other interesting books, but for young researcher the most important and rewarding goal is to publish.

I think it depends on what type of researcher you are. In every field there are always authoritative leaders who are comfortable writing "survey papers", which is perhaps most comparable to what the "research distiller" is all about. Except, these guys know from experience that visualization of complexity is perhaps the most direct way of communicating to the brain... and the real-time interactive nature of such technologies is far beyond "book sellers", and more into how you can imagine the future of human communication more generally approaching (perhaps with support of real-time speech recognition and graphics generation AI, e.g.)... but I digress - this is most certainly a fantastic move in the right direction for the research community at large, and especially for the machine learning community where so much is happening so fast, and we really do need people to stop and help us "distill". :) I have fond memories of finally understanding LSTMs based on Christopher Olah's blog, and if we can somehow scale this up and out in other areas, I'll gladly invest time and money and energy into helping pursue the bigger opportunities here...

Well, now you need a Distill WYSIWYG, to make it usable (for most of the intended audience).

Hey let's be honest, most academics (that I know) still don't even use LaTeX (or refuse to do so). This is really cool, but requires way too many skills (in js/css3/html5/distill-extensions and node.js).

Personally, my team and I had really great experience with sharelatex.com, whom only I had knowledge about LaTeX. I liked that it's also opensource with a permissive license. I would rather host that on sandstorm.io the next time, or just pay for the comfort offered by overleaf.com (I've never seen such a beautiful colloborative LaTeX Editor).

• What about vendor lock-in?

• Can you export to LaTeX, Word or PDF?

• Can you selfhost it for your team or company?

> Hey let's be honest, most academics (that I know) still don't even use LaTeX (or refuse to do so).

What field? TeX is pretty much de rigueur in Math/CS/Physics graduate schools in the U.S.

To my surprise certain subfields of CS don't use LaTeX at all or rarely and use MS-Word instead. You kind of have no choice since the conference/journal templates are only provided in one format (well you can create your own template but only if they accept PDF entries...yes some only accept Word files).

I agree with you here, and have no idea what the OP is talking about. In Math and ML, TeX is so ingrained in the culture that there are _jokes_ based on TeX puns. (Where do mathematicians go for a rational rack of ribs? The \mathbb Q)

You're right, i've been myself using git, github, keynote, ffmpeg, medium, JS, python, d3 and others to build blog post.

I clearly don't expect people to do that much. I can only do that because i'm coming from web development, and very nice tools started to appear recently.

People in research needs a design framework like a set of templates for keynotes/PPT/JS/CSS (think about how much traction got bootstrap). Distill is doing an awesome jobs at showing the example of what you could do.

Maybe Distill could open-source the templates they use to build those blog post?

They did actually! [0] The blog posts are also online on their GitHub site.

[0] https://github.com/distillpub/template


up next: a neural net that reorganizes research papers to improve clarity

Your criticism is spot on. If something like Distill existed for my own research area I would applaud it, but probably not use it because of time constraints.

On the other hand, being able to write well and to create good interactive illustrations are valuable skills. Maybe we could incorporate these things into seminars or otherwise crowdsource the creation of e.g. individual figures?

I'm not in academia, but I guess the impact (citations) you could get with a distill-like paper will be higher than the ones you get on a traditional paper-based journal.

So, I guess this will get distill get traction.

As I said in Rob's thingy, I hope you get the tenure committees and job committees, because they don't have to respect it but they're the ones you have to get to respect

All we can do is work hard to build academic support:

* In the last three weeks, we've had 80 outreach conversations with various stakeholders for Distill. The majority of these have been academic researchers. The response has been extremely positive.

* A number of ML faculty at Stanford / Berkeley / Toronto / Montreal are very excited and supportive of Distill.

* Distill's steering committee consists of recognized leaders in ML and data visualization.

* We've registered with the library of congres / CrossRef, dotting our "i"s and crossing our "t"s to be a serious journal. In some senses, we're more legitimate than some notable venues.

* The largest industry research groups institutionally support Distill.

My sense is that the academic community really wants to have something like this, if it can be done well. At the end of the day, we need to publish outstanding content and demonstrate that we're a high-quality venue.

Can you share a "behind the scenes" of what it took to get Distill off the ground? You hint at dotting your "i"s and crossing your "t"s, but an explicit manual would be useful. Other communities than just machine learning could benefit from something like this, and if Distill succeeds in being taken seriously by your research community, it would help to have a playbook in which to replicate that success in other research communities as well.

My concern is also the academic & industrial support community will support the concentration of a few contributing institutions to such a journal. I have no doubt that Distill will have high-impact and visibility among various audiences.

Yet I don't see how this will readily support possibly cutting-edge work or new research in machine learning that does not have access to visualization development, or these forged connections to Distill to facilitate the development of these visualizations.

So it seems like a likely outcome is that Distill publishes content from well-regarded institutions and increases publicity for that work, to the detriment of a vast bulk of papers which do not have access to the visualization resources to develop Distill-ed versions of their work.

Furthermore, and this is a larger disciplinary issue, but it seems inherently this could end up spotlighting more CS-y machine learning vs statistical learning due to cultural differences between disciplines and differences in computational/web development background in grad students and researchers in both fields. Are there efforts to reach out to statistical associations as well?

It varies heavily by institution and country, but CS is moving increasingly towards caring about citation metrics above anything else (with "selectivity", i.e. a high bar for peer review and low acceptance rate, being the main other factor). Unlike in most other fields, conference papers therefore hold weight, not only journal articles. This does sometimes cause trouble at higher levels of large institutions, where a CS dept strongly recommends a candidate for tenure, but when the case makes it up to the dean level, the dean, who is a physicist or biomed person, wonders how they could possibly recommend tenure for someone who has "just" a bunch of conference papers and few journal articles. But that is becoming rarer at places with top CS departments.

Anyway, as a result, I don't see a reason why an alternative-format journal would necessarily fare any worse than conferences have in terms of becoming accepted, if the reviewing standards are high and if it attracts citations.

For the hiring side (more than the tenure side), to some extent, oddly enough, the first-order decision here is in Google's hands. A lot of CS hiring committees nowadays unofficially do a first cut sifting of resumes by typing candidates into Google Scholar and looking at their Google-computed h-index, so what "counts" is basically up to Google.

Agree with a lot of this. If Google wanted they could probably even give some extra boost to forms of publication they endorse. I'd love to see Open Access weighted higher in Scholar and they could add an extra boost for "interactive examples" or "available data sets". I think you're spot on that they hold quite a bit of power (high GS ranking is also an incredible citation boost for the typical "tack on citations").

I see comments like this all the time, and while what you say is correct, I think committees increasingly appreciate this sort of thing - frankly they have to or they will miss out on some of the most innovative people. There is plenty of "standard" already (nothing wrong with that of course).

With new things, what you need is at least one person on the committee to fight and convince the others why this new thing is awesome. As someone who is now on some of these committees, I would put all my weight behind something like this should I encounter it (assuming of course it has the relevant quality).

in my (incredibly limited) experience, Impact Factor is also a consideration


Looks simply amazing and looking forward to getting deeper into it.

As a side note who made the interface design for this?:


I am very interested in getting into this space from a design perspective.

I'm one of the editors of distill and I designed the interface for the playground, along with my awesome colleague Daniel Smilkov.

It's really great work. Would love to connect with you guys and talk some more about what you are doing.

Hi Chris,

Thank you for this effort. I'm a fan of your blog articles. A question regarding Distill: is it a journal like conventional journal to target new research? Or it is a journal for educational articles to explain old researches better?

I hope to contribute to an effort to better explain deep learning. I don't know if that is what distill is looking for?

We're interested in both review/tutorial articles and novel research articles. :)

So would an article explaining the basics of say dynamic programming be of interest? Are there "page" limits. For example, would a tutorial article that is about 20-30 pages in a traditional paper format be okay?


How do I donate to this?

Just by spreading the word :)

I've been trying to read more primary source information, sort of as my own way of combatting "fake news" but before that term was coined. There's a learning curve to it, but I've found that reading S1 filings and Quarterly Earnings Reports can be more enlightening than reading a news article on any given company. Likewise, reading research papers on biology and deep learning is significantly more valuable than reading articles or educational content on those topics.

As you'd imagine though, it's really hard. Reading a two page research paper is a very different experience from reading a NYTimes or WSJ article. The information density is enormous, the vocabulary is very domain specific, and it can take days or weeks of re-reading and looking up terms to finally understand a paper.

I'm really excited about Distill, there's a lot of value in making research papers more accessible and interesting. I've noticed that the ML/AI field has been very pioneering about research publication process, some papers are now published with source code on GitHub and the authors answering questions on r/machinelearning. This seems like a really great next step, I hope other fields of science will break away from traditional journals and do the same.

I don't want to undermine visualizations, they are awesome, but one of the big problems I see with ML research is the lack of re-produceability. I know that Google, Facebook and some others already share associated source repos, but it should almost be mandatory when working with public benchmark datasets. Source + Docker Images would be even better.

I worked in clinical research in a past life and studies would be highly discounted if they couldn't be reproduced. A highly detailed methods section was key. Many ML papers I see tend to have incredibly formalized LaTeX+Greek obsessed methods section, but far short of anything to allow reproduction. Some ML papers, i swear must have run their parameter searches a 1000 times to overfit and magically achieve 99% AUC.

Worse, I actually have tons of spare GPU farm capacity i'd love to devote to re-producing research, tweaking, trying it on adjacent datasets, etc. But the effort to re-produce is too high for most papers.

It is also disappointing to see various input datasets strewn about individuals' personal homepages, and sometimes end up broken. Sometimes the "original" dataset is in a pickled form after having already gone through multiple upstream transformations. I hope Distill can instill some good best practices to the community.

I think that having a venue that can publish non-traditional academic artifacts is an important step for reproducibility, even if it isn't our focus.

It seems clear to me that the future will involve some kind of linking reproducibility to papers. If we want to find that future, we need a way for people to experiment with what a publication is.

Jupyter notebooks are a big piece of solving ML reproducability, it feels like.

I see this a lot, but I disagree, at least in their current form. They miss a variety of very key parts for reproducibility (which, to be fair, was not their original goal).

* Dependencies like libraries are not specified anywhere.

* Dependencies on local code are not bundled.

* Dependencies on local data are not bundled.

* Underlying requirements like LLVM (which needs to be specifically 3.9.X for llvmlite in python as I discovered recently).

* Perhaps most dangerously, you can run the code sections out of order, and deleted sections will leave their variables around which can interfere with the run. I've been caught out by this in my own notebooks.

I really like jupyter notebooks, but I think some of the design decisions (correct for some ways of working) actively work against reproducible reports.

There was a recent writeup here:

> we were able to successfully execute only one of the ~25 notebooks that we downloaded.


Right, "a part" was important. Looks like the authors of that writeup agree.

> Technologies such as Jupyter and Docker present great opportunities to make digital research more reproducible, and authors who adopt them should be applauded.

I somewhat disagree that it's a big part or even really should be a part of the solution, I'm really not sure that these notebooks are the right approach to making reproducible research. The conclusion there doesn't seem supported by their findings, to me.

I think they solve a different use case well, and forcing them into a workflow they weren't designed for may just result in both less useful workbooks and a poor experience.

Edit - To expand a little, jupyter notebooks are nice to mix code and descriptions, and in essence force people to release a certain amount of their code. But other than that they actually provide fewer of the guarantees that you want from things for reproducibility. And since the goals for reproducibility generally force more restrictions on how you work, I can see there being more issues for trying to match these different ways of working.

I don't see how there are any features which are useful for the goal of making things reproducible, and as such why people keep bringing them up as a solution.

The main steps would seem to be

1. Make sure the results used are not generated on "my machine" but on a specified base run somewhere else. Just like we don't take the unit test results I run locally as gospel.

2. Unique and versioned identifiers for code, base system and data.

3. Archived code and data.

4. An agreed on format in the output data to say where it came from (which references the identifier(s) for the code, base system used and input data)

Your output might be a rendered notebook, but the notebook itself is entirely orthogonal to the process, as what a notebook provides is:

* A nice interface for entering the code

* A nice output format

* A neat way of mixing nicely written documentation along with the code

The announcements and About page indicate an emphasis on visuals and presentation, which I apprI've. But when I think of "modern machine learning," I think of open-source and reproducibility (e.g. Jupyter notebooks).

Will the papers published on Distill maintain transparency of the statistical process?

I see in the submission notes that articles are required to be a public GitHub repo, which is a positive indicator. Although the actual code itself does not seem to be a requirement.

I totally agree that this is very important. While it isn't currently our primary focus, having a publishing platform that can accommodate a variety of content types (including code and data) feels like a step in the right direction.

As a developer with a weaker background in mathematics, I face a language barrier with many modern algorithms. After lots of research I can understand and explain them in code, but I have no idea what your artistic-looking MathXML means.

Visualizations or algorithms described using code are much, much easier for me to understand and serve as a great starting point for unpacking the math explanations.

I understand where you're coming from and you raise a valid point, but the ML/AI is heavily academic and oriented around research. The target audience is people with a very strong math background and the necessary context.

I would recommend picking up a book on Comp Sci or algorithms, even just a cursory reading helps a lot. CS is very much not just programming and it is heavily restricted by descriptions through code.

Shameless self-plug: If you like interactive explanations, check out http://explorableexplanations.com/ and the explorables subreddit: https://www.reddit.com/r/explorables/

Is there any concern about a web-native journal being less "future-proof"? I've come across quite a few interactive learning demonstrations in Flash/Java that no longer work.

This is a high-priority for us. By focusing on web-standards and avoiding proprietary plugins we're pretty confident that the content will be future-proof.

Something that could help is perhaps a choice that examples should work in (e.g.) Firefox recent.x on ubuntu, then provide a VM and archived version of firefox. Put it on a platform that archives things with C/LOCKSS and get a doi, then although you're not expecting people to use it on a daily basis, it'd cover several "worst case" kind of scenarios.

Of course that's not completely permanent, but would perhaps provide some more safety.

Also in addition to the sibling comment, the published articles will be on github under their organization.

I feel like binding the journal to GitHub means that it's less likely to exist over the long term (where long term means >100 years, which is as long as I would expect an academic article to be accessible for).

We produce "archive html" files where everything is bundled into a single file. We're looking into ensuring their long-term preservation with projects like LOCKSS.

Example: http://distill.pub/2016/augmented-rnns/index.archive.html

A simple intermediate step would be to archive with Zenodo

YC Research's (and longtime HNer!) michael_nielsen wrote an announcement here: http://blog.ycombinator.com/distill-an-interactive-visual-jo.... Hopefully he'll participate in the discussion too.

I wish there was a way to subscribe to a weekly email related to this.

There does seem to be an RSS feed: http://distill.pub/rss.xml Although it is not advertised on the website (I did view-source to find it).

Should you plug that in to IFTTT, Zapier, or something to that extent, you hopefully then have a weekly feed.

Though I do agree, an option to signup to updates directly on the website would be much better ;)

This is great but it would have been even better if Distill was designed to play well with the current system. Vast majority of researchers are focused on publishing at various conferences with strict deadlines. Even if they had all the skillsets and time to produce these beautiful illustrations, I highly doubt this will change.

Also, it is very likely that veterans in the field might think of this format as too verbose and too sugar coated, more appropriate for less math-savvy users and therefore not mainstream. Furthermore, I really feel TeX is irreplaceable unless you got all of its feature covered. All of the historic effort to replace TeX - even with bells and whistles of WYSIWYG editors - in research has failed and its important to learn from those failures. You will be surprised how many researchers insist on printing out the paper for reading even when they have access to tablets and PC.

Instead of being another peer reviewed journal, Distill could act as the following:

- platform to publish supplemental material and code

- platform to manage communication/issues post publication

- platform for readers to invite other readers for peer review and generate "front page" based on some sort of reviewer trust relationship.

- platform to host Python and MatLab code with web frontends without researchers having to learn new developer skills

- support pdf submissions but without all the eliteness of arxiv and using algorithms to create the "front page" based on some sort of peer reviewer rankings.

Above features are indeed sorely missing and Distill has good opportunity to become an "add-on" to current academic publishing systems as opposed to another peer reviewed journal.

This is really exciting! Chris et al: have you guys seen Keras.js (https://github.com/transcranial/keras-js)? It could probably be useful for certain interactive visualizations or papers.

How does this provide IF ratings? Probably irrelevant for industry, but publishing in academia is all about IF, no matter how bad and corrupt one might think it is.

And what about long-term stability/presence. Most top journals and their publishing houses (NPG, Elsevier, Springer) are likely to hang around for another decade (or two...), while I don't feel so sure about that for a product like GitHub. Maybe Distill is/will be officially backed (financially) by the industry names supporting it?

That being said, I'd love seeing this succeed, but there seems much to be done to get this really "off the ground" beyond being a (much?!) nicer GitXiv.

Our present JIF is undefined because we haven't existed for two years yet.

If you just apply the formulas anyways, you'll get an JIF of (6 citations)/(4 publications) = 1.5. Again, this number is really pessimistic because those publications are only a few months old and haven't had time to accumulate citations.

> And what about long-term stability/presence.

We aren't particularly tied to github besides it being convenient. Even if the journal died, keeping it up indefinitely would be very cheap.

More than that, we're looking into joining projects like LOCKSS to ensure preservation of the academic record.

> but there seems much to be done to get this really "off the ground" beyond being a (much?!) nicer GitXiv.

We've actually done a lot of the logistics needed to legitimize a journal. We've registered as a journal with the library of congress, joined CrossRef, and built infrastructure to integrate our metadata with the library system.

Of course, there's a lot more to do. But the biggest thing is to just publish great content and run Distill as a serious, high-quality venue.

I for one am not so convinced GitHub is likely to be around for another decade or two. But whatever, let's just pretend that Distill can always find a free hosting solution, that is not so unlikely. Maybe that's good enough?

Re. IF, sorry if my first post wasn't as as obvious as I thought it would be. I wasn't referring to how IF is calculated, much less to Distill's current IF. Rather, there are two big problems related to IF that Distill needs to "solve"; Not the how, but rather then when and who of IF:

Ad when: The egg and the hen problem. As colah3 wrote, Distill's IF will only become meaningful in two years. But if you have exciting research, you want that to be in an high-impact journal/venue now. So attracting good research as a new journal/venue is extremely difficult, and probably the one main reason why new journals fail (c.f. the number of new journals/venues and the mostly non-existent change in impact rankings of the "best" places to publish). However, if you can get private researchers in industry to publish in Distill, because they are not [so] "dependent" on IF, you might accumulate sufficient impact in the first two years to get to a nice score, that later makes Distill competitive to the various IEEE journals or JMLR.

Ad who: The even worse problem that (at least European, not sure about US) universities evaluate their researchers by looking up their Web of Science ranking/score. WoS in turn is controlled by Thomson Reuters (TR), who also decide which journals get ranked in WoS (and sell access to WoS to universities and governments - n/c...). If a journal is not "recognized" by WoS, the publication or its citations do not get counted by TR. Ergo, as a public researcher, your funding dries up and/or you don't get the promotions you need. For that reason alone, no researcher in public research will allow her/his students and postdocs to publish in a journal that is not indexed by TR/WoS. But again, you might get around that by behaving "like" arXiv at first, at least: Most journals now grudgingly accept that the work was first on arXiv before it got published in some high-impact journal or venue. And maybe there is even a chance that the publishing industry will have to accept Distill in their midst (i.e., index it in WoS) if some other industrial backers create enough pressure...

As might be clear from the above, I (and many researchers) am (are) fed up with the current publishing system, so I certainly hope a "self-hosted", free solution controlled by the public [researchers] one day will break the iron first the current (private) publishing houses exert over how research is managed and evaluated today. If Distill manages to keep itself independent from industry, but at the same time can use the political weight its current backing could bring, maybe this is a way to break this vicious cycle?

While this is very nice, I'm a bit confused about the target. What kind of material is intended to be published here in the future?

Because the blog post and title seems to be describing it as a "journal" intended to replace PDF publications, but the actual content appears to be more in the tutorial/survey category, e.g. "how to use t-SNE," etc. Is this intended to be a place to publish new research in the future, or is it meant more for enhanced "medium"-style blog posts?

Both are fine, I just find the dissonance between the announcement and the actual content a bit confusing.

I feel like science publication in general could benefit from disruption of the publishing model. I'm not sure that the toolkit that Distill has provided is quite enough to totally change the paradigm, and it currently restricted to only one field.

I like the idea of having research being approachable for the non-scientist, and the more important question of whether there is a more efficient form (in terms of communicating new science between scientists) for research papers to take.

Is there any relevant work along this vector of thought that I should check out? Because I would really love to do some work on this.

Yes, check everything made by Bret Victor and his explorable explanations.

I made an awesome list recently just for this topic: github.com/sp4ke/awesome-explorables

Would saving jupyter notebooks as .html work? PS: I have published in all of top-4 tier ML conferences but sk at html/css/js. What is my pathway to distill now? I, like every other researcher worth her/his name in salt is always running behind clock when it comes to deadlines and lit to review. So, yeah? Coaxing myself into investing time for css/html/js in lieu of picking up more math tools seems criminal to me. Am I alone in this ?

Wow this comes with great timing!

I am a UI-developer who has been wanting to learn ML forever. I started working on

1. fast.ai 2. think bayes 3. UW data science @ scale w/ coursera 4. udacity car nano degree

I'm going to write some articles about what I learn and hopefully move into the ML field as a data engineer in 6 months. I figure I got into my current job with a visual portfolio of nicely designed css/js demos, maybe the same thing will work for AI.

I don't see it written explicitly; can anyone confirm that this journal is fully open-access?

Yes. Everything is published under Creative Commons Attribution.

(One of the members of our steering committee, Michael Nielsen, has a significant history advocating for open science. I think there's about a snowball's chance in hell he'd be involved if we weren't. :P )

It's not super clear what if any license is offered for code and data, eg from http://distill.pub/2016/misread-tsne/

> Diagrams and text are licensed under Creative Commons Attribution CC-BY 2.0, unless noted otherwise, with the source available on GitHub. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

Ideally code and data would be unambiguously public domain (CC0-1.0) or under appropriate open source and open data licenses.

> Everything is published under Creative Commons Attribution.

this is tres bien.

same for data sets?

That would preclude most research data.

If you use Wikipedia as an input, for example, your data is CC-By-SA, not CC-By.

Seems to be here: http://distill.pub/journal/

Passages like:

"Distill articles must be released under the Creative Commons Attribution license."

With a little more flexibility to keep things private before publishing: "You can keep it private during the review process if you would like"

You should definitely assign a DOI to each article.

Distill does assign DOIs. There is a citation_doi meta tag in the page source, and you can also find a complete list here: https://search.crossref.org/?q=Distill&publication=Distill

I agree that the DOI should be included in the BibTeX citation.

I see! Yes, this is something I miss a lot on Google Scholar (I have to go to the article page to search for the DOI field). It would be nice to also display the DOI link somewhere near the author list since it seems standard practice, but in the citation section would be good as well.

Each article currently gets a DOI DOI: http://doi.org/10.23915/distill ISSN: 2476-0757

Uh, no? That's the DOI and ISSN for the journal, not for each article. The BibTeX code at the bottom of each article doesn't include a DOI either.

Looks very good (especially the team behind it!), but I wonder if there's a discrete step down to where you make machine learning materials accessible to the general public beyond data visualizations and clear writing. This will certainly be a more interactive experience, but it seems to cater to those who are "in-the-know" and require a bit more interactivity/clarity. It'd be nice to discuss the format changes or the "TLDR" bot of machine learning that makes machine learning research truly accessible to the general public.

This is amazing! My burning question - as has been pointed out in the thread, the effort to produce a great article on Distill - generating interactive figures, doing front end web dev etc. would require a lot of time and resources on the part of the researchers. Is it possible to include within Distill an option to connect researchers to willing-and-able developers in those domains (for example, me) to help them get it done?

I already have a nomination. The guy who wrote this blog post:


It's the only way I could get a working model of Caffe while understanding the data preparation steps. I've already retrofitted it to classify tumors.

Great stuff! I'm a fan of what's gone up on distill so far. Question for colah and co if they're still around: When does the first issue of the journal come out (edit: looks like individual articles just get published when they get published, n/m). Also, that "before/after" visualization of the gradient descent convergence is intriguing -- where's it from?

Find out in a week!

I don't know jack about machine learning, but these illustrations are gorgeous - simple, elegant, and aesthetically very pleasing.

Looking at the how-to section[1] for creating distil articles, I fail to find how to write math and some notes on how best to reference sections of the document.

Other than that, this looks, much, much easier to write than LaTex.

[1] http://distill.pub/guide/

It would be cool to see greater diversity of thinking on the about page. perhaps the pub is designed for insiders.

Having more research transparency is great for community of likes minds to learn from. A suggested addition is an section and team to lead a discussion ML ethics.

I will definitely submit my first paper to Distill. It draws upon a few different fields but the foundation is definitely machine learning.

What a time to be alive!

Anyone here has any idea if Jupyter notebook -> save as .html would do the trick?

Hopefully this won't be another ResearchGate dressed in open source clothing.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact