Hacker News new | past | comments | ask | show | jobs | submit login

Readability is important to me. What can I do better? I always appreciate feedback.



It is an interesting article and some of the results are unexpected, but the layout is way too long and could be greatly condensed. At first glance and speaking from how I like to lay things out in a paper:

- You do not really need to show the prompt interface. You can show it once and thereafter use a bulleted list format, or simply show the input image if it responded correctly. - Your figures should be about half of their size, they don't need to fit the width of the body. - For comparative results against other models you can use a table with colored cells, with the test name on the row and model names on the columns. - For the dog, show a side-by-side figure with the raw image on the left and the box on the right, and include the coordinates it gave you in the body. - In your conclusion show the full matrix table of comparative results and summarize the relative strengths of the model against the others.

In terms of the writing and methods your conclusion says little and your tests do not go into significant depth. For example with the tire image you could show that it succeeds when cropped but as the photo gets wider it begins to fail to correctly identify the text in the image's center. For example see the methodology and presentation this article used: https://dynomight.net/ducks/

Also, the OCR test is too simple, even a 20-year-old OCR algorithm would probably recognize that. Experimenting with progressive degradation of the image could show its strengths, and analysis could show its accuracy at each level of degradation.


I'd say, trying to read this, the biggest problems are:

- tons of visual clutter, all those gradients and lines like the header or hero image - a floating ToC which insists on jamming in 'recommended links' (?!) the entire time - no outlines. Every single image or screenshot blends into the actual article. - a visual summary which is hard to read because it has tiny text and looks like a correlation heatmap instead of a table - highly inconsistent use of linking. Like, why does 'We have evaluated Gemini across four separate vision tasks:' link only 2 of the 4, and then not to the section in this article? - highly repetitive screenshots, which add nothing, and in conjunction with the lack of outlines for the images and the many outlines inside the images, means that the benchmark sections are a frustrating visual jigsaw puzzle where you have to decode the screenshot again and again to look at the tiny text inside it. It would be better to provide one (1) screenshot of each model's UI, which is all I need to see to get an idea of what it looks like and the implied workflow and what sort of metadata/options it has, and then for each task simply show the image/prompt and each model's responses as a normal blockquote or text.


^^ reformatted

- tons of visual clutter, all those gradients and lines like the header or hero image

- a floating ToC which insists on jamming in 'recommended links' (?!) the entire time

- no outlines. Every single image or screenshot blends into the actual article.

- a visual summary which is hard to read because it has tiny text and looks like a correlation heatmap instead of a table

- highly inconsistent use of linking. Like, why does 'We have evaluated Gemini across four separate vision tasks:' link only 2 of the 4, and then not to the section in this article?

- highly repetitive screenshots, which add nothing, and in conjunction with the lack of outlines for the images and the many outlines inside the images, means that the benchmark sections are a frustrating visual jigsaw puzzle where you have to decode the screenshot again and again to look at the tiny text inside it. It would be better to provide one (1) screenshot of each model's UI, which is all I need to see to get an idea of what it looks like and the implied workflow and what sort of metadata/options it has, and then for each task simply show the image/prompt and each model's responses as a normal blockquote or text.


All: I sincerely appreciate the time spent sharing feedback. Your notes and comments are helpful and give me tools to be a better writer .

Regarding the screenshots, I am not a fan of this approach. We adopted it because of the early trend to share ChatGPT screenshots, and to ensure people could see the origin of our prompting (the web interface).

I will start a discussion about screenshots with the team. This can be better.

I will also discuss the layout, too. Machine learning and AI is difficult enough. To the extent to which we can focus attention on the most important part of the page — the content — we should.

Thank you again for your notes! I appreciate it.


While I don't necessarily agree with all of these points,

> link only 2 of the 4, and then not to the section in this article?

This one is particularly prevalent on websites and it's quite annoying. When the site has any topic explainer articles, the terms that refer to those topics are always linked to those other articles, presumably to increase ad impressions and keep users on their site- but when there are legitimate article-specific links (which are almost always what I want), I have no way to locate those links (for instance when finding an original source).

Back in the day websites would use a different link style for this sort of "internal plug" style links, which was helpful. I guess it died out because users didn't want to click them. So the solution is, make it hard to tell which ones are internal plugs!


Obviously the core content wasn't but the article reads like unfiltered AI output. It doesn't really 'flow' for a lack of a better word.


I'll note that your article was very easy to read in Safari's reader mode.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: