
Ask HN: Is there a site providing example files for “all” media types? - salzig
Hej HN.<p>I would love to know if there is a good site&#x2F;source for files of different media types.
======
khedoros
What exactly do you want to do? A lot of the other answers assume that you
want examples of different h.264 encodings, JPEG image varieties, etc.

In my own side projects, I'm often dealing with old, badly-documented formats
with limited examples (like data files for games). I usually start with the
"file" command to try to identify the filetype, then look on Wikipedia and
filext.com to find links to format specifications. Usually, I can also find
the name of any programs that create/edit/view that file type, and that's a
jumping-off point to find examples (given the domain, it'll be anything from
another game that uses the format to a 90s-era fan page with modified or fan-
created data files).

I've used this site before too:
[https://wiki.multimedia.cx/index.php?title=Main_Page](https://wiki.multimedia.cx/index.php?title=Main_Page)

It provides links to a lot of format specifications, codec information,
sometimes the mplayer samples that other comments here have linked to, etc.

------
bariumbitmap
I started a project in this vein a while ago:

[https://github.com/nbeaver/mimetype-
menagerie](https://github.com/nbeaver/mimetype-menagerie)

It organizes files by mimetype. It's not complete, but it might be a good
starting point.

You can also look at the testcase folder for afl-fuzz, which includes
archives, images, and even an H264 compressed video:

[https://github.com/arisada/afl-
fuzz/tree/master/testcases](https://github.com/arisada/afl-
fuzz/tree/master/testcases)

------
Someone
You may want to look at
[http://fileformats.archiveteam.org/wiki/Category:Graphics](http://fileformats.archiveteam.org/wiki/Category:Graphics).
Collecting sample files from there is a bit of a hassle, but there's quite
some obscure formats.

On the same site,
[http://fileformats.archiveteam.org/wiki/Encyclopedia_of_Grap...](http://fileformats.archiveteam.org/wiki/Encyclopedia_of_Graphics_File_Formats)
points to an archive.org copy of a CD ROM with sample images.

Via its BMP page, I found
[http://entropymine.com/jason/bmpsuite/](http://entropymine.com/jason/bmpsuite/),
which looks like the definite resource on that format.

------
jewel
I've used the ffmpeg samples before:

[https://samples.ffmpeg.org/](https://samples.ffmpeg.org/)

Not nearly complete, of course. I can't find a set of test images from the
imagemagick project.

This would be a awesome github project; it wouldn't even need an associated
web page. You'd be amazed at all the different varieties of "legal" JPEG
images, for example.

~~~
niftich
Incidentally, [https://samples.ffmpeg.org/](https://samples.ffmpeg.org/) and
[https://samples.mplayerhq.hu/](https://samples.mplayerhq.hu/) are actually
mirrors of the same collection, it seems.

------
ninjakeyboard
Define "all" media types. Do you mean something like an example of each and
every one of these ?
[https://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf...](https://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/mime.types)

~~~
salzig
yes, kinda.

------
doykle
Apache Tika is a text extraction toolkit. They store a wide selection of file
types for their parser tests:
[https://github.com/apache/tika/tree/master/tika-
parsers/src/...](https://github.com/apache/tika/tree/master/tika-
parsers/src/test/resources/test-documents)

For large sets of some common media types take a look at the govdocs1 corpus:
[http://digitalcorpora.org/corp/files/govdocs1/by_type/](http://digitalcorpora.org/corp/files/govdocs1/by_type/)

For an odd example, sometimes a google search will turn something up.:
[https://www.google.com/?q=filetype:xlsx](https://www.google.com/?q=filetype:xlsx)

~~~
brudgers
The test files for the parser are available at:
[https://github.com/apache/tika/tree/master/tika-
parsers/src/...](https://github.com/apache/tika/tree/master/tika-
parsers/src/test/resources/test-documents)

------
alexschiller
[https://github.com/alexschiller/file-format-
commons](https://github.com/alexschiller/file-format-commons)

Here was my stab at that problem. Somewhere near 70 files, many are variants
on text files/code files iirc but a lot of data centric files as well. There
is a small neglected WordPress site linked to it.

------
swanson
Would like to find something like this as well.

I've used this site for videos of different sizes/filetypes before:
[http://www.sample-videos.com/](http://www.sample-videos.com/)

------
dugmartin
I've used this site in the past to test various video formats and sizes:
[http://www.sample-videos.com/](http://www.sample-videos.com/)

It is basically the "Big Buck Bunny" video in many sizes, durations and
formats.

------
a-priori
I don't know if there's any source with a wide variety of formats, but there
are various sites with some samples. Here's one for H.264 videos, mostly movie
trailers, encoded using various parameters:

[http://www.h264info.com/clips.html](http://www.h264info.com/clips.html)

I've used it before when I built a system that processed video files.

------
steveklabnik
[http://www.iana.org/assignments/media-types/media-
types.xhtm...](http://www.iana.org/assignments/media-types/media-types.xhtml)
is the canonical list of types. They do not have examples though.

------
niftich
Is this about Cascading Style Sheet media types? Or MIME types for video,
audio, etc.?

~~~
salzig
video, documents, whatever.

~~~
niftich
For video and audio, the best place is by far
[https://samples.mplayerhq.hu/](https://samples.mplayerhq.hu/), from the devs
of MPlayer

------
syngrog66
dunno. if not, go please make it. sounds like a useful public resource. it's
one of those classic cross-cutting concerns. it would fit in nicely with a web
where any piece of info you'd want, or service, is sitting at a URL, just a
tab or curl away.

