First few millions of IDs are just autoincrements and then they swapped to the custom sort. If you are bored, you can dig through who used instagram in the early days.
BTW base(62/63/64) is very popular way to encode IDs. Some sites will use much more custom alphabet. My favorite was Vine which used
This is base64. Base62 would be without the "-_" (26 upper case letters + 26 lower case letters + 10 digits = 62, if you include two more characters then it's base64, regardless if the characters are url-safe like -_ or /+ like in the standard base64 encoding).
Also worth noting that with things like base64/base62/etc. you can either encode a byte array into a string of known characters, or if you deal with integer values only then you can compress it much more by converting from a base10 number to a base62 (or similar) number.
For example, "1000" base64 encoded yields "MTAwMA==", but if you'd convert the base10 number 1000 into a base62 number then you'd get "G8". It's like converting between base2 and base10 (binary to dec) or base16 (hex) to base10 (dec).
Thank you for correcting me
Picking base 49 (a seemingly random number) seems to be perhaps a similar security-through-obscurity thing?
I've written code to produce random (non-sequential) ID's for storage in a database, but it's surprisingly non-trivial to produce a performant INSERT that will work without ever producing collisions.
Relying on the database's AUTOINCREMENT but obfuscating the value through a baseXX encoding seems actually much easier, when the only "security" you're trying to provide is against reporters and business analysts trying to estimate the site's usage/popularity, and they're quite unlikely to bother to reverse-engineer your encoding scheme.
We managed to get a safe space of around 50M per second, so we could safely use shard_id to prevent collisions.
I don't buy this explanation. If you really cared about security, you'd encrypt the number (preferably through a block cipher). Using a weird encoding system only provides marginal security, as evidenced by this post.
If “timestamp” function is not from FB or not inspired by, then it must be a massive coincidence because FB’s IDs are almost exactly the same. They just don’t hash them with base62.
EDIT: ugh. Still early for us slackers on the West coast to do basic math.
What the author refers to as the "real post #1" has a taken_at_timestamp less than the first picture. It's still id #6 in the database versus #2 (the dog picture). Numeric ids are easily visible in the source of the page or by using this not-so-secret query string: https://www.instagram.com/p/C/?__a=1
When writing a feed-based app, timestamps are one of the first things you inevitably screw up (read it from local vs. the server, pull it from exif and then decide to pull it from upload date, forget a timezone, fix a local time on the server that was out of sync, update the timezone on edit vs on creation, etc etc etc).
This is the reason my reddit user number is lower than spez and kn0thing's.
We may never know for sure...
- the 'noon' photo is taken at night
- the '10:26am' photo shows shadows on the windowsill from the south and west - Pier 38 runs almost west->east from shore. (something on the shore reflecting light? I'm assuming the stronger shadow is from the sun). Popping the date and time into a shadow calculator shows the sun would have been positioned above the south east corner of Pier 40 from that window - which couldn't have cast either shadow. I used http://shadowcalculator.eu/#/lat/37.78196555892351/lng/-122.... to check. The 10:26 photo does look more like it was taken around noon.
So, as others have said, these are upload times not when the photos were taken.
It's from a fantastic movie that would be completely ruined if I told you the name; so refer to the comment below if you don't mind the huge spoiler...
> Reverse engineering, also called back engineering, is the process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object; similar to scientific research, the only difference being that scientific research is about a natural phenomenon
As an aside, I just reverse-engineered HN and found the first ever submission:
For the average programmer, it might not seem too much. But for a normal user, seeing something and figure out how it's working, without knowing internals beforehand, can be said to be reverse-engineering.
I guess we should start rating reverse-engineering so people can know what to expect.
So all in all, my bug wasn't that exciting, but if you think about the early chaotic days of a startup, there are often nuances in dealing with date and time that don't get thought through, particularly with newer developers being pushed or pushing themselves to get things launched. For example, if you are doing all of your testing locally, you might not find out until you get a user or customer in another time zone that, hey, we should have thought about that.
Bottom line is, and I think this is really what I was thinking this morning -- I would never sink so much time into drawing conclusions with the early data of a startup like this. Life is too short. If I am going to spend that much time on something it's going to need to be a less risky investment.
One thing I've been shocked by is how wrong the clocks often are -- to the point that our software tracks the offset from real time (our server) and adjusts all collected timestamps. It's often a minute or two off (which means the customer is not using NTP sync), but many times it's been several days or even years off. One of the things that led us to adding the time adjustment was a bug report that was initially something like "The UI says 'Data last updated in 5 years' but it was really a few minutes ago" -- that was a result of accepting data as-is from a server with a clock set 5 years in the future.
Another fun bug that sticks out in my head was caused by a system that was sending time strings using custom formatting, where the original developer either accidentally specified hours as 12-hour (instead of 24-hour) format or forgot the "AM/PM" (I am not sure which). On the receiving end, a fairly forgiving parsing method was being used and because there was no "AM/PM" it was being read as if it was 24-hour format, so what was really "7PM" was being parsed as "7AM". Worse, this wasn't even that obvious as a problem, because data naturally followed business hours (eg, <12 hours window of active time per day, usually without overlapping) and was being collected from many time zones. It was only visible if you really dug into the data, knew what was expected from the source data, and checked using data in collected in the afternoon of the client timezone.
Whenever I'm venturing on an idea, I have a pretty clear goal and a pretty clear intent of use from a users perspective. However despite that, I'm always asked by
people (who, by the way, I often figure would be the target audience for this new thing I'm building), "why?" or "what's the point?" or just blatantly "no one's going to use this over [facebook/twitter/etc/etc/etc]"
I remember when first diving into twitter 11 years ago. I remember diving into Facebook when it cracked open to the general public. I remember telling people about these platforms, and the response always is "why?" or "I don't understand what this is for or why I would use it"
I keep that in mind everytime I'm asked the same question about my own projects. Not everything is going to be a massive hit, of course. But you never know when one of these wild ideas becomes successful, and it will usually be something that the people didn't know they wanted until you show it to them.
pics of photoshopped faces and body is quality content of instagram
Also getting a photo from your phone camera lens to the internet was often a multi-step process at that time. Instagram was two clicks and on your feed.
And the sharing model (all photos pushed to every follower) made people try a little harder on the images. That made it popular among amateur photographers. The profile model (grid of photos) also encouraged some level of vanity.
People discovered it by being invited by friends who like cataloging pretty things.
Dogpatch Labs was not later relocated to Dublin. Both Dublin and San Francisco were open during the same time (as well as NYC). San Francisco and NYC were closed.
Maybe I misunderstood the phrase, but how can you get 28 posts from 26 letters? the article doesn't mention lowercase letters
Compare that to HN/Reddit/Blogs before and you can basically pinpoint when and how social media changed the way we connect.