Pinterest | Hybrid @ {San Francisco, New York, or Seattle} | Full-time + internships
Pinterest’s Advanced Technologies Group (ATG) is an ML applied research organization within the company, focusing on large-scale foundation models (e.g. multimodal encoders, graph representation models, content embeddings, generative models, computer vision signals, etc.) that are deployed throughout the company. ATG is composed primarily of ML engineers and researchers, backed by a strong infrastructure team, and a small product prototyping + design team for deploying new AI/ML features in Pinterest. The organization is highly collaborative, research-driven, and delivers deep impact. The team is hiring for several engineering position
- iOS engineer for generative AI products: we are looking for senior or staff iOS engineers who have a track record of building fast prototyping work in the AI space — no deep machine learning domain expertise is required, but the ideal candidate would be comfortable interfacing with our ATG’s ML teams daily. An engineer in this role would be building entirely new features for Pinterest leveraging emerging technologies across LLMs, visual models, recommendation systems, and more.
- Computer vision domain specialist: we are looking for researchers or applied engineers with industry experience in the computer vision / visual-language modeling field (e.g. multimodal representation learning, visual diffusion models, visual encoders/decoders, etc.) We encourage the team to regularly publish, and the team works in a highly collaborative, research-driven environment, with full access to the Pinterest image-board-style graph for large-scale pre-training.
Please reach out to me directly (dkislyuk@pinterest.com) if you’re interested in either of these roles.
Additionally, the team is currently hiring for fall 2025 ML research internships for Master’s / PhD students, with opportunities to publish or to work on frontier models in the visual understanding and multimodal representation learning space: https://grnh.se/dad7c60e1us
This is a great characterization of self-information. I would add that the `log` term doesn't just conveniently appear to satisfy the additivity axiom, but instead is the exact historical reason why it was invented in the first place. As in, the log function was specifically defined to find a family of functions that satisfied f(xy) = f(x) + f(y).
So, self-information is uniquely defined by (1) assuming that information is a function transform of probability, (2) that no information is transmitted for an event that certainly happens (i.e. f(1) = 0), and (3) independent information is additive. h(x) = -log p(x) is the only set of functions that satisfies all of these properties.
I think commodification is directly tied to a perceived drop in quality. For example, if the barriers to making a video game keep going down, there will be far more attempts, and per Sturgeon's law, the majority will be of low quality. And we have a recency bias where we over-index on the last few releases that we've seen, and we only remember the good stuff from a generation or two ago. But for every multitude of low-effort, AI-generated video games out there, we still get gems like Factorio and Valheim.
Surgeon's paw was true in the 90's and is true in the '20s. There's not much point in comparing the crap to the crap. The only big difference is that it is easier to see the bottom of the barrel in your most popular storefronts with a click l(even on "curated" ones these days woth PSN and the eShop) instead of going out of your way to find some shareware from a Geocity that barely functioned.
Thing is those high profile disasters are still supposedly the "cream of the crop". That's why they get compared to the cream of before.
Popular examples are easier to exemplify as well instead of taking the time to explain what Blinx the Cat or Midnight Club are (examples of good but not genre-defining entries)
I found that looking at the original motivation of logarithms has been more elucidating than the way the topic is presented in grade-school. Thinking through the functional form that can solve the multiplication problem that Napier was facing (how to simplify multiplying large astronomical observations), f(ab) = f(a) + f(b), and why that leads to a unique family of functions, resonates a lot better with me for why logarithms show up everywhere. This is in contrast to teaching them as the inverse of the exponential function, which was not how the concept was discussed until Euler. In fact, I think learning about mathematics in this way is more fun — what original problem was the author trying to solve, and what tools were available to them at the time?
Toeplitz wrote "Calculus: The Genetic Approach" and his approach of explaining math via its historical development is apparently more widely used: https://en.wikipedia.org/wiki/Genetic_method . Felix Klein remarked: "on a small scale, a learner naturally and always has to repeat the same developments that the sciences went through on a large scale"
We could really take a page from this style for teaching advanced computing. We try to imagine that architectures just kind of come out of nowhere. Starting with mechanical computing and unit record equipment makes so much make more sense.
There is Mathematics for the Million by Lancelot Hogben, which not only covers math, but the history of math and why it was developed over the centuries. It starts with numbers, then geometry, arithmetic, trig, algebra, logarithms and calculus, in that order. It's a very cool book.
I was going to say the same! I got it years ago, it's hard to top a math book with a quote from a certain Al Einstein on the back cover singing its praises! Morris Kline's "Mathematics for the Nonmathematician" takes a similar approach, as I believe other books by the author do. Can also recommend "Code" by Charles Petzold and "The Information" by James Gleick, while not comprehensive they do cover the development of key mathematical insights over time.
I'm sympathetic but there's no clear historic chronology. For instance the ancient egyptians dealt with both algebra and calculus (at least in part) long before Pythagoras. And thats not starting on China and India which had very different chronologies.
Choose a chronology that makes sense. We can see how Western ideas build, we have less clarity on how the ancient Egyptians or Chinese ideas developed, and therefore it's harder to explain to a learner.
If you're sensitive to that singular world view warping the learner's prospect, you could at each point explain similar ideas from other cultures that pre-date that chronology.
For example, once you've introduced calculus and helped a student understand it, you can then jump back and point out that ancient Egyptians seemed to have a take on it, explain it, ask the student to reason did they get there in the same way as the Western school of ideas did, is there an interesting insight to that way of thinking about the World?
Another ideas is how ideas evolved. We know Newton and Leibniz couldn't have had access to direct Egyptian sources (hieroglyphs were a lost language in their life times), but Greek ideas would have been rolling around in their heads.
Here's one that starts with the concept of a straight line and builds all the way to string theory. It's a monumental book, and it still challenges me.
Roger Penrose's The Road To Reality.
A book without expecting any knowledge of mathematical notation would be a good start.
I've bought 3 math books to get into it and quit all of them within the first chapter.
Could you give a concrete example concerning what sort of notation caused you difficulty in the past? Asking because it seems odd to me that you feel you need to learn „all“ the notation to get started.
Starting in elementary school you slowly build up topics, mathematical intuition and notation more or less in unity. E.g. starting with whole numbers, plus and minus signs before multiplication, then fractions and decimal notation. By the end of high school you may have reached integrals and matrices to work with concepts from calculus and linear algebra…
It makes little sense to confront people with notation before the corresponding concepts are being taught. So it feels like you may have a different perspective on notation as a layperson that are no longer obvious to more advanced learners.
I want to learn the notation. Just not everything at once. I need to be able to see real world usecases, otherwise I wont be able to remember and apply the notation.
What I meant is learning the notation step by step, topic related.
Although the math in the book is relatively basic I enjoyed it tremendously because it gives the historical development for everything and even describes the characters of different mathematicians, etc. The historical context helps so much with understanding.
If you like this approach, I highly recommend Mathematics: It's Content, Methods, and Meaning by Kolmogorov. He uses this same approach, but applies it to many more concepts in math (about 1,000 pages!). In fact, I think I actually heard about that book on this site, so I guess I'm paying it forward.
This approach was to align with the Soviet philosophy of dialectical materialism, which claims that all things arise from a material need. Not sure I'm fully onboard with the philosophy as a whole, but Kolmogorov's book was really eye opening.
I think this should be front and center. To that end I propose "magnitude notation"[0] (and I don't think we should use the word logarithm, which sounds like advanced math and turns people away from the basic concept, which does make math easier and more fun).
> I think this should be front and center. To that end I propose "magnitude notation"[0] (and I don't think we should use the word logarithm, which sounds like advanced math and turns people away from the basic concept, which does make math easier and more fun).
The only reason that "logarithm" sounds like advanced math is because it was so useful that mathematicians, well, used it. Since this terminology is just logarithms without saying the word, if it is more useful it, too, will probably be used by mathematicians, and then it will similarly come to sound like advanced math. So what's the point of running away from a name for what we're doing that fits with what it's actually called, if eventually we'll just have to make up a new, even less threatening name for it?
(I'd argue that "logarithm" is frightening less because it sounds like advanced math than because it's an unfamiliar and old-fashioned-sounding word. I'm not completely sure that "magnitude" avoids both these issues, but it's at least arguable that it suffers less from them.)
It's written like ^6 and said like "mag 6", which sounds like an earthquake (and this is basically the Richter scale writ large). One syllable, sounds cool, easy to type/spell, evokes largeness. "Logarithm" is 3-4 syllables, hard to pronounce, hard to spell, sounds jargon-y.
People virtually never say “logarithm” in use though. They either say “log” or they say “lun” for natural log. Notice that both log and lun are one syllable, easy to pronounce etc.
Magnitude is an existing and important concept in maths - it would be extremely confusing to just overload it to mean something else.
The log of 3.1m is 6.5. How do you say "10^6.5"? I say "mag 6.5" and it is clear. The Richter scale famously uses "mag 6.5" exactly like this. If that was ever confusing, then we've managed to work past it, and this just expands the Richter scale to cover basically everything.
There's nothing particularly special about the Richter scale in that respect. All logarithmic scales (eg dB) work that way. Both the Richter scale and decibels (and other logarithmic scales) are also famous like other nonlinear scales[1] for being widely misunderstood so I'm not sure a lot of people would think your way is clearer than the current usage, which is just to say "3.1m" if that's what you mean. That said, I like log scales and logarithms in general so if you want to campaign for this scale, knock yourself out. I don't like that you're calling it magnitude though, because magnitude means a specific thing (the first coordinate of a vector in polar or spherical form).
I have been writing the same thing by (ab)using the existing unit of measurement known as a bel (B), which is most commonly seen with the SI prefix “deci” (d) as dB or decibel. I write the speed of light as 8.5 Bm/s (“8.5 bel meters per second”), which resembles the expression 20 dBV (“20 decibel volts”) shown at https://en.wikipedia.org/wiki/Decibel.
Mag is the inverse of log10. e.g. log10 ^6 = 6. We have no current shorthand for inverse log10 except "tentothe" which might be serviceable but is not as punchy.
I often wonder about this. I also believe that mathematical pedagogy strive to attract people that are very smart and think in the abstract like euler, and not operationally, meaning they will get it intuitively.
For other people, you need to swim in the original problem for a while to see the light.
I think it is a combination of factors. Mathematical pedagogy is legitimate if the end goal is to train mathematicians, so yes it is geared towards those who think in the abstract. (I'm going to ignore the comment about very smart, since I don't think mathematical ability should be used as a proxy for intelligence.)
On the other side, I don't think those who are involved in curriculum development are very skilled in the applications of mathematics. I am often reminded of an old FoxTrot comic where Jason calculated the area of a farmer's field using calculus.
Frankly I wish I had known integral calculus going into geometry, I could tell there was a pattern behind formulas for areas and volumes but I couldn't for the life of me figure it out. There are worse ways to remember the formula for the volume of a sphere than banging out a quick integral!
I had known it. Thanks Dr Steven Giavat. The geometric shapes gave the patterns meaning. I read 'mathematics and the imagination' and mathematics a human endever' while I was starting algebra. Also the time-life book on math. All very brilliant because they used the methods that were used to investigate it, to show how it was discovered. These allowed me to fly ahead in math until I got to trig. Which took a long year to get facile, until I was able to finish my degree.
I had brilliant teachers.
Napier's bones, were for adding exponents, hense multiplication. Brilliant and nessary for the development of the slide rule, and the foundation of modern engineering, until the pocket calculator.
I was recently struggling to model a financial process and solved it with Units. Once I started talking about colors of money as units, it became much easier to reason about which operations were valid.
I really disagree with the straightforward reduction of engineering to 'math but practical', but I'm finding it hard to express exactly why I feel this way.
The history of mathmatical advancement is full of very grounded and practical motivations, and I don't believe that math can be separated from these motivations. That is because math itself is "just" a language for precise description, and it is made and used exactly to fit our descriptive needs.
Yes, there is the study of math for its own sake, seemingly detached from some practical concern. But even then, the relationships that comprise this study are still those that came about because we needed to describe something practical.
So I suppose my feeling is that, teaching math without a use case is like teaching english by only teaching sentence construction rules. It's not that there's nothing to glean from that, but it is very divorced from its real use.
As someone who is studying maths at the moment I don’t recognise this picture at all. Every resource I learn from stresses the practical motivation for things. My book of odes is full of problems involving liquids mixing, pollution dispersing through lakes, etc, my analysis book has a whole big thing about heat diffusion to justify Fourier analysis, the course I’m following online uses differential equations in population dynamics to justify eigenvalues etc.
Agreed, and it's such a shame! A kid goes to math class and learns, say, derivatives as this weird set of transformations that have to be memorized, and it's only later in in physics class that they start to see why the transformations are useful.
I mean, imagine a programming course where students spend the whole first year studying OpenGL, and then in the second year they learn that those APIs they've been memorizing can be used to draw pictures :D
I actually prefer the straightforward log is an inverse of exponents. It's more intuitive that way because I automatically can understand 10^2 * 10^3 = 10^5. Hence if you are using log tables, addition makes sense. I didn't need an essay to explain that.
Take logs, add 2 + 3 = 5 and then raise it back to get 10^5.
This is how I've always taught logarithms to students I've tutored. I photocopy a table of various powers of ten, we use it in all sorts of ways to solve problems, and then I sneakily present an "inverse power" problem where they need to make the lookup backwards.
Almost every student gets it right away, and then I tell them looking up things backwards in the power table is called taking a logarithm.
That's how I mentally processed them when first learning them years ago. Doing operations on x and y with log(x) = y in the background somehow felt far less intuitive than thinking about 10^y = x.
I really enjoyed this author's work, BTW. Just spent several hours reading the entire first five chapters or so. What an excellent refresher for high school math in general.
This would be an interesting thing to study: How many different ways people learned about logarithms, and how they generally fared in math. I learned about logarithms by seeing my dad use his slide rule, and studying stock charts, which tended to be semi-logarithmic.
It gives the history / motivation behind logarithms and suddenly it became so much clearer to me. Pretty much multipling huge numbers by adding exponents , well I think I've understood that correctly?
I think why I'm so interested in programming and computing is because I fascinated by the history of it all. It somehow acts as a motivation to understand it.
Normally, a sliderule at distance x has the value of log(x) written on it, which allows doing multiplications by moving along the sliderule, since log(ab) = log(a) + log(b).
Now imagine a sliderule onto which values of x^2/2 are written. This also allows you to multiply two numbers, because ab = (a+b)^2/2 - (a^2/2 + b^2/2).
Yes, but such a property was not available to Napier, and from a teaching perspective, it requires understanding exponentials and their characterizations first. Starting from the original problem of how to simplify large multiplications seems like a more grounded way to introduce the concept.
From a teaching perspective it goes like this: first we learn additions, and to undo additions we have subtractions; then we learn repeated additions i.e. multiplications, and to undo multiplications we have divisions; finally we learn repeated multiplications, i.e. exponentiation, and to undo exponentiation we have logarithms and roots.
Right, I'm not saying it's for no reason, but the asymmetry makes it harder to keep track of which undoes exponentiation in which way.
And logs are frankly more confusing than the other operations because more than anything else they feel like an algebraic expression in the form of an operation. Other operations intuitively feel like a process, whereas logs feel like more like a question.
Maybe that's just because I never learned them super well though, maybe they're not actually that inherently different ¯\_(ツ)_/¯
Presumably the book from this thread by Charles Petzold will be a great canonical resource, but originally there was a quote by Howard Eves that I came across that got me curious:
> One of the anomalies in the history of mathematics is the fact that logarithms were discovered before exponents were in use.
One can treat the discovery of logarithms as the search for a computation tool to turn multiplication (which was difficult in the 17th century) into addition. There were previous approaches for simplifying multiplication dating back to antiquity (quarter square multiplication, prosthaphaeresis), and A Brief History of Logarithms by R. C. Pierce covers this, where it’s framed as establishing correspondences between geometric and and arithmetic sequences. Playing around with functions that could possibly fit the functional equation f(ab) = f(a) + f(b) is a good, if manual, way to convince oneself that such functions do exist and that this is the defining characteristic of the logarithm (and not just a convenient property). For example, log probability is central to information theory and thus many ML topics, and the fundamental reason is because Claude Shannon wanted a transformation on top of probability (self-information) that would turn the probability of multiple events into an addition — the aforementioned "f" is the transformation that fits this additive property (and a few others), hence log() everywhere.
Interestingly, the logarithm “algorithm” was considered quite groundbreaking at the time; Johannes Kepler, a primary beneficiary of the breakthrough, dedicated one of his books to Napier. R. C. Pierce wrote:
> Indeed, it has been postulated that logarithms literally lengthened the life spans of astronomers, who had formerly been sorely bent and often broken early by the masses of calculations their art required.
I had a slide rule in high school. It was more of a novelty item by that point in time, only one of my math teachers even knew what a slide rule was, but that didn't stop me from figuring out how it was used and how it works. It didn't take much to figure out that the sliding action was solving problems by addition, and the funky scales were logarithmic. In other words: it performed multiplication by adding logs.
That said, I did encounter references to its original applications in other places. I studied astronomy and had an interest in the history of computation.
Pinterest | San Francisco, New York, or hybrid/remote (US-only) | ML Engineer / Applied Research Scientist | Full-time
Pinterest’s Advanced Technologies Group (ATG) is hiring for an engineering position on our visual modeling team for developing Pinterest Canvas. Canvas is a foundation text-to-image model developed internally for helping various visualization, inpainting, and outpainting products. In this role, you’ll get to work with Pinterest’s rich visual-text dataset to build large-scale generative models which are continuously being shipped to production. The core Canvas pod is a small group (~6 engineers) inside of ATG, which focuses on a broad variety of AI/ML initiatives, such as core computer vision, multimodal representation learning, heterogeneous graph neural networks, recommender systems, etc.
New-grads are welcome to apply (preferably with a masters or PhD). Candidates should have diffusion modeling experience (e.g. diffusion transformers, LoRA fine-tuning, complex {text, image} conditioning, style transfer, etc.) and some form of industry experience. Engineers within ATG have a lot of leeway in terms of product contribution, so both ML engineers and research scientists are welcome to apply. We encourage the team to regularly publish, and the role can be either in person (SF, NY) or hybrid is preferred.
Please reach out to me directly (dkislyuk@pinterest.com) if you’re interested.
Pinterest Advanced Technologies Group | Staff Engineer, iOS and applied ML | US remote or hybrid in SF/NY | Full-time
We’re looking for strong engineers to help us build consumer AI products within Pinterest’s Advanced Technologies Group (ATG), our in-house ML research division. You’d be working with a full-stack team of ML researchers and product engineers on projects that bring LLMs, diffusion models, and other core models in the generative multimodal ML and computer vision space to life inside the Pinterest product. Projects include assistants, new ways to search, restyling of boards / pins / rooms, and many other new applications. Your work will directly impact how millions of users experience Pinterest.
Tracks:
*iOS engineer*: You’ll craft beautiful and intuitive user experiences for our new AI products. Strong command of iOS and UI/UX craftsmanship required. Bonus points if you’re an opinionated product thinker with 0-1 mentality or have experience working with ML models. Please apply here: https://www.pinterestcareers.com/jobs/5426324/staff-ios-soft...
*Applied ML*: If you think you’d be a better fit as an applied ML or research engineer with an interest in directly translating research into user-facing products, feel free to contact me directly (@dkislyuk everywhere).
The ML and product engineering teams on ATG work directly together, along with design. The team consists of long-tenured employees who care deeply about both the quality of the Pinterest experience, and taking full advantage of the new capabilities emerging in the ML space over the last two years. ATG more broadly has spent the past decade+ bringing various ML technologies into the Pinterest ecosystem, and values publishing our work, building long-term infrastructure, and a collaborative and remote-friendly culture (though we do expect everyone to join company onsites a few times a year).
Yes, exactly. ViTs need O(100M)-O(1B) images to overcome the lack of spatial priors. In that regime and beyond, they begin to generalize better than ConvNets.
Unfortunately, ImageNet is not a useful benchmark for a while now since pre-training is so important for production visual foundation models.
Rocket Men by Robert Kurson tells the Apollo 8 story in a captivating manner. Some of the passages are quite dramatic but it's justified given the litany of firsts accomplished by the mission.
I read it recently. I came away amused by Borman being such a no-nonsense person. He would warn the other guys on his crew if they joked too much or goofed off. He was a straight shooter, and didn't mind telling NASA when something wasn't being done right.
When Apollo 11 landed on the moon he considered the job done. He thought they had beaten the Russians to the moon and why would anybody want to go back?
He was devoted to his wife, who suffered from addictions due to the frazzled life of her husband's career.
Seems to go hand-in-hand too with his later recounting that he had no further interest to go back to the moon to walk on it; he was there to "beat the Russians" and loved his family too much to risk his life to "go pick up some rocks." From all accounts, he was clearly an extremely brilliant man with a sense of purpose, skill, and courage aplenty to go along with it. Glad the US put his talents to good use!
In the current world, deep learning with homogeneous computation graphs, tuned with backprop, has won the Hardware Lottery [1]. This is unfortunate for research outside of that area, but just looking at the momentum of development it seems like a sure bet to keep investing in GPU-based training and inference for the next decade. There's just too much lock-in already to this paradigm.
If a new algorithm appears from with a novel approach (analog compute, heterogeneous computation graphs from genetic algorithms, quantum, much more...), there will be a whole generation of R&D + tool + framework building, which gives the major players enough time to adapt.
Moravec's paradox is the usual counterargument given to this line of reasoning. We've had far less progress in embodied robotics, where a robot has to interact with the real world in any kind of generalized, tactile way, compared to visual, audio, and language processing tasks. The history of AI is littered with predictions that <a reasoning or computation AI breakthrough> will lead to a humanoid robot, and the predictions always end up in the regime of ~real world data collection and integration is harder than we thought.
Maybe this time it's different, and maybe it's not, but that's why most recent robotics predictions fail to convince the ML industry broadly.
I don't think that's true. I recall when Amazon acquired Kiva robotics for their warehouse operations one of the people involved said something like "If you want something that can pick assorted objects out of a box and put them in another box, I'll need a NASA research team and 5 years, but moving the boxes around, we can do that with our Kiva robots." Here we are 10 years later and Amazon does in fact have the picker arms, though I'm not sure how production-ready they are, Amazon has demoed them.
Honestly I think there's been very dramatic improvements in robotics alongside ChatGPT but ChatGPT is easy to demo with nothing but an internet connection so it's just a lot less visible.
That first quote is right on-the-money, in my experience with competitive FRC and automation. It's extremely easy to work with a limited set of parameters like a cardboard box; you can easily estimate object volume and bounding-box collision in software. Making a robot that manipulates millions of Amazon products is a suicide mission by comparison; especially if you expect it to behave consistently.
Computer vision, AI and inverse kinematics have all come a long ways in the past few years. That being said, it's still easier by an order of magnitude to design the box-pushing robot.
Pinterest’s Advanced Technologies Group (ATG) is an ML applied research organization within the company, focusing on large-scale foundation models (e.g. multimodal encoders, graph representation models, content embeddings, generative models, computer vision signals, etc.) that are deployed throughout the company. ATG is composed primarily of ML engineers and researchers, backed by a strong infrastructure team, and a small product prototyping + design team for deploying new AI/ML features in Pinterest. The organization is highly collaborative, research-driven, and delivers deep impact. The team is hiring for several engineering position
- iOS engineer for generative AI products: we are looking for senior or staff iOS engineers who have a track record of building fast prototyping work in the AI space — no deep machine learning domain expertise is required, but the ideal candidate would be comfortable interfacing with our ATG’s ML teams daily. An engineer in this role would be building entirely new features for Pinterest leveraging emerging technologies across LLMs, visual models, recommendation systems, and more.
- Computer vision domain specialist: we are looking for researchers or applied engineers with industry experience in the computer vision / visual-language modeling field (e.g. multimodal representation learning, visual diffusion models, visual encoders/decoders, etc.) We encourage the team to regularly publish, and the team works in a highly collaborative, research-driven environment, with full access to the Pinterest image-board-style graph for large-scale pre-training.
Please reach out to me directly (dkislyuk@pinterest.com) if you’re interested in either of these roles.
Additionally, the team is currently hiring for fall 2025 ML research internships for Master’s / PhD students, with opportunities to publish or to work on frontier models in the visual understanding and multimodal representation learning space: https://grnh.se/dad7c60e1us