Calibrating cameras is still important; it's only "mostly irrelevant in 2024" if you don't care about accuracy. Incidentally, tools like opencv and their ilk are also what you use if you don't care about accuracy. Modern tools like mrcal (https://mrcal.secretsauce.net/tour.html) are essential if you're trying to do long-range stereo or use wide lenses or have calibration instability or any number of other ever-present issues.
Opencv is fine for basic models like the most popular one. This one is also sufficient in most cases, I have used it for subpixel 400m stereo with a 40cm base, visual odometry, high end 3d reconstruction etc, it also has fully functional variants for wide and fisheye. You just have to select the right model, calibrate without observability issues, verify the calibration, understand the camera, etc. The better tools help a little bit, but they do not eliminate the need to understand what you are doing in the slightest. The better tools are easier to use, but generally not more accurate. The spline field variants are more accurate if you know what you are doing and the right conditions apply, but much less robust if you dont.
Opencv calibration is hard to get right if you dont know what you are doing, and at first glance it will look right despite being irrecoverably bad. I had to edit the docs and tutorials because the examples they had were of FAILED calibrations. Visualizing the resulting distortion field is critical to understand if the calibration succeeded, and that s what the better tools provide. That said, if you dont know you are looking at, they likely dont help, and the golden rule is if someone calibrated something and said it was easy, they didnt. If someone has a pdh in camera calibration said they calibratied them but that they havent used them for sfm but they plan to soon, odds are 20% its right.
Mrcal was written and verified for more difficult stereo tracking than that, approximately by a 10-15x factor (in, say, fisher information sense). It's a shame it doesn't have a better data set for verification but I expect there are reasons for that as well. Not to say that other tools are completely inappropriate for this domain, obviously, or denigrate anything or anyone.
For you to say that you would have needed the image res 2MPixel, but ignoring that, it should have been a paper I knew about if true. So I checked. mrcal is providing the same lens models as opencv, and it looks like its using opencv to do the estimate. For higher perf calibration checkout https://arxiv.org/abs/1912.02908
I thought the thread here was re: Interrogating the model and improving its calibration, as well as using different models. I know from working down the hall from Dima for a few years that mrcal has done both, and in domains beyond and more challenging than those described in this thread. So I mentioned it, that's all. Nothing to say beyond that, but I wish I had more to contribute.
That paper describes a rich splined model, very similar to the one used in mrcal. mrcal models projection (instead of unprojection like the paper does), which is better in a practical sense. Both work well to fit every lens. https://mrcal.secretsauce.net/splined-models.html
if you don't mind, I have two questions about your comment.
Why is sfm relevant to assess the quality of your calibration?
What does the distortion field tell you besides using the wrong distortion model or having an unbalanced data set? (e.g. not having enough samples at the borders of the image)
If you have say a 1Mpixel image which has been rectified, then a human will generally say that it looks correct if the calibration errors are less than 10 pixels. Perhaps as little as 5 if they know what they should look for and are careful. Most camera calibrations are this bad and no one notices. If showing a person the image rectified is the goal thats good enough. But, if you want to do post processing of any kind...
Sfm tests the calibration to the extreme and even single pixel calibration error will be highlighted as correlated reprojection error vector in most images. And that is assuming the bad calibration does not cause the system to fail outright. There is also the inbetween case where the system is only able to use small parts of the image.
The distortion field often visibly tells you if the estimation failed. It should be smooth, and monotonic, and you can draw it not just for where there are pixels but further and in higher resolution, and for regular lenses almost always highly symmetric. Looking at the distortion field you can see alot of problems that you could not otherwise. The most common problem is the monotonicity, since that constraint is very difficult to add to general optimizers. Since this means the distortion goes backwards, they are visible as sharp edges in the distortion field.
For those curious, keep in mind that the use case for open cv has been almost exclusively indoor, close range, or otherwise loose constraints. Graph models for localization famously make use of quite primitive features but tons of observations over time to produce excellent estimates and loop closures.
There are applications that require pulling almost magical levels of signal from very small numbers of pixels, and for those extremely accurate results you really need to drill down on the camera model in particular. Even modelling the environment or atmosphere to get accurate results.
For those types of applications opencv is absolutely not the right choice. It just so happens that dima has an accessible software kit that is more for the latter case. But all the better because it's perfectly fine to use this tool to calibrate any camera and then go use it with any other CV pipeline. His pain your gain.
Nonsense. Opencv has been used widely for outdoor localisation too, in particular since it is much easier to do outdoor than indoor.
Such algorithms always create a graph over the images, but if you mean the graphslam graph filter methods, those are substantially subpar compared to classic feature based methods as well as the more modern, dense and semidense methods.
OK I was imprecise in my comments. let me try again.
For localization work (indoor or outdoor), in which you mostly want to close loops and track 3D pose of features using measurements from, say 100s of m or less, (or perhaps scene recognition using effectively flat "very far" images), calibration using opencv is probably highly performant. It's standard, and I've certainly seen much success using it before feeding into regular gtsam etc. There are some unspoken assumptions about localization that don't translate to, say, tracking necessarily. Generally they are: Many observations, close-ish range (relative to stereo offset), mostly-correctly-pointed camera rigs (e.g., forward on a car), or perhaps assumptions about density of features, correlation across images, existence of dense prior maps, etc.
I believe the use case for something like mrcal is to improve calibration for cameras and applications that don't fit this model well. In particular, you may need to track a target at extreme range, using a pixel or two in the corner of your image, with a particularly wide FOV. These specific use cases, mentioned in top level comment, do require additional care, especially in calibration. Thus mrcal.
It just so happens that out of a domain where the very particular details of calibration matter, you may find a tool that helps with all calibration and I think that's the point in bringing up mrcal on a thread discussing calibration in general.
I think that's as precise and non-controversial as I can be.
Uncertainty propagation. Richer models that fit better. Lots of feedback and metrics and visualization to evaluate the quality of the solve. Flexibility of the tool. Documentation.
Richer models that fit better are often a trap, in particular for beginners. Use the simplest model you can get away with. Unless you know what you are doing, most people are better of with a simpler model as it will be more robust to observability issues. If you dont know what those are, use a simpler model.
Uncertainty propagation is very difficult to use for vision, and largely just modelling errors in vision as the error distributions, e.g. for anything observed or reconstructed from images are either subpixel accurate, or too non linear.
Avoiding rich models is a great thing to do if you don't model uncertainty: a beginner that didn't get enough useful calibration data will see poor uncertainties in the results. So I now use the splined models in pretty much all applications, and there are very few downsides. In my experience, every lens fits noticeably better with the richer model (the mrcal validation shows you this explicitly). I think you should look at the tour of mrcal; it's friendly.
A problem I always run into with OpenCV is that I need to preprocess the checkerboard images such that the lighting is just right. This is odd because that's the sort of thing I'd expect a computer vision library to excel at.
Another problem I run into is that the transforms are ill-defined just outside of the screen. That means that if I want to draw e.g. a line in world coordinates onto the image from a camera, then I often get garbage if the line starts or ends outside the image (even if I divide the line into many segments).
I strongly agree with you on the first point. OpenCV provides calibration primitives which look like they'd solve the problem easily and this gives you false confidence. In my experience, they're very low-level and are not more than a wrapper around the optimization routine. You need to implement most things from scratch, such as correct exposure, checking for blurred frames, verifying the extracted checkerboard,... It's weird that everyone needs to re-invent this process.
Can you be more specific on the second point? Once you have the intrinsics, it's trivial to project them to an (undistorted) image.
Regarding the second point, I'm projecting onto the distorted image (basically the image that I get from the camera sensor) because I don't want to undistort the image for performance reasons. My problem is that the transform is basically undefined just outside the viewport. Maybe I should simply do another check to see if the transformed points make sense. But it breaks my assumption that the inverse-of-the-inverse of a transform is the identity.
The corner detection is bad. Not sure why. I also think its gotten worse since 3.5.
I know the corner detection refinement is worse than the raw detections, so turn that shit off.
Yeah thats a common problem if the calibration failed. It could also be that you are not cropping to what is in front of the camera, but if its really weird, its most likely the former.
So the default, and probably most of the camera models in opencv requires a monotonic change lenses and bijective imaging. The former is common unless the lense has defects on the surface, and the latter is practically a physical constraint. The problem is that these constraints are difficult to add to the estimator, so they didnt. Meaning it will find a solution where they are not satisfied. If say the bijectiveness is not satisfied a bit outside the image, but stil valid accounting for infront and float accuracy, then that would absolutely account for the problem you describe. Is pretty obvious if you consider the function what the problem is, just hard to add the constraint in opencvs estimator.
The solution is 1, verify the result is satisfied after estimation, 2 make sure you make the parameters are as observable as possible during calibration. This means spread out in the image, evenly distributed, and all the way to the edges. Also make verify it has not rotated one or more of the detected chessboards upside down, or 90 deg sideways. Finally, because it becomes harder and harder to avoid this problem with more parameters, always start with 1, then try 2, then the two variations of 3, and so on. More parameters always fit better, so use an appropriate test.
I've been working in AI/ML and some CV projects over the last decade.
I've seen far too many cases where the "algorithms" and modeling teams had no concern or even a concept for how the input systems for data that would be used to train models, and later inference, mattered to the quality of outcomes.
In CV and computational photography cases, there was little concern or understanding for how photography and imaging actually works, nor consideration for how variances in hardware, configuration, or the capture pipeline in general might affect those models when doing training data capture in parallel. Adding another layer, consideration for how variances between the capture pipeline and actual inference pipelines might need to be accounted for when designing the overall system and how to approach data collection and curation for training + evaluation, as well as to build-in robustness. (similar thoughts apply to concepts like bias and fairness in models)
Example: training vision models using one set of imaging hardware and configurations while applying those models on very different imaging hardware with different characteristics.
To summarize the above, not caring about calibration in CV is like not caring about how variances in feature extraction/embedding generation will affect the overall quality of your results.
I've noticed a really fun trend at the companies I've worked with. 10-15 years ago stereo vision was pretty reliable for feature tracking or dense navigation. Pose graph methods were stellar. A new grad student could make a pretty decent pipeline themselves. Then detectors became AI/CNN driven and stereo tracking became really good. Like radar good. Again, couple knowledgeable grad students, hard studiers, or PhD could do magic.
But then it all crashed. Somewhere along the line we lost something. My last job (not Shield), I watched the estimates for a truck jump in and out of existence at 50m range with 2m baseline. Its orientation flickered 30degrees or more. But because they had used an EKF somewhere, they claimed it was the best they could do and anything else was inventing signal. I haven't dug into state estimation in the last five years but it's not gotten better if this is what new grads believe.
Very few know the old methods. Few were trained, and deep learning started working just 5 years after classics got good and easy enough that a few people could use it for a product. So there was never the needed wave of non deep learning based computer vision students, they all went to deep learning.
Many of the libraries are not well maintained. I needed a basic homography estimation recently and asked a minion to try opencv for it before anything more advanced. He got it working, but the best inlier ratio for every feature he tried with default parameters was 3%. The images were offset by 30 pixels left right, and less than a pixel in warp and taken with different exposure time. So he used sift...
He argued he got it working and that was as good as keypoint matching got... If I hadnt happened to be the guy who needed it one step up, it could have just propagated, someone adding a shitty ekf to make it smooth, then buried in layers of heuristics and api.
Hey, I’m very interested in learning more about sfm, 3d reconstruction, slam etc. primarily for robotics applications (3D vision). Currently still a hobby project but I’m more than happy to throw money at it. If you’d be willing to share some wisdom or help otherwise, shoot me an email!
Part of this is a corruption in the field/industry that results from greed and chasing a gold rush.
Many "AI/ML" teams have been doing more alchemy than science.
One memorable team "revelation" was when they figured out the hard way that many objects and human subjects, in particular, register completely differently based on the imaging spectrum used (e.g. human visible light, versus near infrared wavelengths).
People familiar with the "old ways" probably would have considered this to be obvious, but it seems that the "new ways" are often correlated with an absence of domain knowledge related to what the "new ways" are being applied to. This isn't a new story, and history seems to repeat itself.
Looking through the comments
People are confusing the lens models, with the nl estimator used, with the framework for calibrating cameras. There is no such thing as the OPENCV lens model. There is such a thing as the brown lense model as implemented in opencv.
The lense model is the parametrized approximation of the lense function. Picking the right one matters for ideal accuracy.
The estimator takes observations which constrain the parameters of the lens function, and find the parameters.
The framework helps you create those observations.
Pre-calibration on earth is totally irrelevant if the conditions in space are totally different then. You need to calibrate in space, for different temperatures, pressure, heavy mechanical stress from the liftoff, a.s.o
I once messed up and input the RGB values in the wrong order. This lost me 2mIoU in evaluation. If anything it was a pain in the ass to figure out, because the error was so subtle that I thought maybe I was imagining it. So I don't believe that camera calibration is important anymore. What you need is to have a dataset taken by a variety of cameras.
Heh yeah that most deep learning stuff degrades massively if you switch from one camera to another model is telling. But its still important for most things that arent trivial dl.
he cameras are used to capture the world around us. But frames are just representations of the world, and the actual relationship between the flat image and the real thing is not always obvious. In order to extract meaningful insights about the real objects from the images, developers and users of computer vision systems need to understand camera characteristics — with the help of the calibration solutions
Is there a good resource I can read about the last point in the article?
> We don't know where the camera will be. For example, we ask players to use the cameras of their mobile devices to explore the size and proportions of a room — to work with the augmented reality helmet. We don't know anything about the rooms, and we can't ask users to use a pattern.
Are there automatic techniques to tackle this problem?
Its not particularily hard, and there is no difference between reconstructing from uncalibrated cameras and reconstructing from many images from one camera with unknown calibration. Then you can check if the calibration is unchanged over time, and if it is, add that as a constraint. So far it will still take half an hour to run, but now you can add in tracking exploiting motion prediction and imu, which in turn will give you the real time speed.
Extrinsic calibration (where is my camera in 3D space, and how is it oriented?) being different from intrinsic calibration (how do the pixels in my image map to ray directions relative to the front nodal point of the lens?).
Yeah, you can do live simultaneous calibrations and reconstructions and localizations. Its a difficult problem, and a slightly different one for each phone, but there are plenty of variants which work for single phone models. The imu is key to making it possible though.
What makes it harder is that older phones have insufficient processing power, whereas newer phones have too many cameras. The phones also do post processing in undocumented ways that are impossible to disable. There is also rolling shutter. This in turn can be simplified down again by just subsampling the images to 480p and estimating camera parameters per image.
This seems like a temporary problem as AI models improve. If we can figure out how to extract meaning from distorted images, AI will be able to as well.
OCR for distorted text is a solved problem. Relatively simple math based on normalizing text size and shape can be used to calculate how to reverse distortion.
If calibration is a temporary problem, it's because the people building those AI models understand imaging really deeply, and engage with some of the ideas in the linked post. To put it another way: if you want to build AI models that understand the world in 3D via 2D images, you should probably understand projective geometry, camera calibration, etc, extremely well.
Or does it ? Drawing on a parallel from my experience in NLP during the early 2000s, the field was largely dominated by linguists trying to grok language and structuring rules manually. However, the most significant advancements came when we shifted towards using massive datasets to train models, without requiring explicit, deep linguistic knowledge.
Similarly, maybe the next frontier in 3D vision is employing large datasets to train a black-box AI without us having to understand and get the math right.
You cant compare a real technical field with hundreds of years of real world use to some social science nonsense. Linguists have never made useful or testable predictions of any kind. Geometric computer vision has been refining camera calibration to the point where we use stereo to find things billions of light years away.
I think we have objective information about how protective geometry works.
Ergo, I would not bet against teams that understand that objective technical field; AI is not magic, and knowing the fundamentals will always help. Better models, smaller models, faster models; the less you leave for the model to solve in latent space the better you’ll do, I think.
If that was true, transformers would have been useless for NLP and deep learning for Go.
You can argue the exact opposite: the more you leave to the model to learn by itself, the more likely you are to find a solution that was not accessible to humans and their limited feature engineering capability.
Thank you for keeping us old school computer vision guys in buisness. If deep learning folks actually used good calibration, and didnt constantly argue eh an offset of a few pixels does not matter, they might actually be competetive with classic methods by now. Well competetive accuracy wise, but not performance wise at least. Most VO systems localize a camera before the images have been uploaded from ram to the gpu, and deep learning will have a hard time getting around that kind of performance limit.
I mean yeah, but thats because text is a fairly predictable, simple thing to extract. It has sharp edges, limited number of lines, and is neatly spaced relative to each glyph. Its also mostly high contrast.
Structure from motion however has none of those luxuries. for "AI" to overcome that, it would require a model that has an innate understanding of the scale, shape and orientation of most objects in the world. Not only that, it'd also need to have an innate understanding of lens distortion to work out if the object is bent because of the image, or the object, or some other effect.
I look forward to you releasing your model that does all this.
Camera calibration is almost always underestimated and a source of both failure and algorithm degradation. Shit in, shit out. Keep the model simple, but actually understand it. Make sure its observable. Visualize and verify.
Hot take: Offline camera calibration is largely unneeded anymore. As long as you have an approximate starting point, collect about 5 minutes of data from all sensors and you have everything you need to reverse solve for the calibration.
Not saying it's easy, but solving it in software is O(1) and solving it by calibrating each device is O(N), and allows you to be resilient to things like temperature deformations and other nasty things that can happen in the field.
There have been papers published about making robots and autopilot-like systems robust to all sorts of "system degradation" such as obstructed optics, missing signals, noisy signals, etc...
I can't find the reference any more, but one particularly impressive result was a four-wheeled robot that could tolerate the loss of a wheel, distortions of the entire chassis, and multiple faulty sensors all at the same time!
Depends on the use case. A car that requires five minutes of driving to calibrate its cameras once after exiting the factory would be completely acceptable. A drone that has a pre flight calibration when one first takes it out of the box would be okay.
Realistically, the dealership is likely to drive the car for a few minutes, if only to put it in the right place on the dealership lot, fill it with gas, and get it washed before delivering it to the customer.
If there was an issue with the camera system it could be caught at that time.
How do you jump from OP "after exiting the factory" to your "bought a car and 5 min into driving" ? Those are two rather different things. Are you just looking for a pointless argument?
If testing takes too long to be sustainable outsourcing it to the customer seemed like the suggestion? Admit perhaps a bad example but point still stands.
Take the other example then: "pre flight calibration when one first takes it out of the box".
Calibrate absolute scale to a margin where people can no longer tell most likely.
Its a huge difference between that and you can do stereo at half a kilometer.
I guess you havent actually tried to do this, or you would emphasis the crucial important of, you still need a good initial guess, you still need to know the sync, and you still need data from the right environment with the right motions, and in the end, it will be worse.
I agree, and I don't think this is a particularly hot take; if you've been working with camera systems, you've seen calibration without explicit calibration routines get better and better over the past few years.
For critical imaging, you still want to calibrate, but the gap between explicit calibration and "implicit calibration" (just using your subject matter) is definitely getting smaller, and quickly.
A linear scale lacks aspect ratio, they'd want at least a right angled scale.
That said, there's a long tradition of using circles and checkerboards that goes back to early BBC TV broadcasts transmitting a tuning pattern.
The coin used is a circle, likely has "interesting" patterns, and perhaps even has raised and gouged features that the stereo paired cameras can height resolve (and calibrate against for when examining rocks).
As a camera nerd, I have to weigh in with some minor annoyances: there are errors in this article that reflect common misconceptions or elisions that computer vision engineers often believe in. Let me debunk a few...
"Cameras, by design, also introduce a level of distortion": distortion is an artifact that is usually minimized in lens design if you are trying to create a rectilinear lens, though you are typically forced to trade it off versus other design criteria (like resolving power, or optical complexity). If you're trying to design a rectilinear lens, you explicitly try to remove any distortion; for "normal" focal lengths (focal length ~= image circle diagonal) you can often get vanishingly low distortion, even with relatively simple lenses. (There was a 19th century lens design that achieved "zero" distortion with just a few elements, and a precisely-chosen aperture position; the name escapes me at the moment.) If you are not designing a rectilinear lens, there are other lens mappings, in which case it's not really proper to describe the effect as distortion, though in technical literature it's often still described this way. (For further reading, check out F-theta vs. F-tan-theta lenses: https://en.wikipedia.org/wiki/Fisheye_lens#Focal_length;https://www.thorlabs.com/newgrouppage9.cfm?objectgroup_id=10....)
"... the proportions of objects closer to the edges of the frame are distorted." This is technically incorrect; objects subtend the same number of pixels in an f*theta fisheye image whether they're at the center of the frame or at the edge, for a given camera-subject distance. The apparent distortion exists because you're viewing it further away than the focal length implies, so the viewer at a "normal" viewing distance is applying a transform to the image that is not distance-preserving. If you get extremely close to the image, and if it were displayed on a curved surface (so the image's angular extent to the viewer matched the camera's FOV), you'd see no distortion. Also, we think of arranging subject matter for, e.g., a portrait, along a plane that's orthogonal to the camera's axis -- but then people closer to the edge are farther from the camera, and would naturally appear smaller. It's a weird artifact of rectilinear lenses that they're actually magnified in the resulting image so people standing in a line are all equal size; to get the same effect with a fisheye lens, you just have people arranged in an arc centered on the camera. (Most portraits are not shot on fisheye lenses though, for very good reasons, so this is more of an edge case.)
Related: "there are no straight lines" -- if you're using a camera to make an image, you're operating in projective space. Whether a line is straight in 3D space, and is then straight in the projective space, is a function of your projection. Rectilinear lenses -- for which the relation f*tan(theta) applies -- straight lines remain straight. But for most (all?) other mappings they do not; for example, f*theta lenses preserve angular extent: an object that subtends a particular angle from the camera's perspective will occupy the same number of pixels regardless of where they appear in the image. This is arguably superior for many computer vision tasks.
... I need to stop here, because this is already getting tedious. For anyone interested in developing a deep intuition for lenses and imaging, I would recommend playing around with a view camera, understanding the Scheimpflug relationship (https://en.wikipedia.org/wiki/Scheimpflug_principle), and really thinking through imaging from first principles if you want to understand these things more deeply.
Or just, you know, use Colmap and don't look back. That works too.
Mathematically, the camera most often used in computer vision is a pinhole camera. When we talk about "distortions" I think it's usually with regards to how the real device systematically deviates from that model.
Calibration in this context is essentially the task of finding the optimal parameters of some (usually nonlinear) function (u,v)=f(x,y) that remaps positions in the original image frame to a rectified frame, where all straight lines in the world appear straight in the image. Technically, a skewed and squashed image would also fulfill those requirements, too. But this is a customer-oriented blog post, to give someone enough of an understanding to convince them of the importance of calibration, it's not a rigorous technical paper, so I actually think it's fine to skip/simplify some details.
You are, of course, absolutely right. I nonetheless think it may have value for some people to understand where the deviations from that simplifying assumption lie, and at least understand the stakes.
Now, I may be biased, because I work in imaging-for-humans, and I've had many a conversations with engineers about why a particular simplification doesn't work for, e.g., filmmakers, but I think that even for purely technical disciplines, understanding the assumptions that go into the pinhole model can be useful. At the margins. Which sometimes matter.
> When we talk about "distortions" I think it's usually with regards to how the real device systematically deviates from that model.
IMO this is the correct definition of distortion. However, as the parent comment said:
> If you are not designing a rectilinear lens, there are other lens mappings, in which case it's not really proper to describe the effect as distortion, though in technical literature it's often still described this way.
I think many people confuse mapping and distortion. When a fisheye lens is used, it's often seen as "heavy distortion". But a more accurate way should be to say that it's a different mapping/projection, and the _distortion_ a calibration measures is the difference of this ideal projection (ftheta (e.g. Kannala Brandt) rather than fthan*theta (pinhole) ), and the actual image. This can be a minuscule amount.
This means that, "undistorting" a fisheye image doesn't give you a rectilinear image, but still a fisheye image. You can of course decide to map the undistorted fisheye image to a rectilinear one, but that's conceptually a different operation than (un)distortion.
A specific example of where it's useful to know the underlying mechanics: for very wide angle lenses, you will typically get brightness falloff at the edges due to a phenomenon called the cos^4 phenomenon (https://nvlpubs.nist.gov/nistpubs/jres/39/jresv39n3p213_A1b....).
This is often elided by camera systems, that apply a gain to peripheral pixels to correct for this phenomenon. If you understand imaging, you will expect that, and understand why, for example, your wide angle lens displays a lower signal-to-noise ratio for a given illumination value than you might otherwise expect at the edges of the image.
This is a really specific example, but there are dozens. Imaging is its own deep, technical field that is abstracted, and occasionally obscured, by the pinhole model.
The "rectilinear" or "f*tan(theta)" projection mentioned in the grandparent comment is equivalent to that pinhole camera. The former name is because the pinhole camera preserves straight lines, as noted. The latter is because if theta is the angle between the incoming ray of light and the lens's optical axis, and f is the focal length, then the pixel illuminated by that ray of light is at a distance f*tan(theta) from the center of the imager.
That rectilinear projection is indeed the most popular choice, but all projections involve tradeoffs as the FOV gets bigger, in the same way that all planar cartographic projections involve tradeoffs as the depicted region gets bigger. For example, the magnification of objects at the edges of a rectilinear projection gets extreme as the FOV approaches 180 degrees, and the projection stops existing entirely at or beyond that. That magnification is sometimes called "perspective distortion", even though it's inherent to the rectilinear projection.
Wide-angle lenses or multi-lens arrays will often deliberately choose f*theta instead, to avoid that "perspective distortion" or support FOV >= 180 degrees. Other projections (e.g. equirectangular) are also used, especially for stuff like panoramas and VR. The concept of distortion is meaningful only with respect to a desired baseline, which is often but not always rectilinear.
Rectification is not always the best choice, its a simple one for data processing in some cases, but often limiting. Keeping to a simple model is much more important.
Thank you for trying, I thought about it myself, after all someone has to teach people so I dont get another, yeah we collected the dataset with a calibrated camera rig and here it is. What do you mean what was the calibration?