Hacker News new | past | comments | ask | show | jobs | submit login
IBM and NASA Open Source Largest Geospatial AI Foundation Model on Hugging Face (ibm.com)
315 points by drkommy on Aug 3, 2023 | hide | past | favorite | 82 comments



I wish the press release had a bit more detail about what this model actually does and whether it's actually useful for the suggested use cases.

However, make no mistake: this is for the scientific community and will not help geospatial data to be commercialized. No one cares about your geospatial crop model or that you can identify energy infrastructure or that there's some activity around that copper mine. Well, at least no one cares that will actually pay you.

(FWIW, I cofounded a geospatial analytics company)

Satellite data is extremely idiosyncratic. It's coarse (~10m at best), infrequent (every few days at best), and oh you have to deal with the fact that the planet is covered in 50% clouds at any moment. Satellite data works best on things that don't move, that are fairly large, and change infrequently. If you find a use case that satisfies those conditions and want to make money, then you need to find a problem that terrestrial sensors haven't solved. And if you find that problem, the cost of building, training, and running your model (plus the cost of the data!) has to be less than the marginal value of your model. Good luck finding those use cases.

The US Government is special. We don't know what's going on in North Korea or Ukraine or the South China Sea so we buy high resolution imagery over those areas (30cm) at great cost. Large ag companies and oil companies know what's going on within their own facilities; and price gives them information about the rest of the supply chain.

In other words, this might be an interesting announcement for scientists, but it won't change the geospatial market at all.


PlanetScope is 3.7m and captures the full land area of the earth daily (minus cloud cover, of course): https://www.planet.com/products/planet-imagery/

Disclaimer: I work for Planet.

I also disagree with the assertion "no one will actually pay you." Read pages 26-28 of the quarterly report for more information.


I work for a customer of Planet and can confirm, we pay Planet a lot of money each year.


The Planet stock is down -70% since IPO (from 10$ in 2022 IPO to 3$ today).

NYSE: PL


And it took Amazon nearly a decade to get back to their internet bubble highs. PL got SPACed onto the stock market during the SPAC bubble.

That being said, while I do believe that Planet has one of the best business models in the industry, I do sometimes worry that they are a bit early, and their customers aren't ready for them yet.

As someone who has to work with their data (among others), Planet has some of the best APIs in the industry (it's a low bar though).


Why do you think they haven’t rebounded during the last bull market (see Nvidia, MSFT, etc..)


Not really seen as an AI play. The bull market has been mostly driven by FAANG + Nvidia.

I'm mostly just playing devils advocate here, and the point I'm trying to make is share price alone isn't everything. Planet is growing its revenue YoY, and they might even be profitable soon (lol).

Also, my original point was that the company I work for buys a lot of data from Planet, which is something that's just increasing as we grow.


Could you explain the use cases for you company? How do you use their data?


>Satellite data is extremely idiosyncratic. It's coarse (~10m at best)

>The best commercially available spatial resolution for optical imagery is 25 cm, which means that one pixel represents a 25-by-25-cm area on the ground—roughly the size of your laptop.

https://spectrum.ieee.org/commercial-satellite-imagery


I regularly use up to 3cm aerial imagery.

There are many very nice commercial products available.


Are there any products available for casual users, just paying for a few images?


There are many now. You can buy a single image, even schedule to take a live shot at a specific time.

https://skywatch.com/earthcache/


I'm an ML engineer working for a geospatial company and I can assure you, we are looking into this.

> Satellite data is extremely idiosyncratic. It's coarse (~10m at best) 10m is the best free imagery, in the commercial domain it goes down to 30cm.

> Well, at least no one cares that will actually pay you. There are plenty of things people will pay you for. But you gotta find those niches.

> In other words, this might be an interesting announcement for scientists, but it won't change the geospatial market at all.

Maybe, we'll definitely check if it can be fine-tuned on higher res data. We do sometimes use Sentinel-2 (not a lot though), but can help with those cases.


I think it is very important for people to understand that terrestrial sensors are orders of magnitude cheaper for most applications, and are typically far more accurate too. There's a reason why most remote sensing companies go out of business fairly quickly.


You have a sibling comment from PlanetScope that claims ~30-50% profit margin, depending on the quarter.


> Satellite data is extremely idiosyncratic. It's coarse (~10m at best), infrequent (every few days at best), and oh you have to deal with the fact that the planet is covered in 50% clouds at any moment.

Are the coarseness and cloud aspects going to become less of a factor now that there are commercial high-resolution synthetic aperture radar imagery providers? I'm just a hobbyist, but the imagery I've seen is <i>sharp</i>, and it even caught the NRO's attention.[1]

[1] https://spacenews.com/national-reconnaissance-office-signs-a...


Like... no?

InfSar (InSar, SAR, whatever we're calling it these days) isn't a drop in replacement for anything. Its really neither here nor there when it comes to the utility of other dataset. Infsar is amazing, dgmw, but its stands on its own and has its own advantages/ disadvantages.

The ocs point stands. Satellite data is tough because there is a shit ton of atmosphere between you and the target. That issue doesn't go away with infsar and especially not if it isnt coincidentally collected with higher resolution spectral data. I've been in the industry for around 15 years. Things have gotten better, but really, its important to understand the context and limitations of specific platforms. Afaik, there is no panacea.


Super interesting. Hadn’t heard of SAR before. Quickly reading about it, it seems like it works like lidar, what’s the difference between the two techniques? Is SAR like “lidar for space”?


I think one difference is that with SAR the sensor needs to be moving.


what happened with your company?


he ran into the ground without a vision and excess spending on bar tabs and the startup life


If I'm correct and philosophygeek is Mark Johnson, he cofounded Descartes Labs. It was a pretty cool company with some quite impressive technology. He (they) did a lot.

I'm not far from bashing the VC scene and the adjacent startup culture, but your overly cynical comment was too much even for me. More intellectual humility and less cheap soundbites would benefit society a lot.

If you're interested, he wrote about it: https://philosophygeek.medium.com/meditations-a-requiem-for-...


Descarte labs folded?

Oh man. I had no idea! These guys were some of my prime competition for years.

The true cost of venture capitol revealed.


Did you work there and know this as a fact?



Here are a bunch of demos of different use cases:

- https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-10... - This demo showcases how the model was finetuned to detect water at a higher resolution than it was trained on (i.e. 10m versus 30m) using Sentinel 2 imagery from on the sen1floods11 dataset

- https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-10... - This demo showcases how the model was finetuned to classify crop and other land use categories using multi temporal data.

- https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-10... - This demo showcases the image reconstracting over three timestamps, with the user providing a set of three HLS images and the model randomly masking out some proportion of the images and then reconstructing them based on the not masked portion of the images

- https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-10... - This demo showcases how the model was finetuned to detect burn scars

More/same but different source information:

- From NASA - https://www.earthdata.nasa.gov/news/impact-ibm-hls-foundatio...

- From IBM - https://research.ibm.com/blog/nasa-hugging-face-ibm

- From huggingface - https://huggingface.co/ibm-nasa-geospatial


Wondering a use case for a model like these would be navigation without GPS?


Not at 10m brah.


I suggest changing the link to https://huggingface.co/ibm-nasa-geospatial. The currently linked press release is an insufferable corporate PR word salad.


As someone who is not a Hugging Face user your link is much less clear than the submitted link. Really esoteric UI. What am I even looking at? What are Spaces?


I’m not familiar with HuggingFace at all (not in the AI space) but I clicked on the demos and models on that page and was able to learn what this is all about. On the other hand, I read the press release in full and left with no idea what the model is supposed to be.



In addition to what Kiro said (Hugging Face's organization UI is hard to parse), the text on this Organizational Card is written in exactly the same word-salad style, just with even less information.

I prefer the full press release, especially since it already has the link to Hugging Face for those who want it.


I work for USGS EROS Data Center and this is basically our life. Taking remote sensing data and generating various models to determine all sorts of things. Yes, I agree with much of what people are saying. This means very little to the public except when it matters. Some of the projects I work on determine famine early warning systems across the globe so that governments can make sure they have enough food for their populations. Others depend upon the health of crops. How people are using the land (land cover). How much cheatgrass and other things are there that affects rangelands (it cuts the tongues of cows for instance). We use super computers (Cray HPCs with GPUs) and deep learning (PyTorch/Keras/TF). But, given all of the above, I'm not sure how this would help me to do my job.


The core information:

> The model – trained jointly by IBM and NASA on Harmonized Landsat Sentinel-2 satellite data (HLS) over one year across the continental United States and fine-tuned on labeled data for flood and burn scar mapping — has demonstrated to date a 15 percent improvement over state-of-the-art techniques using half as much labeled data. With additional fine tuning, the base model can be redeployed for tasks like tracking deforestation, predicting crop yields, or detecting and monitoring greenhouse gasses. IBM and NASA researchers are also working with Clark University to adapt the model for applications such as time-series segmentation and similarity research.


This simply is not true! NASA and IBM need to do further literature review and rely less on press releases.

There are larger foundation models for geospatial imagery available. Our pre-training method, Scale-MAE [0], has 323M parameters, makes encoders robust to changes in satellite imagery resolution, and is therefore trained on satellite imagery of all resolutions. Work out of SI Analytics [1] presents a 2.4B parameter transformers for satellite imagery.

[0] https://arxiv.org/abs/2212.14532

[1] https://arxiv.org/abs/2304.05215


If the models aren't available in a repo for download do they actually exist?


I'm interested. Are your models open sourced like the IBM model, and are they easily accessible on HuggingFace?


They are open sourced (code and weights) [0], but not accessible on HuggingFace.

[0] https://ai-climate.berkeley.edu/scale-mae-website/


You might want to think about adding them to HuggingFace, and making them HuggingFace API compliant to lower the barrier to entry.

The IBM offering is appealing not because it is "first" or the "best", but because it is accessible. A lot of institutions / enterprises (including my own) are able to leverage transformer models because HF has done so much to lower the barrier to entry with their hub, documentation (model cards) and API.

How else would I have found your models but for an IBM press release?


The demo misidentifies West Bengal, India flood to a location in Pakistan. Either the title is incorrect or the model is simply wrong. The demo should have been proof read before publishing :/

https://youtu.be/9bU9eJxFwWc?t=28


Well, it also says Indus which is in Pakistan. But, good catch, weird mistake!


this is just PR newswire hype, calling it foundation model is a stretch. People have been training nets on satellite imagery for a long time, I don't even see anything here that would need a foundation model, most of them are vanilla CNNs that work relatively well, the problem has mainly been the data pipeline of different gsds and nadir of images which makes layering difficult.


Can someone enlighten me as to how a foundation model is better suited for visual geospatial data analysis (satellite images) versus an intelligently designed "filter?"

For example: you can easily use Photoshop to distinguish between various hues, apply a mask based on the selection, and then apply a color overlay, essentially "highlighting" a particular hue with another. (I imagine a similar approach can be used with hyperspectral imaging as well.)

As visual data is color-based, why can't such "dumb" filters be used? Why is this an AI challenge, let alone one well suited to the flexibility of foundation models?

*Edit: I believe I may have answered my own question. I assume the advantage this approach has is in the efficient bulk analysis of geospatial data. Platform consumes visual data, and spits out numerical data based on an analysis of the images. That numerical data can then be manipulated and fed into prediction models.


So the big issue is can be summed up as one present across many domains of ML, but particularly challenging in remote sensing. ML has seen rapid advances in spaces where foundational models capture the variance of a domain the latent space of a model. By training on very large dataset, the variance becomes encoded in that latent space; lots of training data implies that the latent space contains meaningful variation. This is what powers things like stable diffusion. A model trained on a very large dataset was connected with another dataset, in latent space. The image model wasn't starting from square 0. It had meaningful variation encoded into it.

Enter remotely sensed dataset. They are monumental in scale. So much so that the 804 chips they submitted here is pretty much laughable. Likewise, some parameters of the data have inherent meaning; the pixel dimensions correspond to real world measurements; the bands are specific spectral windows.

Its similar but far more than just working with RGB cell phone camera data.


This is cool -- I built something very close to this and I think it's really useful for researchers to have access to pre-trained models. Training something like this isn't that challenging given the relatively limited scale (100M parameters), but most grad students working in GIS won't have the resources or time to do it.

I think partnering with Huggingface was a good move, because it means the interface is easy to use. This difficulty in actually using pre-trained research models was one of the design goals of Moonshine[1] and I have no doubt that if it was IBM alone it wouldn't be nearly as easy to use.

Will be excited to hear if this works for people! Always cool to see your idea validated even if nobody really uses your tool :)

[1] https://github.com/moonshinelabs-ai/moonshine


We're talking to a lot of companies that are willing to pay for this, and already paying.

With the better resolutions that are being launched and current AI, there are many more feasible applications vs. when you started DL.

We've built this leveraging other foundation models, so all research is very much appreciated https://www.youtube.com/watch?v=2yz4DwPtdjE

Disclaimer: I'm a co-founder of Happyrobot. We're working on this as we speak.

Let's all meet at SmallSat https://smallsat.org/ conference next week and discuss in more depth!


This may be a silly question as I'm quite inexperienced w/ ML and a lot of the words in this article don't mean anything to me yet:

But, could this "model" be used for something like monitoring land use in a city? The specific example I'm thinking of is getting a percentage breakdown of what land devoted to paved surfaces (parking/roads), to vacant undeveloped lots, and to built structures. It would also be interesting to see how those percentages have changed over time.


If I skimmed the linked source and https://hls.gsfc.nasa.gov/algorithms/ correctly the models are trained on 10m x 10m ( ~ 30 ft x 30 ft ) cells.

That may not be fine enough resolution in the source data to resolve parking lots Vs undeveloped areas to the degree you require.

The alternatives are to use the nature of the models (trained to lock in on multispectral signatures) on finer source data - which may limit your options about the globe, OR

to use urban data from city land agencies which have maps from low level air photo surveys with resolution down to 10 cm (+/-) and GIS land boundaries which are often classified via metadata.

Air photos are not multispectral (usually) so you won't have access to IR bands etc.

You can get coarse city growth figures globally from sat data going back to TERRA (launched 1999 IIRC) and fine grained air photo data from well off cities going back to (say) the mid 80s (and longer for wet negatives).


The thing you are asking for does not sound hard if you have multispectral satelite data of your target area.

The typical trick is to look for areas which absorb visible red while reflect near infrared to identify vegetation. If all you have is rgb imagery then you can use machine learning techniques to develop a classification system.

It does not look like this model means a breakthrough in your application area. Definietly not out of the box, maybe with more work you can refine it to do the classification you are looking for.

Do you have access to satelite images of areas where you would be interested in these percentages?


> Do you have access to satelite images of areas where you would be interested in these percentages?

Answering my own question: it seems one can access the right type of data from the Sentinel satelites relatively freely.


I think that'd make a lot of sense to use it for. One of the demos (https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-10...) showcases a fine-tuned version of the model that detects crop type and land usage, so hard to imagine it couldn't be done for more types that are more city-oriented.


If you are just interested in the percentage share (not a map of labeled data), all you need is random sampling. Classifying a couple of hundred coordinates is sufficient, takes at most some hours (more likely <1h), and is required anyway to train your network.


Keep in mind, even though this is the largest geospatial model, it's still a tiny model, with only 100M parameters.

I'd be excited to see what a more substantial, state-of-the-art model could do with geospatial data.

Say, a model based on something like ViT-22B, with 22 billion parameters: https://arxiv.org/abs/2302.05442.


Personally, I'm excited to see ML researchers doing cool stuff with small models again!

With LLMs taking over the spotlight it's easy for people to forget that not everything needs billions or trillions of parameters. Stable Diffusion fits comfortably in my 8GB of VRAM and can generate amazing images. I'd love to see more research like this in smaller models that can be used on cheap consumer hardware.


We want much larger models, not because they're "cool," but because they exhibit capabilities that tiny models don't exhibit, including the ability to perform new tasks for which they were not trained, without requiring finetuning.


I question the assumption that fine-tuning should always be avoided.

If a model is going to be used many times for a specific use case, it is far cheaper and uses far less energy to fine tune a small model once and run it on cheap low-power hardware than it is to continuously run a huge, do-everything model on expensive, high-power hardware. Enormous models are great for exploration and for general purpose applications like ChatGPT, but I think that we will find over the next few years that smaller, purpose-built models will continue to dominate in applications like geospatial analysis.


We're talking about different things. You're talking about finetuning models to tasks known in advance. I'm talking about the ability to generalize to new tasks: https://arxiv.org/pdf/2206.07682. Please don't argue against a straw-man.


I'm not attacking a straw man, it seems that I just don't understand the distinction you're drawing.

As I understand it we're contrasting two opposite approaches to ML: fine tuning small models for specific applications versus training a single large model that can generalize to new tasks without preparing them ahead of time.

I'm arguing that in fine tuning is far more useful than people are currently giving it credit for, and that generalizing a single massive model to new tasks is overrated.

Can you clarify where you're seeing a straw man?


You're talking about tasks known in advance; I'm not.


And I'm saying that any task that wasn't known in advance immediately becomes a task known in advance once it's done once.

I'm not arguing that there is no place for large, general models—they're great for exploration—just that a smaller foundation model shouldn't be dismissed offhand based solely on parameter count.


I'm not disagreeing with you but just pointing out that it's kind of wild that "100M parameters" isn't considered SotA anymore. Amazing how far we've come in such a short time.


Its the largest *published* geospatial model. There are plenty of models being developed daily with more than 100m parameters.

I think its just that there is a dearth of publishing on the advances in geospatial ml.


"What about a bigger model" is basically feature creep for ML.


the emphasis here is obviously on market-facing customer accounts. Basic LANDSAT NASA data is open to anyone, including big companies. So is this not simply an IBM Watson 2.0, including vague and bombastic impressiveness claims?

Floods and fires are bad news, but how does this product actually get used to make things any different ?


Land use classification is one of the features advertised. While some may be capable of putting together a classification system on their own, not everyone can (especially reliably). Land use often directly effects things like flooding and fires. Such classification can be used in conjunction with other models to compare say current vs proposed changes in attempts to reduce things like flooding and fires spreading. You might be talking more about mass flooding and less about, poor water overflow management and things like that in which case sure, some things are largely inactionable for humans currently. There are cases where we can have some tangible effects at least on the scales we do create and manage infrastructure though.


Unclear which LICENSE it is.


Apache 2.0

E.g. https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M

(Though with all Facebook's shenanigans I understand the need to be sceptical and search for a license when a company claims "open source")


And the finetuned models seems to be CC-BY-4.0 (note there are no NC). IBM actually not lying it seems unlike most of the other "open source" models.


ML models are not copyrightable and the licenses people choose to try to apply are irrelevant.

Yes, I actually type these by hand. My carping is 100 percent artisanal.


What makes this a “foundation model”?


They are using the word to mean that it is not single-purpose or task-specific and there is an ability to fine-tune / adapt it for new use cases without retraining the whole thing.

> With additional fine tuning, the base model can be redeployed for tasks like tracking deforestation, predicting crop yields, or detecting and monitoring greenhouse gasses. IBM and NASA researchers are also working with Clark University to adapt the model for applications such as time-series segmentation and similarity research.


“Model” is a dangerously overloaded word in 2023, possibly worse than Object in 1996.

When I worked on ML inference I would tease the researchers with the question “what is a model?” in the hope they would say something that constrained it in any way, but no, a model can be anything at all and is whatever you want it to be.

As such this press release is perfect nonsense, which is kind of appropriate for a Watson offshoot. If there is something interesting here it doesn’t succeed in telling you what it is.


If you think of a model as a function approximator with an error that can only be characterized empirically, it's not a bad term at all to describe the class of algorithms ANNs belong to.

It's a real shame that the term is being abused so badly, because it's really appropriate in a lot of these cases. Or rather, it would be if people used it mindfully.


It's a better term than "algorithm" or "learner" or "AI".

It very literally is a model of the relationships within the training data.

Moreover, statisticians and other varieties of people doing data analysis have been using the term "model" to mean something along these lines for many decades already.

If anything, it's nice to see more people calling these things "models".


So what's the better word?


What it actually is: https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M

“Prithvi is a first-of-its-kind temporal Vision transformer”

That is enormously more specific and interesting, and the word model is absent.


This entirely depends on context. Replacing “model” with “temporal Vision transformer” in the title would ensure only people “in the know” understand what this even is, which severely limits the reach this post obtains. The point in posting this article is to broadly inform people that this temporal Vision transformer exists. In order to do that, you have to use words the broader audience understands even if the terms aren’t perfectly accurate.

In contrast, if you’re publishing this in a machine learning journal, then sure—you should definitely call it a temporal Vision transformer.


Transformer is overly specific. Most people won't care about the actual architecture and would just treat it as a black box.


"Transformer" is a particular type of neural network architecture, and neural networks are a class of models. The term "model" in this sense has been standard in statistics and data analysis for a very long time.


So it's some coils of wire wound around a common high permeability core?

Or some kind of weird robot-car hybrid?

I guess since it's a "Vision" transformer, it's probably actually some kind of hallucinogenic plant.


dangerous how?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: