More

bfirsh · 2025-01-03T16:48:30 1735922910

Replicate (YC W20) | San Francisco, CA + Remote | https://replicate.com/

Replicate makes it easy to run AI in the cloud. You can run a big library of open source models with a few lines of code, or deploy your own models at scale.

We're an experienced team from Spotify, Docker, GitHub, Heroku, NVIDIA, and various other places. We're backed by a16z, Sequoia, NVIDIA, Andrej Karpathy, Dylan Field, Guillermo Rauch.

We're hiring:

- An infrastructure engineer

- A machine learning engineer who's an expert at image models

- An engineer who likes talking to people to look after our customers

... and more: https://replicate.com/about

bfirsh · 2024-10-14T21:11:26 1728940286

You're right -- this wasn't clear. Added another paragraph to explain what you had to do before.

bfirsh · 2024-10-07T13:10:02 1728306602

I can confirm these are dangerous. There are several of these in Berkeley and I got knocked off my bicycle on one of them for exactly the reason you describe.

I am from the UK and it makes me wonder why road design in the US is so bad. Just one minute of thinking about this as a lay person would reveal the problem with the design.

Is there some structural reason in the US that would cause it? Perhaps some lack of standards or approval process? Perhaps iteration speed is slower so they don’t get better? Some other incentives going on?

pclmulqdq · 2024-10-07T13:48:52 1728308932

My personal hypothesis on this is that the worst 5% of Americans is likely both dumber and more sociopathic than Europeans, and the behavior of the worst drivers is what creates a lot of traffic and road accidents. If that is the case, you will not have the same kind of design that works in a high-trust, more cohesive society.

bfirsh · 2024-08-29T22:30:47 1724970647

https://replicate.com/yorickvp/llava-13b :)

bfirsh · 2024-06-03T17:26:12 1717435572

Replicate (YC W20) | San Francisco, CA + Remote | https://replicate.com/

Replicate makes it easy to run AI in the cloud. You can run a big library of open source models with a few lines of code, or deploy your own models at scale.

We're an experienced team from Spotify, Docker, GitHub, Heroku, Apple, and various other places. We're backed by a16z, Sequoia, Andrej Karpathy, Dylan Field, Guillermo Rauch.

We're hiring:

- An infrastructure engineer

- An expert at deploying and optimizing language models

- An engineer who is good at humans to look after our customers

... and more: https://replicate.com/about#join-us

Email us: jobs@replicate.com

bfirsh · 2024-04-24T16:33:58 1713976438

If you want to have a conversation with it, here's a full chat app: https://arctic.streamlit.app/

Official blog post: https://www.snowflake.com/blog/arctic-open-efficient-foundat...

Weights: https://huggingface.co/Snowflake/snowflake-arctic-instruct

leblancfg · 2024-04-24T20:09:14 1713989354

Wow that is *so fast*, and from a little testing writes both rather decent prose and Python.

pixelesque · 2024-04-24T21:59:05 1713995945

I guess the chat app is under quite a bit of load?

I keep getting error traceback "responses" like this:

TypeError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app). Traceback:

File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script exec(code, module.__dict__) File "/mount/src/snowflake-arctic-st-demo/streamlit_app.py", line 101, in <module> full_response = st.write_stream(response)

hnthrowaway9812 · 2024-04-24T23:22:23 1714000943

It claims to have a knowledge cut-off of 2021. Not sure if its hallucinating or its true.

But when I asked it about the best LLMs it suggested GPT-3, Bert and T5!

bfirsh · 2024-04-24T13:36:17 1713965777

It might not be obvious from the title, but this model is absolutely massive: 480B parameters. The largest open-source model to date, I believe.

You can try it out here: https://arctic.streamlit.app/

Weights are here: https://huggingface.co/Snowflake/snowflake-arctic-instruct

georgehill · 2024-04-24T13:46:18 1713966378

I can still change the title, but I'm not sure if that would be okay according to HN guidelines.

bfirsh · 2024-04-24T14:06:29 1713967589

Indeed – I was doing the editorialization in the comments. ;)

It's missing the name though (Arctic). That might be worth adding.

georgehill · 2024-04-24T15:11:03 1713971463

Done!

bfirsh · 2024-04-18T17:29:26 1713461366

We've got an API out here: https://replicate.com/blog/run-llama-3-with-an-api

You can also chat with it here: https://llama3.replicate.dev/

simonw · 2024-04-18T17:40:55 1713462055

The pet names example is my pelican test prompt!

bfirsh · 2024-02-17T22:43:47 1708209827

Founder of Replicate here. Our cold boots do suck (see my other comment), but you aren't charged for the boot time on Replicate, just the time that your `setup()` function runs.

Incentives are aligned for us to make it better. :)

moscicky · 2024-02-17T23:07:50 1708211270

Was not aware of that that. You should probably change the docs to better explain what you are charged for. Right now it says you do get charged for boot time:

“[…] Unlike public models, you’ll pay for boot and idle time in addition to the time it spends processing your requests.”

Apart from boot times, we actually find replicate to be an amazing platform, congrats

bfirsh · 2024-02-17T23:48:44 1708213724

Oops you're absolutely right. "Boot" ambiguously meant "setup" there. Fix deploying!

bfirsh · 2024-02-17T22:40:28 1708209628

Founder of Replicate here. Yeah, our cold boots suck.

Here's what we're doing:

- Fine-tuned models now boot fast: https://replicate.com/blog/fine-tune-cold-boots

- You can keep models switched on to avoid cold boots: https://replicate.com/docs/deployments

- We've optimized how weights are loaded into GPU memory for some of the models we maintain, and we're going to open this up to all custom models soon.

- We're going to be distributing images as individual files rather than as image layers, which makes pulling images much more efficient.

Although our cold boots do suck, the comparison in this blog post is comparing apples to oranges because Fly machines are much lower level than Replicate models. It is more like a warm boot.

It seems to be using a stopped Fly machine, which has already pulled the Docker image onto a node. When it starts, all it's doing is starting the Docker container. Creating the Fly machine or scaling it up would take much longer.

On Replicate, the models auto-scale on a cluster. The model could be running anywhere in our cluster so we have to pull the image to that node when it starts.

Something funny seems to be going on with the latency too. Our round-trip latency is about 200ms for a similar model. Would be curious to see the methodology, or maybe something was broken on our end.

But we do acknowledge the problem. It's going to get better soon.

reissbaker · 2024-02-17T23:53:32 1708214012

The warm boot numbers for Replicate are also a bit concerning, though. I know that you're contesting the 800ms latency, and saying that a similar model you tested is 200ms — but that's still 30% slower than Fly (155ms). Even if you fix the cold boot problem, it looks like you're still trailing Fly by quite a bit.

I feel like it would be worth a deep dive with your team on what's happening and maybe writing a blog post on what you found?

Also, I'll gently point out that Fly not having to pull Docker images on "cold" boot isn't something your customers think much about, since a stopped Fly machine doesn't accrue additional cost (other than a few cents a month for rootfs storage). If it's roughly the same price, and roughly the same level of effort, and ends up performing the same function for the customer (inference), whether or not it's doing Docker image pulls behind the scenes doesn't matter so much to most customers. Maybe it's worth adding a pricing tier to Replicate that charges a small amount for storage even for unused models, and results in much better cold boot time for those models since you can skip the Docker image pull — or in the future, model file download — and just attach a storage device?

(I know you're also selling the infinitely autoscaling cluster, but I think for a lot of people the tradeoff between finite-autoscaling vs extremely long cold boot times is not going to be in favor of the long cold boots — so paying a small fee for a block storage tier that can be attached quickly for autoscaling up to N instances would probably make a lot of sense, even if scaling to N+1 instances is slow again and/or requires clicking a button or running a CLI command.)

tptacek · 2024-02-18T00:45:02 1708217102

For what it's worth: creating and stopping/starting Fly Machines is the whole point of the API. If you're on-demand creating new Machines, rather than allocating AOT and then starting/stopping them JIT, you're holding it wrong. :)

(There's a lot I can say about why I think a benchmark like this is showing us unusually well! I'm not trying to argue that people should take this benchmark too seriously.)

brianjking · 2024-02-18T18:25:34 1708280734

How do you see the Replicate vs Modal.com overlap?