Hacker News new | past | comments | ask | show | jobs | submit | flurly's comments login

Thank you! With respect to datadog the elephant in the room is cardinality and pricing. We charge $20/mo and based on usage of data storage and query execution time. DD is notorious for charging for metric cardinality, which can very easily blow up. Coinbase famously racked up a $50m bill from this!

With respect to the product, we believe there's a segment of the market that likes to know what they are measuring. Datadog and other "telemetry" tools often times install an agent and collect metrics automatically. While this is a great UX onboarding, it does make it harder to know what data you have and how to query it. Whereas with telemetry we believe having that "manual shift" mode where you log your own data and write your own queries is useful in many scenarios.

I'd love to chat more if you're up for it! Shoot me an email at hi@telemetry.sh if you're open to the idea.


Generally a big fan of Zed. Super fast and quite innovative in their grep UI. My biggest current gripe is Zed's filesystem watchers are either broken or misconfigured on Mac. If I do a `git rest --hard` via terminal or github desktop UI, zed doesn't detect it and I'm forced to do a hard reset of the app to get back to a synced state.


Hey HN community! I've been working on something I think you'll find pretty neat: a Rate Limit API. It's a tool I built with the goal of tackling the challenges of API rate limiting, especially in distributed systems.

One of the core ideas behind this project was to make it dead simple to use, kind of like what Stripe did for payments. I wanted developers to be able to integrate rate limiting into their systems without the usual complexity. You'll find examples in JavaScript, Python, and Ruby to get you started in no time.

Let's talk about distributed counting – it's a tough nut to crack. In a distributed system, maintaining a consistent rate limit across multiple servers is tricky. There's a lot of coordination and data syncing involved, which can be a real headache. This API abstracts all that complexity away. It provides a centralized, consistent approach to rate limiting, so you don't have to worry about the underlying challenges.

On the pricing front, it's free for up to 1 million requests per month. We've got more flexible plans for higher volumes, all aiming to keep your costs reasonable.

I'd really appreciate your thoughts on this, especially around the ease of use and the distributed counting solution. If you've ever felt the strain of managing API traffic, especially in a distributed environment, I'm keen to hear how this might fit into your workflow.

Can't wait to hear what you guys think!

JR


I like the simplicity of the landing page! For me personally, the tool would be more useful if I could adjust the duration and the rate limit count without redeploying my code, but I'd also sooner spin up my own Redis instance than use a 3rd party service - so I'm not sure if my feedback is useful :)

PS: You can simplify and make the Python example a bit more secure by using the params parameter[0] instead of building the query string manually:

``` requests.get(base_url, params=params) ```

[0] https://requests.readthedocs.io/en/latest/user/quickstart/#p...


Thank you! I just pushed an update to the website to use the params parameter.

I'd love to make this more ergonomic for you. I'm used to using configuration propagation mechanisms where you can change configs without redeploying code (basically the webserver subscribes to some central pubsub config store). That paradigm works with this since you could parameterize the duration using the config value. What would work better for you?


That makes sense if you already have a pub/sub service running. But if you have a pub/sub service running, you probably also have a service that can count events so you won't need ratelimitapi.com.

In this case, I'm guessing the ability to configure the rate limits on your end would be useful. But again, I'm not your potential customer, so don't take this as customer feedback/feature request. I'm just thinking out loud :)


Touche! I guess we'll see if it's an issue in practice. My intuition is people won't change durations that frequently in practice. eg. OpenAI has a limit of 40 messages per 3 hours and hasn't changed that for months.


Hi HN,

Just wrapped up the first version of my newest project: LLM Templates. It's all about making your daily grind with Large Language Models (like GPT-3.5 and GPT-4) a bit easier.

So what's the deal with LLM Templates? It lets you create and use quick templates for those repetitive tasks you do with AI. eg.

Deciphering code: “What does this code do? {{ code }}”

Email makeovers: “Make this email sound cooler {{ email }}”

Quick info grabs: “Need the email of {{ person }}”

Right now, it only supports GPT 3.5 and GPT-4, but I'm planning to add more models soon.

I hope it could be a real time-saver for many of you. Give it a whirl and let me know what you think!

Cheers!

JR


Wow it works without sign up or BYO API token! Nice for me but that’ll cost you no doubt?


Yep that's right! I did implement IP rate limiting, so eventually power users will either have to sign up or stop using the service. I figured this way was less friction for new users to see the value the product provides.


Hi HN,

c2p is a lightweight utility binary that allows you to quickly pattern match files in your repo and turn them into a single prompt that you can copy into your clipboard and paste into LLM tools including ChatGPT. I built c2p because I found usage of ChatGPT becomes unwieldy once I want to use it for multiple file projects.

The way it works is under the hood c2p turns your list of patterns/globs/etc into a list of files. It then constructs a prompt that looks as follows:

File: {filepath}

{contents}

for every file that matches your inputted pattern.

As an example I ran

`c2p src/.rs migrations/*/.sql | pbcopy`

In a small rust webserver side repository, prefixed the prompt with some instructions directing ChatGPT to add a new feature, and bam ChatGPT took in the context of the entire repository and gave me the exact changes I needed to make to ship my feature: https://chat.openai.com/share/3c674621-e526-45b7-bce8-10c38e...

I thought this was super mind-blowing so I wanted to share it with HN to see if it would be useful to others.


Can you elaborate more? Is this in web analytics, funnels or cohort retention? Are you doing basic page views or custom events? Feel free to DM or email hi@beamanalytics.io and we can get this sorted out for you!


Just basic page views by throwing the script in and that's it. Nothing fancy


> Does this mean my site won’t need a cookie alert banner?

Correct

> what happens if my hobby site gets an unexpected surge in traffic?

Nope. You don't even need to put in CC to sign up. You'll only get billed if you explicitly upgrade. That being said if you are consistently over for many months, we may cut off data ingestion and dashboard access!


Yes absolutely! You can use custom events in both funnels and cohort analysis. You can read more about it here https://beamanalytics.io/blog/custom-events-on-beam


TL;DR

> The way mmap handles memory makes it a bit tricky for the OS to report on a per-process level how that memory is being used. So it generally will just show it as "cache" use. That's why (AFAICT) there was some initial misunderstanding regarding the memory savings


I lost track since things move so quickly. Was there still memory savings just not as drastic? Or no memory savings, just a speed-up?


It did somewhat reduce the total memory used. Now you can load the 30B model while only using ~20gb of RAM, which is about the aggregate size of the 4bit quantized weight files for that model. The real win is that you can kill the main inference binary and try another prompt, and it will start doing inference basically immediately instead of spending 10-15 seconds loading up all the weights into RAM each time.


It's neither memory savings or a speed-up really. The advantage of mmap is that you can treat a file on a disk a block of memory, so pages from it can be loaded (or unloaded) as necessary instead of one big upfront load into RAM. The benefit is that you can work with data that's bigger than your physical RAM because the kernel can swap it back out to disk if needed. Another benefit could be that if only a small part of the data is needed to compute something then the OS will automatically only load those, but it's unclear to me whether this is the case with LLaMA.


It's still somewhat faster if you benchmark it. I assume the os is doing good enough prefetching in the mmap case to hide the loads from disk mostly. So it's not just hiding the initial load of 30gb from disk.

Obviously if you're swapping because you don't have enough memory to hold the model in RAM, the mmap version is going to be much faster, since you don't need to swap anything out to disk but just discard the page and re-read from disk if you need it again later.


> So it's not just hiding the initial load of 30gb from disk.

The issue is typically that that initial load involves some sort of transformation - parsing, instantiating structures, etc. If you can arrange it so that the data is stored in the format you actually need it in memory, then you can skip that entire transformation phase.

I don’t know if that’s what’s been done with llama.cop though.


It is a speedup for me. When I run llama.cpp from CLI first tiem it takes a very long tiem to load the model in memory. If the program exits or I stop it with Ctrl+C and start it again it will start almost instant.


That's down to caching. If you used your system to do something else for a while, you'd find those pages evicted and the performance back down to Earth. That's one of the things that makes mmap so useful, though. The system can take advantage of access patterns to dramatically improve performance.


Yes, makes sense. And is great. Though honestly not sure why it takes minutes to load a 23Gb model in RAM, I feel is not proportional with the smaller models.


Regarding your last point: No, you need all of the weights all of the time.

Edit: except embedding weights but those are not the problem.


They weren't asking about mmap, they were asking about the program itself.


It's basically paging to disk.

Not necessarily memory savings, but the improvements here are that it will run on computers with less ram because it can page to disk.

Not necessarily the number reported (since you do need to load chunks into ram), but still lower.


If that's the case then part of this may be the different interpretations of "memory".

One persons "paging the same memory requirement from disk to RAM" can be someone elses "requiring less memory/RAM".


Sort of, but without the duplication and initial wait to load. A traditional fat in-memory app would do this:

file (DISK) to process active pages (RAM) to paged out virtual memory if saturated (elsewhere on DISK)

Using mmap typically goes something like:

file (DISK) to process active pages (RAM) to released and cached (RAM) to uncached if evicted (back to same place on DISK) OR back to process active pages from cache (RAM)

For the cost of fixing up some process page tables, the physical memory pages necessary can be brought back from the cache rather than read from disk. It's an orders-of-magnitude performance savings.


more like memory mis-reporting, since when you mmap a file in, IIRC that counts against page cache rather than memory usage (you can evict the page without causing write IO so the memory isn't "used")


It being safe to evict is why it's correct to not report it as "memory usage".

It's part of the program's working set but measuring that is a completely different story.


Less than $2 for almost 10k users! https://twitter.com/TheBuilderJR/status/1635943930524217346

Right now it's "sponsored" by my main company https://www.beamanalytics.io/

In some sense it's a good deal since the small percentage of people who visit Beam and pay can currently over the costs.


Wow, that's really impressive! Much lower than I expected for 10k users.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: