erichileman's comments

erichileman · 2024-09-17T17:57:09.000000Z

Why not run something like 8 x L40's for $4,750 a month from a bare metal provider like latitude.sh? This seems far more cost efficient and flexible.

015a · 2024-09-17T18:12:28.000000Z

I think you're reading that page wrong; but their pricing page is so confusing that its giving me red flags already.

It says that would cost $6.51/hr and $4752/yr: I think you pay both of those things. I think the first number is the hourly cost, and the second number is the annual commitment. So its $56,246/year if you're running 24x7 + $4,756 = $61,002/year total.

guisalberto · 2024-09-18T16:48:50.000000Z

I am sorry if you are finding our pricing page confusing. We had a recent update to that page and we had a glitch for prices for the GPUs. Anyway, the correct price for a 8 x L40S is $4752/mo when paid a year upfront.

015a · 2024-09-18T17:12:03.000000Z

So, when the page now says "$6.51/hr // $4,752/mo", that's really just presenting the same actual cost you'd have to pay, across two different time metrics? As in, you pay $6.51/hr, or you pay $4,752/mo, same thing, but not both?

I think you need to consider: Your competitors (e.g. AWS) generally structure annual commitments as an upfront (or monthly) payment + a reduced cost per hour/minute for the resource being reserved; that's the lens through which most people viewing this page will be thinking. If that is not how you structure annual commitments, then that should be made very clear.

If my first paragraph is correct (and again, the page is still confusing, its not obvious to me that this interpretation is correct): You should list one price, and give a dropdown at the top to change the computation of that price across whatever timeframe the user wants ($/hr, $/day, $/month, etc). That would also free up some space in-line to put a chip that says something like "-15% discount!".

adityapatadia · 2024-09-17T18:19:42.000000Z

I think a single L40 costs $1320 a month on latitude. L40 is also an older GPU.

guisalberto · 2024-09-18T16:49:44.000000Z

Latitude.sh provides the L40S, not L40

The on demand cost is $9.30/hr and $4,752/mo when paid a year upfront

genewitch · 2024-09-17T19:09:10.000000Z

it's hopper vs grace hopper and afaik doesn't require custom boards, but i haven't really looked in to it too much as both are far outside my personal price range.

pella · 2024-09-17T18:18:58.000000Z

> "Dedicated GPU clusters for accelerated computing"

- so you have to add the price of AMD/Intel bare metal servers.

- the price of "Networking" PER TB

- and the "Additional services pricing"

https://www.latitude.sh/pricing

guisalberto · 2024-09-18T16:50:27.000000Z

That's all included, you don't have to add the bare metal servers, networking or anything else.

pella · 2024-09-18T19:33:18.000000Z

thanks, the webpage is not clear ...

"Bandwidth Pricing is based on the country your server is located. Packages are billed monthly and sold in increments of 10 TB."

SteveNuts · 2024-09-17T18:18:13.000000Z

It looks like their on-demand price would cost $81,468 per year

So even at the reserved price for a year (365 * 24 * $6.51) you're nowhere near $4,750 per year, it's closer to $60k

erichileman · on Nov 30, 2016

Unfortunately <not supported>

Performance counter stats for 'php56 index.php':

        394.588620      task-clock (msec)         #    0.983 CPUs utilized
               226      context-switches          #    0.573 K/sec
                 2      cpu-migrations            #    0.005 K/sec
            17,447      page-faults               #    0.044 M/sec
   <not supported>      cycles
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   <not supported>      instructions
   <not supported>      branches
   <not supported>      branch-misses
   <not supported>      L1-dcache-loads
   <not supported>      L1-dcache-load-misses
   <not supported>      LLC-loads
   <not supported>      LLC-load-misses

       0.401580145 seconds time elapsed

Working on finding the event descriptors...

erichileman · on Nov 28, 2016

Memory seems likely.

On the E5 php 5.6 in top we see sys at 15%.

On the E3 php 5.6 in top we see sys at 7%.

On the E5 php 7 in top we see sys at 13%.

We are exploring memory perf more now.

erichileman · on Nov 28, 2016

We have not compared on bare metal.

erichileman · on Nov 27, 2016

We thought that as well. The E5 has 4 memory channels max bandwidth of 51.2 GB/s. The E3 has 2 memory channels max bandwidth of 34.1 GB/s.

But we see a dramatic difference in single core tests. Our virtual machines have 2 cores assigned and there's also a dramatic difference. I wouldn't think that 1-2 cores would saturate 2 memory channels nor 34.1 GB/s bandwidth. If we were testing all 8 cores on the E3 vs E5 8 core virtual machine, yeah maybe, but 1-2 cores?

The L3 cache is much larger on the E5 at 20MB Smartcache vs the E3 at 8MB Smartcache. That seems to be the more likely suspect but I don't know enough about how the cpu cache is used in relation to php to say for sure. Hopefully, someone else does :)

Ref: http://ark.intel.com/products/88176/Intel-Xeon-Processor-E3-... http://ark.intel.com/products/64590/Intel-Xeon-Processor-E5-...

rektide · on Nov 27, 2016

You have talked about everything but what the parent was mentioning- actual cache. As in, L1 and L2. Those vary sizably among the different price tiers, somewhat understably, for reasons related to this.

On recent IBM Power chips, there's a so called PowerCore option that turns off half the cores, and lets the remaining cores double their L2. On some workloads that's a net win. I also tend to think it's there for those people paying a pricey per-core or per-socket fee, where a modest 15% performance gain/core could be very rewarding in a way that scale-out/more-cores can't replicate, but that's in a different realm than anyone I know.

erichileman · on Nov 30, 2016

See the other comment above re: perf stat. Working on the event descriptors to see and confirm the l1/l2 cache hits/misses.

erichileman · on Nov 27, 2016

The binaries are from remi repo. We have a template we provision from. I used the same template on each virtual machine.

PHP 5.4.45 (cli) (built: Sep 19 2016 15:31:07) PHP 5.5.38 (cli) (built: Nov 9 2016 17:32:11) PHP 5.6.28 (cli) (built: Nov 9 2016 07:04:38)

The binaries are the same on each virtual machine. Are there build optimizations for E3/V5 vs E5/V2 that could make such a difference?

cthalupa · on Nov 27, 2016

>Are there build optimizations for E3/V5 vs E5/V2 that could make such a difference?

Potentially, yes.

Either way you're not comparing apples to apples using packages from a 3rd party repo built for a different system.

Newer processors have different instruction sets (AVX is a big one). You'd want to make sure you're not only compiling it on each different platform, but also using a new enough compiler to support the instruction sets.

erichileman · on Nov 27, 2016

Correct. We are going to build using the correct -march flags: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

erichileman · on Nov 28, 2016

>>gcc -march=native -Q --help=target

>>march=silvermont

gcc native thinks this e3 is silvermont, a low power SoC.

erichileman · on Nov 28, 2016

We built 5.6 with -O2 -march=broadwell -mno=avx (had to remove, probably pecl ext issue).

There was about a 15% performance gain. Nothing that would explain the large difference between E3 and E5.

erichileman · on Nov 27, 2016

Found this and will try it: https://github.com/centminmod/centminmod/commit/755dd9e87eac...