Hacker News new | past | comments | ask | show | jobs | submit | erichileman's comments login

Why not run something like 8 x L40's for $4,750 a month from a bare metal provider like latitude.sh? This seems far more cost efficient and flexible.

I think you're reading that page wrong; but their pricing page is so confusing that its giving me red flags already.

It says that would cost $6.51/hr and $4752/yr: I think you pay both of those things. I think the first number is the hourly cost, and the second number is the annual commitment. So its $56,246/year if you're running 24x7 + $4,756 = $61,002/year total.


I am sorry if you are finding our pricing page confusing. We had a recent update to that page and we had a glitch for prices for the GPUs. Anyway, the correct price for a 8 x L40S is $4752/mo when paid a year upfront.

So, when the page now says "$6.51/hr // $4,752/mo", that's really just presenting the same actual cost you'd have to pay, across two different time metrics? As in, you pay $6.51/hr, or you pay $4,752/mo, same thing, but not both?

I think you need to consider: Your competitors (e.g. AWS) generally structure annual commitments as an upfront (or monthly) payment + a reduced cost per hour/minute for the resource being reserved; that's the lens through which most people viewing this page will be thinking. If that is not how you structure annual commitments, then that should be made very clear.

If my first paragraph is correct (and again, the page is still confusing, its not obvious to me that this interpretation is correct): You should list one price, and give a dropdown at the top to change the computation of that price across whatever timeframe the user wants ($/hr, $/day, $/month, etc). That would also free up some space in-line to put a chip that says something like "-15% discount!".


I think a single L40 costs $1320 a month on latitude. L40 is also an older GPU.

Latitude.sh provides the L40S, not L40

The on demand cost is $9.30/hr and $4,752/mo when paid a year upfront


it's hopper vs grace hopper and afaik doesn't require custom boards, but i haven't really looked in to it too much as both are far outside my personal price range.

> "Dedicated GPU clusters for accelerated computing"

- so you have to add the price of AMD/Intel bare metal servers.

- the price of "Networking" PER TB

- and the "Additional services pricing"

https://www.latitude.sh/pricing


That's all included, you don't have to add the bare metal servers, networking or anything else.

thanks, the webpage is not clear ...

"Bandwidth Pricing is based on the country your server is located. Packages are billed monthly and sold in increments of 10 TB."


It looks like their on-demand price would cost $81,468 per year

So even at the reserved price for a year (365 * 24 * $6.51) you're nowhere near $4,750 per year, it's closer to $60k


Unfortunately <not supported>

Performance counter stats for 'php56 index.php':

        394.588620      task-clock (msec)         #    0.983 CPUs utilized
               226      context-switches          #    0.573 K/sec
                 2      cpu-migrations            #    0.005 K/sec
            17,447      page-faults               #    0.044 M/sec
   <not supported>      cycles
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   <not supported>      instructions
   <not supported>      branches
   <not supported>      branch-misses
   <not supported>      L1-dcache-loads
   <not supported>      L1-dcache-load-misses
   <not supported>      LLC-loads
   <not supported>      LLC-load-misses

       0.401580145 seconds time elapsed
Working on finding the event descriptors...


Memory seems likely.

On the E5 php 5.6 in top we see sys at 15%.

On the E3 php 5.6 in top we see sys at 7%.

On the E5 php 7 in top we see sys at 13%.

We are exploring memory perf more now.


We have not compared on bare metal.


We thought that as well. The E5 has 4 memory channels max bandwidth of 51.2 GB/s. The E3 has 2 memory channels max bandwidth of 34.1 GB/s.

But we see a dramatic difference in single core tests. Our virtual machines have 2 cores assigned and there's also a dramatic difference. I wouldn't think that 1-2 cores would saturate 2 memory channels nor 34.1 GB/s bandwidth. If we were testing all 8 cores on the E3 vs E5 8 core virtual machine, yeah maybe, but 1-2 cores?

The L3 cache is much larger on the E5 at 20MB Smartcache vs the E3 at 8MB Smartcache. That seems to be the more likely suspect but I don't know enough about how the cpu cache is used in relation to php to say for sure. Hopefully, someone else does :)

Ref: http://ark.intel.com/products/88176/Intel-Xeon-Processor-E3-... http://ark.intel.com/products/64590/Intel-Xeon-Processor-E5-...


You have talked about everything but what the parent was mentioning- actual cache. As in, L1 and L2. Those vary sizably among the different price tiers, somewhat understably, for reasons related to this.

On recent IBM Power chips, there's a so called PowerCore option that turns off half the cores, and lets the remaining cores double their L2. On some workloads that's a net win. I also tend to think it's there for those people paying a pricey per-core or per-socket fee, where a modest 15% performance gain/core could be very rewarding in a way that scale-out/more-cores can't replicate, but that's in a different realm than anyone I know.


See the other comment above re: perf stat. Working on the event descriptors to see and confirm the l1/l2 cache hits/misses.


The binaries are from remi repo. We have a template we provision from. I used the same template on each virtual machine.

PHP 5.4.45 (cli) (built: Sep 19 2016 15:31:07) PHP 5.5.38 (cli) (built: Nov 9 2016 17:32:11) PHP 5.6.28 (cli) (built: Nov 9 2016 07:04:38)

The binaries are the same on each virtual machine. Are there build optimizations for E3/V5 vs E5/V2 that could make such a difference?


>Are there build optimizations for E3/V5 vs E5/V2 that could make such a difference?

Potentially, yes.

Either way you're not comparing apples to apples using packages from a 3rd party repo built for a different system.

Newer processors have different instruction sets (AVX is a big one). You'd want to make sure you're not only compiling it on each different platform, but also using a new enough compiler to support the instruction sets.


Correct. We are going to build using the correct -march flags: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html


>>gcc -march=native -Q --help=target

>>march=silvermont

gcc native thinks this e3 is silvermont, a low power SoC.


We built 5.6 with -O2 -march=broadwell -mno=avx (had to remove, probably pecl ext issue).

There was about a 15% performance gain. Nothing that would explain the large difference between E3 and E5.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: