you should be able to install PCP from binary packages made available by the PCP development team on:
ftp.pcp.io
and that threw up a red flag in my brain. Then I noticed that techblog.netflix.com doesn't redirect to https (and indeed can't serve https) (I use https://www.eff.org/HTTPS-EVERYWHERE and did not get content over https).
The directions _I_ saw for building from source looked pretty innocuous, but you might see a different set of directions if you're being MITMed. Observe an appropriate amount of caution.
His concerns seem plain to me. Unauthenticated channels for software distribution or software installation instructions are bad.
The techblog isn't using SSL, and the git pull url for PCP is using the git protocol which is also unauthenticated, rather than the authenticated https transport (ssh is only an option when user accounts make sense).
Someone's at a conference and follows the link over public wifi. They get the same page but with "here's how to get PCP: ftp evil.io or git clone git://git.evil.io/pcp" Even if the webpage were ssl-enabled so that an attacker can't rewrite the pcp.io links, an attacker or evil network operator could MITM git.pcp.io or ftp.pcp.io. (FTP?!)
Being in Ubuntu's repo doesn't make it safe if Ubuntu's maintainers have no (semi-)trustworthy way of getting the code.
It's enormously frustrating to me that the Windows platform is so far behind.
I've been researching for a rebuild of our ecommerce site with the idea of a modern microservice architecture. Team skillset and some other considerations dictate a .Net environment.
Netflix and other companies have created a really rich platform, between logging and monitoring technologies like this; message queueing; deployment; and so on. In the Windows environment there's precious little to compare to.
One bright spot is the news story also on HN today [1] about MS announcing Docker-related Hyper-V technology - but in the next version of Windows server.
It might be that to satisfy those .Net compatibility wishes, we just go to Mono and do everything else in Linux.
Want to try https://ruxit.com for monitoring your .NET ecommerce shop on Windows? It also monitors your host and real user interactions with your site.
So I haven't looked into all the metrics that they are getting off of PCP but from an initial glance it doesn't look like they are getting a lot more information then you get off net-snmp.
In this one case the windows platform isn't going to hold you back too much. On top of that anything the .net stack on windows gives you access to WMI which can give you really simple really deep data from your application.
Don't buy into too much hype on what some of the big guys are doing on monitoring, most of their challenges seem to be around the scale they are monitoring at not with the depth of metrics.
For the initial release, you are right that you'll find most of these metrics in net-snmp. It's the selection that's more interesting, which is guided by the USE method: identifying utilization, saturation, and error metrics for key resources. We're missing some metrics (so do all monitoring tools), and we'll be extending Vector to include them (as can other people, as it's open source).
One challenge is that there are some metrics that the kernel doesn't make available, so we need to either use tracers to fetch them, patch the kernel, or use custom kernel modules. In recent weeks, I've been using (or trying to use): ftrace, perf_events, SystemTap, eBPF, and LTTng. I've also used others, but not in the last few weeks. I'm really looking forward to getting these advanced tools into Vector.
I suspect that you could extend PCP (the monitoring component of this dashboard) to support Windows' performance counters. The profiling capabilities are certainly there.
PCP can export many hundreds of Windows-specific counters (which it accesses via the PDH APIs). The Windows build does need some love though, as the OP mentioned - its out of date and needs a regular contributor to bring it up to date with the latest PCP sources.
Then Vector should Just Work (tm) for Windows servers too.
> Netflix and other companies have created a really rich platform, between logging and monitoring technologies like this; message queueing; deployment; and so on. In the Windows environment there's precious little to compare to.
Have you looked at any of the following for the various gripes you mentioned? I came from a primarily Linux background until recently, but I know .net and Windows have good tooling that extends beyond my modest list below. Looking to Microsoft to provide every option is probably not the best way to go about it, not sure if that was mostly what you were looking at so far.
The web was built, and runs on, *nix/BSD. Windows is an outlier. Windows can't even get the slashes going the right way. There's a reason 80% of internet traffic does not use Windows.
I also don't think it's strictly true. For sure the underlying networking came from Unix. But as for the web itself, once we got to real dynamic content, it seems to me that Microsoft were the ones that got things moving.
While the Unix world was mired in the awful world of CGI, Microsoft gave us high-performance ISAPI, and then Cold Fusion (also on the Windows platform) and Microsoft's ASP made programming a little more sane. While the Unix world tried to deal with JSP (which IMHO wasn't a very good solution), the Microsoft platform seems to have been the innovators for several years, until Ruby on Rails and then node.js and stuff started coming out.
Today, the Apache server powers far more sites than any other, it's true. But IIS shares the 2nd-place spot [1]. When you say "There's a reason 80% of internet traffic does not use Windows", that's pretty much true for the servers, but far off the mark when counting clients. And the reason for that is that Microsoft's strength hasn't historically been radical advances, but in figuring out ways to take the bleeding edge tech that doesn't really work quite right yet, and packaging it into commodity software that may not be as sexy as envisioned by those with the original ideas, but actually useful to the average guy.
Your examples presume there was something better than CGI at the time and the other products are better than something else. For one, I wouldn't be caught dead using any of the products you mentioned.
You claim IIS by using a ZDNet article from two years ago but the reality is IIS is number three behind Nginx and Apache. Still, being a distant #2 is nothing to brag about.
Now you're trying to claim clients are what powers the web but that's not the topic. What an amateur uses does not define what the professional uses. And to claim not wanting to be on the bleeding edge of things is no excuse for falling behind. Firefox and Chrome knocked IE off its perch years ago by being on the bleeding edge.
Likely more than that seeing as how the majority of the load balancers and many of the routers / switches out there run some form of BSD (netscalar) or Linux (Arista), etc.
I just tried installing this on a test VPS. PCP went well, but whenever I tried to input a "hostname" in Vector's UI to get stats, it just kept telling me it couldn't connect and to check hostname, regardless of what I put in the box. PCP was available and of course ports open and whatnot. I'm not sure what the problem was, but that was a bit frustrating. It looks like a slick product otherwise, especially for a first version! Thanks for releasing projects like this! I'll be trying again in the future, for sure.
After using Stackdriver for the last 18 months, I would never go back to rolling my own monitoring infra if I could avoid it. I had nagios, cacti, munin, graphite all running and had two ops guys pretty much 80% of their time managing it.
Stackdriver with pagerduty and I have 250,000 custom metrics being published and hundreds and hundreds of graphs on dozens of dashboards.
Although, I am looking at SignalFX to give even better version of this, but I manage nearly 1,000 machines with only a staff of four.
We were concerned when.Google bought them and we had many meetings with them about this, I do not believe that will happen, but if that is still an issue for you, use signalFX.
Am I wrong in saying this is a web interface that wraps PCP? So you can't really compare it to inspeqtor or collectd since those actually do the metrics collection.
What does this have that CloudWatch enhanced metrics doesn't? From the screenshots, the metrics look pretty similar. Not a slight at all against this project (it looks awesome), I'm just curious if your infrastructure is already AWS based what would cause you to choose a non-CloudWatch option.
CloudWatch charges you per instance if you want 1 minute metrics instead of the standard (free) 5 minute metrics. Any tool that collects the data for you gets you out of that $3.50/instance/month detailed monitoring charge [1].
It's one second metrics, and has been designed to allow us to add other metric sources and visualizations (using, say, ftrace, perf_events, SystemTap, etc), although those are not in the initial release.
It looks like it has some metrics that aren't present in cloudwatch without a lot of scripting, such as Memory utilization. I'm new to cloud watch though, so someone please correct me if that is wrong.
Honestly? Just write your own SVG code. If you inspect the elements the output data is super simple and easy to understand. NVD3 just wraps D3.js which just wraps utilities that output relatively basic data. Well, d3js is a data-binding system that's way too god damned complicated if all you want to do is make some simple charts which is how it's used 99% of the time.
I've spent more time trying to manipulate chart libraries into doing almost the same thing but just different enough to cause pain and suffering. Output your own path data and it's a million times easier.
Wow this looks great, thanks for releasing it! Nice to see even 'simple' stuff like this that can help people who aren't running at Netflix scale is still released.
PCP is much finer-grained than Zabbix in terms of the metrics it makes available (esp. from the Linux kernel); not sure on Zabbix costs but PCP is quite light on all resources (mem, cpu, net) and very robust.
I've worked on production systems where everything else was failing (hardware, kernel, applications) but PCP kept chugging along, recording and telling the sad story to anyone that would listen.
Goal is a bit different. Vector doesn't collect and persist metrics. We needed something that had as little overhead as possible so it could be deployed to all our hosts and simplify the process of analyzing those metrics.
Not really. htop doesn't visualize historical trends from numerous sources at the same time. Vector also allows us to interface with other metric sources, beyond /proc.
doesn't htop require you to go onto the box? We haven't released our custom pcp modules yet which allow more complex visualizations such as flamegraph generation from perf event sampling.
Yes, exactly - htop is a curses application that shows instantaneous samples of a few system metrics, but it mostly lists processes - this is nothing like Vector at all...
I think the reason is they have a bunch of ephemeral hosts which they don't want to put in a central web interface or collect statistics on.
They are only interested in getting some insight as to what went wrong with this host so they can fix it in the future.
pcp is happy to collect & persist metrics; vector just doesn't happen to render them. pcp offers other ways to interact with live or archive-saved data, including other webapps.
Neither does collectd, it just gathers and transmits the data to an endpoint you specify. It's also quite lightweight (once you address the memory leaks, at least).
See http://pcp.io/docs/installation.html#deb or direct link at ftp://ftp.pcp.io/projects/pcp/download/deb/ - but I only see packages for i386 arch, so you may have to compile from source if on amd64 or other architecture.
The directions _I_ saw for building from source looked pretty innocuous, but you might see a different set of directions if you're being MITMed. Observe an appropriate amount of caution.