Colony Graphs: Visualizing the Cloud

What does the cloud appear like? As a buyer, you could have 1000’s of situations working a wide range of purposes. As a cloud supplier, you could have 1000’s of consumers, working every little thing possible.
A course of colony graph, or “ptree graph”, is a technique to visualize your reside software setting, based mostly on primary course of particulars. It illustrates the quantity, sorts, and exercise of your purposes, and will let you spot uncommon or downside areas.
The next sections present ptree graphs at growing scales, from just a few processes to a complete cloud datacenter.
1. Processes
Inspecting only a few processes to start with (click on any of those photographs for the total model):

Father or mother-child relationships are proven with arrows. The scale of every course of displays current CPU utilization: larger means busier. The colour identifies the kind of course of: system processes are proven in gentle blue. These particulars will be adjusted – the method measurement might present reminiscence footprint, for instance.
2. Zone
That is what a typical cloud computing node seems to be like (also referred to as a “zone” or “container”), on this case, an online server:

The grasp course of for the net server will be seen surrounded by its employee processes, all proven in purple. The employee processes are drawn bigger, since they’re busier on CPU doing work to answer net requests. Within the center is a grey oval representing the “init” means of the zone (the true buyer zone identify has been scrubbed right here). The total set of system processes that make up the zone may also be seen, with their relationship.
3. Server
Now scaling to indicate a complete bodily server, which is working 9 zones (plus one “international” zone):

Inexperienced is for language associated processes, corresponding to php, python, java, and so forth. Pink exhibits database processes, together with MySQL, memcached, Riak, and so forth. The inexperienced/purple zone is a Ruby/Apache server, and the highest left zone has each mysqld and memcached. The biggest pink course of on the high is a busy MySQL server.
Beforehand I might take a look at lists of processes utilizing ps or ptree to see the identical knowledge. However getting a fast sense of what is processes exist, and are busy, from pages of textual content output turns into unwieldy. Think about analyzing the identical knowledge on a rack of servers – in can grow to be a whole bunch of pages of textual content.
4. Rack
Visualizing all of the zones in a rack:

Extra zone sorts come out and will be recognized shortly. The chain of 5 inexperienced circles is a Perl server, with 5 busy perl processes. At this scale, this visualization is beginning to appear like a micro organism colony in a petri dish (which was inspiration for the identify: colony graphs).
5. Datacenter
Now for a datacenter, which consists of a fleet of racks. These represent an “availability zone”:

It is the primary time I’ve seen all of the processes in a complete datacenter in a single picture. This consists of over 300 servers and over 3500 zones. Gathering the method knowledge to generate this was simpler than it sounds: since that is an OS virtualized cloud, I solely wanted to login to the 300 bodily servers, and never the 3500 particular person zone situations, to seize all processes working.
This picture will be generated mechanically to search for anomalies and modifications within the cloud. I’ve made many discoveries thus far, with the graphs usually lovely and surprising.
Lifeless Zone
One of many discoveries will be seen in the course of the graph above: six massive zones that seem as concentric circles. Here is how they appear zoomed in:

My jaw dropped once I first noticed this. What’s occurred is that this zone is working a shell program by way of cron (system scheduler), that processes the results of getent. The getent course of is caught on an LDAP lookup that by no means completes, and so all its associated processes are additionally caught. Cron stored producing these mindlessly, till the zone had hit its course of restrict.
Luckily these had been previous take a look at zones that weren’t hurting anybody.
Implementation
These ptree graphs are based mostly on course of ID, dad or mum course of ID, course of identify, current % CPU, collected utilizing simply ps(1):
# ps -eo zone,ppid,pid,rss,pcpu,comm
This consists of a few further fields: zone identify (zone), for mapping any later found anomalies again to the origin zone, and resident set measurement (rss), for producing ptree graphs based mostly on reminiscence utilization as an alternative of CPU utilization, when desired.
I have been utilizing the neato program from graphviz to generate the photographs from this knowledge. It reads a graph description in DOT format, and I wrote a trivial shell/awk program to transform the ps(1) output into DOT: ps2gv-p.sh, which has the companion file colors.awk.
Here is an instance of their use, to look at processes on my macbook, which has graphviz already put in:
$ ps -eo ppid,pid,rss,pcpu,comm | awk '{ print "-", $0 }' > ps-macbook.txt $ ./ps2gv-p.sh ps-macbook.txt $ neato -Tpng -Nfontsize=12 -Elen=1.9 ps-macbook.gv -o ps-macbook.png
The ps(1) on OS X does not have the zone area, so I used awk so as to add a dummy area. In case you are elsewhere, use the sooner ps(1) command.
The ensuing picture is:

In the event you look intently, yow will discover the bash shell that is working the ps(1) command, like a mirrored image of the photographer.
Modify -Nfontsize and -Elen (edge size) as desired. You can too customise the colours.awk file, which maps course of names to colours. It makes use of grey if a mapping is not current. There may be additionally setting in ps2gv-p.sh, cpulimit, which adjusts the scaling of the node sizes.
To make use of this on the cloud, gather the ps(1) output from a number of servers and concatenate earlier than processing with ps2gv-p.sh.
Conclusion
Course of colony graphs is a straightforward visualization of course of dad or mum/little one relationships, which is a helpful technique to research environments at scale, as much as whole datacenters. These had been an experimental visualization created utilizing ps(1) and graphviz, and have confirmed helpful, discovering points that different observability instruments have neglected.
For extra colony graphs, see the main page.