What is Ganglia:
- It is a highly scalable monitoring system for
high performance computing.
- It can monitor a system or clusters of systems
or grid of clusters.
- It uses the XML technology for data
representation.
- It uses the RRDtool for the data storage and
visualization..
- The implementation of ganglia is robust, has
been ported to an extensive set of operating systems and processor
architectures, and is currently in use on thousands of clusters around the
world.
- It has been used to link clusters across
university campuses and around the world and can scale to handle clusters with
2000 nodes.
In a simple manner, “Ganglia is a real time cluster
monitoring tool that collects information from each computers in the cluster
and provides and interactive way to view the performance of computers and
cluster a whole.”
Like other monitoring tool ganglia only provide a way to
view but not control the performance of the cluster.
Architecture of Ganglia:
The
Ganglia system consists of, two daemons gmond and gmetad, a PHP based web
frontend, and two other utilities gmetric and gstat.
What is Gmond:
Gmond runs on every node of
the cluster and gather the information like CPU, memory, network, disk, swap etc.
What is Gmetad:
Gmetad runs on head node. It
gathers data from all other nodes and stores them in round robin database. It
can poll multiple clusters and aggregate the metrics. It is also used by the
web frontend in generating the UI.
What is PHP Web Frontend:
The Ganglia web front-end
provides a view of the gathered information via real-time dynamic web pages.
Most importantly, it displays Ganglia data in a meaningful way for system
administrators and computer users. It should be installed on the same machine
where gmetad is installed.
Ganglia Installation:
Installation of ganglia on master node:
apt-get install ganglia-monitor rrdtool gmetad ganglia-webfrontend
The above command will install the gmond, gmetad and ganglia
web UI on the node. The ganglia web frontend package also installs the required
apache server and php modules. In order to deploy and run Ganglia in Apache
server, it is required to copy the apache.conf file from
/etc/ganglia-webfrontend/apache.conf to /etc/apache2/sites-enabled/:
sudo cp /etc/ganglia-webfrontend/apache.conf
/etc/apache2/sites-enabled/ganglia.conf
The /etc/ganglia-webfrontend/apache.conf contains a simple
alias for /ganglia towards /usr/share/ganglia-webfrontend.
Installation of ganglia on other nodes:
apt-get install ganglia-monitor
The above command will install the ganglia monitor.
Gmond configuration on master node:
There are two type of configuration ganglia supports, one is multicast and other is unicast. Here I
am taking an example of a cluster to configure the ganglia in unicast mode. I have a cluster
named “Test” with the 192.168.1.1 as
a master node and 192.168.1.2 and 192.168.1.3 as slave nodes.
globals {
daemonize =
yes
setuid =
yes
user = ganglia
debug_level
= 0
max_udp_msg_len = 1472
mute =
no
deaf =
no
allow_extra_data = yes
host_dmax =
0 /*secs */
cleanup_threshold = 300 /*secs */
gexec =
no
send_metadata_interval = 30
}
cluster {
name =
"Test"
owner =
"clusterOwner"
latlong =
"unspecified"
url =
"unspecified"
}
udp_send_channel {
host =
192.168.1.1
port = 8649
ttl = 1
}
udp_recv_channel {
port = 8649
}
tcp_accept_channel {
port = 8649
}
Gmond configuration on other nodes:
globals {
daemonize =
yes
setuid =
yes
user = ganglia
debug_level
= 0
max_udp_msg_len = 1472
mute =
no
deaf =
no
allow_extra_data = yes
host_dmax =
0 /*secs */
cleanup_threshold = 300 /*secs */
gexec =
no
send_metadata_interval = 30
}
cluster {
name =
"Test"
owner =
"clusterOwner"
latlong =
"unspecified"
url =
"unspecified"
}
udp_send_channel {
#
mcast_join = 239.2.11.71
host =
192.168.1.1
port = 8649
ttl = 1
}
tcp_accept_channel {
port = 8649
}
The gmond configuration defines the following properties.
global section :
- daemonize
: It is a Boolean attribute. When true, gmond will daemonize. When false, gmond
will run in the foreground.
- setuid
: The setuid attribute is a boolean. When true, gmond will set its effective
UID to the uid of the user specified by the user attribute. When false, gmond
will not change its effective user.
- debug_level
: The debug_level is an integer value. When set to zero (0), gmond will run
normally. A debug_level greater than zero will result in gmond running in the
foreground and outputting debugging information. The higher the debug_level the
more verbose the output.
- mute :
The mute attribute is a boolean. When true, gmond will not send data regardless
of any other configuration directives.
- deaf :
The deaf attribute is a boolean. When true, gmond will not receive data
regardless of any other configuration directives.
- allow_extra_data
: The allow_extra_data attribute is a
boolean. When false, gmond will not send out the EXTRA_ELEMENT and EXTRA_DATA
parts of the XML. This might be useful if you are using your own frontend to
the metric data and will like to save some bandwith.
- host_dmax: The host_dmax value is an integer with units in seconds. When set to zero
(0), gmond will never delete a host from its list even when a remote host has
stopped reporting. If host_dmax is set to a positive number then gmond will
flush a host after it has not heard from it for host_dmax seconds.
- cleanup_threshold
: The cleanup_threshold is the minimum amount of time before gmond will cleanup
any hosts or metrics where tn > dmax a.k.a. expired data.
- gexec
: The gexec boolean allows you to specify whether gmond will announce the hosts
availability to run gexec jobs. Note: this requires that gexecd is running on
the host and the proper keys have been installed.
- send_metadata_interval
: The send_metadata_interval establishes
an interval in which gmond will send or resend the metadata packets that
describe each enabled metric. This directive by default is set to 0 which means
that gmond will only send the metadata packets at startup and upon request from
other gmond nodes running remotely. If a new machine running gmond is added to
a cluster, it needs to announce itself and inform all other nodes of the
metrics that it currently supports. In multicast mode, this isn't a problem
because any node can request the metadata of all other nodes in the cluster.
However in unicast mode, a resend interval must be established. The interval
value is the minimum number of seconds between resends.
Cluster section :
- name : The
name attributes specifies the name of the cluster of machines.
- owner :
The owner tag specifies the administrators of the cluster. The pair name/owner
should be unique to all clusters in the world.
- latlong :
The latlong attribute is the latitude and longitude GPS coordinates of this
cluster on earth.
Specified to 1 mile accuracy with two decimal places per axis
in decimal.
- url : The
url for more information on the cluster. Intended to give purpose, owner,
administration, and account details for this cluster.
Udp_send_channel :
You can define as many udp_send_channel sections as you like
within the limitations of memory and file descriptors. If gmond is configured
as mute this section will be
ignored.
The udp_send_channel has a total of five attributes: mcast_join, mcast_if, host, port and ttl.
- mcast _join
and mcast_if : The mcast_join and mcast_if attributes are optional. When
specified gmond will create the UDP socket and join the mcast_join multicast
group and send data out the interface specified by mcast_if.
- ttl :
The ttl is time to live field for send data.
- host and
port : If only a host and port are specified then gmond will send unicast
UDP messages to the hosts specified. You could specify multiple unicast hosts for
redundancy as gmond will send UDP messages to all UDP channels.
Udp_recv_channel :
You can specify as many udp_recv_channel sections as you like
within the limits of memory and file descriptors. If gmond is configured deaf this attribute will be ignored.
The udp_recv_channel section has
following attributes: mcast_join, bind, port,
mcast_if, family.
- mcast_join
and mcast_if : The mcast_join and mcast_if should only be used if you want
to have this UDP channel receive multicast packets the multicast group mcast_join
on interface mcast_if. If you do not specify multicast attributes then gmond
will simply create a UDP server on the specified port.
- port :
The port is for creating a udp server on port.
- bind : You
can use the bind attribute to bind to a particular local address.
Tcp_accept_channel :
You can specify as many tcp_accept_channel
sections as you like within the limitations of memory and file descriptors. If gmond
is configured to be mute, then these sections are ignored.
- bind :
The bind address is optional and allows you to specify which local address gmond
will bind to for this channel.
- port :
The port is an integer than specifies which port to answer requests for data.
Gmetad
Configuration:
data_source
"Test" 15 192.168.1.1:8649
The gmetad configuration defines the data source configuration with cluster name, pooling interval and the gmond running ip and port. In data source configuration“Test” is the cluster name, 15 is the gmetad polling interval for
metrics and “192.168.1.1:8649” is
the gmond ip and port of head node.
Starting Ganglia
:
No old process of gmetad and gmond should be running on machines.
Starting gmetad : Run the below command on head node of
cluster.
sudo service
gmetad start
Starting gmond : Run the below command on all the nodes
of cluster.
sudo service
ganglia-monitor start
Starting
Apache Server :
Stop old running instance of apache2 server. Then run the below command
to start apache server.
sudo service
apache2 start