Monit - ( Autohealer + Monitor )

While at work , one of the important part of job responsibility is the production support where we have to make sure that the application is up and running all the time and there are no alerts. What if we get something which will keep an eye on the application on our behalf and heal the alert if there is any . In short monit will take all the headache and we will have peaceful life. Let's see how it works .
Monit is a small Open Source utility for managing and monitoring Unix systems. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.
CentOS
Monit is available in the EPEL repository.
sudo yum update && sudo yum install epel-release
sudo yum update && sudo yum install monit
To enable and start the daemon in CentOS 7
sudo systemctl enable monit && sudo systemctl start monit
To enable and start the daemon in CentOS 6
sudo chkconfig monit on && sudo service monit start
Debian / Ubuntu
Debian and Ubuntu automatically start and enable Monit after installation.
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install monit
Monitrc - Configuration file
Below is the monitrc configuration -
daemon - monit cycle to re run checks every 120s
include - contains files which needs to be monitored or auto heal
eventqueue - If no mail server is available, Monit can queue events in the local file-system for retry until the mail server recovers.
/etc/monitrc
set daemon 120
set logfile syslog
set statefile /var/lib/monit/state
set idfile /var/lib/monit/id
set eventqueue
basedir /var/lib/monit/events
slots 100
include /etc/monit.d/*
Example monitors
Monitor root disk
# cat /etc/monit.d/root_disk
check filesystem root_disk with path /dev/sda2
if space usage > 80% for 5 times within 15 cycles then alert
Explanation - it will monitor /dev/sda2 partition and alert if usage is more than 80%
Monitor ssh
# cat /etc/monit.d/sshd
check process sshd with pidfile /var/run/sshd.pid
start program = "/usr/sbin/service sshd start"
stop program = "/usr/sbin/service sshd stop"
Explanation - it will monitor ssh process and restart it automatically when it is not able to find sshd.pid file
Monitor td-agent and send alert on slack
# cat /etc/monit.d/td-agent
check process td-agent with pidfile /var/run/td-agent/td-agent.pid
start program = "/usr/sbin/service td-agent start"
stop program = "/usr/sbin/service td-agent stop"
if 2 restarts within 3 cycles then exec "/usr/local/bin/slack.sh"
Explanation - it will monitor td-agent and try to restart it 3 times only after that it will alert to slack
Slack Config
# cat /usr/local/bin/slack.sh
#!/bin/bash
URL=$(cat /opt/slack-url)
COLOR=${MONIT_COLOR:-$([[ $MONIT_EVENT == *"succeeded"* ]] && echo good || echo danger)}
TEXT=$(echo -e "$MONIT_SERVICE $MONIT_EVENT: $MONIT_DESCRIPTION" | python3 -c "import json,sys;print(json.dumps(sys.stdin.read()))")
MONIT_HOST=`hostname`
PAYLOAD="{
\"attachments\": [
{
\"text\": $TEXT,
\"color\": \"$COLOR\",
\"mrkdwn_in\": [\"text\"],
\"fields\": [
{ \"title\": \"Date\", \"value\": \"$MONIT_DATE\", \"short\": true },
{ \"title\": \"Host\", \"value\": \"$MONIT_HOST\", \"short\": true }
]
}
]
}"
Slack WebHook
cat /opt/slack-url
https://hooks.slack.com/services/*******/******/**********************
Check Status
# monit status

Remote Hosts
Perhaps you are not a DevOps at all, you are a tester who works with many client sites on different hosts. Wouldn’t it be nice to proactively respond to site outages even before a client calls? It is! You can configure Monit to check all your client sites statuses and alert you immediately if they are down
check host server with address www.foo.com
if failed port 3000 protocol http with timeout 60 seconds then alert
Hope you liked this article and find it useful. Happy Reading !!