Services Dying? Keep Them Alive With a Perl Script

StayinAlive - Keeping Unix Services Alive

Have you ever felt frustrated with your Unix services that keep shutting down in production unexpectedly without notifying you?  Would you like to have a script that might help restart those processes and notify you of the problem?

This little helpful Perl script might help.  Or it might help you see what happens when a long time Unix system admin learns enough about perl programming to be dangerous.  Even so, this script works well for me.

I went through these processes on my VPS killing the processes associated with each service and waited for a few seconds.  My websites went down and back up fired up by cron within a minute.  My email server and spam checkers and various other services went down and back up as expected.

However, if you blow away sshd and complain to me that it did not restart successfully, I promise to laugh at you heartily and ask why you did that.  So, think before you blow something away, and above all, don’t do anything willy-nilly on a business sensitive or enterprise sensitive production system.  The grim reaper loves people who do that.

Scripts like this are mostly for individuals and companies in start-up mode who do not have deep pockets to buy expensive enterprise level commercial software or paying for expensive consultants and an entire IT staff to keep services running.

This script can be a pain because it is a kluge or a simple solution.  In a commercial application, you would have separate configuration files–for instance, a set of example files for different variants of Unix or Linux to show where things are usually located, and you might copy the one for your operating system into place.  Then you would have another configuration file to indicate which of these services you wanted to check and whether you wanted them to stay on or off.

And these files would be simple.  Instead of having a Perl-like structure, you might just have a tab, space, or comma delimited file where comments start with a pound sign.  For instance, the file you might modify most might only contain the service name and “on” or “off”.

# Restarter Configuration - Identify which services to keep alive
#
# Indicate which services to turn on by setting them on or off
#
sshd     on
fail2ban off
exim     off
sendmail off
postfix  on

In a separate file which you almost never touch would be specifications that almost never change.  For example,

# Restarter Configuration - Description of Possible Services to run
#
# For each service, give the following separated by colons:
#   filename found in /etc/init.d
#   path to pid file (the file containing the process id of the master process)
#   the comm field for that master process when it is running.
#       use '> ps -elf' to look for the process id
#       then use '> ps -p 1234 -o comm=' for example to see the comm field.
#
sshd:/var/run/sshd.id:sshd
fail2ban:/var/run/fail2ban/fail2ban.pid:fail2ban-server

Currently this example script is unsophisticated, and this means configuring it can be more of a pain.  But it keeps things simple or quick and dirty for a Unix systems administrator.

There are pros and cons to bringing up a startup system into production with kluges like this.   A kluge can be absolute poison to enterprise level production servers.  If you are managing a system where lives or people’s lif savings depend on a service staying available and performing perfectly, you may want to have some rather stringent change control or change management processes with strict rules requiring that any change go through a series of testing and approvals.  You may still use a kluge like this program early on, but after having *other* people designated by management for testing it.  And then only those permitted to put it into production would be allowed to do so with the appropriate approvals.  There will likely be processes identified for applying a change, reversing that change, and recovery of anything lost in the process.

With DevOps practices becoming popular these days, you may find that instead of applying changes to a whole system, you will end up applying them to a Docker image.  An existing docker image will be downloaded.  A container will be built upon it.  changes will be applied such as adding a script like this or a more sophisticated program.  The set up will be unit tested and saved back into a new image.  Or some deployment scripts will be set up to generate the changes–changes that could be applied perhaps to other docker images.  Then the resulting scripts or the resulting new image might be uploaded to a docker hub or repository to be pulled into a testing environment.  When that integration testing  succeeds, the image may be marked OK for production and installed in parallel to the old production system.  In many cases, this will only result in a docker container that is used as a micro-service and not a full blown server containing everything.  You may have one for a database–say mongo or mysql or oracle or postgres.  You may have several actually for nginx or apache, and you may have another nginx used as a load balancer.  And you probably would not want your email services on those systems.

So, you may not have a need to have a more sophisticated script watching over a wide variety of services.  You may prefer to have them split into separate containers.

Daemon Process Restarter

It is not a cure-all.  It won’t diagnose, find, and fix the root cause of the problem.  You will still need to follow up and find out whether it’s a resource shortfall, a configuration problem, or a bug in a program.

But, it might help you

  • keep your email server receiving email
  • keep your marketing websites visible to your user community, and
  • notice and fix problem before your users do.

I call these scripts, “Little Helpful Scripts” because they are simple.  They don’t warrant a tremendous amount of architectural design or decomposition into modules where configuration is broken out into separate files.

These scripts should be easy to read, understand, and maintain.  However, if you are using them in production,  would advise keeping them in a user-contributed area such as /usr/local/bin and away from any area that might be replaced when upgrading the operating system.

Secondly, whenever the script is modified, it is always best to make a copy of it first so you can always have your last working version of the script available.  It is also best to make your modifications away from production and of course, if your company has strict change control policies, always follow them.  The purpose of this script is to keep services running and keep users happy.  It is not to get system administrators fired or to make the user community and management angry.

Another important thing to remember is that this script can hide a problem so it never gets fixed, and that can be a serious problem.  Monitor your log files (usually found in /var/log on Linux systems).  Or better yet, use Nagios or Splunk to help you monitor your systems and their log files.

Planned Service Downtime

Have you ever shut down a service to work on it only to find it suddenly come back up?  One thing nice about this script is that this most likely will not happen.

Why?  When you start and stop services using the “service” command or by using any of the scripts found in /etc/init.d or /etc/rc2.d and such, those scripts almost always create a special “pid” file under /var/run.  When your service crashes on its own, the “pid” file is still out there, but when you shut down the service properly, the “pid” file is almost always removed.  So, if the “pid” file is there, this program will restart the service.  If the “pid” file is gone, it will not restart the service.

Service Restarter Installation

  • Copy and install this file in a directory of your choosing–say
    /usr/log/src/restarter/restarter.pl
  • Customize it by editing the “services” variable to add or remove servers.  These should be servers found in /etc/init.d.
  • Look in /var/run to see where the “pid” file is for the process you want to watch.
  • Run “crontab -e” as root to edit your crontab file and enter something like this:
* * * * * (cd /usr/local/src/restarter; /usr/bin/perl restarter.pl) &2>1 >> /dev/null

Test It with your Services!

  • First, test it on a machine where you are at liberty to play with it a little.
  • Test the script exactly as you would be using it in production.
  • Test each of the processes out.
  • Look at the process id in the appropriate /var/run files.
  • Kill that process.
  • Try killing that process and all the subprocesses.
  • Then try killing the process and only that process.
  • Run “ps” to see if the process is dead.  If so, wait a minute and see if the process automatically comes back to life.

Then go through whatever change control processes are required to implement it in production, and do whatever tests you have specified in your change control process to make sure you have a hearty installation.

Enjoy!

The Services Restarter Program

#!/usr/bin/perl -w
###
 # Linux Daemon Process Restarter
 # @version 1.0
 # @author Daniel J. Dick http://danieljdick.com
 #
 # INSTALLATION
 #
 # Customize this program by editing the "services" table below
 # The startup scripts for your services should be found in /etc/init.d.
 # The "pid" files for your services should be found in /var/run.
 #
 # Add a line in crontab by running "crontab -e" as root, for example,
 #
 # * * * * * /usr/bin/restarter.pl &2>1 >> /dev/null
 #
#/

use Sys::Syslog qw(:standard :macros);
our $start_command;

$service_dir = '/etc/init.d';

%services = (
 'postfix' => {
 'pidfile' => '/var/spool/postfix/pid/master.pid',
 'starter' => 'postfix',
 'return' => 'master'
 },
 'dovecot' => {
 'pidfile' => '/var/run/dovecot/master.pid',
 'starter' => 'dovecot',
 'return' => 'dovecot'
 },
 'mailscanner' => {
 'pidfile' => '/var/run/MailScanner.pid',
 'starter' => 'mailscanner',
 'return' => 'MailScanner: ma'
 },
 'spamassassin' => {
 'pidfile' => '/var/run/spamd.pid',
 'starter' => 'spamassassin',
 'return' => '/usr/sbin/spamd'
 },
 'spamass-milter' => {
 'pidfile' => '/var/run/spamass/spamass.pid',
 'starter' => 'spamass-milter',
 'return' => 'spamass-milter'
 },
 'clamav-daemon' => {
 'pidfile' => '/var/run/clamav/clam.pid',
 'starter' => 'clamav-daemon',
 'return' => 'clamav-daemon'
 },
 'clamav-fresh' => {
 'pidfile' => '/var/run/clamav/freshclam.pid',
 'starter' => 'clamav-freshclam',
 'return' => 'freshclam'
 },
 'mysqld' => {
 'pidfile' => '/var/run/mysqld/mysqld.pid',
 'starter' => '/etc/init.d/mysqld',
 'return' => 'mysqld'
 },
 'nginx' => {
 'pidfile' => '/var/run/nginx.pid',
 'starter' => 'nginx',
 'return' => 'nginx'
 },
 'php5-fpm' => {
 'pidfile' => '/var/run/php5-fpm.pid',
 'starter' => 'php5-fpm',
 'return' => 'php5-fpm'
 },
 'sshd' => {
 'pidfile' => '/var/run/sshd.pid',
 'starter' => 'sshd',
 'return' => 'sshd'
 },
 'fail2ban' => {
 'pidfile' => '/var/run/fail2ban/fail2ban.pid',
 'starter' => 'fail2ban',
 'return' => 'fail2ban-server'
 },
);

################################################
openlog("Restarter", "pid",LOG_LOCAL0);

for $service (keys %services) {
 #
 # Get information from the %services list above
 #
 $pidfile = "$services{$service}{pidfile}";
 $start_command = "${service_dir}/$services{$service}{starter}";
 $return = "$services{$service}{return}";
 #
 # if the pidfile exists, get the pid and "comm" information from ps on that pid
 #
 if (-e $pidfile) {
 if (open (IN, $pidfile)) {
 my $pid = `cat $pidfile`; chomp $pid;
 my $ret = `ps -p $pid -o comm=`; chomp $ret;
 #
 # If the return value after "comm" does not match what is expected
 # Restart that service
 #
 if ($ret ne $return) {
 syslog(LOG_ERR, "Running $service($ret) is not standard($pid) - restarting");
 &restart_proc;
 }
 close(IN);
 }
 }
};
exit(0);


sub restart_proc() {
 syslog (LOG_INFO, "Restarting Service: $start_command");
 my $msg = `$start_command restart`;
 syslog (LOG_INFO, "Service $service returned: $msg");
}


Leave a Reply

Your email address will not be published. Required fields are marked *