Notice: This material is excerpted from Running A Perfect Internet Site with Linux, ISBN: 0-7897-0514-1. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.
Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp .com.
Now that you've set your system up the way you want it, you've crossed into the next phase of being a system administrator. Instead of focusing on setting everything up, you'll now focus on maintaining and improving your site (improving is covered in Part 5, "Upgrading and Adding to Your Site"). After all, an Internet site is somewhat like a car or a house-you need to constantly attend to it, rather than wait for things to get so out of hand that major repairs are necessary.
In this chapter, you learn how to manage:
The majority of what happens on most Internet sites is done by users. Therefore, the majority of what you'll need to attend to management-wise deals with users. This is especially true if you run a site that has a user base that changes on a regular basis (e.g. a commercial site with people signing on and then leaving, or an educational site with people coming in and out each semester).
Part of being a system administrator is dealing with people. These people are the others responsible for your site (if you don't run it alone), occasionally the administrators of other sites, and the users who utilize your site. Users can be the most difficult of the three to deal with because most of your communications with them involve complaints and requests. You may want to save any compliments or thank-you's that come along for your own morale!
There are a few things to keep in mind when dealing with user calls or requests.
Be professional. Being a site administrator may be a labor of love for you, but it's also a job. If you find you dislike a user, or find one difficult to deal with, sometimes it's best to just smile and be friendly and efficient. The sooner the problem is solved, the sooner you can go on to other things.
Be patient. You will run into users who are technophobes, unsure of their abilities, or simply computer illiterate. Often with these kinds of users you have to drag the information you need out of them in order to solve their problem. You may find that a simple e-mail form that includes some of the basic questions you need answers to (e.g. what platform they're running on, what software the problem occurs with, error messages they see). After all, most of the complaints you get are through e-mail or local newsgroup postings. If you tell the user you're sending the form while you're on the phone with them, do it as you're talking to them so you won't forget.
Be as prompt as possible. There is little more frustrating to a user with a serious problem than having to wait weeks or months for a solution. If you're swamped with system problems, try to organize yourself so you don't lose individual people's problems under piles of paper perhaps setting aside a special e-mail box, and/or a special In box.
Be firm. Many system administrators work themselves to death trying to make their users happy, which includes fulfilling special requests. Remember, you're only human, and there are times the line needs to be drawn between a legitimate request and a request that goes against an important and necessary policy. Keep a copy of your Acceptable Use agreement to refer to.
Be authoritative when necessary. You will run into the occasional problem user. This person is someone who either is a problem on the Internet itself (e.g. spamming newsgroups, stalking, and so on) or is a problem on your site (e.g. is a resource hog or uses your system for illegal purposes). Be sure your Acceptable Use policy gives you the power to deal with such people in clear terms.
Spamming newsgroups means sending same post to many unrelated groups. For example, sending out an advertisement for a computer you're selling to every single newsgroup in the comp hierarchy.
For some fiction involving the exploits of a ruthless and frightening system operator, see the BOFH articles by Simon Travaglia by FTP at sunsite.unc.edu, in /pub/docs/humor/bastard-operator. You can also see them on the Web in HTML at RENAISSOFT.
Dealing with problem users on your site is likely one of your more difficult jobs. It requires a mix of patience, fairness, tact, and determination. There are a number of reasons such users are challenging to deal with.
You have to be sure the user is a problem. After all, their account may have been hacked, or the person complaining about them may not be entirely honest about the situation. It's helpful to require some sort of proof there is a problem. This proof could consist of forwarded news posts, e-mail, IRC logs, and so on. Keep in mind however that this proof may be altered or false. State in your Acceptable Use policy (making this policy is discussed in more detail in chapter 5, "Setting Up Your Site For General Use") what measures you will take to verify a complaint against a user.
The user may threaten legal action. If you make sure to have a solid Acceptable Use policy stating the things you won't allow or put up with on your site, and how offenders will be dealt with, you are generally covered. If you run a large site and are concerned, you may want to have a lawyer look over your Acceptable Use policy and help you fine tune it.
Some problem users are both spiteful and computer-knowledgeable. These users are actually capable of damaging your system. It is important if you tell a user you intend to remove their account to do so right away (more on removing users in the next section)! State in your Acceptable Use policy the number of days, hours, or minutes it will be between when you say you will remove an account, and when you will remove it. You can give a flexible time or a time range.
There are times you will need to remove users, whether they're simply leaving your provider or are in violation of your Acceptable Use policy. There are a few scenarios you may want to handle differently:
A precious resource for any Internet site is hard disk space. After all, you need to be able to store all that news, mail, and general stuff that users like to collect.
It's important to keep up with your disk space management. If you let it go too long, you'll find that your site suddenly starts kicking back mail or news saying that there's no more room!
You may find at some point that a number of users on your site are using a large amount of disk space for personal file storage. You can deal with this problem, of course, by simply asking them to cut their disk usage down to a level you find more appropriate. However, this may not always help.
There are two solutions to this problem. You can either add more hard drive space to your site or implement disk quotas. Adding more space, while a nice solution for the users, won't solve the problem if resource hogs continue to eat up your disk space. You set up disk quotas; give your users a bit of fair warning so that they have time to clean up their directories. Installing disk quotas is covered in detail in chapter 5, "Setting Up Your Site for General Use."
To set a quota for only one person:
Remember to turn your quotas on as discussed in chapter 5!
fs /home/joe blocks (soft=15000, hard=25000) inodes (soft=0, hard=0)
If you don't trim some of your log files from time to time, they will eventually overrun your hard drive. You can simply do this by hand whenever you're wandering through the drive to regain some lost space, or you can add items to your cron jobs (discussed in the next section) to take care of it on a regular basis.
Take a look at each log file on your system over a short period of time. See how large it gets within, say, a day and a full week. Look at the information stored there to see how helpful it is to you on a short term and a long term basis.
An example of this process (done as root) is:
To edit the crontab file for an account you're not logged into, you must be logged into root. No one else can edit a crontab for an account other than the one they're using.
L0 3 * * Wed For Minute Hour Day Month Day-of-Week
You can partition your syslog file according to the items kept in it. To do this, edit the file /etc/syslog.conf. You can tell this file to save out the log data for any type of system process. The following minitable shows examples of the processes for which you can make separate log files.
When you assign a log file to a particular type of process (e.g. daemon), you also can assign what kinds of messages you want written to the file. The levels of messages are shown in the following minitable, from most to least serious (not all these levels are used by all the process types).
Two special levels are debug (include all debugging information) and none (don't include anything).
Now, you just put these items together. For example, let's say I want my mail and daemon processes plus all error messages in general to each have their own log files. Also, I need to choose the levels for each of them. The entries I might use are:
mail.info /var/admin/mail.log
Send all information produced by mail to the specified log file.
daemon.notice /var/admin/daemon.log
Send all notices produced by my daemons to the specified log.
*.err /var/admin/err.log
Send all error messages to the specified log.
It's important to keep backups of data from your site. The extent of the backups depends on your own needs. Today, a tape drive for backups isn't all that expensive, so most people shouldn't have to resort to floppy disks. In fact, unless you're doing the most minor of backups involving only one more file, you want to avoid floppy disks for backing up. They're simply not feasible for most sites' backup needs!
To use a tape drive for your backups, you'll need two programs. First, you need a driver to actually run the tape drive: ftape. Second, you'd need a program to actually do the backups. You can use tar, or AFIO.
Another form of backup is keeping around an extra hard drive, and compressing your entire primary drive onto it.
See the Hardware How-To for the other backup technologies supported by Linux.
Ftape is one of the packages included with Slackware. It's the actual driver that runs your tape drive. Once you install it from the Slackware disks, that's it! No setup is required.
See the Hardware How-To which brands of tape backup systems are supported by Linux.
You have two main choices when it comes to actually making your backups. You can use tar itself. However, if a small portion of any file among the data gets damaged, you won't be able to recover even a part of the file. If you don't want to worry about damaged files, the better choice is AFIO. It's designed for archiving purposes, and so has more error-handling capabilities.
To install the program, do the following:
Now, if you intend to back up to tape, you'll want to look at the tob script. To install this script, do the following:
If you have only one tape backup, you can use the first resource file on the machine with the drive and the second file on the machines connected to it.
# Resource file for tob (version 0.01 and higher), using afio'd compressed # archives. I use "ftape.o" as loadable module, hence the PRECMD and POSTCMD. # See the docs for a full explanation. VERBOSE='yes' TOBHOME="/usr/etc/tob" BACKUPDEV="/dev/ftape" PRECMD="insmod /sbin/ftape.o" POSTCMD="rmmod ftape" # Let's see what we're up to. if [ "$TYPE" = "full" ] ; then echo "About to make a FULL backup of volume $VOLUMENAME." elif [ "$TYPE" = "diff" ] ; then echo "About to make a DIFFERENTIAL backup of volume $VOLUMENAME." elif [ "$TYPE" = "inc" ] ; then echo "About to make INCREMENTAL backup of volume $VOLUMENAME." fi
/
/ /home
.*/tmp/.*
The first part of the filename must match with the item in Step 10 that it refers to.
- 12. Create the directory TOBHOME/listings. In my case, it would be /usr/sbin/listings.
Making Backups
Now you need to come up with the line to add to root's crontab file to handle your backups. You can invoke tob without any arguments, but all that does is list usage information. The available arguments are listed in the following table.Table 13.x Available Arguments
Argument Description
rc rcfilename Use this if you want to select an rc file that isn't tob.rc. Make sure to use the full path since this file isn't necessarily in one of the directories tob will expect it in otherwise. If you use this switch, it must be the first one on the command line. backups Displays which backups were made and when, finding the data in TOBHOME/lists directory. check Tells tob to check its settings and report any errors. Tob checks the environment settings and its resource file to determine this information. full volumename Start a full backup of the volume listed (e.g. everything) fullcount volumename Tells tob to report the size a full backup of the volume (e.g., everything) listed will be. diff volumename Start a differential backup of the volume listed. At least one full backup must be done prior to a differential backup. diffcount volumename Tells tob to report the size a differential backup of the volume listed will be. restore item Restores everything that matches up with item (e.g., /home/ralph/*) find item Tells tob to scan its files and list anything that matches the item. Item in this case can contain regular expressions. verbose Lists the contents of the backup device.If you want your standard backup to be a differential backup, you'll want to do the following:
- Put a tape in your tape drive.
- Type tob -full volumename (e.g. tob -full everything). This command will do a full backup of your hard drive, and likely take a while. It's best to do this at a time when things are very slow or not being used at all.
- Now, use crontab -e to enter a cron job for your differential backups.
- Save and exit the crontab file.
Make sure and keep a tape in for the differential backups! Do them fairly frequently, at least every couple of days, if not every day.
A way to ensure you have a good backup system going is to follow the son, father, grandfather tape backup method (which requires 10 tapes). The son tapes you use every day of the week except for one, e.g. Sunday, one per day. On the seventh day (the day you use the other tape, Sunday in this case) you use the father tape, one per each Sunday. Then, you go back to using the son tapes. Then, once a month, you use the grandfather tape instead of the father tape.So, your schedule may look like this:
Reuse tapes as the time comes.
You may occasionally want to backup a file listing for your site with all of the permissions intact (type ls -lR | listing to save this to a file called listing). Then, if something happens that messes up your permissions you have something to look back to!
The major site resource you'll need to watch over is your network resources. Without close management, you'll find that your system will get annoyingly bogged down at times. By network resources, I mean your actual CPU and bandwidth.
One thing that can slow your system down is the processes people are running on it. There are, however, ways for the system administrator to adjust the load so that the system runs most processes at a speed that's comfortable for both yourself and your users.
If you feel the response time of your system is slow, it's good to find the culprit process. First, take a look at how badly bogged down your system is by using the uptime command. This program lists what time it is, how long the system's been up, how many users are on, and the load average from three quick samples. You generally want to keep the load average below 3 or so.
If you feel your load average is too high, you can get a listing of the top processes running to see what's happening. To get a listing of, say, the top 10 processes, type ps aux | head -10.
Typing ps aux by itself lists all processes.
Take a look at the process listing and see how much of your system's effort the top ones are taking up (percent CPU). Any process taking more than 50 percent of your CPU time for 30 seconds or longer is a problem.
If you find that your system is constantly having load problems, it may be time to add more RAM to the CPU.
You can start a CPU-intensive process with the nice command to make sure it takes up less of the processor. You can also change the nice value of a process that's already running.
In Linux, the nice range is from -20 to 19, with -20 being highest priority and 19 being lowest. The default nice value is zero, and if you type it without an argument the process's nice value goes up by 10. For example, if you wanted to start the calculate process and really weren't in much of a hurry, you could give it a nice of 15 to slow it down: nice 15 calculate. Or, if you were starting the rush process and wanted to make sure it got done quickly, you might nice it to -20: nice -20 rush.
If you see a process running that is taking up too much of your CPU time, you can use the renice command as root to change its nice value. Get the process number from the ps listing, and decide on the new nice value for it. For example, if rush was process 235 and you realized it was just bogging things down too much, you could use: renice -15 235.
All users can nice their own processes. If you have users who are constantly running CPU-intensive processes, they may be willing to nice them for you.
Sometimes a process just plain dies, or hangs. In this case, you may need to kill it so it's not wasting your CPU's time. Once again, you'll need a process's id (PID) number to do this.
Killing a process kills its child processes. If you kill one of your shells, you will also kill the processes you were running in that shell.
You'll want to try killing a process in the following order (for example, process number 345):
This is a harsher kill, but probably the one you'll use most often. Since the process its killing doesn't get to clean up after itself, the files involved may be in a bit of a mess.
Managing bandwidth isn't as simple as managing other features of your site. A few tips are: