|
Chapter 12Using Sockets
CONTENTS
Perl offers a host of functions for accessing the socket-based interprocess communication facilities. The system facilities on UNIX systems are available to the Perl programmer and, thus, can be called directly from Perl scripts. Given the information in this chapter, you should be able to write your own client/server Perl applications using sockets. A Very Brief Introduction to SocketsPerl offers a host of functions for accessing the socket functions on a UNIX-based system. It's essential to cover some of the important features of interprocess communications in order to understand how the model fits in with Perl. Given the limited amount of space in a chapter, I cannot hope to possibly cover all the bases for socket programming. However, I will cover the basics, and you should have enough information from this chapter to develop your own client/server model using Perl. The absolutely best reference to doing anything with network programming is UNIX Network Programming by W. Richard Stevens (Prentice Hall, ISBN 0-13-949876-1). The book provides programs and C code samples in excruciating detail that cover all aspects of network programming. BSD UNIX gives many ways to open, use, and close sockets, and this book covers them all. The examples presented in this chapter are derived from the C code in this book. Perl is great in that often you can do a line-by-line mapping of cookbook socket and network function calls in C to equivalent calls in Perl. (Someone ought to write an interpreter!) Please refer to Stevens's book for a more detailed discussion on internetworking with sockets. In general, socket programming is based on the client/server model. A client is a program that makes requests to the server to get responses. A server is simply another program that responds to these requests. The server either can reside on the same computer as the client or can be a computer somewhere on a connecting network. Sending requests and receiving replies to transfer the data between clients and servers is the protocol with which they communicate. A common protocol in the UNIX world is the TCP/IP protocol. The Internet Protocol (IP) handles the transfer of data from one computer to another over a network. The Transport Control Protocol (TCP) offers a set of reliability and connection functions that IP does not offer. Messages sent via TCP/IP from one computer are acknowledged by the other computer. The acknowledgment of sent messages increases reliability, but at the cost of efficiency. The User Datagram Protocol (UDP) is like TCP in sending messages, but has no acknowledge feature as such. This makes UDP faster when receiving acknowledgments is not as high of a priority as sending acknowledgments back. UDP tends to be less reliable than TCP because the sender UDP does not have a guarantee that the sent message even made it to the remote site. Think of UDP as the regular U.S. Mail service and TCP/IP as a registered letter service. Although the U.S. Mail is generally very reliable, a sender does not really know if the recipient did indeed receive the letter. As far as you are concerned, the letter got to its destination, and if it didn't the recipient will request another. This is similar to UDP. Send a message, and if it doesn't make it over to the destination-no problem-the recipient will ask again. When sending important documents though, you would most likely want to get a confirmation from the recipient. In this case, you'd normally use a registered letter service, which will return a signed receipt. The signed receipt is your "acknowledgment" in TCP/IP. Most applications using TCP or UDP have a port number that they talk on to get the service they want. A machine assigns one unique port number to an application. Port numbers are standardized enough to identify the type of service being used. On UNIX systems, the file /etc/services maintains a list of services offered on each port. Port numbers between 1 and 255 are reserved for standard, well-known applications. There are well-known port numbers that everyone recognizes: for example, port 80 for the World Wide Web's server daemons, the nameserver on port 42, sendmail at port 25, and so on. In all cases, avoid using socket 0, since it's interpreted differently on different systems. Two computers talk to each other via a network circuit. Each circuit is uniquely identified by a combination of two numbers called a socket. Basically a socket is the IP address of the machine plus the port number used by the TCP software. There are two ways of defining the address: If two processes talking to each other are on the same machine, the "family" of protocols is referred to as AF_UNIX. If the communicators are on different machines, this is referred to as the AF_INET family. In the AF_UNIX family, sockets are assigned a pathname in the directory tree. In the AF_INET family, they are assigned a port number and application number. Using AF_INET, you can talk to processes on the same machine, but AF_UNIX is reserved for the same machine. There are two types of sockets about which you should know:
There is a socket on both the sending and receiving machine. Clients send on their sockets, and servers listen on their sockets and accept connections when necessary. The IP address of each machine is guaranteed to be unique by design, and the port numbers are unique to each machine. This implies the socket numbers, which are a combination of these two unique numbers, will also be unique across the network. This allows two applications to communicate using unique socket numbers. With Perl, it's possible to get access to these socket and network functions. Most of this chapter has a UNIX slant to it. On NT machines, you'll be dealing with the Remote Access Server and WinSock under Windows. Please refer to the technical notes from Microsoft for more information on WinSock programming. Perl Functions for Working with ProtocolsThe protocols available on your UNIX system are located in the /etc/protocols file. You have to use three functions to read this file. The function setprotoent() starts the listing process. The Perl function getprotoent() reads one line from the /etc/protocols file and lists it for you. Successive calls to the function read successive lines. Finally, a call to endprotoent() stops the listening process. A simple way to have all the protocols available to your Perl script is to use the script shown in Listing 12.1. Listing 12.1. Showing available protocols. 1 #!/usr/bin/perl The output should be similar to what is shown here: Name=ip, Aliases=IP, Protocol=0 To keep the file open between successive calls to the getprotoent() call, you should call the setprotoent() function with a nonzero parameter. To stop querying the file, use the endprotoent() call. To determine whether you have a specific protocol present, you can use the system call getprotobyname or getprotobynumber. A return value of NULL indicates that the protocol is not there. The name passed to the function is not case-sensitive. Therefore, to list the names, aliases, and the protocol number for TCP, you can use this: if (($name, $aliases, $protonum) = getprotobyname('tcp')) { A comparable set of calls is available for determining what services are available for your machine. This call queries the /etc/services file. Listing 12.2 illustrates how to use these calls. The setservent call with a nonzero file rewinds the index into the services file for you, the getservent gets the four items in the service entry, and the endservent call terminates the lookup. The output from this file can be a bit lengthy and is shown in Listing 12.2 starting at line 15. In Listing 12.2, lines 1 and 2 clear the screen and show the output of the showme.pl file with the script in it. At line 13, we execute this script. Your output may be different than the one shown in Listing 12.2 depending on what services you have installed on your system. Listing 12.2. Listing server services. 1 $ clear
Perl also lets you look at the host name by address in your /etc/hosts file with the gethostbyaddr call. This function takes two parameters, the address to look up and the value of AF_INET. On most systems, this value is set to 2 but can be looked up in the /usr/include/sys/socket.h file. The gethostbyname("hostname") function returns the same values as the gethostbyaddr() call. The parameter passed into the function is the name of the host being looked up. Listing 12.3 illustrates how to do this. In the program shown in Listing 12.3, the code in Line 4 gets the host name and alias given the address 204.251.103.2. You would use a different address, of course, because the address shown here is specific to my machine. Lines 6 through 10 print the components of the information you get back from the gethostbyaddr function call. Also, in lines 12 and 13, you can get the same information back using the node name instead of an IP address. Lines 14 through 19 print these values. Listing 12.3. Sample listing to show usage of gethostbyname and gethostbyaddr. 1 #!/usr/bin/perl
Socket PrimitivesEnough already about getting information on your system. Let's see what socket functions are available to you. Depending on your site and what extensions you have for Perl, you may have more functions available. Check the man pages for socket for more information. Here are the most common ones you'll use:
I cover these functions in the following sections. However, there are some constants that must be defined before I continue. These constants are used in all function calls and scripts in this chapter. Feel free to change them to reflect your own system's peculiarities. Here's a list of the constants:
socket()The socket() system call creates a socket for the client or the server. The socket function is defined as this: socket(SOCKET_HANDLE, $FAMILY, $TYPE, $PROTOCOL); The return value from this function is NULL, and if there was an error, you should check the $! for the type of error message. The call to open a socket looks like this: socket(MY_HANDLE, $AF_UNIX, $STREAMS, $PROTOCOL) || It's a good idea to unlink any existing file names for previously opened sockets with the unlink call: unlink "$my_tst_srvr" || die "\n$O: No permissions"; You'll use the socket descriptor MY_HANDLE to refer to this socket in all subsequent network function calls in your program. Sockets are created without a name. Clients use the name of the socket in order to read or write to it. This is where the bind function comes in. The bind() System CallThe bind() system call assigns a name to an unnamed socket: bind(SOCKET_HANDLE, $nameAsAString); The first item is the socket descriptor you just created. The second parameter is the name that refers to this socket if you are using AF_UNIX or its address if you are using AF_INET. The call to bind using AF_UNIX looks like this: bind(MY_HANDLE,"./mysocket") || die "Cannot bind $!\n"; In AF_INET, it looks like this: $port = 6666 The parameters' pack() function probably needs some explanation. The pack() function takes two parameters: a list of formats to use and a list of values to pack into one continuous stream of bytes. In our case, the bind() call expects a sockaddr structure of the following form in a C structure: { The first parameter to the pack instruction can take the values listed in Table 12.1. Check the man pages for the pack instruction for more details. You had the pack instruction create the socket address structure for you. Therefore, the script uses S n C4 x8 to pack a signed short, followed by an integer in network order, four unsigned characters, and eight NULL characters to get this call: pack(S n C4 x8,$AF_INET,$STREAMS,$port,$addr);
Now that you have bound an address for your server or client, you can connect to it or listen for connections with it. If your program is a server, it will set itself up to listen and accept connections.
Now let's look at the functions available for use in a server. The listen() and accept() System CallsThe listen() system call is used by the server to listen for connections. Once it is ready to listen, the server is able to honor any requests for connections with the accept system call. The listen call is defined as this: listen(SOCKET_HANDLE, $queueSize); The SOCKET_HANDLE is the descriptor of the socket you created. The queueSize is the number of waiting connections allowed at one time before any are rejected. Use the standard value of 5 for queue size. A returned value of NULL indicates an error. The call to listen normally looks like this: listen(MY_HANDLE,5) || die "Cannot listen $!\n"; If this call is successful, you can accept connections with the accept function, which looks like this: accept(NEWSOCKET, ORIGINAL_SOCKET); The accept() system call is used by the server to accept any incoming messages from a client's connect() calls. Be aware that this function will not return if no connections are received. As requests come off the queue and set up in the listen() call, the accept function handles them by assigning them to a new socket. NEWSOCKET is created by the accept function as ORIGINAL_SOCKET, but now NEWSOCKET is going to be used to communicate with the client. At this point, most servers fork off (fork()) a child process to handle the client and go back to wait for more connections. Before I get into that, let's see how connections are originated. Let's look at the connect() call that you'll use to connect to a server. The connect() System CallThe connect() system call is used by clients to connect to a server in a connection-oriented system. This connect() call should be made after the bind() call. There are two ways you can call the connect() call: one for AF_UNIX using the pathname of the socket and the other using an address as the AF_INET requirement for a socket handle. connect(SOCKET_HANDLE,"pathname" ); # for AF_UNIX Connection-Oriented Servers in PerlGiven this background information about socket information gathering, creation, and so on, you are now ready to write your own server using Perl. Listing 12.4 presents a sample server. Listing 12.4. Server side for connection-oriented protocol. 1 #!/usr/bin/perl
In the case of connection-oriented protocols, the server does the following functions:
Once a connection to the server has been accepted, the client and server can exchange data with the read() and write() function calls. To read from the socket, use the function call read(MY_SOCKET, $buffer, $length); where SOCKET_HANDLE is the socket you are reading from and $buffer is where you will be putting in data of size $length. To write to the socket, you can use the function call write(MY_SOCKET, $buffer, $length); For sending just text data, you can use the print call instead. For example, the following code will write text to the socket: print MY_SOCKET, "Hello, .."; Once a connection has served its time, it has to be closed so that other clients are able to use the system resources. To close the socket, your server and clients should call the close() function: close(MY_SOCKET); The shutdown() function allows you to selectively shut down sends and receives on a socket. Here's the function call: shutdown(MY_SOCKET,HOW); When the HOW parameter is 0, no more data is received on this socket. If HOW is set to 1, no more data will be sent out on this socket. If set to 2, no data is received or sent on this socket. (You still have to close the socket, even if you shut it down for sending and receiving.) Listing 12.5 presents a sample of the client side of things. Listing 12.5. The client side. 1 #!/usr/bin/perl The client for connection-oriented communication also takes the following steps:
That's about it for a client program. Any processing that has to be done is done while the connection is open. Client programs can be written to keep a connection open for a long time while large amounts of data are transferred. If there is too long of a delay between successive messages, clients would then open a socket connection, send the message or messages, and close the connection immediately after the acknowledgment, if any, arrives. This way, all sockets are opened only on an as needed basis and do not use up socket services when both the server and client are idle. The h2ph ScriptIf you read more documentation on Perl and sockets, you'll see references to the socket.ph file. If you cannot find this file anywhere on your system, it's because you have not run the h2ph file on your include directories. This h2ph program converts C header files to Perl header files. The safest way to ensure that you have all the files converted to Perl headers is to issue the following statements while logged in as root: $ cd /usr/include You may run into some problems while running this script. For instance, it will say that it's creating a .ph file from a .h file, but after execution, the *.ph file may not exist! Check the script in the h2ph file to see where $perlincl is pointing and if you have read/write permissions there. A common repository is the /usr/local/lib/perl5 or the /usr/lib/perl5 directory. Another thing to remember is that the @Inc variable in your Perl scripts should point to the same location where the *.ph files are placed. Using Socket.pmThe standard Perl 5 distribution comes with the Socket.pm module, which greatly speeds up Perl code development work. Look at the documentation in the /usr/lib/Perl5/Socket.pm file for more information. This module requires dynamic loading, so ensure that your system supports it. SummaryPerl offers very powerful features for using sockets on UNIX machines. The system calls available offer enough power and flexibility to create client/server applications in Perl without having to write code in a compiled language. The functions available in Perl include those for querying available system services and protocols, IP addresses, and other host name information. Here are a few key points to remember when working with Perl and sockets:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
With any suggestions or questions please feel free to contact us |