Subject: Access to netlab machines It turns out that there is not an easy way to access netlab machines directly from outside the dept. This procedure should work from anywhere in the world. telnet yoda.cs.clemson.edu : telnet netlab-gw : telnet netlab3 This presumes that you have an ID on the CS network and that you know what your password is. If you don't know what your password is go to the CS help desk in G-23 (i think) with a picture ID and one of the student administrators can reset it for you. Remember to use yppasswd to set your password. Subject: Assignment 1 hints Your program is a "regular" application process. It should enter an infinite (while (1) ) loop. It is fine for the only way for it to end is for the user who starts it to type Ctl-C. ---- On the SIGUSR2 issue: You can use the getpid() function to obtain the process ID of of your program and then print it out when you start it up. Suppose your process has pid 13456 Then the unix command (issued by you from another session (xterm)) kill -USR2 13456 will send a USR2 "signal" to your process. You should establish a signal handler which catches such signals as carries out the required functions. See "man signal" for how to do this. ---- It is IMPERATIVE that all programs check the value returned by recvfrom(). If this value is <=0 your program should print a message, exit, and you should figure out why it is broken! mw Subject: Assn1 The assignment says that your program needs to take a particular action if an ICMP packet should be returned in which the ORIGINAL IP header contained header options. You are NOT required to submit a program that generates packets containing header options. You are NOT even <> to test your program to see if such packets are handled correctly. However, if I find a way to create such a test, you are required to submit a program which deals with the test as specified! mw Subject: ICMP packet length.. (This message is being sent to all 826 students) I have received a number of queries to the effect: "Will ICMP error packets always contain exactly N bytes following the IP header". To this I respond. icmp software is written by humans and is not divinely ordained. Thus it is the case that it is virtually CERTAIN that two separate implementations of icmp will NOT always act in precisely the same way. Furthermore, if you read the internet standards (RFC's) you will find MMAANNYY instances of MUST/MAY phrasing. e.g. X MUST do Y and MAY (or may not) do Z. Thus anytime you write a piece of software that will fail if the other end does not do exactly what you expect then you have just written a broken piece of software. Unfortunately, many such broken pieces of software exist and are regularly exploited by hackers sending INTENTIONALLY NON_COMPLIANT packets to gain unauthorized access. ---- The bottom line: If the packet contains at least 8 bytes of data beyond the header you MUST print them. If it contains more than 8 bytes you MAY print the extras ... I do not care. If, for some reason, the error message FAILS to contain even 8 bytes your program should NOT "print" the non existent data and it should NOT fail. jmw Subject: ICMP packet length.. (This message is being sent to all 826 students) Your program should NOT terminate after receiving SIGUSR2. It should be able to catch many SIGUSR2's and continue to operate normally. It should terminate on a SIGKILL (normally Ctl-C) Subject: How to turn in assignment (This message is being sent to all 826 students) Suppose your userid happens to be anshulg then I have created a directory named: /local/share/home/turnin/a1/anshulg which may be accessed from your account on the netlab machines. Copy your (1) makefile (2) source code and (3) headers files (if any) to the appropriate directory (replace anshulg by your personal id unless you happen to be Anshul Gupta.. in which case use anshulg). Do NOT copy (1) core dumps (2) old test versions of the program or other extraneous materials to the turnin directory. I recommend that you the cd to /local/share/home/turnin/a1/anshulg re-make the program and verify that it still runs correctly. --------------------- One additional note: Please remember that there is NO BACKUP of the <> netlab directories. You should ALWAYS keep a current copy of your assignments archived on the dept Sun network. mw Subject: Excessive ping traffic... (This message is being sent to all 826 students) I have received a concern/complaint from network administrators regarding high volumes of ICMP traffic. It would be nice if you guys could tone it down a bit if you are generating it. Here are some suggestions: 1 - Do initial testing between netlab nodes (eg on netlab2 use: a.out netlab2 netlab3 ..... 2 - <> run more than 32 pings in a single test 3 - Don't use delays <> than 50msec 4 - (and most important)Make DAMN sure you don't don't leave a broken orphan thread transmitting huge amounts of pings. (You should be able to run your own icmpd program and determine if somthing on your host is spewing pings.) thanks, mw Subject: Thread creation. . (This message is being sent to all 826 students) When compiling a program that uses threads it is necessary to include the threads library gcc -o xxx xxx.c -lpthread Subject: Clarifications.... (This message is being sent to all 826 students) The ping-delay is the delay between individual pings Each sample consists of 4 packets (1 short and 1 long to each of 2 destinations) Number of pings is the total number of pings sent. Thus the number of samples is the number of pings / 4. Assume an RFC 894 link header Subject: netlab (This message is being sent to all 826 students) Some "wizards" in Prof. Geists OS class were apparently unaware of which machine their keyboard was connected to and rebooted the netlab gw and may have reconfigured it.. I am now trying to repair this damage. Until I do I recommend you STAY OFF ALL NETLAB MACHINES as home directories may not be available and reboots may occur without warning. Ugh! Subject: netlab (This message is being sent to all 826 students) I think things should be back to normal now. I have physically unplugged the keyboards to netlab-gw=netlab1 and netlab2. Thus if you need to use the keyboard you will have to reconnect them. Subject: assn1 (This message is being sent to all 826 students) I have been working on getting assignment 1 graded today so that I can give you some feedback prior to submission of assignment 2. I regret to report that I have encountered 2 distinct cases in which 2 students collaborated on the construction of (defective) "makefiles" This represents a serious and complete disregard for my rules on non-collaboration. I tried to stress to you guys that this could have dire consequences up to and including dismissal from graduate school. Subject: Assn 2 (This message is being sent to all 826 students) If the number of pings supplied is not a multiple of 4 round it up to the next multiple of 4 If you encounter a condition in which negative or infinite bandwidth would be computed print * (an asterisk) in place of the bandwidth. Your program should be able to handle both DNS names and dotted decimal IP addresses for both endpoints of any run. However, since gethostbyname() resolves both of these cases correctly and transparently there is nothing special needed to make this work. Subject: Assn 2 (This message is being sent to all 826 students) I was reminded this evening that I had failed to create the a2 turnin directory. I have now done so... let me know if you have trouble accessing it. I also have not returned a1 results due to the problem mentioned yesterday. Thus the due date for a2 is deferred until Monday 24 Sept at 11:59 pm. Subject: Assn 2 (This message is being sent to all 826 students) A student asks: Also while printing latencies can we use the format specifier %6.3f cause if we use %6.1f we are getting latencies of 0. This is a good idea... so good that I will make it a requirement and defer the due date again until Wednesday at 11:59. While ms time scales work well for remote hosts usec scales are better for locals. I don't really care if the output goes to stderr or stdout Subject: Assn 3 (This message is being sent to all 826 students) assn3.f01 incorrectly mentioned an "examples" subdirectory It should have said "source" subdirectory. assn3.f01 has now been fixed. Subject: Assn 3 (Q and A) (This message is being sent to all 826 students) What I did not understand from your server code is that 1. Once the fork is done and a new socket is created for the connection, how is this intimated to the client? Careful review of the server code will reveal that the server sends the connection response using the <>. Since the new socket is not bound by the server the system will pick an available port and implicitly bind it. 2. As there are 2 sockets after fork how does the client program figure out that it needs to send to a particular port? When the client does a recvfrom() to read the connection response, the sockaddr structure that is filled in will contain the port that is bound to the new socket on the server. The client must use this port AND NOT THE ONE TO WHICH THE CONNECTION REQUEST WAS SENT WHEN SENDING DATA. Subject: Bit rate control program (This message is being sent to all 826 students) There is a program now available for controlling the bit rate on netlab2 through netlab6. It is NOT for use on netlab1 which contains a different breed of ATM adapter than the others. To set the raw bit rate of an adapter just specify the rate in 1000's of bits per second as in: /local/share/apetqcfg 600 will set the raw transmit bit rate to 600,000 bits per second. Useful values range from 400 to 26000. <<>> For test purposes I recommend running your servers on netlab1 and clients on netlab2-6. The bandwidth available between the switch and netlab1 is about 5 times that between netlab2-6 and the switch. Thus you should be able to test in parallel this way without interfering with each other. Subject: Assn 2 evaluations I am about to send evaluations to the person in whose directory the assignment was turned in.. please pass on to your teammate. Subject: netlab (This message is being sent to all 826 students) The netlab machines are now back in operation. We were again victimized by the 822 students! This time one of them cleverly used his or her feet to unplug the power strip into which the ATM switch and 4 of the machines were plugged!! Subject: Loss of precision (This message is being sent to all 826 students) Some teams received deductions for converting their time units to milliseconds BEFORE computing latencies and delay. In the following example I will endeavor to show why that it is always a bad idea to discard significant digits in a numerical problem such as this prior to performing a computation. Suppose the tk's are: 36.450 ms and 32.160 ms Suppose the tj's are: 2.610 ms and 1.410 ms and k = 10000 j = 1000 Suppose these results are highly repeatable. Now if you compute the bandwidth using the numbers as written you will get 5825 bits / msec However, if you pre-round to milliseconds you will have tk's are: 36 and 32 3 1 and you will get 9000 bits per-second... as you can see a significant error. You might also have a case like: Suppose the tk's are: 36.450 ms and 36.010 ms Suppose the tj's are: 2.220 ms and 2.180 ms If you solve this you will get 72,000 bits / msec However if you first convert to ms you will have tk's are: 36 and 36 2 2 and you will get a zero divide when you try to solve!! So EVEN WHEN final units are desired in bits/msec you get much better accuracy using the highest resolution time you have available. I hope this makes the problem more clear. mw Subject: Netlab broken (This message is being sent to all 826 students) NIS is not functioning correctly on some or all of the netlab machines. I am endeavoring to correct it. Correcting it may require rebooting multiple machines. I recommend that you don't try to log in to ANY machine until I send out another messages.. mw Subject: Netlab fixed (This message is being sent to all 826 students) I think everything is OK now. The problem was caused by a dumb move made by me when restarting networking on the file server. However, it has been demonstrated that it is possible to induce networking failure in netlab machines via a self inflicted Denial of Service attack in which excessive UDP traffic causes some sort of problem. Therefore it is important to start with a relatively slow rate such as 100 Kbp per second and build up to whatever actual rate is available.. If you try to start sending at something like 300 Mbps you CAN wedge the system! mw Subject: netlab I see that a number of machines were wedge again. I restarted networking but they are still acting a bit flaky.. Thus I think a reboot is in order. Subject: More netlab woes... (This message is being sent to all CPSC 826 students) I have some guests visiting this weekend and will not be able to monitor the health of the netlab on a regular basis nor fix problems until late Sunday.. Thus I am extending a two day extension on the assignment. Before I left today I rebooted netlab2,3, and 5. I was unable to remotely reboot netlab4 and 6. When I tried to get into Jordan, discovered it was locked and I don't have a key!! If any of the netlab machines becomes so degraded that you cannot rlogin in to netlab1 feel free to reboot that machine. The preferred method of reboot is to go to Jordan G-30 (ECE students, like me, may have trouble getting in) and after BEING SURE that the keyboard is attached to a netlab machine (quickly typing CTL-CTL toggles between netlab and 822lab machines.. If you can login, your are on the netlab). you can type CTL-ALT-DEL to reboot. This is a safe way to reboot Linux. If the system becomes so wedged that CTL-ALT-DEL doesn't work then it is also OK to hard reset the computer by pressing the reset button on the system unit. Have a good weekend! mw Subject: Netlab notice (This message is being sent to all 826 students) I am about to endeavor to uncover the source of netlab instabilities. Thus netlab machines may be subject to instant rebooting while I am working. I will be finished by 7:15 pm today (Sunday). mw Subject: Netlab notice (This message is being sent to all 826 students) I have now finished trying to give those machines a personality transplant. Feel free to resume torturing them. However, you should NOT run your SERVER on any netlab except 1. You may run your server on a dept Sun machine as well. This would provide a decent test for portability of code and for paths with longer prop delays. mw Subject: netlab (This message is being sent to all CPSC 826 students) I am going to be running some additional tests on netlab2 today. Please do not use it until further notice. netlab3,4,5,6 will remain available. For an additional test points you may also run servers or clients on waco-lane. You can rlogin to this machine from netlab-gw. It has a 155Mbps interface, but there is a 10Mbps ethernet segment between it and the netlabs. mw Subject: Netlab update (This message is being sent to all 826 students) I never would have guessed that it would be this difficult to configure an environment that "dropped packets reliably". At any rate my latest attempt is now deployed on netlab2-6. I am running a test of 3 concurrent clients sending 100,000,000 packets each on netlab6.. So far we have reached about 70,000,000 packets with no signs of trouble but you guys seem to have a real gift for eliciting it! So, take your best shot! lec0 Link encap:Ethernet HWaddr 00:04:AC:6C:29:19 inet addr:192.168.8.16 Bcast:192.168.8.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1492 Metric:1 RX packets:48425 errors:0 dropped:0 overruns:0 frame:0 TX packets:70371537 errors:0 dropped:72569 overruns:0 carrier:0 collisions:0 txqueuelen:100 mw Subject: Interesting problem(s) (This message is being sent to all CPSC 826 students) Netlab6 survived the three concurrent 100,000,000 packet clients with no crashes or lockups. However, one problem noted earlier by Praveen and Harish seems to persist under intense workloads: bad sequence number 64591443 last was 64591445 bad sequence number 64591442 last was 64591445 bad sequence number 68179434 last was 68179436 bad sequence number 68179433 last was 68179436 bad sequence number 68470327 last was 68470329 bad sequence number 68470326 last was 68470329 bad sequence number 69118087 last was 69118089 bad sequence number 69118086 last was 69118089 bad sequence number 69062829 last was 69062831 bad sequence number 69062828 last was 69062831 bad sequence number 76830170 last was 76830172 bad sequence number 76830169 last was 76830172 bad sequence number 92070209 last was 92070211 bad sequence number 92070208 last was 92070211 bad sequence number 96277647 last was 96277649 bad sequence number 96277646 last was 96277649 bad sequence number 95916357 last was 95916359 bad sequence number 95916356 last was 95916359 bad sequence number 99927344 last was 99927346 bad sequence number 99927343 last was 99927346 These messages, though relatively rare (and thus extremely hard to identify the cause of) persist. I have spent about 8 hours today trying to identify the cause.. First I put the following code in the tx side of the 25 MBPs driver used in the clients: 235 wloc = (unsigned long *)skb->data; 236 wloc += 4 + 5 + 2; /* Link hdr + IPH + UDPH */ 237 if (ntohl(*wloc) == 0x1a2b3c4d) 238 { 239 int seq; 240 wloc += 2; /* Point to seq # */ 241 seq = ntohl(*wloc); 242 if (seq < lastseq) 243 { 244 printk("OoO in drive %d %d \n", seq, lastseq); 245 } 246 else if (seq != lastseq) 247 { 248 printk("Drops in drive %d %d \n", seq, lastseq); 249 } 250 /* printk("Seq is %d \n", seq); */ 251 lastseq = seq + 1; 252 } This code lives right before the packet (skbuff) is scheduled on the hardware. What I saw was lots of the second printk but none of the first one. This led me to suspect the receive side... so I further instrumented the server to print the sequence number of EVERY packet received to standard error. When I see something like: bad sequence number 19836 last was 19852 The corresponding sequence number log looks like: seq 19807 19814 19821 19829 19851 <--- So 19851 really WAS received before 19836 19836 <--- 19844 19858 19866 19873 19881 19887 This led me to suspect problems in the transmit or receive side hardware or the receive driver. So I put the code block shown above in the receive side logic of the ia5515 driver in the section that handles DMA transfer complete interrupts. This code lies just before the packet is pushed to the LANE layer above the device driver... What I got was the following: The column labeled seq is the sequence number of the packet that just came off the board and is about to go to the LANE layer. The column labeled lastseq is (seq of last packet recvd + 1). The column on the far right a copy of what udpserv spit out. 36 seq lastseq seq 37 Drops in recv 19821 19815 19807 38 Drops in recv 19829 19822 19814 39 Drops in recv 19836 19830 19821 40 Drops in recv 19844 19837 19829 41 Drops in recv 19851 19845 19851 42 Drops in recv 19858 19852 19836 43 Drops in recv 19866 19859 19844 44 Drops in recv 19873 19867 19858 45 Drops in recv 19881 19874 19866 46 Drops in recv 19887 19882 19873 47 Drops in recv 19892 19888 19881 48 Drops in recv 19899 19893 19887 49 Drops in recv 19905 19900 Note that: (1) the values of seq seen by the device driver match those seen by the application in the area of the trouble. (2) Packet 19836 appears to have been processed by the driver and handed off to LANE <> packet 19851. (3) Packet 19851 appears to have been received by the driver before packet 19836!! 15 points extra credit will be awarded to the first team that can provide a complete and believable explanation to the cause of this problem. (Driver sources are in class/826/ia5515 and class/826/it25) ------------------------------------------------------------- Another problem reported by Chinmay was receipt of a negative bit rate. This is commonly caused by integer overflows, but all values used in the computation are double precision as far as I can tell. I have also not been able to reproduce the problem. 5 points extra credit are available for finding/fixing whatever is going on here. (To claim this prize you must be able to produce the problem and show that your fix makes it go away.) Subject: One more time... (This message is being sent to all CPSC 826 students) It turns out that the netlab machines do have at least one more "feature" needing correction. Bandwidth adjustment may more may not work correctly. I will be warm-restarting 2-5 to fix this. mw Subject: One more time... (This message is being sent to all CPSC 826 students) In trying to fix this remotely I have managed to crash netlab5 and now the 822 lab is in progress.. I will try to get the changeover done by 6:30pm mw Subject: Assignment update (This message is being sent to all 826 students) I finally got around to doing this assignment yesterday. I am using 1250 byte packets. It turns out that it shouldn't be to hard to achieve 20 Mbps at 0.1% drop rate (1 per 1000) with the adapter at full rate. At constrained rates it should also be possible to achieve nearly full rate with the drop rate at 1 per 1000. For full rate I test with 100,000 packets. I use 10,000 packets for 400Kbit rates. mw Subject: netlab recovered (This message is being sent to all 826 students) netlab1 seems to have suffered a severe system crash last night but it is now back in business. If you find it necessary to reboot one of the systems and you can still login to the console, it is a good idea to enter the "sync" command initiating the reboot. Last nights attempted reboot of netlab1 failed because of file system corruption. Off campus networking appears to be non-functional at the moment. Please don't confuse this problem with a netlab problem and initiate unneeded reboots. mw Subject: Home directories on netlab1 (This message is being sent to all 826 students) I failed to remount the home directories on netlab1 when I rebooted ... sorry about that. It should be OK now. Subject: Service restored + Safety guidelines (This message is being sent to all 826 students) The latest outage resulted from someone rebooting all 6 machines 30 minutes ago.. netlab1 is used as a gateway to reach the NIS server by 2-6. So.. if netlab1 is not fully functional when 2-6 are rebooted they will come up in a TOTALLY BROKEN STATE. Therefore BE SURE that netlab1 is up and running BEFORE rebooting the others. If it should become necessary to reboot all machines it is imperative that netlab1 must be rebooted and allowed to come fully back up before any other machine is rebooted. A good rule is DON"T REBOOT netlab1 if you can log into it and successfully ping any of the other netlab machines. Also, please be as careful as you can in testing your programs. A broken variant of one of these programs is FULLY EQUIVALENT to what is known in the network security world as a DENIAL OF SERVICE ATTACK. The reason it is called an ATTACK is because it can damage stuff!! Thanks for your cooperation. mw Subject: One more word to the wise... (This message is being sent to all 826 students) In perusing the system logs on netlab1 it appears that there are a LARGE number of concurrent users. If 8 or 10 of you are simultaneously running tests, not only does it greatly increase the likelihood of a DoS induced failure, it also will likely render your own performance numbers meaningless. I think two or three simultaneous tests sourced from different hosts are not likely to cause problems but 8 to 10 definitely will! Subject: fw assignment (This message is being sent to all 826 students) I have now enabled the insmod, lsmod, and rmmod commands to be run by all users on netlab2-6. I have also created the following firewall devices on the specified hosts: netlab 2 /dev/fwa /dev/fwb 3 /dev/fwc /dev/fwd 4 /dev/fwe /dev/fwf 5 /dev/fwa /dev/fwb /dev/fwc 6 /dev/fwd /dev/fwe /dev/fwf Major device numbers for the /dev nodes are /dev/fwa 181 /dev/fwb 182 /dev/fwc 183 /dev/fwd 184 /dev/fwe 185 /dev/fwf 186 Recall that teams are identified as a, b, c, d, e, and f. Please work ONLY on a machine on which your team's fw device exists. Subject: fw module installation (This message is being sent to all 826 students) The insmod command is a bit fussy when it comes to inserting modules from an nfs filesystem. Fortunately, there is an easy circumvention to this situation: cp fw.o /tmp/fw.o insmod /tmp/fw.o Subject: Assignment reminders (This message is being sent to all 826 students) (1) Please send me a concise statement of command line parameters used in starting your assignment 3 udp client (2) Please also include with your firewall submission a brief description of the capabilities of your firewall and how to use configure it (add/modify/delete/list rules). Subject: Assignment considerations (This message is being sent to all 826 students) You should be aware that it is entirely possible for you to (mis)configure a correct firewall in such a way that it is no longer possible to reach the home directory server or the NIS server! It is also the case the a moderately broken firewall may screw up networking without actually crashing the system. As an example, ypbind on netlab2 became non-functional and refused to restart. The only way that I could get it to restart was to reboot the system. Subject: local port. (This message is being sent to all 826 students) The "local port number" refers to the port number ON THE HOST THAT IS RUNNING THE FIREWALL. Thus local port = dest port on input packets and local port = source port on output packets. You don't need to concern yourself with forwarding rules on this assignment. mw Subject: Answers to various questions (This message is being sent to all 826 students) 1 - I have been out of town for a couple of days. It has been reported that some particularly gruesome crashes have resulted in unrebootable systems. I will investigate that situation Sunday. You can minimize the chances of such crashes by issuing the "sync" command and waiting a few seconds before installing/testing a new version of your firewall. 2 - You are perfectly free to modify any and all data structures used by the existing firewall. You may add new fields and/or new flag bits. Part of your evaluation will be related to how sensibly you do so. 3 - I have also received a number of questions regarding how can I/should I format/display "stuff". The answer is "in the way you would do it if you were trying to sell your program". Exceptionally innovative approaches will be rewarded, competent but not inspiring displays will not be penalized, and broken/ugly/unfriendly displays will be subject to deductions. 4 - Some folks have also said that "skbuff" pointers were thought to point to "junk" in the transport layer headers. This may well be because the skbuff "pointers" point to a union in which things are not necessarily as well-constructed as they might be. It is my experience that if you work directly from the iph pointer in the firewall you will find all that you need in the place you expect it. 5 - One team was having a problem using port number 123456. Since port numbers only reach 65536 (64K) and port numbers < 1024 are restricted to root access, port numbers outside this range should be avoided. Subject: Schedule for remainder of class (This message is being sent to all 826 students) Wed - Nov 28: Last quiz Fri - Nov 30: Assignment 4 due date in order to receive grades on Monday. Mon - Dec 3: Course evaluations. Grade averages to date will be provided Wed - Dec 5: No class Fri - Dec 7: Two makeup quizzes will be given. (Students who are content with whatever grade they receive on Monday 3 December are exempt. Thu - Dec 13: Final exam (score on final exam cannot harm final avg). Subject: File server (This message is being sent to all 826 students) The file server has been rebooted, as far as I can tell, without loss of data. However, I remind you that files are NOT BACKED UP. Thus you should ALWAYS keep a copy of your work on a departmental server. Subject: A4 turnin directory (This message is being sent to all 826 students) The directory has now been created. Please send me an email when you have turned in a4 mw Subject: Grades (This message is being sent to all 826 students) In the envelope that Kowshik provides for you today you will find a grade report. The leftmost column contains my view of your average to date. Following that are your individual quiz grades (normalized to a scale of 100 pts) your quiz average and your assignment average. One team has yet to submit the last assignment and thus their assignment average contains a 0 at the moment. Averages >= 89 ensure a grade of A for the course Averages >= 78 ensure a grade of at least B for the course Averages >= 70 ensure a grade of at least C for the course Questions/discrepancies/etc may be addressed when I return on Thursday. jmw