Network+ Ch. 21, Network Troubleshooting
Caution: No matter what the problem, always consider the safety of your data first. Ask yourself this question before performing any troubleshooting action:
"Can what I'm about to do potentially damage my data?"
Troubleshooting process: Identify the problem: Determine if anything has changed: Determine if anything has changed on the network recently that might have caused the problem. You may not have to ask many questions before the person using the problem system can tell you what has changed, but, in some cases, establishing if anything has changed can take quite a bit of time and involve further work behind the scenes. Here are some examples of questions to ask: -"What exactly was happening when the problem occurred?" -"Has anything been changed on the system recently?" -"Has the system been moved recently?" Notice the way I've tactfully avoided the word you, as in "Have you changed anything on the system recently?" This is a deliberate tactic to avoid any implied blame on the part of the user. Being nice never hurts, and it makes the whole troubleshooting process more friendly. You should also internally ask yourself some isolating questions, such as
"Was that machine involved in the software push last night?" or "Didn't a tech visit that machine this morning?" Note you will only be able to answer these questions if your documentation is up to date. Sometimes, isolating a problem may require you to check system and hardware logs (such as those stored by some routers and other network devices), so make sure you know how to do this.
Windows command of the traceroute utility is
"tracert"
Troubleshooting process: Identify the problem: Determine if anything has changed: Determine if anything has changed on the network recently that might have caused the problem. You may not have to ask many questions before the person using the problem system can tell you what has changed, but, in some cases, establishing if anything has changed can take quite a bit of time and involve further work behind the scenes. Here are some examples of questions to ask: (3)
-"What exactly was happening when the problem occurred?" -"Has anything been changed on the system recently?" -"Has the system been moved recently?"
So far, everything seems to confirm that the local office cannot get to the remote server. Just to be able to say he tried everything, Terry runs the mtr utility from a Linux box and lets it run for an extended time. At the same time, he runs the pathping utility from a Windows computer. Neither utility can contact the server. He tries all of these utilities on some other company resources and Internet sites and has no problems connecting. Confident that the reported symptom is confirmed, Terry puts in a call to the remote site to ask about the status. The virtual PBX sends Terry to voicemail for every extension that he calls. This could point to a network disconnection at the site or to everyone being out of the office there. Since it is 3:00 a.m. at the remote site, Terry does not have a clear answer. The next quick test to perform is to see if the site is reachable from outside of the local office. This will confirm or eliminate his theory of a local incorrect host-based firewall settingsissue. Terry sits down at a computer and searches on Google for a looking glass site. He selects one from the results list and browses to the site. Once in the site, he selects the location of a source router to perform a diagnostic test, and then he selects the type of test to run; in this case, he chooses a ping test. He enters the target server address of the company remote server and submits the test parameters. After a moment, the looking glass server sends a set of pings, none of which receives a response. He tries the test from a few other source router locations and gets the same results. To complete his tests, Terry uses the looking glass site to ping some additional hosts at the remote site and is pleased to discover that they are all reachable. Now Terry knows that the site is accessible, so it must be that the server is down. When the office opens, he will contact the technician there and offer whatever help and information that he can. In the meantime, he informs the rest of the organization of the server's status. Narrowing the problem to a single source—an apparently down server—doesn't get all the way to the bottom of the problem (although it certainly helps!). What could cause an unresponsive server? (7)
-Local power outage, like a blown circuit breaker -Failed NIC on the server -Network cable disconnected -Improper network configuration on the server -A changed patch cable location in the rack -Failed component in the server -Server shutdown -A whole lot of other possibilities
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: (not expanded)
1. Identify the problem. 2. Establish a theory of probable cause. 3. Test the theory to determine the cause. 4. Establish a plan of action to resolve the problem and identify potential effects. 5. Implement the solution or escalate as necessary. 6. Verify full system functionality and, if applicable, implement preventative measures. 7. Document findings, actions, and outcomes.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 1.
1. Identify the problem. a. Gather information. b. Duplicate the problem, if possible. c. Question users. d. Identify symptoms. e. Determine if anything has changed. f. Approach multiple problems individually.
Exam tip: Memorize these problem analysis steps: (7)
1. Identify the problem. a. Gather information. b. Duplicate the problem, if possible. c. Question users. d. Identify symptoms. e. Determine if anything has changed. f. Approach multiple problems individually. 2. Establish a theory of probable cause. a. Question the obvious. b. Consider multiple approaches: i. Top-to-bottom/bottom-to-top OSI model ii. Divide and conquer 3. Test the theory to determine cause. a. Once theory is confirmed, determine next steps to resolve problem. b. If theory is not confirmed, reestablish new theory or escalate. 4. Establish a plan of action to resolve the problem and identify potential effects. 5. Implement the solution or escalate as necessary. 6. Verify full system functionality and, if applicable, implement preventative measures. 7. Document findings, actions, and outcomes.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: (write out on word document)
1. Identify the problem. a. Gather information. b. Duplicate the problem, if possible. c. Question users. d. Identify symptoms. e. Determine if anything has changed. f. Approach multiple problems individually. 2. Establish a theory of probable cause. a. Question the obvious. b. Consider multiple approaches: i. Top-to-bottom/bottom-to-top OSI model ii. Divide and conquer 3. Test the theory to determine the cause. a. Once the theory is confirmed, determine the next steps to resolve the problem. b. If the theory is not confirmed, reestablish a new theory or escalate. 4. Establish a plan of action to resolve the problem and identify potential effects. 5. Implement the solution or escalate as necessary. 6. Verify full system functionality and, if applicable, implement preventative measures. 7. Document findings, actions, and outcomes.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. What layer? (A disconnected cable or dead NIC can make for a bad day.)
1. Physical
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than
1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch.
The largest Ethernet packet is __(#) bytes
1500
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. What layer? (The MAC address of the database server or Martha's machine might be blacklisted.)
2. Data Link
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 2.
2. Establish a theory of probable cause. a. Question the obvious. b. Consider multiple approaches: i. Top-to-bottom/bottom-to-top OSI model ii. Divide and conquer
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around __(year), Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A PMTU-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount.
2007
Exam tip: The ARP table functions at Layer __(#), mapping IP addresses to MAC addresses.
3
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. What layer? (Someone might have changed the IP address of the database server.)
3. Network
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 3.
3. Test the theory to determine the cause. a. Once the theory is confirmed, determine the next steps to resolve the problem. b. If the theory is not confirmed, reestablish a new theory or escalate.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 4.
4. Establish a plan of action to resolve the problem and identify potential effects.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. What layer? (Perhaps extreme traffic on the network could block an acknowledgment segment? This seems a bit of a reach, but worth considering.)
4. Transport
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 5.
5. Implement the solution or escalate as necessary.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. What layer? (Could a database authentication failure be preventing access? Again, this could be the problem, but Martha would probably see an error message here as well.)
5. Session
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. What layer? (Could there be a problem with encryption between the application and the database server? Maybe, but Martha would probably see an error message rather than nothing.)
6. Presentation
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 6.
6. Verify full system functionality and, if applicable, implement preventative measures.
Punchdown tools (Figure 21-5) put UTP wires into
66- and 110-blocks. The only time you would use a punchdown tool in a diagnostic environment is a quick repunch of a connection to make sure all the contacts are properly set.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. What layer? (Could there be a problem with the API that enables the database application to connect to the database server? Sure.)
7. Application
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 7.
7. Document findings, actions, and outcomes.
Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE __ specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario.
802.3ad
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 1: Physical:
A disconnected cable or dead NIC can make for a bad day.
While you are asking the user problem-isolating questions, what else should you be doing? Asking yourself if there is anything on your side of the network that could be causing the problem. Nothing; just keep asking the user questions. Using an accusatory tone with the user. Playing solitaire.
A. Ask yourself if anything could have happened on your side of the network.
What are tone probes and tone generators used for? Locating a particular cable Testing the dial tone on a PBX system A long-duration ping test As safety equipment when working in crawl spaces
A. Tone probes are only used for locating individual cables.
When trying to establish symptoms over the phone, what kind of questions should you ask of a novice or confused user? You should ask open-ended questions and let the user explain the problem in his or her own words. You should ask detailed, close-ended questions to try and narrow down the possible causes. Leading questions are your best choice for pointing the user in the right direction. None; ask the user to bring the machine in because it is useless to troubleshoot over the phone.
A. With novice or confused users, ask open-ended questions so the user can explain the problem in his or her own words.
Computers use the Address Resolution Protocol (ARP) utility to resolve IP addresses to MAC addresses. As the computer learns various MAC addresses on its LAN, it jots them down in the
ARP table. When Computer A wants to send a message to Computer B, it determines B's IP address and then checks the ARP table for a corresponding MAC address.
ARP stands for
Address Resolution Protocol (ARP)
Computers use the __ utility to resolve IP addresses to MAC addresses.
Address Resolution Protocol (ARP)
Troubleshooting process: Establish a Theory of Probable Cause:Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 5. Session. Could a database authentication failure be preventing access?
Again, this could be the problem, but Martha would probably see an error message here as well.
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario. To enable LACP between two devices, such as the switch and file server just noted, each device needs two or more interconnected network interfaces configured for LACP. When the two devices interact, they will make sure they can communicate over multiple physical ports at the same speeds and form a single logical port that takes advantage of the full combined bandwidth (Figure 21-12). Those ports can be in one of two modes: active or passive. Active ports want to use LACP and send special frames out trying to initiate creating an aggregated logical port. Passive ports wait for active ports to initiate the conversation before they will respond. So here's the common network error with LACP setups..
An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of NIC teaming misconfiguration.
As you'll recall from back in Chapter 18, "Managing Risk," a port scanner is a program that probes ports on another system, logging the state of the scanned ports. These tools are used to look for unintentionally opened ports that might make a system vulnerable to attack. As you might imagine, they also are used by hackers to break into systems. The most famous of all port scanners is probably the powerful and free Nmap. Nmap was originally designed to work on UNIX systems, so Windows folks used alternatives like
Angry IP Scanner by Anton Keks (Figure 21-7).Nmap has been ported to just about every operating system these days, however, so you can find it for Windows.
Troubleshooting process: Identify the problem: Approach Multiple Problems Individually: If you encounter a complicated scenario, with various machines off the network and potential server room or wiring problems, break it down. __ to sort out root causes. Methodically tackle them and you'll eventually have a list of one or more problems identified. Then you can move on to the next step.
Approach multiple problems individually
What does nslookup do? Retrieves the name space for the network Queries DNS for the IP address of the supplied host name Performs a reverse IP lookup Lists the current running network services on localhost
B. The nslookup command queries DNS and returns the IP address of the supplied host name.
When should you use a cable tester to troubleshoot a network cable? When you have a host experiencing a very slow connection When you have an intermittent connection problem When you have a dead connection and you suspect a broken cable When you are trying to find the correct cable up in the plenum
C. Cable testers can only show that you have a broken or poorly wired cable, not if the cable is up to proper specification.
What is the last step in the troubleshooting process? Implementing the solution Testing the solution Documenting the solution Closing the help ticket
C. Documenting the solution is the last and, in many ways, the most important step in the troubleshooting process.
Which command shows you detailed IP information, including DNS server addresses and MAC addresses? ipconfig ipconfig -a ipconfig /all ipconfig /dns
C. ipconfig /all displays detailed IP configuration information.
Network technicians use three different devices to deal with broken cables. __ can tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1).
Cable testers
Network technicians use three different devices to deal with broken cables.
Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you where the break is on the cable (Figure 21-2). A TDR works with copper cables and an OTDR works with fiber optics, but otherwise they share the same function. If a problem shows itself as a disconnect and you've first checked easier issues that would manifest as disconnects, such as loss of permissions, an unplugged cable, or a server shut off, then think about using these tools.
__ test a cable to ensure that it can handle its rated amount of capacity.
Certifiers
Cisco ASA stands for
Cisco Adaptive Security Appliance (ASA)
WAN Problems: Appliance Problems: Many of the boxes that people refer to as "routers" contain many features, such as routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more. These complex boxes, such as the __, are called network appliances.
Cisco Adaptive Security Appliance (ASA)
LAN Problems: Link Aggregation Problems: NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is to turn off a feature called IGMP snooping, which is enabled by default on
Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 5: Session:
Could a database authentication failure be preventing access? Again, this could be the problem, but Martha would probably see an error message here as well.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 6: Presentation:
Could there be a problem with encryption between the application and the database server? Maybe, but Martha would probably see an error message rather than nothing.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 7: Application:
Could there be a problem with the API that enables the database application to connect to the database server? Sure.
What is Wireshark? Protocol analyzer Packet sniffer Packet analyzer All of the above
D. All of the above; Wireshark can sniff and analyze all the network traffic that enters the computer's NIC.
One of your users calls you with a complaint that he can't reach the site www.google.com. You try and access the site and discover you can't connect either but you can ping the site with its IP address. What is the most probable culprit? The workgroup switch is down. Google is down. The gateway is down. The DNS server is down.
D. In this case, the DNS system is probably at fault. By pinging the site with its IP address, you have established that the site is up and your LAN and gateway are functioning properly.
What will the command route print return on a Windows system? The results of the last tracert The gateway's router table The routes taken by a concurrent connection The current system's route table
D. The route print command returns the local system's routing table.
LAN Problems: Server misconfigurations: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user. You'll be clued into such misconfiguration by using ping and other tools. If you can ping a file server by IP address but not by name, this points to DNS issues. Similarly, if a computer fails in discovering neighboring devices/nodes, like connecting to a networked printer,__ can be the culprit.
DHCP or DNS misconfiguration
LAN Problems: Server misconfigurations: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user. You'll be clued into such misconfiguration by using ping and other tools. If you can ping a file server by IP address but not by name, this points to
DNS issues. Similarly, if a computer fails in discovering neighboring devices/nodes, like connecting to a networked printer, DHCP or DNS misconfiguration can be the culprit. To fix the issue, go into the network configuration for the client or the server and find the misconfigured settings.
The nslookup (all operating systems) and dig (macOS/UNIX/Linux) utilities help diagnose
DNS problems. These tools are very powerful, but the CompTIA Network+ exam won't ask you more than basic questions, such as how to use them to see if a DNS server is working. When working on Windows systems, the nslookup utility is your only choice by default. On macOS/UNIX/Linux systems, you should prefer the dig utility. Both utilities will help in troubleshooting your DNS issues, but dig provides more verbose output by default. You need to be comfortable working with both utilities when troubleshooting modern networks.
The ipconfig (Windows), ifconfig (macOS and UNIX), and ip (Linux) utilities tell you almost anything you want to know about a computer's IP settings. Make sure you know that typing ipconfig alone only gives basic information. Typing ipconfig /all gives detailed information (like __ and __).
DNS servers; MAC address
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as
DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by adding a new feature called the "__" to the IP packet. A PMTU-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount.
Don't Fragment (DF) flag
The extremely transparent fiber-optic cables allow light to shine but have some inherent impurities in the glass that can reduce light transmission. (3)__ can also degrade the strength of light pulses as they travel through a fiber-optic run.
Dust, poor connections, and light leakage
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"The type of question you ask at any given moment depends on what information you need and on the user's knowledge level. If, for example, the user seems to be technically oriented, you will probably be able to ask more close-ended questions because they will know what you are talking about. If, on the other hand, the user seems to be confused about what's happening, open-ended questions will allow him or her to explain in his or her own words what is going on. One of the first steps in trying to determine the cause of a problem is to understand the extent of the problem. Is it specific to one user or is it network-wide? Sometimes this entails trying the task yourself, both from the user's machine and from your own or another machine. For example, if a user is experiencing problems logging into the network, you might need to go to that user's machine and try to use his or her user name to log in. In other words, try to duplicate the problem. Doing this tells you whether the problem is a user error of some kind, as well as enables you to see the symptoms of the problem yourself. Next, you probably want to try logging in with your own user name from that machine, or have the user try to log in from another machine. In some cases, you can ask other users in the area if they are experiencing the same problem to see if the issue is affecting more than one user. Depending on the size of your network, you should find out whether the problem is occurring in only one part of your company or across the entire network. What does all of this tell you?
Essentially, it tells you how big the problem is. If nobody in an entire remote office can log in, you may be able to assume that the problem is the network link or router connecting that office to the server. If nobody in any office can log in, you may be able to assume the server is down or not accepting logins. If only that one user in that one location can't log in, the problem may be with that user, that machine, or that user's account.
Troubleshooting process: Document Findings, Actions, and Outcomes: It is vital that you document findings, actions, and outcomes of all support calls, for two reasons:
First, you're creating a support database to serve as a knowledge base for future reference, enabling everyone on the support team to identify new problems as they arise and know how to deal with them quickly, without having to duplicate someone else's research efforts. Second, documentation enables you to track problem trends and anticipate future workloads, or even to identify a particular brand or model of an item, such as a printer or a NIC, that seems to be less reliable or that creates more work for you than others. Don't skip this step—it really is essential!
Troubleshooting is a dynamic, fluid process that requires you to make snap judgments and act on them to try and make the network go. Any attempt to cover every possible scenario here would be futile at best, and probably also not in your best interest. If an exhaustive listing of all network problems is impossible, then how do you decide what to do and in what order? Before you touch a single console or cable, you should remember two basic rules.
For starters, to paraphrase the Hippocratic Oath, "First, do no harm." If at all possible, don't make a network problem bigger than it was originally. This is a rule I've broken thousands of times, and you will too. But if I change the good doctor's phrase a bit, it's possible to formulate a rule you can actually live with: "First, do not trash the data!" My gosh, if I had a dollar for every megabyte of irreplaceable data I've destroyed, I'd be rich! I've learned my lesson, and you should learn from my mistakes. The second rule is: "Always make good backups!" Computers can be replaced; data that is not backed up is, at best, expensive to recover and, at worst, gone forever.
Hands-on Problems: Aside from obvious physical problems, other hands-on problems you can fix manifest as some sort of misconfiguration. An incorrect IP configuration, such as setting a PC to a static IP address that's not on the same network ID as other resources, would result in a "dead-to-me" network. A similar fate would result from inputting incorrect default gateway IP addressinformation. The same is true with an incorrect netmask setting—that is, the subnet mask isn't accurate. The system will go nowhere, fast. The fix for these sorts of problems should be pretty obvious to you at this point.
Go into the network configuration for the device and put in correct numbers. Figure 21-9 shows TCP/IP settings for a Windows Server machine.
__ refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems.
Hands-on problems
Network problems fall into several basic categories, and most of these problems you or a network tech in the proper place can fix. Fixing problems at the workstation, work area, or server is a network tech's bread and butter. The same is true of connecting to resources on the LAN. Problems connecting to a WAN can often be resolved at the local level, but sometimes need to get escalated. The knowledge from the previous chapters combined with the tools and methods you've learned in this chapter should enable you to fix just about any network! There are a couple of stumbling blocks when it comes to resolving network issues. First, at almost any level of problem, the result—as far as the end user is concerned—is the same.
He or she can't access resources beyond the local machine. Whether a user tries to access the local file server or do a Google search, if the attempt fails, "the network is down!" You need to fall back on the most important question a tech can ask: What can cause this problem? Then methodically work through the troubleshooting steps and tools to narrow possibilities. Let's look at a scenario to illustrate the narrowing process.
HSRP stands for
Hot Standby Router Protocol (HSRP).
Almost every new networking person I teach will, at some point, ask me: "What tools do I need to buy?" My answer shocks them: "None. Don't buy a thing." It's not so much that you don't need tools, but rather that different networking jobs require wildly different tools. Plenty of network techs never crimp a cable. An equal number never open a system. Some techs do nothing all day but pull cable. The tools you need are defined by your job. This answer is especially true with software tools. Almost all the network problems I encounter in established networks don't require me to use any tools other than the classic ones provided by the operating system. I've fixed more network problems with ping, for example, than with any other single tool. As you gain skill in this area, you'll find yourself hounded by vendors trying to sell you the latest and greatest networking diagnostic tools. You may like these tools. All I can say is that
I've never needed a software diagnostics tool that I had to purchase.
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them. When sending a traceroute, it's important to keep a significant difference between Windows and UNIX/Linux/Cisco systems in mind. Windows tracert sends only __ packets
ICMP
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A PMTU-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount. Unfortunately, PMTU runs under
ICMP; most routers have firewall features that, by default, are configured to block ICMP requests, making PMTU worthless. This is called a PMTU or MTU black hole. If you're having terrible connection problems and you've checked everything else, you need to consider this issue. In many cases, going into the router and turning off ICMP blocking in the firewall is all you need to do to fix the problem.
The ping utility uses Internet Message Control Protocol (ICMP) packets to query by
IP address or by name.
LAN Problems: Most clients will use DHCP for __, __, and __settings.
IP address, subnet mask, and default gateway
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because
IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets.
Exam tip: The iptables utility in Linux enabled command-line control over __, rules that determine what happens with an IPv4 packet when it encounters a firewall.
IPv4 tables
The ping utility uses Internet Message Control Protocol (ICMP) packets to query by IP address or by name. It works across routers, so it's generally the first tool used to check if a system is reachable. Unfortunately, many devices block ICMP packets, so a failed ping doesn't always point to an offline system. The ping utility defaults to (IPv4 or IPv6)
IPv4, but also functions well in an IPv6 network.
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them. When sending a traceroute, it's important to keep a significant difference between Windows and UNIX/Linux/Cisco systems in mind. Windows tracert sends only ICMP packets, while UNIX/Linux/Cisco traceroute can send either ICMP packets or UDP datagrams, but sends UDP datagrams by default. Because many routers block ICMP packets, if your traceroute fails from a Windows system, running it on a Linux or UNIX system may return more complete results. The traceroute command defaults to
IPv4, but also functions well in an IPv6 network. In Windows, use the command with the -6 switch: tracert -6. In UNIX/Linux, use traceroute6(or traceroute -6 in some variants of Linux).
End-to-End Connectivity: The end-to-end principle meant originally that applications and work should happen only at the endpoints in a network. In the early days of networking, this made a lot of sense. Connections weren't always fully reliable and thus were not good for real-time activity. So the work should get done by the computers at the ends of a network connection. The Internet was founded on the end-to-end principle. With modern networks like the Internet, the end-to-end concept has had to evolve. Clearly, anything you do over the Internet goes through many different machines. So, perhaps end-to-end means that the intermediary devices simply don't change the essential data in packets that flow through them. Add in today, though, the fact that plenty of intermediaries want to do a lot of things to your data as it flows through their devices. Thieves want to steal information. Merchants want to sell you things. Advertisers want to intrude on your monitor. Government agencies want to control what you can see or do, or simply want to monitor what you do for later, perhaps benign purposes. Other intermediaries help create trust bonds between your computer and a secure site so that e-commerce can function. That dynamic between the fundamental principle of work only happening on the ends of the connection and all the intermediaries facilitating, pilfering, or punctuating is the current state of the Internet. It's the basic tension between
ISP companies that want to build in tiered profit structures and the consumers and creators who want Net Neutrality.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where the electrical signal bleeds from one wire pair to another, creating interference. Fifth, a broken cable might pick up noise, spurious signals usually caused by faulty hardware or poorly crimped jacks. Finally, a broken cable might have __. Impedance is the natural electrical resistance of a cable. When cables of different types—think thickness, composition of the metal, and so on—connect and the flow of electrons is not uniform, it can cause a unique type of electrical noise, called an echo.
Impedance mismatch
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. __ can lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users.
Incorrect ACL settings
The ping utility uses __packets to query by IP address or by name.
Internet Message Control Protocol (ICMP)
Exam tip: The ARP table functions at Layer 3, mapping IP addresses to MAC addresses. The ARP table therefore would be stored on a Layer 3 device. A MAC address table, in contrast, maps MAC addresses to ports, and thus lives on a
Layer 2 device, a switch.
Exam tip: The ARP table functions at Layer 3, mapping IP addresses to MAC addresses. The ARP table therefore would be stored on a __ device.
Layer 3
Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification __and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario.
Link Aggregation Control Protocol (LACP)
LACP stands for
Link Aggregation Control Protocol (LACP)
Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification __ and the Cisco-proprietary __. Let's focus on the former for a common network issue scenario.
Link Aggregation Control Protocol (LACP); Port Aggregation Protocol (PAgP)
Windows still comes with netstat, but the ss utility has completely eclipsed it on the __(OS) side
Linux
Sometimes you need to perform a ping or traceroute from a location outside of the local environment. __ are remote servers accessible with a browser that contain common collections of diagnostic tools such as ping and traceroute, plus some Border Gateway Protocol (BGP) query tools.
Looking glass sites
Exam tip: The ARP table functions at Layer 3, mapping IP addresses to MAC addresses. The ARP table therefore would be stored on a Layer 3 device. A __, in contrast, maps MAC addresses to ports, and thus lives on a Layer 2 device, a switch.
MAC address table
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an
MTU mismatch.
What OSs is the dig command on?
MacOS/UNIX/Linux
ipconfig version for MacOS, UNIX, and Linux
MacOS: ifconfig UNIX: ifconfig Linux: ip
Troubleshooting process: Establish a Theory of Probable Cause:Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 6. Presentation. Could there be a problem with encryption between the application and the database server?
Maybe, but Martha would probably see an error message rather than nothing.
Throughput testers enable you to measure the data flow in a network. Which tool is appropriate depends on the type of network throughput you want to test. Most techs use one of several speed-test sites for checking an Internet connection's throughput, such as
MegaPath's Speakeasy Speed Test (Figure 21-8): www.speakeasy.net/speedtest.
__ has a utility called pathping that combines the functions of ping and traceroute and adds some additional functions.
Microsoft
__ test voltage (both AC and DC), resistance, and continuity.
Multimeters
mtr stands for
My Traceroute
__ is a dynamic (keeps running) equivalent to traceroute. Windows does not support __.
My Traceroute (mtr)
LAN Problems: Link Aggregation Problems: NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is to turn off a feature called IGMP snooping, which is enabled by default on Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't. The problem with turning off IGMP snooping is that the switches won't map and filter multicast traffic. Instead of only sending to the devices that are set up to receive multicast, the switches will treat multicast messages as broadcast messages and send them to everybody. This is a __ misconfiguration that can seriously degrade network performance.
NIC teaming
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario. To enable LACP between two devices, such as the switch and file server just noted, each device needs two or more interconnected network interfaces configured for LACP. When the two devices interact, they will make sure they can communicate over multiple physical ports at the same speeds and form a single logical port that takes advantage of the full combined bandwidth (Figure 21-12). Those ports can be in one of two modes: active or passive. Active ports want to use LACP and send special frames out trying to initiate creating an aggregated logical port. Passive ports wait for active ports to initiate the conversation before they will respond. So here's the common network error with LACP setups. An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of
NIC teaming misconfiguration.
LAN Problems: Time Issues: Most devices these days rely on the __ time servers on the Internet to regulate time. Every once in a while (like on the CompTIA Network+ exam), you'll see a scenario where machines, isolated from the Internet (and thus removed from a time server), will get out of sync. This can result in incorrect time issues that stop services from working properly. Did I mention that this is rare?
NIST
As you'll recall from back in Chapter 18, "Managing Risk," a port scanner is a program that probes ports on another system, logging the state of the scanned ports. These tools are used to look for unintentionally opened ports that might make a system vulnerable to attack. As you might imagine, they also are used by hackers to break into systems. The most famous of all port scanners is probably the powerful and free
Nmap. Nmap was originally designed to work on UNIX systems, so Windows folks used alternatives like Angry IP Scanner by Anton Keks (Figure 21-7). Nmap has been ported to just about every operating system these days, however, so you can find it for Windows.
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory. This step comes down to experience—or good use of the support tools at your disposal, such as your knowledge base. You need to select the most probable cause from all the possible causes, so the solution you choose fixes the problem the first time. This may not always happen, but whenever possible, you want to avoid spending a whole day stabbing in the dark while the problem snores softly to itself in some cozy, neglected corner of your network. Don't forget to question the obvious. If Bob can't print to the networked printer, for example, check to see that the printer is plugged in and turned on.Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the __ as a troubleshooting tool in several ways to help with this process.
OSI seven-layer model
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A PMTU-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount. Unfortunately, PMTU runs under ICMP; most routers have firewall features that, by default, are configured to block ICMP requests, making PMTU worthless. This is called a
PMTU or MTU black hole. If you're having terrible connection problems and you've checked everything else, you need to consider this issue. In many cases, going into the router and turning off ICMP blocking in the firewall is all you need to do to fix the problem.
PMTU stands for
Path MTU Discovery (PMTU)
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, __, a method to determine the best MTU setting automatically, was created. __ works by adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A __-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of __—you can automatically set your MTU size to the perfect amount.
Path MTU Discovery (PMTU)
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 4: Transport:
Perhaps extreme traffic on the network could block an acknowledgment segment? This seems a bit of a reach, but worth considering.
Exam tip: The ping command has the word "__" in the output.
Pinging
Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary __. Let's focus on the former for a common network issue scenario.
Port Aggregation Protocol (PAgP)
PAgP stands for
Port Aggregation Protocol (PAgP)
__is the process of making remotely connected computers truly act as though they are on the same LAN as local computers.
Proxy ARP
__ (Figure 21-5) put UTP wires into 66- and 110-blocks.
Punchdown tools
LAN Problems: An expired IP address can cause a system not to connect. (what to do)
Release/renew to obtain a proper IP address from the DHCP server. If the DHCP server's scope of IP addresses has been claimed, that release/renew won't work. You'll get an error that points to an exhausted DHCP scope. The only fix for this is to make changes at the DHCP server.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. __ issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol.
Router configuration
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 3: Network:
Someone might have changed the IP address of the database server.
Troubleshooting process: Establish a Theory of Probable Cause:Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 7. Application. Could there be a problem with the API that enables the database application to connect to the database server?
Sure
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 2: Data Link:
The MAC address of the database server or Martha's machine might be blacklisted.
Troubleshooting process: Establish a Theory of Probable Cause:Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. 4. Transport. Perhaps extreme traffic on the network could block an acknowledgment segment?
This seems a bit of a reach, but worth considering.
__ enable you to measure the data flow in a network.
Throughput testers
TDR stands for
Time domain reflectometer
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). __ and __ can tell you where the break is on the cable (Figure 21-2).
Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs)
__ and their partners, __, have only one job: to help you locate a particular cable.
Tone probes; tone generators
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them. When sending a traceroute, it's important to keep a significant difference between Windows and UNIX/Linux/Cisco systems in mind. Windows tracert sends only ICMP packets, while UNIX/Linux/Cisco traceroute can send either ICMP packets or UDP datagrams, but sends __ by default.
UDP datagrams
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them. When sending a traceroute, it's important to keep a significant difference between Windows and UNIX/Linux/Cisco systems in mind. Windows tracert sends only ICMP packets, while UNIX/Linux/Cisco traceroute can send either ICMP packets or
UDP datagrams, but sends UDP datagrams by default. Because many routers block ICMP packets, if your traceroute fails from a Windows system, running it on a Linux or UNIX system may return more complete results.
LAN Problems: Adding VLANS: When you add VLANs into the network mix, all sorts of fun network issues can crop up. As an example, suppose Bill has a 24-port managed switch segmented into four VLANs, one for each group in the office: Management, Sales, Marketing, and Development (Figure 21-11). Bill thought he'd assigned six ports to each VLAN when he set up the switch, but by mistake he assigned seven ports to VLAN 1 and only five ports to VLAN 2. Merrily plugging in the patch cables for each group of users, Bill gets called up by his boss asking why Cindy over in Sales suddenly can see resources reserved for management. This obviously points to an interface misconfiguration that resulted in a __.
VLAN mismatch
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to send an ARP request to get the IP address. Your VPN device is designed to act as a proxy for all that type of data. Almost all proxy ARP problems take place on the
VPN concentrator. With misconfigured proxy ARP settings, the VPN concentrator can send what looks like a denial of service (DoS) attack on the LAN. (A DoS attack is usually directed at a server exposed on the Internet, like a Web server. See Chapter 19, "Protecting Your Network," for more details on these and other malicious attacks.) If your clients start receiving a large number of packets from the VPN concentrator, assume you have a proxy ARP problem and escalate by getting the person in charge of the VPN to fix it.
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a __ as the classic example.
Virtual Private Network (VPN)
VRRP stands for
Virtual Router Redundancy Protocol (VRRP)
Exam tip: As you'll recall from Chapter 18, if you want to prevent downtime due to a failure on your default gateway, you should consider implementing __ or, if you are a Cisco shop, __.
Virtual Router Redundancy Protocol (VRRP); Hot Standby Router Protocol (HSRP).
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settings can lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users.Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. __ don't come up, __ suddenly disappear, and __ can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router.
Web pages; FTP servers; e-mail clients
Network problems fall into several basic categories, and most of these problems you or a network tech in the proper place can fix. Fixing problems at the workstation, work area, or server is a network tech's bread and butter. The same is true of connecting to resources on the LAN. Problems connecting to a WAN can often be resolved at the local level, but sometimes need to get escalated. The knowledge from the previous chapters combined with the tools and methods you've learned in this chapter should enable you to fix just about any network! There are a couple of stumbling blocks when it comes to resolving network issues. First, at almost any level of problem, the result—as far as the end user is concerned—is the same. He or she can't access resources beyond the local machine. Whether a user tries to access the local file server or do a Google search, if the attempt fails, "the network is down!" You need to fall back on the most important question a tech can ask:
What can cause this problem? Then methodically work through the troubleshooting steps and tools to narrow possibilities. Let's look at a scenario to illustrate the narrowing process.
My Traceroute (mtr) is a dynamic (keeps running) equivalent to traceroute. __ does not support mtr.
Windows
Some problems you can fix at the local machine don't point to messed-up hardware or invalid settings, but reflect the current mix of wired and wireless networks in the same place. Here's a scenario that applies to Windows versions before Windows 10. Tina has a wireless network connection to the Internet. She gets a shiny new printer with an Ethernet port, but with no Wi-Fi capability. She wants to print from both her PC and her laptop, so she creates a small LAN: a couple of Ethernet cables and a switch. She plugs everything in, installs drivers, and all is well. She can print from both machines. Unfortunately, as soon as she prints, her Internet connection goes down. The funny part is that the Internet connection didn't go anywhere, but her simultaneous wired/wireless connections created a network failure. The wired and wireless NICs can't actually operate simultaneously and, by default, the wired connection takes priority in the order in which devices are accessed by network services. Exam tip: __ does not have this simultaneous wired/wireless connection issue at all, so the problem is irrelevant as long as your clients have updated computers.
Windows 10
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them. When sending a traceroute, it's important to keep a significant difference between Windows and UNIX/Linux/Cisco systems in mind.
Windows tracert sends only ICMP packets, while UNIX/Linux/Cisco traceroute can send either ICMP packets or UDP datagrams, but sends UDP datagrams by default. Because many routers block ICMP packets, if your traceroute fails from a Windows system, running it on a Linux or UNIX system may return more complete results.
A packet sniffer, as you'll recall from Chapter 20, intercepts and logs network packets. You have many choices when it comes to packet sniffers. Some sniffers come as programs you run on a computer, while others manifest as dedicated hardware devices. Most packet sniffers come bundled with a protocol analyzer, the tool that takes the sniffed information and figures out what's happening on the network. Arguably, the most popular GUI packet sniffer and protocol analyzer is
Wireshark (Figure 21-6).
Beyond Local - Escalate: Switching Loops: Also known as a bridging loop, a switching loop is when you connect and configure multiple switches together in such a way that causes a circular path to appear. Switching loops are rare because all switches use the Spanning Tree Protocol (STP), but they do happen. The symptoms are identical to
a broadcast storm: every computer on the broadcast domain can no longer access the network.
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When __, turn to a certifier.
a cable is not broken but it's not moving data the way it should
When a cable is not broken but it's not moving data the way it should, turn to
a certifier.
The ping utility uses Internet Message Control Protocol (ICMP) packets to query by IP address or by name. It works across routers, so it's generally the first tool used to check if a system is reachable. Unfortunately, many devices block ICMP packets, so
a failed ping doesn't always point to an offline system.
The extremely transparent fiber-optic cables allow light to shine but have some inherent impurities in the glass that can reduce light transmission. Dust, poor connections, and light leakage can also degrade the strength of light pulses as they travel through a fiber-optic run. To measure the amount of light loss, technicians use an optical power meter, also referred to as a light meter (see Figure 21-3). The light meter system uses __ at one end of a run and __ at the other end. This measures the amount of light that reaches the detector.
a high-powered source of light; a calibrated detector
Everyone in the local office appears to have full access to local and Internet Web sites. No one, however, can reach a company-operated server at a particular remote site in Istanbul. There has been a recent change to the firewall configuration, so it is up to technician Terry to determine if the firewall change is the culprit or if the problem lies elsewhere. Terry has come up with three possible theories: the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the ping and ping6 utilities. First he pings localhost to confirm the workstation has a working IP stack, then he attempts to ping the remote server and gets no response. Next, he tries the tracert and traceroute utilities on the different hosts. Traceroute shows a functional path to the router that connects the remote office to the Internet, but does not get a response from the server. So far, everything seems to confirm that the local office cannot get to the remote server. Just to be able to say he tried everything, Terry runs the mtr utility from a Linux box and lets it run for an extended time. At the same time, he runs the pathping utility from a Windows computer. Neither utility can contact the server. He tries all of these utilities on some other company resources and Internet sites and has no problems connecting. Confident that the reported symptom is confirmed, Terry puts in a call to the remote site to ask about the status. The virtual PBX sends Terry to voicemail for every extension that he calls. This could point to
a network disconnection at the site or to everyone being out of the office there. Since it is 3:00 a.m. at the remote site, Terry does not have a clear answer.
As you'll recall from back in Chapter 18, "Managing Risk," a port scanner is
a program that probes ports on another system, logging the state of the scanned ports. These tools are used to look for unintentionally opened ports that might make a system vulnerable to attack. As you might imagine, they also are used by hackers to break into systems.
Punchdown tools (Figure 21-5) put UTP wires into 66- and 110-blocks. The only time you would use a punchdown tool in a diagnostic environment is
a quick repunch of a connection to make sure all the contacts are properly set.
Beyond Local - Escalate: Broadcast Storms: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well. The trick is to isolate; that's where escalation comes in. You need to break down the network quickly by unplugging devices until you can find the one causing trouble. Getting a packet analyzer to work can be difficult, but at least try. If you can scoop up one packet, you'll know what node is causing the trouble. The second the bad node is disconnected, the network returns to normal. But if you have a lot of machines to deal with and a bunch of users who can't get on the network yelling at you, you'll need help. Call
a supervisor to get support to solve the crisis as quickly as possible.
No single person is truly in control of an entire Internet-connected network. Large organizations split network support duties into very skill-specific areas: routers, cable infrastructure, user administration, and so on. Even in a tiny network with a single network support person, problems will arise that go beyond the tech's skill level or that involve equipment the organization doesn't own (usually it's their ISP's gear). In these situations, the tech needs to identify the problem and, instead of trying to fix it on his or her own, escalate the issue. In network troubleshooting, problem escalation should occur when you face a problem that falls outside the scope of your skills and you need help. In large organizations, escalation problems have very clear procedures, such as who to call and what to document. In small organizations, escalation often is nothing more than
a technician realizing that he or she needs help. The CompTIA Network+ exam objectives define some classic networking situations that CompTIA feels should be escalated. Here's how to recognize broadcast storms, switching loops, routing problems, and proxy ARP.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 1. Identify the problem. a.
a. Gather information.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 3. Test the theory to determine the cause. a.
a. Once the theory is confirmed, determine the next steps to resolve the problem.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 2. Establish a theory of probable cause. a.
a. Question the obvious.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An __ might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it.
access control list (ACL)
Exam tip: Avoid aggressive or __ questions when trying to get information from a user.
accusatory
The netstat utility displays information on the current state of all the running IP processes on a system. It shows what sessions are
active and can also provide statistics based on ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows current sessions. Typing netstat -r shows the routing table (100 percent identical to route print). If you want to know about your current sessions, netstat is the tool to use.
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario. To enable LACP between two devices, such as the switch and file server just noted, each device needs two or more interconnected network interfaces configured for LACP. When the two devices interact, they will make sure they can communicate over multiple physical ports at the same speeds and form a single logical port that takes advantage of the full combined bandwidth (Figure 21-12). Those ports can be in one of two modes:
active or passive. Active ports want to use LACP and send special frames out trying to initiate creating an aggregated logical port. Passiveports wait for active ports to initiate the conversation before they will respond.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by
adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A PMTU-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount.
Beyond Local - Escalate: Switching Loops: Also known as a bridging loop, a switching loop is when you connect and configure multiple switches together in such a way that causes a circular path to appear. Switching loops are rare because
all switches use the Spanning Tree Protocol (STP), but they do happen. The symptoms are identical to a broadcast storm: every computer on the broadcast domain can no longer access the network.
The ipconfig (Windows), ifconfig (macOS and UNIX), and ip (Linux) utilities tell you
almost anything you want to know about a computer's IP settings. Make sure you know that typing ipconfig alone only gives basic information. Typing ipconfig /all gives detailed information (like DNS servers and MAC address).
The end-to-end principle meant originally that
applications and work should happen only at the endpoints in a network. In the early days of networking, this made a lot of sense. Connections weren't always fully reliable and thus were not good for real-time activity. So the work should get done by the computers at the ends of a network connection. The Internet was founded on the end-to-end principle.
The __ utility enables you to view and change the ARP table on a computer.
arp
Make the CompTIA Network+ exam (and real life) easier by separating your software tools into two groups: those that come built into every operating system and those that are third-party tools. Typical built-in tools are tracert/traceroute, ipconfig/ifconfig/ip, arp, ping, arping, pathping, nslookup/dig, route, and netstat/ss. Third-party tools fall into the categories of packet sniffers, port scanners, throughput testers, and looking glass sites. The CompTIA Network+ exam tests your ability to recognize the output from all of the built-in tools (except __ and __). Take some time to memorize example outputs from all of these tools.
arping; ss
Troubleshooting process: Identify the problem: Determine if anything has changed: Determine if anything has changed on the network recently that might have caused the problem. You may not have to ask many questions before the person using the problem system can tell you what has changed, but, in some cases, establishing if anything has changed can take quite a bit of time and involve further work behind the scenes. Here are some examples of questions to ask: -"What exactly was happening when the problem occurred?" -"Has anything been changed on the system recently?" -"Has the system been moved recently?" Notice the way I've tactfully
avoided the word you, as in "Have you changed anything on the system recently?" This is a deliberate tactic to avoid any implied blame on the part of the user. Being nice never hurts, and it makes the whole troubleshooting process more friendly.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 2. Establish a theory of probable cause. b.
b. Consider multiple approaches: i. Top-to-bottom/bottom-to-top OSI model ii. Divide and conquer
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 1. Identify the problem. b.
b. Duplicate the problem, if possible.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 3. Test the theory to determine the cause. b.
b. If the theory is not confirmed, reestablish a new theory or escalate.
If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a __ likely isn't the culprit.
bad cable
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A __ might increase crosstalk, attenuation, or interference.
bad installation
Throughput testers enable you to measure the data flow in a network. Which tool is appropriate depends on the type of network throughput you want to test. Most techs use one of several speed-test sites for checking an Internet connection's throughput, such as MegaPath's Speakeasy Speed Test (Figure 21-8): www.speakeasy.net/speedtest. The CompTIA Network+ exam objectives refer to throughput testers as
bandwidth speed testers.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settings can lead to __ that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users.
blocked TCP/UDP ports
WAN Problems: Company Security Policy: Implemented company security policies can make routine WAN connectivity actions completely fail. Here's a scenario. Mike is the head of his company's IT department and he has a big problem: the amount of traffic running between the two company locations is on a dedicated connection and is blowing his bandwidth out of the water! It's so bad that data moving between the two offices will often drop to a crawl four to five times per day. Why are people using so much bandwidth? As he inspects the problem, Mike realizes that the sales department is the culprit. Most of the data is composed of massive video files the sales department uses in their advertising campaign. He needs to make some security policy decisions. First, he needs to set up a throttling policy that defines in terms of megabits per second the maximum amount of bandwidth any single department can use per day. Second, he needs to add a __. If anyone goes over this limit, the company will block all traffic of that type for a certain amount of time (one hour). Third, he needs to update his company's __ or __ security policies to reflect these new limits. This lets employees, especially those pesky sales folks, know what the new rules are.
blocking policy; fair access policy; utilization limits
Beyond Local - Escalate: Broadcast Storms: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well. The trick is to isolate; that's where escalation comes in. You need to
break down the network quickly by unplugging devices until you can find the one causing trouble. Getting a packet analyzer to work can be difficult, but at least try. If you can scoop up one packet, you'll know what node is causing the trouble. The second the bad node is disconnected, the network returns to normal. But if you have a lot of machines to deal with and a bunch of users who can't get on the network yelling at you, you'll need help. Call a supervisor to get support to solve the crisis as quickly as possible.
Beyond Local - Escalate: Switching Loops: Also known as a __, a switching loop is when you connect and configure multiple switches together in such a way that causes a circular path to appear.
bridging loop
LAN Problems: Most clients will use DHCP for IP address, subnet mask, and default gateway settings. With manual configuration, on the other hand, errors can creep in and cause a device to fail to connect to network resources. A typical scenario is with a __ environment, where an employee will bring in a manually configured laptop—that he didn't remember was tuned to his home network—and complain about not being able to access the LAN or the Internet.
bring your own device (BYOD)
Beyond Local - Escalate: Broadcast Storms: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other __ work perfectly well.
broadcast domains
Beyond Local - Escalate: A __ is the result of one or more devices sending a nonstop flurry of broadcast frames on the network.
broadcast storm
Beyond Local - Escalate: No single person is truly in control of an entire Internet-connected network. Large organizations split network support duties into very skill-specific areas: routers, cable infrastructure, user administration, and so on. Even in a tiny network with a single network support person, problems will arise that go beyond the tech's skill level or that involve equipment the organization doesn't own (usually it's their ISP's gear). In these situations, the tech needs to identify the problem and, instead of trying to fix it on his or her own, escalate the issue. In network troubleshooting, problem escalation should occur when you face a problem that falls outside the scope of your skills and you need help. In large organizations, escalation problems have very clear procedures, such as who to call and what to document. In small organizations, escalation often is nothing more than a technician realizing that he or she needs help. The CompTIA Network+ exam objectives define some classic networking situations that CompTIA feels should be escalated. Here's how to recognize __(4)
broadcast storms, switching loops, routing problems, and proxy ARP.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 1. Identify the problem. c.
c. Question users.
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed. Only then might a creative tech at her wit's end notice the recently installed, high-powered WAP on the wall outside Tom's office. RFI strikes! If the installation is new and unproven, a perfectly fine network device might be unreachable because of interface errors, meaning that the installer didn't install the wall jack correctly. The resulting incorrect termination might be a mismatched standard (568A rather than 568B, for example).The __ might be bad or might be a crossover cable rather than straight-through cable.
cable from the wall to the workstation
LAN Problems: Adding VLANS: When you add VLANs into the network mix, all sorts of fun network issues can crop up. As an example, suppose Bill has a 24-port managed switch segmented into four VLANs, one for each group in the office: Management, Sales, Marketing, and Development (Figure 21-11). Bill thought he'd assigned six ports to each VLAN when he set up the switch, but by mistake he assigned seven ports to VLAN 1 and only five ports to VLAN 2. Merrily plugging in the patch cables for each group of users, Bill gets called up by his boss asking why Cindy over in Sales suddenly can see resources reserved for management. This obviously points to an interface misconfiguration that resulted in a VLAN mismatch. Similarly, after fixing his initial mistake and getting the VLANs set up properly, Bill needs to plug the right patch cables into the right ports. If he messes up and plugs the patch cable for Cindy's computer into a VLAN 1 port, the intrepid salesperson would again have access to the management resources. Such __ show up pretty quickly and are readily fixed. Keep proper records of patch cable assignments and plug the cables into the proper ports.
cable placement errors
A __ or __ (Figure 21-4) helps you to make UTP cables. You'll need a crimping tool (a crimper) as well.
cable stripper; snip
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a __ on the cable.
cable tester
Multimeters test voltage (both AC and DC), resistance, and continuity. They are the unsung heroes of cabling infrastructures because no other tool can tell you how much voltage is on a line. They are also a great fallback for continuity testing when you don't have a __ handy.
cable tester
In multiple chapters in this book, you've read about tools used to configure a network. These hardware tools include (10)__. Some of these tools can also be used in troubleshooting scenarios to help you eliminate or narrow down the possible causes of certain problems. Let's review the tools as listed in the CompTIA Network+ exam objectives (plus a couple I think you should know).
cable testers, TDRs, OTDRs, certifiers, voltage event recorders, protocol analyzers, cable strippers, multimeters, tone probes/generators, and punchdown tools
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the
cabling certification documents to see that yes, the cable worked when installed.
Exam tip: Sometimes a GUI tool like Wireshark won't work because a server has no GUI installed. In situations like this, tcpdump is the go-to choice. This great command-line tool not only enables you to monitor and filter packets in the terminal, but
can also create files you can open in Wireshark for later analysis. Even better, it's installed by default on most UNIX/Linux systems.
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although tracking down the culprit can take time and patience. One of the most obvious errors occurs when you're duplicating machines and using static IP addresses. As soon as you plug in the duplicated machine with its duplicate IP address, the network will howl. No two computers can have the same IP address on a broadcast domain. The fix for the problem—after the face-palm—is to
change the IP address on the new machine either to an unused static IP or to DHCP.
Troubleshooting process: Test the Theory to Determine Cause: With the third step, you need to test the theory to determine the cause but do so without
changing anything or risking any repercussions. If you have determined that the probable cause for Bob not being able to print is that the printer is turned off, go look. If that's the case, then you should plan out your next step to resolve the problem. Do not act yet! That comes next.
The ping utility uses Internet Message Control Protocol (ICMP) packets to query by IP address or by name. It works across routers, so it's generally the first tool used to
check if a system is reachable. Unfortunately, many devices block ICMP packets, so a failed ping doesn't always point to an offline system.
Troubleshooting process: Identify the problem: Determine if anything has changed: Determine if anything has changed on the network recently that might have caused the problem. You may not have to ask many questions before the person using the problem system can tell you what has changed, but, in some cases, establishing if anything has changed can take quite a bit of time and involve further work behind the scenes. Here are some examples of questions to ask: -"What exactly was happening when the problem occurred?" -"Has anything been changed on the system recently?" -"Has the system been moved recently?" Notice the way I've tactfully avoided the word you, as in "Have you changed anything on the system recently?" This is a deliberate tactic to avoid any implied blame on the part of the user. Being nice never hurts, and it makes the whole troubleshooting process more friendly. You should also internally ask yourself some isolating questions, such as "Was that machine involved in the software push last night?" or "Didn't a tech visit that machine this morning?" Note you will only be able to answer these questions if your documentation is up to date. Sometimes, isolating a problem may require you to
check system and hardware logs (such as those stored by some routers and other network devices), so make sure you know how to do this.
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you where the break is on the cable (Figure 21-2). A TDR works with copper cables and an OTDR works with fiber optics, but otherwise they share the same function. If a problem shows itself as a disconnect and you've first __, then think about using these tools.
checked easier issues that would manifest as disconnects, such as loss of permissions, an unplugged cable, or a server shut off
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to
chuck the theory out the window later in the process and establish a revised theory.
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be __, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask __ questions, such as, "What have you already tried in attempting to fix the problem?"
close-ended; open-ended
Everyone in the local office appears to have full access to local and Internet Web sites. No one, however, can reach a company-operated server at a particular remote site in Istanbul. There has been a recent change to the firewall configuration, so it is up to technician Terry to determine if the firewall change is the culprit or if the problem lies elsewhere. Terry has come up with three possible theories: the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to
confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the ping and ping6 utilities. First he pings localhost to confirm the workstation has a working IP stack, then he attempts to ping the remote server and gets no response. Next, he tries the tracert and traceroute utilities on the different hosts. Traceroute shows a functional path to the router that connects the remote office to the Internet, but does not get a response from the server.
The end-to-end principle meant originally that applications and work should happen only at the endpoints in a network. In the early days of networking, this made a lot of sense. Connections weren't always fully reliable and thus were not good for real-time activity. So the work should get done by the computers at the ends of a network connection. The Internet was founded on the end-to-end principle. With modern networks like the Internet, the end-to-end concept has had to evolve. Clearly, anything you do over the Internet goes through many different machines. So, perhaps end-to-end means that the intermediary devices simply don't change the essential data in packets that flow through them. Add in today, though, the fact that plenty of intermediaries want to do a lot of things to your data as it flows through their devices. Thieves want to steal information. Merchants want to sell you things. Advertisers want to intrude on your monitor. Government agencies want to control what you can see or do, or simply want to monitor what you do for later, perhaps benign purposes. Other intermediaries help create trust bonds between your computer and a secure site so that e-commerce can function. That dynamic between the fundamental principle of work only happening on the ends of the connection and all the intermediaries facilitating, pilfering, or punctuating is the current state of the Internet. It's the basic tension between ISP companies that want to build in tiered profit structures and the consumers and creators who want Net Neutrality. As a common issue, end-to-end connectivity refers to connecting users with essential resources within a smaller network, such as a LAN or a private WAN. In such a scenario, the job of the tech is to ensure
connections happen fully. Make sure the proper ports are open on an application server. Make sure the right people have the right permissions to access resources and that white list and black list ACLs are set up correctly.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to link lights when you have a "hardware failure." The network connection LED status indicators—link lights—can quickly point to a
connectivity issue. Try known good cables/NICs if you run into this issue.
The ping and traceroute utilities are excellent examples of __, applications that enable you to determine if a connection can be made between two computers.
connectivity software
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for
continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed.
Multimeters test voltage (both AC and DC), resistance, and continuity. They are the unsung heroes of cabling infrastructures because no other tool can tell you how much voltage is on a line. They are also a great fallback for __ when you don't have a cable tester handy.
continuity testing
Multimeters test voltage (both AC and DC), resistance, and continuity. They are the unsung heroes of cabling infrastructures because no other tool can tell you how much voltage is on a line. They are also a great fallback for
continuity testing when you don't have a cable tester handy.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks
continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.)
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you where the break is on the cable (Figure 21-2). A TDR works with __ cables
copper
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book.EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are
crazy things to troubleshoot.
A cable stripper or snip (Figure 21-4) helps you to make UTP cables. You'll need a __ (a __) as well.
crimping tool; crimper
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed. Only then might a creative tech at her wit's end notice the recently installed, high-powered WAP on the wall outside Tom's office. RFI strikes! If the installation is new and unproven, a perfectly fine network device might be unreachable because of interface errors, meaning that the installer didn't install the wall jack correctly. The resulting incorrect termination might be a mismatched standard (568A rather than 568B, for example).The cable from the wall to the workstation might be bad or might be a __ rather than __.
crossover cable; straight-through cable
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience __, where the electrical signal bleeds from one wire pair to another, creating interference.
crosstalk
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase __, __, or __.
crosstalk, attenuation, or interference.
The netstat utility displays information on the current state of all the running IP processes on a system. It shows what sessions are active and can also provide statistics based on ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows
current sessions.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 1. Identify the problem. d.
d. Identify symptoms.
A packet sniffer, as you'll recall from Chapter 20, intercepts and logs network packets. You have many choices when it comes to packet sniffers. Some sniffers come as programs you run on a computer, while others manifest as
dedicated hardware devices.
LAN Problems: Link Aggregation Problems: NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is to turn off a feature called IGMP snooping, which is enabled by default on Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't. The problem with turning off IGMP snooping is that the switches won't map and filter multicast traffic. Instead of only sending to the devices that are set up to receive multicast, the switches will treat multicast messages as broadcast messages and send them to everybody. This is a NIC teaming misconfiguration that can seriously
degrade network performance.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to __, was created.
determine the best MTU setting automatically
Troubleshooting process: Test the Theory to Determine Cause: With the third step, you need to test the theory to determine the cause but do so without changing anything or risking any repercussions. If you have determined that the probable cause for Bob not being able to print is that the printer is turned off, go look. If that's the case, then you should plan out your next step to resolve the problem. Do not act yet! That comes next. If the theory is not confirmed, you need to reestablish a new theory or escalate the problem.Go back to step two and determine a new probable cause. Once you have another idea, test it. The reason you should hesitate to act at this third step is that you might not have permission to make the fix or the fix might cause repercussions you don't fully understand yet. For example, if you walk over to the print server room to see if the printer is powered up and online and find the door padlocked, that's a whole different level of problem. Sure, the printer is turned off, but management has done it for a reason. In this sort of situation, you need to escalate the problem. To escalate has two meanings: either to inform other parties about a problem for guidance or to pass the job off to another authority who has control over the device/issue that's most probably causing the problem. Let's say you have a server with a bad NIC. This server is used heavily by the accounting department, and taking it down may cause problems you don't even know about. You need to inform the accounting manager to consult with them. Alternatively, you'll come across problems over which you have no control or authority. A badly acting server across the country (hopefully) has another person in charge to whom you need to hand over the job. Regardless of how many times you need to go through this process, you'll eventually reach a theory that seems right. Once the theory is confirmed,
determine the next steps you need to take to resolve the problem.
Computers use the Address Resolution Protocol (ARP) utility to resolve IP addresses to MAC addresses. As the computer learns various MAC addresses on its LAN, it jots them down in the ARP table. When Computer A wants to send a message to Computer B, it
determines B's IP address and then checks the ARP table for a corresponding MAC address.
Almost every new networking person I teach will, at some point, ask me: "What tools do I need to buy?" My answer shocks them: "None. Don't buy a thing." It's not so much that you don't need tools, but rather that
different networking jobs require wildly different tools. Plenty of network techs never crimp a cable. An equal number never open a system. Some techs do nothing all day but pull cable. The tools you need are defined by your job.
The nslookup (all operating systems) and dig (macOS/UNIX/Linux) utilities help diagnose DNS problems. These tools are very powerful, but the CompTIA Network+ exam won't ask you more than basic questions, such as how to use them to see if a DNS server is working. When working on Windows systems, the nslookup utility is your only choice by default. On macOS/UNIX/Linux systems, you should prefer the __ utility.
dig
LAN Problems: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might
direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user.
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you where the break is on the cable (Figure 21-2). A TDR works with copper cables and an OTDR works with fiber optics, but otherwise they share the same function. If a problem shows itself as a __ and you've first checked easier issues that would manifest as __, such as loss of permissions, an unplugged cable, or a server shut off, then think about using these tools.
disconnect
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at installation, but running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for
disconnects, only slowdowns. All certifiers need some kind of loopback adapter on the other end of the cable run to provide termination and return of a signal. A loopback adapter is a small device with a single port.
LAN Problems: Server misconfigurations: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user. You'll be clued into such misconfiguration by using ping and other tools. If you can ping a file server by IP address but not by name, this points to DNS issues. Similarly, if a computer fails in __, like connecting to a networked printer, DHCP or DNS misconfiguration can be the culprit. To fix the issue, go into the network configuration for the client or the server and find the misconfigured settings.
discovering neighboring devices/nodes
The route utility enables you to
display and edit the local system's routing table. To show the routing table, just type route print or netstat -r.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. You might imagine the reverse model in some situations. If the network was newly installed, for example, running through some of the basic connectivity at Layers 1 and 2 might be a good first approach. Another option for tackling multiple options is to use the __ approach.
divide and conquer
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview, escalate as necessary to get the fix implemented. If you're the implementer, follow these guidelines. All the way through implementation, try only one likely solution at a time. There's no point in installing several patches at once, because then you can't tell which one fixed the problem. Similarly, there's no point in replacing several items of hardware (such as a hard disk and its controller cable) at the same time, because then you can't tell which part (or parts) was faulty. As you try each possibility, always
document what you do and what results you get. This isn't just for a future problem either—during a lengthy troubleshooting process, it's easy to forget exactly what you tried two hours before or which thing you tried produced a particular result. Although being methodical may take longer, it will save time the next time—and it may enable you to pinpoint what needs to be done to stop the problem from recurring at all, thereby reducing future call volume to your support team—and as any support person will tell you, that's definitely worth the effort!
Troubleshooting process: Identify the problem: Determine if anything has changed: Determine if anything has changed on the network recently that might have caused the problem. You may not have to ask many questions before the person using the problem system can tell you what has changed, but, in some cases, establishing if anything has changed can take quite a bit of time and involve further work behind the scenes. Here are some examples of questions to ask: -"What exactly was happening when the problem occurred?" -"Has anything been changed on the system recently?" -"Has the system been moved recently?" Notice the way I've tactfully avoided the word you, as in "Have you changed anything on the system recently?" This is a deliberate tactic to avoid any implied blame on the part of the user. Being nice never hurts, and it makes the whole troubleshooting process more friendly. You should also internally ask yourself some isolating questions, such as "Was that machine involved in the software push last night?" or "Didn't a tech visit that machine this morning?" Note you will only be able to answer these questions if your
documentation is up to date. Sometimes, isolating a problem may require you to check system and hardware logs (such as those stored by some routers and other network devices), so make sure you know how to do this.
DF stands for
don't fragment
Exam tip: CompTIA continues to include __ as a common network issue, although that's not how networks work today. Every NIC, switch, and router features autosensing and autonegotiating ports. You plug two devices in and, as long as they're not otherwise misconfigured, they'll run at the same speed—most likely at full duplex.
duplex/speed mismatch
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although tracking down the culprit can take time and patience. One of the most obvious errors occurs when you're duplicating machines and using static IP addresses. As soon as you plug in the duplicated machine with its __, the network will howl.
duplicate IP address
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although tracking down the culprit can take time and patience. One of the most obvious errors occurs when you're duplicating machines and using static IP addresses. As soon as you plug in the duplicated machine with its duplicate IP address, the network will howl. No two computers can have the same IP address on a broadcast domain. The fix for the problem—after the face-palm—is to change the IP address on the new machine either to an unused static IP or to DHCP. A related issue comes from duplicate MAC addresses, something that can happen when working with virtual machines or, rarely, as a result of a manufacturing error. The effect is the same as
duplicate IP addresses. Either put the devices on different VLANs or swap out NICs to avoid duplication.
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although tracking down the culprit can take time and patience. One of the most obvious errors occurs when you're duplicating machines and using static IP addresses. As soon as you plug in the duplicated machine with its duplicate IP address, the network will howl. No two computers can have the same IP address on a broadcast domain. The fix for the problem—after the face-palm—is to change the IP address on the new machine either to an unused static IP or to DHCP. A related issue comes from __, something that can happen when working with virtual machines or, rarely, as a result of a manufacturing error.
duplicate MAC addresses
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"The type of question you ask at any given moment depends on what information you need and on the user's knowledge level. If, for example, the user seems to be technically oriented, you will probably be able to ask more close-ended questions because they will know what you are talking about. If, on the other hand, the user seems to be confused about what's happening, open-ended questions will allow him or her to explain in his or her own words what is going on. One of the first steps in trying to determine the cause of a problem is to understand the extent of the problem. Is it specific to one user or is it network-wide? Sometimes this entails trying the task yourself, both from the user's machine and from your own or another machine. For example, if a user is experiencing problems logging into the network, you might need to go to that user's machine and try to use his or her user name to log in. In other words, try to
duplicate the problem. Doing this tells you whether the problem is a user error of some kind, as well as enables you to see the symptoms of the problem yourself. Next, you probably want to try logging in with your own user name from that machine, or have the user try to log in from another machine.
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although tracking down the culprit can take time and patience. One of the most obvious errors occurs when you're
duplicating machines and using static IP addresses. As soon as you plug in the duplicated machine with its duplicate IP address, the network will howl. No two computers can have the same IP address on a broadcast domain. The fix for the problem—after the face-palm—is to change the IP address on the new machine either to an unused static IP or to DHCP.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 1. Identify the problem. e.
e. Determine if anything has changed
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where the electrical signal bleeds from one wire pair to another, creating interference. Fifth, a broken cable might pick up noise, spurious signals usually caused by faulty hardware or poorly crimped jacks. Finally, a broken cable might have impedance mismatch. Impedance is the natural electrical resistance of a cable. When cables of different types—think thickness, composition of the metal, and so on—connect and the flow of electrons is not uniform, it can cause a unique type of electrical noise, called an
echo.
Troubleshooting process: Test the Theory to Determine Cause: With the third step, you need to test the theory to determine the cause but do so without changing anything or risking any repercussions. If you have determined that the probable cause for Bob not being able to print is that the printer is turned off, go look. If that's the case, then you should plan out your next step to resolve the problem. Do not act yet! That comes next. If the theory is not confirmed, you need to reestablish a new theory or escalate the problem.Go back to step two and determine a new probable cause. Once you have another idea, test it. The reason you should hesitate to act at this third step is that you might not have permission to make the fix or the fix might cause repercussions you don't fully understand yet. For example, if you walk over to the print server room to see if the printer is powered up and online and find the door padlocked, that's a whole different level of problem. Sure, the printer is turned off, but management has done it for a reason. In this sort of situation, you need to escalate the problem. To escalate has two meanings:
either to inform other parties about a problem for guidance or to pass the job off to another authority who has control over the device/issue that's most probably causing the problem. Let's say you have a server with a bad NIC. This server is used heavily by the accounting department, and taking it down may cause problems you don't even know about. You need to inform the accounting manager to consult with them. Alternatively, you'll come across problems over which you have no control or authority. A badly acting server across the country (hopefully) has another person in charge to whom you need to hand over the job.
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about __ and __ previously in the book.
electromagnetic interference (EMI); radio frequency interference (RFI)
The ping and traceroute utilities are excellent examples of connectivity software, applications that
enable you to determine if a connection can be made between two computers.
The end-to-end principle meant originally that applications and work should happen only at the endpoints in a network. In the early days of networking, this made a lot of sense. Connections weren't always fully reliable and thus were not good for real-time activity. So the work should get done by the computers at the ends of a network connection. The Internet was founded on the end-to-end principle. With modern networks like the Internet, the end-to-end concept has had to evolve. Clearly, anything you do over the Internet goes through many different machines. So, perhaps end-to-end means that the intermediary devices simply don't change the essential data in packets that flow through them. Add in today, though, the fact that plenty of intermediaries want to do a lot of things to your data as it flows through their devices. Thieves want to steal information. Merchants want to sell you things. Advertisers want to intrude on your monitor. Government agencies want to control what you can see or do, or simply want to monitor what you do for later, perhaps benign purposes. Other intermediaries help create trust bonds between your computer and a secure site so that e-commerce can function. That dynamic between the fundamental principle of work only happening on the ends of the connection and all the intermediaries facilitating, pilfering, or punctuating is the current state of the Internet. It's the basic tension between ISP companies that want to build in tiered profit structures and the consumers and creators who want Net Neutrality. As a common issue, __ refers to connecting users with essential resources within a smaller network, such as a LAN or a private WAN. In such a scenario, the job of the tech is to ensure connections happen fully. Make sure the proper ports are open on an application server. Make sure the right people have the right permissions to access resources and that white list and black list ACLs are set up correctly.
end-to-end connectivity
The __ meant originally that applications and work should happen only at the endpoints in a network. In the early days of networking, this made a lot of sense. Connections weren't always fully reliable and thus were not good for real-time activity. So the work should get done by the computers at the ends of a network connection. The Internet was founded on the __.
end-to-end principle
LAN Problems: Most clients will use DHCP for IP address, subnet mask, and default gateway settings. With manual configuration, on the other hand,
errors can creep in and cause a device to fail to connect to network resources. A typical scenario is with a bring your own device (BYOD) environment, where an employee will bring in a manually configured laptop—that he didn't remember was tuned to his home network—and complain about not being able to access the LAN or the Internet.
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview,
escalate as necessary to get the fix implemented.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settings can lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users.Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then
escalate the problem and find the person responsible for the router.
Troubleshooting process: Test the Theory to Determine Cause: With the third step, you need to test the theory to determine the cause but do so without changing anything or risking any repercussions. If you have determined that the probable cause for Bob not being able to print is that the printer is turned off, go look. If that's the case, then you should plan out your next step to resolve the problem. Do not act yet! That comes next. If the theory is not confirmed, you need to reestablish a new theory or escalate the problem.Go back to step two and determine a new probable cause. Once you have another idea, test it. The reason you should hesitate to act at this third step is that you might not have permission to make the fix or the fix might cause repercussions you don't fully understand yet. For example, if you walk over to the print server room to see if the printer is powered up and online and find the door padlocked, that's a whole different level of problem. Sure, the printer is turned off, but management has done it for a reason. In this sort of situation, you need to
escalate the problem.
Troubleshooting process: Establish a Plan of Action and Identify Potential Effects: By this point, you should have some ideas as to what the problem might be. It's time to "look before you leap" and
establish a plan of action to resolve the problem. An action plan defines how you are going to fix this problem. Most problems are simple, but if the problem is complex, you need to write down the steps. As you do this, think about what else might happen as you go about the repair. Identify the potential effects of the actions you're about to take, especially the unintended ones. If you take out a switch without a replacement switch at hand, the users might experience excessive downtime while you hunt for a new switch and move them over. If you replace a router, can you restore all the old router's settings to the new one or will you have to rebuild from scratch?
Troubleshooting process: Identify the problem: Determine if anything has changed: Determine if anything has changed on the network recently that might have caused the problem. You may not have to ask many questions before the person using the problem system can tell you what has changed, but, in some cases,
establishing if anything has changed can take quite a bit of time and involve further work behind the scenes.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settings can lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users.Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear:
every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router.
LAN Problems: An expired IP address can cause a system not to connect. Release/renew to obtain a proper IP address from the DHCP server. If the DHCP server's scope of IP addresses has been claimed, that release/renew won't work. You'll get an error that points to an
exhausted DHCP scope. The only fix for this is to make changes at the DHCP server.
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory. This step comes down to
experience—or good use of the support tools at your disposal, such as your knowledge base. You need to select the most probable cause from all the possible causes, so the solution you choose fixes the problem the first time. This may not always happen, but whenever possible, you want to avoid spending a whole day stabbing in the dark while the problem snores softly to itself in some cozy, neglected corner of your network.
WAN Problems: Certificate Problems: SSL/TLS certificates have __ and companies need to maintain them properly.
expiration dates
LAN Problems: An __ can cause a system not to connect. Release/renew to obtain a proper IP address from the DHCP server.
expired IP address
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"The type of question you ask at any given moment depends on what information you need and on the user's knowledge level. If, for example, the user seems to be technically oriented, you will probably be able to ask more close-ended questions because they will know what you are talking about. If, on the other hand, the user seems to be confused about what's happening, open-ended questions will allow him or her to explain in his or her own words what is going on. One of the first steps in trying to determine the cause of a problem is to understand the
extent of the problem. Is it specific to one user or is it network-wide? Sometimes this entails trying the task yourself, both from the user's machine and from your own or another machine.
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 1. Identify the problem. f.
f. Approach multiple problems individually.
LAN Problems: Most clients will use DHCP for IP address, subnet mask, and default gateway settings. With manual configuration, on the other hand, errors can creep in and cause a device to fail to connect to network resources. A typical scenario is with a bring your own device (BYOD) environment, where an employee will bring in a manually configured laptop—that he didn't remember was tuned to his home network—and complain about not being able to access the LAN or the Internet. Anything that doesn't match the LAN settings will cause a client to
fail to connect. An IP address that doesn't match the subnet, for example, will bring no love. An error in the subnet mask settings—an incorrect netmask issue in CompTIA speak—will stop client access cold. A DNS server setting that's not accurate can cause name resolution failure. If the default gateway address is incorrect—an incorrect gateway issue—then there's no Internet for the client.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where the electrical signal bleeds from one wire pair to another, creating interference. Fifth, a broken cable might pick up noise, spurious signals usually caused by (2)
faulty hardware or poorly crimped jacks. Finally, a broken cable might have impedance mismatch. Impedance is the natural electrical resistance of a cable. When cables of different types—think thickness, composition of the metal, and so on—connect and the flow of electrons is not uniform, it can cause a unique type of electrical noise, called an echo.
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you where the break is on the cable (Figure 21-2). A TDR works with copper cables and an OTDR works with
fiber optics, but otherwise they share the same function. If a problem shows itself as a disconnect and you've first checked easier issues that would manifest as disconnects, such as loss of permissions, an unplugged cable, or a server shut off, then think about using these tools.
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to
figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory.
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview, escalate as necessary to get the fix implemented. If you're the implementer, follow these guidelines. All the way through implementation, try only one likely solution at a time. There's no point in installing several patches at once, because then you can't tell which one fixed the problem. Similarly, there's no point in replacing several items of hardware (such as a hard disk and its controller cable) at the same time, because then you can't tell which part (or parts) was faulty. As you try each possibility, always document what you do and what results you get. This isn't just for a future problem either—during a lengthy troubleshooting process, it's easy to
forget exactly what you tried two hours before or which thing you tried produced a particular result. Although being methodical may take longer, it will save time the next time—and it may enable you to pinpoint what needs to be done to stop the problem from recurring at all, thereby reducing future call volume to your support team—and as any support person will tell you, that's definitely worth the effort!
As you'll recall from back in Chapter 18, "Managing Risk," a port scanner is a program that probes ports on another system, logging the state of the scanned ports. These tools are used to look for unintentionally opened ports that might make a system vulnerable to attack. As you might imagine, they also are used by hackers to break into systems. The most famous of all port scanners is probably the powerful and __ Nmap.
free
Troubleshooting process: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to (6)__. Following these steps will help you get to the root of the problem.
gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually.
Beyond Local - Escalate: Switching Loops: Also known as a bridging loop, a switching loop is when you connect and configure multiple switches together in such a way that causes a circular path to appear. Switching loops are rare because all switches use the Spanning Tree Protocol (STP), but they do happen. The symptoms are identical to a broadcast storm: every computer on the broadcast domain can no longer access the network. The good part about switching loops is that they rarely take place on a well-running network. Someone had to break something, and that means someone, somewhere is messing with the switch configuration. Escalate the problem, and
get the team to help you find the person making changes to the switches.
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to send an ARP request to get the IP address. Your VPN device is designed to act as a proxy for all that type of data. Almost all proxy ARP problems take place on the VPN concentrator. With misconfigured proxy ARP settings, the VPN concentrator can send what looks like a denial of service (DoS) attack on the LAN. (A DoS attack is usually directed at a server exposed on the Internet, like a Web server. See Chapter 19, "Protecting Your Network," for more details on these and other malicious attacks.) If your clients start receiving a large number of packets from the VPN concentrator, assume you have a proxy ARP problem and escalate by
getting the person in charge of the VPN to fix it.
LAN Problems: Server misconfigurations: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user. You'll be clued into such misconfiguration by using ping and other tools. If you can ping a file server by IP address but not by name, this points to DNS issues. Similarly, if a computer fails in discovering neighboring devices/nodes, like connecting to a networked printer, DHCP or DNS misconfiguration can be the culprit. To fix the issue,
go into the network configuration for the client or the server and find the misconfigured settings.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A PMTU-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount. Unfortunately, PMTU runs under ICMP; most routers have firewall features that, by default, are configured to block ICMP requests, making PMTU worthless. This is called a PMTU or MTU black hole. If you're having terrible connection problems and you've checked everything else, you need to consider this issue. In many cases, __ is all you need to do to fix the problem.
going into the router and turning off ICMP blocking in the firewall
In multiple chapters in this book, you've read about tools used to configure a network. These __ tools include cable testers, TDRs, OTDRs, certifiers, voltage event recorders, protocol analyzers, cable strippers, multimeters, tone probes/generators, and punchdown tools.
hardware
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A ___ can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost.
hardware failure
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview, escalate as necessary to get the fix implemented. If you're the implementer, follow these guidelines. All the way through implementation, try only one likely solution at a time. There's no point in installing several patches at once, because then you can't tell which one fixed the problem. Similarly, there's no point in replacing several items of hardware (such as a hard disk and its controller cable) at the same time, because then you can't tell which part (or parts) was faulty. As you try each possibility, always document what you do and what results you get. This isn't just for a future problem either—during a lengthy troubleshooting process, it's easy to forget exactly what you tried two hours before or which thing you tried produced a particular result. Although being methodical may take longer, it will save time the next time—and it may enable you to pinpoint what needs to be done to stop the problem from recurring at all, thereby reducing future call volume to your support team—and as any support person will tell you, that's definitely worth the effort! Then you need to test the solution. This is the part everybody hates. Once you think you've fixed a problem, you should try to make it happen again. If you can't, great! But sometimes you will be able to re-create the problem, and then you know you haven't finished the job at hand. Many techs want to slide away quietly as soon as everything seems to be fine, but trust me on this, it won't impress your customer when her problem flares up again 30 seconds after you've left the building—not to mention that you get the joy of another two-hour car trip the next day to fix the same problem, for an even more unhappy client! In the scenario where you are providing support to someone else rather than working directly on the problem, you should have
her try to re-create the problem. This tells you whether she understands what you have been telling her and educates her at the same time, lessening the chance that she'll call you back later and ask, "Can we just go through that one more time?"
The nslookup (all operating systems) and dig (macOS/UNIX/Linux) utilities help diagnose DNS problems. These tools are very powerful, but the CompTIA Network+ exam won't ask you more than basic questions, such as
how to use them to see if a DNS server is working. When working on Windows systems, the nslookup utility is your only choice by default. On macOS/UNIX/Linux systems, you should prefer the dig utility. Both utilities will help in troubleshooting your DNS issues, but dig provides more verbose output by default. You need to be comfortable working with both utilities when troubleshooting modern networks.
Troubleshooting process: Establish a Plan of Action and Identify Potential Effects: By this point, you should have some ideas as to what the problem might be. It's time to "look before you leap" and establish a plan of action to resolve the problem. An action plan defines
how you are going to fix this problem. Most problems are simple, but if the problem is complex, you need to write down the steps. As you do this, think about what else might happen as you go about the repair. Identify the potential effects of the actions you're about to take, especially the unintended ones. If you take out a switch without a replacement switch at hand, the users might experience excessive downtime while you hunt for a new switch and move them over. If you replace a router, can you restore all the old router's settings to the new one or will you have to rebuild from scratch?
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 2. Establish a theory of probable cause. b. Consider multiple approaches: i.
i. Top-to-bottom/bottom-to-top OSI model
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will __ through your observation of what is (or isn't) happening.
identify symptoms
No single person is truly in control of an entire Internet-connected network. Large organizations split network support duties into very skill-specific areas: routers, cable infrastructure, user administration, and so on. Even in a tiny network with a single network support person, problems will arise that go beyond the tech's skill level or that involve equipment the organization doesn't own (usually it's their ISP's gear). In these situations, the tech needs to
identify the problem and, instead of trying to fix it on his or her own, escalate the issue.
Troubleshooting process: First,
identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem.
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you
if you have a continuity problem or if a wire map isn't correct (Figure 21-1).
The CompTIA Network+ exam objectives contain a detailed troubleshooting methodology that provides a good starting point for our discussion. Here are the basic steps in the troubleshooting process: 2. Establish a theory of probable cause. b. Consider multiple approaches: ii.
ii. Divide and conquer
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an __ as well.
impedance mismatch
Troubleshooting process: Verify Full System Functionality and Implement Preventative Measures: Okay, now that you have changed something on the system in the process of solving one problem, you must think about the wider repercussions of what you have done. If you've replaced a faulty NIC in a server, for instance, will the fact that the MAC address has changed (remember, it's built into the NIC) affect anything else, such as the logon security controls or your network management and inventory software? If you've installed a patch on a client PC, will this change the default protocol or any other default settings that may affect other functionality? If you've changed a user's security settings, will this affect his or her ability to access other network resources? This is part of testing your solution to make sure it works properly, but it also makes you think about the impact of your work on the system as a whole. Make sure you verify full system functionality. If you think you fixed the problem between Martha's workstation and the database server, have her open the database while you're still there. That way you don't have to make a second tech call to resolve an outstanding issue. This saves time and money and helps your customer do his or her job better. Everybody wins. Also at this time, if applicable,
implement preventative measures to avoid a repeat of the problem. If that means you need to educate the user to do or not do something, teach him or her tactfully. If you need to install software or patch a system, do it now.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by __, for example.
improperly crimping a cable
The extremely transparent fiber-optic cables allow light to shine but have some inherent
impurities in the glass that can reduce light transmission. Dust, poor connections, and light leakage can also degrade the strength of light pulses as they travel through a fiber-optic run. To measure the amount of light loss, technicians use an optical power meter, also referred to as a light meter (see Figure 21-3).
Hands-on Problems: Aside from obvious physical problems, other hands-on problems you can fix manifest as some sort of misconfiguration. An __, such as setting a PC to a static IP address that's not on the same network ID as other resources, would result in a "dead-to-me" network.
incorrect IP configuration
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed. Only then might a creative tech at her wit's end notice the recently installed, high-powered WAP on the wall outside Tom's office. RFI strikes! If the installation is new and unproven, a perfectly fine network device might be unreachable because of interface errors, meaning that the installer didn't install the wall jack correctly. The resulting incorrect termination might be a mismatched standard (568A rather than 568B, for example). The cable from the wall to the workstation might be bad or might be a crossover cable rather than straight-through cable. That's an __, according to the CompTIA Network+ objectives. Try another cable.
incorrect cable type
Hands-on Problems: Aside from obvious physical problems, other hands-on problems you can fix manifest as some sort of misconfiguration. An incorrect IP configuration, such as setting a PC to a static IP address that's not on the same network ID as other resources, would result in a "dead-to-me" network. A similar fate would result from inputting __ information.
incorrect default gateway IP address
LAN Problems: Most clients will use DHCP for IP address, subnet mask, and default gateway settings. With manual configuration, on the other hand, errors can creep in and cause a device to fail to connect to network resources. A typical scenario is with a bring your own device (BYOD) environment, where an employee will bring in a manually configured laptop—that he didn't remember was tuned to his home network—and complain about not being able to access the LAN or the Internet. Anything that doesn't match the LAN settings will cause a client to fail to connect. An IP address that doesn't match the subnet, for example, will bring no love. An error in the subnet mask settings—an incorrect netmask issue in CompTIA speak—will stop client access cold. A DNS server setting that's not accurate can cause name resolution failure. If the default gateway address is incorrect—an __ issue—then there's no Internet for the client.
incorrect gateway
Everyone in the local office appears to have full access to local and Internet Web sites. No one, however, can reach a company-operated server at a particular remote site in Istanbul. There has been a recent change to the firewall configuration, so it is up to technician Terry to determine if the firewall change is the culprit or if the problem lies elsewhere. Terry has come up with three possible theories: the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the ping and ping6 utilities. First he pings localhost to confirm the workstation has a working IP stack, then he attempts to ping the remote server and gets no response. Next, he tries the tracert and traceroute utilities on the different hosts. Traceroute shows a functional path to the router that connects the remote office to the Internet, but does not get a response from the server. So far, everything seems to confirm that the local office cannot get to the remote server. Just to be able to say he tried everything, Terry runs the mtr utility from a Linux box and lets it run for an extended time. At the same time, he runs the pathping utility from a Windows computer. Neither utility can contact the server. He tries all of these utilities on some other company resources and Internet sites and has no problems connecting. Confident that the reported symptom is confirmed, Terry puts in a call to the remote site to ask about the status. The virtual PBX sends Terry to voicemail for every extension that he calls. This could point to a network disconnection at the site or to everyone being out of the office there. Since it is 3:00 a.m. at the remote site, Terry does not have a clear answer. The next quick test to perform is to see if the site is reachable from outside of the local office. This will confirm or eliminate his theory of a local __ issue.
incorrect host-based firewall settings
WAN Problems: Appliance Problems: Many of the boxes that people refer to as "routers" contain many features, such as routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more. These complex boxes, such as the Cisco Adaptive Security Appliance (ASA), are called network appliances. One common issue with network appliances is technician error. By default, for example, NAT rules take precedence over an appliance's routing table entries. If the tech fails to set the NAT rule order correctly, traffic that should be routed to go out one interface—like to the DMZ network—can go out an __—like to the inside network.
incorrect interface
LAN Problems: Most clients will use DHCP for IP address, subnet mask, and default gateway settings. With manual configuration, on the other hand, errors can creep in and cause a device to fail to connect to network resources. A typical scenario is with a bring your own device (BYOD) environment, where an employee will bring in a manually configured laptop—that he didn't remember was tuned to his home network—and complain about not being able to access the LAN or the Internet. Anything that doesn't match the LAN settings will cause a client to fail to connect. An IP address that doesn't match the subnet, for example, will bring no love. An error in the subnet mask settings—an __ issue in CompTIA speak—will stop client access cold. A DNS server setting that's not accurate can cause name resolution failure. If the default gateway address is incorrect—an incorrect gateway issue—then there's no Internet for the client.
incorrect netmask
Hands-on Problems: Aside from obvious physical problems, other hands-on problems you can fix manifest as some sort of misconfiguration. An incorrect IP configuration, such as setting a PC to a static IP address that's not on the same network ID as other resources, would result in a "dead-to-me" network. A similar fate would result from inputting incorrect default gateway IP addressinformation. The same is true with an __—that is, the subnet mask isn't accurate. The system will go nowhere, fast.
incorrect netmask setting
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed. Only then might a creative tech at her wit's end notice the recently installed, high-powered WAP on the wall outside Tom's office. RFI strikes! If the installation is new and unproven, a perfectly fine network device might be unreachable because of interface errors, meaning that the installer didn't install the wall jack correctly. The resulting __ might be a mismatched standard (568A rather than 568B, for example).
incorrect termination
LAN Problems: Time Issues: Most devices these days rely on the NIST time servers on the Internet to regulate time. Every once in a while (like on the CompTIA Network+ exam), you'll see a scenario where machines, isolated from the Internet (and thus removed from a time server), will get out of sync. This can result in __ issues that stop services from working properly. Did I mention that this is rare?
incorrect time
Troubleshooting process: Test the Theory to Determine Cause: With the third step, you need to test the theory to determine the cause but do so without changing anything or risking any repercussions. If you have determined that the probable cause for Bob not being able to print is that the printer is turned off, go look. If that's the case, then you should plan out your next step to resolve the problem. Do not act yet! That comes next. If the theory is not confirmed, you need to reestablish a new theory or escalate the problem.Go back to step two and determine a new probable cause. Once you have another idea, test it. The reason you should hesitate to act at this third step is that you might not have permission to make the fix or the fix might cause repercussions you don't fully understand yet. For example, if you walk over to the print server room to see if the printer is powered up and online and find the door padlocked, that's a whole different level of problem. Sure, the printer is turned off, but management has done it for a reason. In this sort of situation, you need to escalate the problem. To escalate has two meanings: either to inform other parties about a problem for guidance or to pass the job off to another authority who has control over the device/issue that's most probably causing the problem. Let's say you have a server with a bad NIC. This server is used heavily by the accounting department, and taking it down may cause problems you don't even know about. You need to
inform the accounting manager to consult with them. Alternatively, you'll come across problems over which you have no control or authority. A badly acting server across the country (hopefully) has another person in charge to whom you need to hand over the job.
The netstat utility displays
information on the current state of all the running IP processes on a system. It shows what sessions are active and can also provide statistics based on ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows current sessions. Typing netstat -r shows the routing table (100 percent identical to route print). If you want to know about your current sessions, netstat is the tool to use.
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at
installation, but running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for disconnects, only slowdowns. All certifiers need some kind of loopback adapter on the other end of the cable run to provide termination and return of a signal. A loopback adapter is a small device with a single port.
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario. To enable LACP between two devices, such as the switch and file server just noted, each device needs two or more
interconnected network interfaces configured for LACP. When the two devices interact, they will make sure they can communicate over multiple physical ports at the same speeds and form a single logical port that takes advantage of the full combined bandwidth (Figure 21-12).
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed. Only then might a creative tech at her wit's end notice the recently installed, high-powered WAP on the wall outside Tom's office. RFI strikes! If the installation is new and unproven, a perfectly fine network device might be unreachable because of
interface errors, meaning that the installer didn't install the wall jack correctly. The resulting incorrect termination might be a mismatched standard (568A rather than 568B, for example). The cable from the wall to the workstation might be bad or might be a crossover cable rather than straight-through cable. That's an incorrect cable type, according to the CompTIA Network+ objectives. Try another cable.
LAN Problems: Adding VLANS: When you add VLANs into the network mix, all sorts of fun network issues can crop up. As an example, suppose Bill has a 24-port managed switch segmented into four VLANs, one for each group in the office: Management, Sales, Marketing, and Development (Figure 21-11). Bill thought he'd assigned six ports to each VLAN when he set up the switch, but by mistake he assigned seven ports to VLAN 1 and only five ports to VLAN 2. Merrily plugging in the patch cables for each group of users, Bill gets called up by his boss asking why Cindy over in Sales suddenly can see resources reserved for management. This obviously points to an __ that resulted in a VLAN mismatch.
interface misconfiguration
WAN Problems: Appliance Problems: Many of the boxes that people refer to as "routers" contain many features, such as routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more. These complex boxes, such as the Cisco Adaptive Security Appliance (ASA), are called network appliances. One common issue with network appliances is technician error. By default, for example, NAT rules take precedence over an appliance's routing table entries. If the tech fails to set the NAT rule order correctly, traffic that should be routed to go out one interface—like to the DMZ network—can go out an incorrect interface—like to the inside network. Users on the outside would expect a response from something but instead get nothing, all because of a NAT
interface misconfiguration.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make __, and they don't __. They make permanent disconnects.
intermittent problems; slow down data
The ipconfig (Windows), ifconfig (macOS and UNIX), and ip (Linux) utilities tell you almost anything you want to know about a computer's IP settings. Make sure you know that typing ipconfig alone only gives basic information. Typing "__" gives detailed information (like DNS servers and MAC address).
ipconfig /all
Exam tip: The __ utility in Linux enabled command-line control over IPv4 tables, rules that determine what happens with an IPv4 packet when it encounters a firewall.
iptables
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to link lights when you have a "hardware failure." The network connection LED status indicators—link lights—can quickly point to a connectivity issue. Try __ if you run into this issue.
known good cables/NICs
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to link lights when you have a "hardware failure." The network connection LED status indicators—link lights—can quickly point to a connectivity issue. Try known good cables/NICs if you run into this issue. Hot-swappable transceivers (which you read about way back in Chapter 4, "Modern Ethernet") can go bad. The key when working with small form-factor pluggable (SFP) or the much older gigabit interface converter (GBIC) transceivers is that you need to check both the media and the module. In other words, a seemingly bad SFP/GBIC could be the cable connected to it or the transceiver. As with other hardware issues, try
known-good components to troubleshoot.
Exam tip: The CompTIA Network+ exam objectives use the term _-. The more accurate term in this context is either power meter or optical power meter. You may see any of these terms on the exam.
light meter
optical power meter also referred to as a
light meter
The extremely transparent fiber-optic cables allow light to shine but have some inherent impurities in the glass that can reduce light transmission. Dust, poor connections, and light leakage can also degrade the strength of light pulses as they travel through a fiber-optic run. To measure the amount of light loss, technicians use an optical power meter, also referred to as a
light meter (see Figure 21-3).
Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called __ or __.
link aggregation; NIC teaming
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to __ when you have a "hardware failure."
link lights
So here's the common network error with LACP setups. An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of NIC teaming misconfiguration. NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—__—and the second as the hot spare—__.
live; standby
Network problems fall into several basic categories, and most of these problems you or a network tech in the proper place can fix. Fixing problems at the workstation, work area, or server is a network tech's bread and butter. The same is true of connecting to resources on the LAN. Problems connecting to a WAN can often be resolved at the __ level, but sometimes need to get escalated.
local
Everyone in the local office appears to have full access to local and Internet Web sites. No one, however, can reach a company-operated server at a particular remote site in Istanbul. There has been a recent change to the firewall configuration, so it is up to technician Terry to determine if the firewall change is the culprit or if the problem lies elsewhere. Terry has come up with three possible theories: the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the ping and ping6 utilities. First he pings __ to confirm the workstation has a working IP stack, then he attempts to ping the remote server and gets no response.
localhost
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"The type of question you ask at any given moment depends on what information you need and on the user's knowledge level. If, for example, the user seems to be technically oriented, you will probably be able to ask more close-ended questions because they will know what you are talking about. If, on the other hand, the user seems to be confused about what's happening, open-ended questions will allow him or her to explain in his or her own words what is going on. One of the first steps in trying to determine the cause of a problem is to understand the extent of the problem. Is it specific to one user or is it network-wide? Sometimes this entails trying the task yourself, both from the user's machine and from your own or another machine. For example, if a user is experiencing problems logging into the network, you might need to go to that user's machine and try to use his or her user name to log in. In other words, try to duplicate the problem. Doing this tells you whether the problem is a user error of some kind, as well as enables you to see the symptoms of the problem yourself. Next, you probably want to try
logging in with your own user name from that machine, or have the user try to log in from another machine. In some cases, you can ask other users in the area if they are experiencing the same problem to see if the issue is affecting more than one user. Depending on the size of your network, you should find out whether the problem is occurring in only one part of your company or across the entire network.
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario. To enable LACP between two devices, such as the switch and file server just noted, each device needs two or more interconnected network interfaces configured for LACP. When the two devices interact, they will make sure they can communicate over multiple physical ports at the same speeds and form a single
logical port that takes advantage of the full combined bandwidth (Figure 21-12).
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at installation, but running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for disconnects, only slowdowns. All certifiers need some kind of __ on the other end of the cable run to provide termination and return of a signal.
loopback adapter
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you where the break is on the cable (Figure 21-2). A TDR works with copper cables and an OTDR works with fiber optics, but otherwise they share the same function. If a problem shows itself as a disconnect and you've first checked easier issues that would manifest as disconnects, such as (3)__, then think about using these tools.
loss of permissions, an unplugged cable, or a server shut off
WAN Problems: Certificate Problems: SSL/TLS certificates have expiration dates and companies need to
maintain them properly. If you get complaints from clients that the company Web site is giving their browsers untrusted SSL certificate errors, chances are that the certificate has expired. The fix for that is pretty simple—update the certificate.
LAN Problems: An expired IP address can cause a system not to connect. Release/renew to obtain a proper IP address from the DHCP server. If the DHCP server's scope of IP addresses has been claimed, that release/renew won't work. You'll get an error that points to an exhausted DHCP scope. The only fix for this is to
make changes at the DHCP server.
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview, escalate as necessary to get the fix implemented. If you're the implementer, follow these guidelines. All the way through implementation, try only one likely solution at a time. There's no point in installing several patches at once, because then you can't tell which one fixed the problem. Similarly, there's no point in replacing several items of hardware (such as a hard disk and its controller cable) at the same time, because then you can't tell which part (or parts) was faulty. As you try each possibility, always document what you do and what results you get. This isn't just for a future problem either—during a lengthy troubleshooting process, it's easy to forget exactly what you tried two hours before or which thing you tried produced a particular result. Although being methodical may take longer, it will save time the next time—and it may enable you to pinpoint what needs to be done to stop the problem from recurring at all, thereby reducing future call volume to your support team—and as any support person will tell you, that's definitely worth the effort! Then you need to test the solution. This is the part everybody hates. Once you think you've fixed a problem, you should try to
make it happen again. If you can't, great! But sometimes you will be able to re-create the problem, and then you know you haven't finished the job at hand. Many techs want to slide away quietly as soon as everything seems to be fine, but trust me on this, it won't impress your customer when her problem flares up again 30 seconds after you've left the building—not to mention that you get the joy of another two-hour car trip the next day to fix the same problem, for an even more unhappy client!
So here's the common network error with LACP setups. An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of NIC teaming misconfiguration. NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd
make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book:
manage the power to the network device in question and install an uninterruptible power supply (UPS).
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a
manual registry setting adjustment.
The ping utility uses Internet Message Control Protocol (ICMP) packets to query by IP address or by name. It works across routers, so it's generally the first tool used to check if a system is reachable. Unfortunately,
many devices block ICMP packets, so a failed ping doesn't always point to an offline system.
Because __, if your traceroute fails from a Windows system, running it on a Linux or UNIX system may return more complete results.
many routers block ICMP packets
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them. When sending a traceroute, it's important to keep a significant difference between Windows and UNIX/Linux/Cisco systems in mind. Windows tracert sends only ICMP packets, while UNIX/Linux/Cisco traceroute can send either ICMP packets or UDP datagrams, but sends UDP datagrams by default. Because __, if your traceroute fails from a Windows system, running it on a Linux or UNIX system may return more complete results.
many routers block ICMP packets
MTU stands for
maximum transmission unit (MTU)
Throughput testers enable you to
measure the data flow in a network. Which tool is appropriate depends on the type of network throughput you want to test. Most techs use one of several speed-test sites for checking an Internet connection's throughput, such as MegaPath's Speakeasy Speed Test (Figure 21-8): www.speakeasy.net/speedtest. The CompTIA Network+ exam objectives refer to throughput testers as bandwidth speed testers.
LAN Problems: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A __ server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user.
misconfigured DNS
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settings can lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to __ so that some destinations just aren't there for users.
missing IP routes
Exam tip: Sometimes a GUI tool like Wireshark won't work because a server has no GUI installed. In situations like this, tcpdump is the go-to choice. This great command-line tool not only enables you to monitor and filter packets in the terminal, but can also create files you can open in Wireshark for later analysis. Even better, it's installed by default on __
most UNIX/Linux systems.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. You might imagine the reverse model in some situations. If the network was newly installed, for example, running through some of the basic connectivity at Layers 1 and 2 might be a good first approach. Another option for tackling multiple options is to use the divide and conquer approach.On its face, divide and conquer appears to be a compromise between top-to-bottom OSI troubleshooting and bottom-to-top OSI troubleshooting. But it's better than a compromise. If we arbitrarily always perform top-to-bottom troubleshooting,we'll waste a lot of time at Layers 7 through 3 to troubleshoot Data Link layer and Physical layer issues. Divide and conquer is a time saver that comes into play as part of developing a theory of probable cause. As you gather information for troubleshooting, a general sense of where the problem lies should manifest. Place this likely cause at the appropriate layer of the OSI model and begin to test the theory and related theories at that layer. If the theory bears out, follow the appropriate troubleshooting steps. If the theory is wrong,
move up or down the OSI model with new theories of probable causes.
abbreviation for My Traceroute
mtr
LAN Problems: Link Aggregation Problems: So here's the common network error with LACP setups. An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of NIC teaming misconfiguration. NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that
multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is to turn off a feature called IGMP snooping, which is enabled by default on Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't.
Note: There's an old adage used by carpenters and other craftspeople that goes, "Never buy cheap tools." Cheap tools save you money at the beginning, but they often break more readily than higher-quality tools and, more importantly, make it harder to get the job done. This adage definitely applies to __! You might be tempted to go for the $10 model that looks pretty much like the $25 model, but chances are the leads will break or the readings will lie on the cheaper model. Buy a decent tool, and you'll never have to worry about it.
multimeters
LAN Problems: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in __ and causes the network to appear to be down for the user.
names not resolving
If you want to know about your current sessions, __ is the tool to use.
netstat
The __ utility displays information on the current state of all the running IP processes on a system.
netstat
The netstat utility displays information on the current state of all the running IP processes on a system. It shows what sessions are active and can also provide statistics based on ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows current sessions. Typing __ shows the routing table (100 percent identical to route print).
netstat -r
Many of the boxes that people refer to as "routers" contain many features, such as routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more. These complex boxes, such as the Cisco Adaptive Security Appliance (ASA), are called __
network appliances.
Beyond Local - Escalate: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than
network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to link lights when you have a "hardware failure." The __—link lights—can quickly point to a connectivity issue. Try known good cables/NICs if you run into this issue.
network connection LED status indicators
A packet sniffer, as you'll recall from Chapter 20, intercepts and logs
network packets.
Multimeters test voltage (both AC and DC), resistance, and continuity. They are the unsung heroes of cabling infrastructures because
no other tool can tell you how much voltage is on a line. They are also a great fallback for continuity testing when you don't have a cable tester handy.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where the electrical signal bleeds from one wire pair to another, creating interference. Fifth, a broken cable might pick up __, spurious signals usually caused by faulty hardware or poorly crimped jacks.
noise
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settings can lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users.Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either
not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router.
The nslookup (all operating systems) and dig (macOS/UNIX/Linux) utilities help diagnose DNS problems. These tools are very powerful, but the CompTIA Network+ exam won't ask you more than basic questions, such as how to use them to see if a DNS server is working. When working on Windows systems, the __ utility is your only choice by default.
nslookup
The __ (all operating systems) and __ (macOS/UNIX/Linux) utilities help diagnose DNS problems.
nslookup; dig
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview, escalate as necessary to get the fix implemented. If you're the implementer, follow these guidelines. All the way through implementation, try only
one likely solution at a time. There's no point in installing several patches at once, because then you can't tell which one fixed the problem. Similarly, there's no point in replacing several items of hardware (such as a hard disk and its controller cable) at the same time, because then you can't tell which part (or parts) was faulty.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where
one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.)
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where
one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where the electrical signal bleeds from one wire pair to another, creating interference.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where
one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.)
Hands-on Problems: Some problems you can fix at the local machine don't point to messed-up hardware or invalid settings, but reflect the current mix of wired and wireless networks in the same place. Here's a scenario that applies to Windows versions before Windows 10. Tina has a wireless network connection to the Internet. She gets a shiny new printer with an Ethernet port, but with no Wi-Fi capability. She wants to print from both her PC and her laptop, so she creates a small LAN: a couple of Ethernet cables and a switch. She plugs everything in, installs drivers, and all is well. She can print from both machines. Unfortunately, as soon as she prints, her Internet connection goes down. The funny part is that the Internet connection didn't go anywhere, but her simultaneous wired/wireless connections created a network failure. The wired and wireless NICs can't actually operate simultaneously and, by default, the wired connection takes priority in the order in which devices are accessed by network services. To fix this problem,
open Network Connections in the Control Panel. Press the alt key to activate the menu bar, then select Advanced | Advanced Settings (Figure 21-10). Change the connection priority in the Advanced Settings options by selecting the one Tina wants to take priority and clicking the up arrow to move it up the list.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an __, where one or more of the wires in a cable simply don't connect from one end of the cable to the other.
open circuit
Exam tip: The CompTIA Network+ exam objectives use the terms open/short. More commonly, techs would refer to these issues as
open circuits and short circuits.
Troubleshooting process: Verify Full System Functionality and Implement Preventative Measures: Okay, now that you have changed something on the system in the process of solving one problem, you must think about the wider repercussions of what you have done. If you've replaced a faulty NIC in a server, for instance, will the fact that the MAC address has changed (remember, it's built into the NIC) affect anything else, such as the logon security controls or your network management and inventory software? If you've installed a patch on a client PC, will this change the default protocol or any other default settings that may affect other functionality? If you've changed a user's security settings, will this affect his or her ability to access other network resources? This is part of testing your solution to make sure it works properly, but it also makes you think about the impact of your work on the system as a whole. Make sure you verify full system functionality. If you think you fixed the problem between Martha's workstation and the database server, have her
open the database while you're still there. That way you don't have to make a second tech call to resolve an outstanding issue. This saves time and money and helps your customer do his or her job better. Everybody wins.
The extremely transparent fiber-optic cables allow light to shine but have some inherent impurities in the glass that can reduce light transmission. Dust, poor connections, and light leakage can also degrade the strength of light pulses as they travel through a fiber-optic run. To measure the amount of light loss, technicians use an
optical power meter, also referred to as a light meter (see Figure 21-3).
OTDR stands for
optical time domain reflectometer (OTDR)
Beyond Local - Escalate: Broadcast Storms: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well. The trick is to isolate; that's where escalation comes in. You need to break down the network quickly by unplugging devices until you can find the one causing trouble. Getting a __ to work can be difficult, but at least try. If you can scoop up one packet, you'll know what node is causing the trouble. The second the bad node is disconnected, the network returns to normal. But if you have a lot of machines to deal with and a bunch of users who can't get on the network yelling at you, you'll need help. Call a supervisor to get support to solve the crisis as quickly as possible.
packet analyzer
Make the CompTIA Network+ exam (and real life) easier by separating your software tools into two groups: those that come built into every operating system and those that are third-party tools. Typical built-in tools are tracert/traceroute, ipconfig/ifconfig/ip, arp, ping, arping, pathping, nslookup/dig, route, and netstat/ss. Third-party tools fall into the categories of (4)
packet sniffers, port scanners, throughput testers, and looking glass sites.
Microsoft has a utility called __ that combines the functions of ping and traceroute and adds some additional functions.
pathping
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make
permanent disconnects.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include __ and __.
physical problems; configuration problems
The __ utility uses Internet Message Control Protocol (ICMP) packets to query by IP address or by name.
ping
Command to run ping in IPv6 on Windows
ping -6
LAN Problems: Server misconfigurations: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user. You'll be clued into such misconfiguration by using ping and other tools. If you can __, this points to DNS issues.
ping a file server by IP address but not by name
LAN Problems: Server misconfigurations: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user. You'll be clued into such misconfiguration by using
ping and other tools. If you can ping a file server by IP address but not by name, this points to DNS issues. Similarly, if a computer fails in discovering neighboring devices/nodes, like connecting to a networked printer, DHCP or DNS misconfiguration can be the culprit. To fix the issue, go into the network configuration for the client or the server and find the misconfigured settings.
Command to run ping in IPv6 on Linux
ping6
Command to run ping in IPv6 on UNIX
ping6
Everyone in the local office appears to have full access to local and Internet Web sites. No one, however, can reach a company-operated server at a particular remote site in Istanbul. There has been a recent change to the firewall configuration, so it is up to technician Terry to determine if the firewall change is the culprit or if the problem lies elsewhere. Terry has come up with three possible theories: the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the __ and __ utilities.
ping; ping6
Microsoft has a utility called pathping that combines the functions of _ and __ and adds some additional functions.
ping; traceroute
Sometimes you need to perform a ping or traceroute from a location outside of the local environment. Looking glass sites are remote servers accessible with a browser that contain common collections of diagnostic tools such as __ and __, plus some __ query tools.
ping; traceroute; Border Gateway Protocol (BGP)
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by
plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed.
As you'll recall from back in Chapter 18, "Managing Risk," a __ is a program that probes ports on another system, logging the state of the scanned ports.
port scanner
The netstat utility displays information on the current state of all the running IP processes on a system. It shows what sessions are active and can also provide statistics based on
ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows current sessions. Typing netstat -r shows the routing table (100 percent identical to route print). If you want to know about your current sessions, netstat is the tool to use.
Exam tip: The ARP table functions at Layer 3, mapping IP addresses to MAC addresses. The ARP table therefore would be stored on a Layer 3 device. A MAC address table, in contrast, maps MAC addresses to
ports, and thus lives on a Layer 2 device, a switch.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A __ or __, such as dips and surges, can make a network device unreachable.
power failure; power anomalies
Exam tip: The CompTIA Network+ exam objectives use the term light meter. The more accurate term in this context is either __ or __. You may see any of these terms on the exam.
power meter or optical power meter
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are __ compared to all of the other network problems that might take place.
pretty small
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for
problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at installation, but running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for disconnects, only slowdowns. All certifiers need some kind of loopback adapter on the other end of the cable run to provide termination and return of a signal. A loopback adapter is a small device with a single port.
No single person is truly in control of an entire Internet-connected network. Large organizations split network support duties into very skill-specific areas: routers, cable infrastructure, user administration, and so on. Even in a tiny network with a single network support person, problems will arise that go beyond the tech's skill level or that involve equipment the organization doesn't own (usually it's their ISP's gear). In these situations, the tech needs to identify the problem and, instead of trying to fix it on his or her own, escalate the issue. In network troubleshooting, problem escalation should occur when you face a problem that falls outside the scope of your skills and you need help. In large organizations, escalation problems have very clear
procedures, such as who to call and what to document. In small organizations, escalation often is nothing more than a technician realizing that he or she needs help. The CompTIA Network+ exam objectives define some classic networking situations that CompTIA feels should be escalated. Here's how to recognize broadcast storms, switching loops, routing problems, and proxy ARP.
LAN Problems: Adding VLANS: When you add VLANs into the network mix, all sorts of fun network issues can crop up. As an example, suppose Bill has a 24-port managed switch segmented into four VLANs, one for each group in the office: Management, Sales, Marketing, and Development (Figure 21-11). Bill thought he'd assigned six ports to each VLAN when he set up the switch, but by mistake he assigned seven ports to VLAN 1 and only five ports to VLAN 2. Merrily plugging in the patch cables for each group of users, Bill gets called up by his boss asking why Cindy over in Sales suddenly can see resources reserved for management. This obviously points to an interface misconfiguration that resulted in a VLAN mismatch. Similarly, after fixing his initial mistake and getting the VLANs set up properly, Bill needs to plug the right patch cables into the right ports. If he messes up and plugs the patch cable for Cindy's computer into a VLAN 1 port, the intrepid salesperson would again have access to the management resources. Such cable placement errors show up pretty quickly and are readily fixed. Keep
proper records of patch cable assignments and plug the cables into the proper ports.
A packet sniffer, as you'll recall from Chapter 20, intercepts and logs network packets. You have many choices when it comes to packet sniffers. Some sniffers come as programs you run on a computer, while others manifest as dedicated hardware devices. Most packet sniffers come bundled with a
protocol analyzer, the tool that takes the sniffed information and figures out what's happening on the network. Arguably, the most popular GUI packet sniffer and protocol analyzer is Wireshark (Figure 21-6). You've already seen Wireshark in the book, but here's a screen to jog your memory.
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at installation, but running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for disconnects, only slowdowns. All certifiers need some kind of loopback adapter on the other end of the cable run to
provide termination and return of a signal. A loopback adapter is a small device with a single port.
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to send an ARP request to get the IP address. Your VPN device is designed to act as a
proxy for all that type of data.
A cable stripper or snip (Figure 21-4) helps you to make UTP cables. You'll need a crimping tool (a crimper) as well. You don't need these tools to punch down 66- or 110-blocks. You would use a __ for that (as described in a bit).
punchdown tool
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although tracking down the culprit can take time and patience. One of the most obvious errors occurs when you're duplicating machines and using static IP addresses. As soon as you plug in the duplicated machine with its duplicate IP address, the network will howl. No two computers can have the same IP address on a broadcast domain. The fix for the problem—after the face-palm—is to change the IP address on the new machine either to an unused static IP or to DHCP. A related issue comes from duplicate MAC addresses, something that can happen when working with virtual machines or, rarely, as a result of a manufacturing error. The effect is the same as duplicate IP addresses. Either
put the devices on different VLANs or swap out NICs to avoid duplication.
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory. This step comes down to experience—or good use of the support tools at your disposal, such as your knowledge base. You need to select the most probable cause from all the possible causes, so the solution you choose fixes the problem the first time. This may not always happen, but whenever possible, you want to avoid spending a whole day stabbing in the dark while the problem snores softly to itself in some cozy, neglected corner of your network. Don't forget to __. If Bob can't print to the networked printer, for example, check to see that the printer is plugged in and turned on.
question the obvious
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to
question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to send an ARP request to get the IP address. Your VPN device is designed to act as a proxy for all that type of data. Almost all proxy ARP problems take place on the VPN concentrator. With misconfigured proxy ARP settings, the VPN concentrator can send what looks like a denial of service (DoS) attack on the LAN. (A DoS attack is usually directed at a server exposed on the Internet, like a Web server. See Chapter 19, "Protecting Your Network," for more details on these and other malicious attacks.) If your clients start __, assume you have a proxy ARP problem and escalate by getting the person in charge of the VPN to fix it.
receiving a large number of packets from the VPN concentrator
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed. Only then might a creative tech at her wit's end notice the
recently installed, high-powered WAP on the wall outside Tom's office. RFI strikes!
LAN Problems: Link Aggregation Problems: Those ports can be in one of two modes: active or passive. Active ports want to use LACP and send special frames out trying to initiate creating an aggregated logical port. Passiveports wait for active ports to initiate the conversation before they will respond. So here's the common network error with LACP setups. An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of NIC teaming misconfiguration. NIC teaming provides many more benefits than just increasing bandwidth, such as
redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on.
Troubleshooting process: Test the Theory to Determine Cause: With the third step, you need to test the theory to determine the cause but do so without changing anything or risking any repercussions. If you have determined that the probable cause for Bob not being able to print is that the printer is turned off, go look. If that's the case, then you should plan out your next step to resolve the problem. Do not act yet! That comes next. If the theory is not confirmed, you need to
reestablish a new theory or escalate the problem. Go back to step two and determine a new probable cause. Once you have another idea, test it.
Sometimes you need to perform a ping or traceroute from a location outside of the local environment. Looking glass sites are
remote servers accessible with a browser that contain common collections of diagnostic tools such as ping and traceroute, plus some Border Gateway Protocol (BGP) query tools.
Troubleshooting process: Establish a Plan of Action and Identify Potential Effects: By this point, you should have some ideas as to what the problem might be. It's time to "look before you leap" and establish a plan of action to resolve the problem. An action plan defines how you are going to fix this problem. Most problems are simple, but if the problem is complex, you need to write down the steps. As you do this, think about what else might happen as you go about the repair. Identify the potential effects of the actions you're about to take, especially the unintended ones. If you take out a switch without a replacement switch at hand, the users might experience excessive downtime while you hunt for a new switch and move them over. If you replace a router, can you
restore all the old router's settings to the new one or will you have to rebuild from scratch?
The __ utility enables you to display and edit the local system's routing table.
route
The netstat utility displays information on the current state of all the running IP processes on a system. It shows what sessions are active and can also provide statistics based on ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows current sessions. Typing netstat -r shows the routing table (100 percent identical to "__")
route print
The route utility enables you to display and edit the local system's routing table. To show the routing table, just type "__" or "__".
route print; netstat -r
WAN Problems: Problems that stop users from accessing content across a WAN, like the Internet, can originate at the local machine, switches within the LAN, routers that interconnect the WAN, switches within the distant network, and the distant machine itself. As you might infer from the opening scenario, some of these common network problems you can fix, and some you cannot. We discussed many remote connectivity problems and solutions way back in Chapter 13, so I won't rehash them here. This section starts with (5)___. The following sections go into bigger problems that require escalation. The chapter wraps up with end-to-end connectivity.
router configuration issues, issues with ISPs and frame sizes, problems with misconfigured multi-layer network appliances, issues with certificates, and company security policies
Many of the boxes that people refer to as "routers" contain many features, such as (5)__. These complex boxes, such as the Cisco Adaptive Security Appliance (ASA), are called network appliances.
routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more
WAN Problems: Appliance Problems: Many of the boxes that people refer to as "routers" contain many features, such as routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more. These complex boxes, such as the Cisco Adaptive Security Appliance (ASA), are called network appliances. One common issue with network appliances is technician error. By default, for example, NAT rules take precedence over an appliance's routing table entries. If the tech fails to set the NAT rule order correctly, traffic that should be routed to go out one interface—like to the DMZ network—can go out an incorrect interface—like to the inside network. Users on the outside would expect a response from something but instead get nothing, all because of a NAT interface misconfiguration. The fix for such problems is to set up your network appliance correctly. Know the capabilities of the network appliance and the relationships among its services. Examine
rules and settings carefully.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settingscan lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users. Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router. One excellent tool for determining a router problem beyond your LAN is tracert/traceroute. Run traceroute to your default gateway. (You can also use ping to check connectivity.) If that fails, you know you have a local issue and can potentially do something about it. If the traceroute comes back positive,
run it to a site on the Internet. A solid connection should return something like Figure 21-13. A failed route will return a failed response.
The netstat utility displays information on the current state of all the
running IP processes on a system. It shows what sessions are active and can also provide statistics based on ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows current sessions. Typing netstat -r shows the routing table (100 percent identical to route print). If you want to know about your current sessions, netstat is the tool to use.
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at installation, but
running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for disconnects, only slowdowns. All certifiers need some kind of loopback adapter on the other end of the cable run to provide termination and return of a signal. A loopback adapter is a small device with a single port.
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't __ easily.
scale
Beyond Local - Escalate: Broadcast Storms: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well. The trick is to isolate; that's where escalation comes in. You need to break down the network quickly by unplugging devices until you can find the one causing trouble. Getting a packet analyzer to work can be difficult, but at least try. If you can
scoop up one packet, you'll know what node is causing the trouble. The second the bad node is disconnected, the network returns to normal. But if you have a lot of machines to deal with and a bunch of users who can't get on the network yelling at you, you'll need help. Call a supervisor to get support to solve the crisis as quickly as possible.
LAN Problems: Link Aggregation Problems: NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is to turn off a feature called IGMP snooping, which is enabled by default on Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't. The problem with turning off IGMP snooping is that the switches won't map and filter multicast traffic. Instead of only sending to the devices that are set up to receive multicast, the switches will treat multicast messages as broadcast messages and send them to everybody. This is a NIC teaming misconfiguration that can seriously degrade network performance. A better fix would be to
send a couple of network techs to change settings on Switch C and make it send multicast packets properly.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A PMTU-aware operating system can automatically
send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount.
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to
send an ARP request to get the IP address. Your VPN device is designed to act as a proxy for all that type of data.
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to send an ARP request to get the IP address. Your VPN device is designed to act as a proxy for all that type of data. Almost all proxy ARP problems take place on the VPN concentrator. With misconfigured proxy ARP settings, the VPN concentrator can
send what looks like a denial of service (DoS) attack on the LAN. (A DoS attack is usually directed at a server exposed on the Internet, like a Web server. See Chapter 19, "Protecting Your Network," for more details on these and other malicious attacks.) If your clients start receiving a large number of packets from the VPN concentrator, assume you have a proxy ARP problem and escalate by getting the person in charge of the VPN to fix it.
Networks need the proper temperature and adequate power, but most network techs tend to view these issues as outside of the normal places to look for problems. That's too bad, because both heat and power problems invariably manifest themselves as intermittent problems. Look for problems that might point to heat or power issues: (2)
server rooms that get too hot at certain times of the day, switches that fail whenever an air conditioning system kicks on, and so on.
WAN Problems: Appliance Problems: Many of the boxes that people refer to as "routers" contain many features, such as routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more. These complex boxes, such as the Cisco Adaptive Security Appliance (ASA), are called network appliances. One common issue with network appliances is technician error. By default, for example, NAT rules take precedence over an appliance's routing table entries. If the tech fails to set the NAT rule order correctly, traffic that should be routed to go out one interface—like to the DMZ network—can go out an incorrect interface—like to the inside network. Users on the outside would expect a response from something but instead get nothing, all because of a NAT interface misconfiguration. The fix for such problems is to
set up your network appliance correctly. Know the capabilities of the network appliance and the relationships among its services. Examine rules and settings carefully.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a __, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.)
short
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where the electrical signal bleeds from one wire pair to another, creating interference. Fifth, a broken cable might pick up noise, spurious
signals usually caused by faulty hardware or poorly crimped jacks. Finally, a broken cable might have impedance mismatch. Impedance is the natural electrical resistance of a cable. When cables of different types—think thickness, composition of the metal, and so on—connect and the flow of electrons is not uniform, it can cause a unique type of electrical noise, called an echo.
No matter how complex and fancy, any troubleshooting process can be broken down into
simple steps. Having a sequence of steps to follow makes the entire troubleshooting process simpler and easier, because you have a clear set of goals to achieve in a specific sequence.
Hands-on Problems: Some problems you can fix at the local machine don't point to messed-up hardware or invalid settings, but reflect the current mix of wired and wireless networks in the same place. Here's a scenario that applies to Windows versions before Windows 10. Tina has a wireless network connection to the Internet. She gets a shiny new printer with an Ethernet port, but with no Wi-Fi capability. She wants to print from both her PC and her laptop, so she creates a small LAN: a couple of Ethernet cables and a switch. She plugs everything in, installs drivers, and all is well. She can print from both machines. Unfortunately, as soon as she prints, her Internet connection goes down. The funny part is that the Internet connection didn't go anywhere, but her __ created a network failure. The wired and wireless NICs can't actually operate simultaneously and, by default, the wired connection takes priority in the order in which devices are accessed by network services.
simultaneous wired/wireless connections
Certifiers test a cable to ensure that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at installation, but running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for disconnects, only slowdowns. All certifiers need some kind of loopback adapter on the other end of the cable run to provide termination and return of a signal. A loopback adapter is a small device with a
single port.
Almost every new networking person I teach will, at some point, ask me: "What tools do I need to buy?" My answer shocks them: "None. Don't buy a thing." It's not so much that you don't need tools, but rather that different networking jobs require wildly different tools. Plenty of network techs never crimp a cable. An equal number never open a system. Some techs do nothing all day but pull cable. The tools you need are defined by your job. This answer is especially true with
software tools. Almost all the network problems I encounter in established networks don't require me to use any tools other than the classic ones provided by the operating system. I've fixed more network problems with ping, for example, than with any other single tool. As you gain skill in this area, you'll find yourself hounded by vendors trying to sell you the latest and greatest networking diagnostic tools. You may like these tools. All I can say is that I've never needed a software diagnostics tool that I had to purchase.
Beyond Local - Escalate: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is
solidly on. Computers on other broadcast domains work perfectly well.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so
some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets.
Beyond Local - Escalate: Switching Loops: Also known as a bridging loop, a switching loop is when you connect and configure multiple switches together in such a way that causes a circular path to appear. Switching loops are rare because all switches use the Spanning Tree Protocol (STP), but they do happen. The symptoms are identical to a broadcast storm: every computer on the broadcast domain can no longer access the network. The good part about switching loops is that they rarely take place on a well-running network. Someone had to break something, and that means
someone, somewhere is messing with the switch configuration. Escalate the problem, and get the team to help you find the person making changes to the switches.
Throughput testers enable you to measure the data flow in a network. Which tool is appropriate depends on the type of network throughput you want to test. Most techs use one of several __ for checking an Internet connection's throughput,
speed-test sites
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory. This step comes down to experience—or good use of the support tools at your disposal, such as your knowledge base. You need to select the most probable cause from all the possible causes, so the solution you choose fixes the problem the first time. This may not always happen, but whenever possible, you want to avoid
spending a whole day stabbing in the dark while the problem snores softly to itself in some cozy, neglected corner of your network.
Also known as a bridging loop, a __ is when you connect and configure multiple switches together in such a way that causes a circular path to appear.
switching loop
Exam tip: Sometimes a GUI tool like Wireshark won't work because a server has no GUI installed. In situations like this, __ is the go-to choice. This great command-line tool not only enables you to monitor and filter packets in the terminal, but can also create files you can open in Wireshark for later analysis. Even better, it's installed by default on most UNIX/Linux systems.
tcpdump
WAN Problems: Appliance Problems: Many of the boxes that people refer to as "routers" contain many features, such as routing, Network Address Translation (NAT), switching, an intrusion detection system (IDS), a firewall, and more. These complex boxes, such as the Cisco Adaptive Security Appliance (ASA), are called network appliances. One common issue with network appliances is
technician error. By default, for example, NAT rules take precedence over an appliance's routing table entries. If the tech fails to set the NAT rule order correctly, traffic that should be routed to go out one interface—like to the DMZ network—can go out an incorrect interface—like to the inside network.
Exam tip: Always __ before you walk away from the job!
test a solution
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd
test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed.
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview, escalate as necessary to get the fix implemented. If you're the implementer, follow these guidelines. All the way through implementation, try only one likely solution at a time. There's no point in installing several patches at once, because then you can't tell which one fixed the problem. Similarly, there's no point in replacing several items of hardware (such as a hard disk and its controller cable) at the same time, because then you can't tell which part (or parts) was faulty. As you try each possibility, always document what you do and what results you get. This isn't just for a future problem either—during a lengthy troubleshooting process, it's easy to forget exactly what you tried two hours before or which thing you tried produced a particular result. Although being methodical may take longer, it will save time the next time—and it may enable you to pinpoint what needs to be done to stop the problem from recurring at all, thereby reducing future call volume to your support team—and as any support person will tell you, that's definitely worth the effort! Then you need to
test the solution. This is the part everybody hates. Once you think you've fixed a problem, you should try to make it happen again. If you can't, great! But sometimes you will be able to re-create the problem, and then you know you haven't finished the job at hand. Many techs want to slide away quietly as soon as everything seems to be fine, but trust me on this, it won't impress your customer when her problem flares up again 30 seconds after you've left the building—not to mention that you get the joy of another two-hour car trip the next day to fix the same problem, for an even more unhappy client!
Certifiers test a cable to ensure
that it can handle its rated amount of capacity. When a cable is not broken but it's not moving data the way it should, turn to a certifier. Look for problems that cause a cable to underperform. A bad installation might increase crosstalk, attenuation, or interference. A certifier can pick up an impedance mismatch as well. Most of these problems show up at installation, but running a certifier to eliminate cabling as a problem is never a bad idea. Don't use a certifier for disconnects, only slowdowns. All certifiers need some kind of loopback adapter on the other end of the cable run to provide termination and return of a signal. A loopback adapter is a small device with a single port.
LAN Problems: An expired IP address can cause a system not to connect. Release/renew to obtain a proper IP address from the DHCP server. If the DHCP server's scope of IP addresses has been claimed,
that release/renew won't work. You'll get an error that points to an exhausted DHCP scope. The only fix for this is to make changes at the DHCP server.
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"The type of question you ask at any given moment depends on what information you need and on the user's knowledge level. If, for example, the user seems to be technically oriented, you will probably be able to ask more close-ended questions because they will know what you are talking about. If, on the other hand, the user seems to be confused about what's happening, open-ended questions will allow him or her to explain in his or her own words what is going on. One of the first steps in trying to determine the cause of a problem is to understand the extent of the problem. Is it specific to one user or is it network-wide? Sometimes this entails trying the task yourself, both from the user's machine and from your own or another machine. For example, if a user is experiencing problems logging into the network, you might need to go to that user's machine and try to use his or her user name to log in. In other words, try to duplicate the problem. Doing this tells you whether the problem is a user error of some kind, as well as enables you to see the symptoms of the problem yourself. Next, you probably want to try logging in with your own user name from that machine, or have the user try to log in from another machine. In some cases, you can ask other users in the area if they are experiencing the same problem to see if the issue is affecting more than one user. Depending on the size of your network, you should find out whether the problem is occurring in only one part of your company or across the entire network. What does all of this tell you? Essentially, it tells you how big the problem is. If nobody in an entire remote office can log in, you may be able to assume that the problem is the network link or router connecting that office to the server. If nobody in any office can log in, you may be able to assume the server is down or not accepting logins. If only that one user in that one location can't log in, the problem may be with (3)
that user, that machine, or that user's account.
The extremely transparent fiber-optic cables allow light to shine but have some inherent impurities in the glass that can reduce light transmission. Dust, poor connections, and light leakage can also degrade the strength of light pulses as they travel through a fiber-optic run. To measure the amount of light loss, technicians use an optical power meter, also referred to as a light meter (see Figure 21-3). The light meter system uses a high-powered source of light at one end of a run and a calibrated detector at the other end. This measures
the amount of light that reaches the detector.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to link lights when you have a "hardware failure." The network connection LED status indicators—link lights—can quickly point to a connectivity issue. Try known good cables/NICs if you run into this issue. Hot-swappable transceivers (which you read about way back in Chapter 4, "Modern Ethernet") can go bad. The key when working with small form-factor pluggable (SFP) or the much older gigabit interface converter (GBIC) transceivers is that you need to check both the media and the module. In other words, a seemingly bad SFP/GBIC could be __ or __. As with other hardware issues, try known-good components to troubleshoot.
the cable connected to it; the transceiver
The ping utility uses Internet Message Control Protocol (ICMP) packets to query by IP address or by name. It works across routers, so it's generally the first tool used to check if a system is reachable. Unfortunately, many devices block ICMP packets, so a failed ping doesn't always point to an offline system. The ping utility defaults to IPv4, but also functions well in an IPv6 network. In Windows, use __. In UNIX/Linux, use __.
the command with the -6 switch: ping -6; ping6
The traceroute command defaults to IPv4, but also functions well in an IPv6 network. In Windows, use
the command with the -6 switch: tracert -6. In UNIX/Linux, use traceroute6(or traceroute -6 in some variants of Linux).
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where
the electrical signal bleeds from one wire pair to another, creating interference.
Troubleshooting process: Verify Full System Functionality and Implement Preventative Measures: Okay, now that you have changed something on the system in the process of solving one problem, you must think about the wider repercussions of what you have done. If you've replaced a faulty NIC in a server, for instance, will the fact that the MAC address has changed (remember, it's built into the NIC) affect anything else, such as the logon security controls or your network management and inventory software? If you've installed a patch on a client PC, will this change the default protocol or any other default settings that may affect other functionality? If you've changed a user's security settings, will this affect his or her ability to access other network resources? This is part of testing your solution to make sure it works properly, but it also makes you think about
the impact of your work on the system as a whole.
Hands-on Problems: Outside invisible forces can cause problems with copper cabling. You've read about electromagnetic interference (EMI) and radio frequency interference (RFI) previously in the book. EMI and RFI can disrupt signaling on a copper cable, especially with the very low voltages used today on those cables. These are crazy things to troubleshoot. An interference problem might manifest in a scenario like this one. John can use e-mail on his laptop successfully over the company's wireless network. When he plugs in at his desk in his cubicle, however, e-mail messages just don't get through. Typically, you'd test everything before suspecting EMI or RFI causing this problem. Test the NIC on the laptop by plugging into a known-good port. You'd use a cable tester on the cable. You'd check for continuity between the port in his office to the switch. You'd glance at the cabling certification documents to see that yes, the cable worked when installed. Only then might a creative tech at her wit's end notice the recently installed, high-powered WAP on the wall outside Tom's office. RFI strikes! If the installation is new and unproven, a perfectly fine network device might be unreachable because of interface errors, meaning that
the installer didn't install the wall jack correctly. The resulting incorrect termination might be a mismatched standard (568A rather than 568B, for example). The cable from the wall to the workstation might be bad or might be a crossover cable rather than straight-through cable. That's an incorrect cable type, according to the CompTIA Network+ objectives. Try another cable.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check
the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost.
Exam tip: CompTIA continues to include duplex/speed mismatch as a common network issue, although that's not how networks work today. Every NIC, switch, and router features autosensing and autonegotiating ports. You plug two devices in and, as long as they're not otherwise misconfigured, they'll run at the same speed—most likely at full duplex. It's important to note that if the speeds on the two NICs are mismatched,
the link will not come up, but if it's just the duplex that's mismatched, the link will come up but the connection will be erratic. Look for this "common error" on the exam, but not in the real world.
WAN Problems: Problems that stop users from accessing content across a WAN, like the Internet, can originate at (6)
the local machine, switches within the LAN, routers that interconnect the WAN, switches within the distant network, and the distant machine itself. As you might infer from the opening scenario, some of these common network problems you can fix, and some you cannot. We discussed many remote connectivity problems and solutions way back in Chapter 13, so I won't rehash them here.
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory. This step comes down to experience—or good use of the support tools at your disposal, such as your knowledge base. You need to select
the most probable cause from all the possible causes, so the solution you choose fixes the problem the first time. This may not always happen, but whenever possible, you want to avoid spending a whole day stabbing in the dark while the problem snores softly to itself in some cozy, neglected corner of your network.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a wire map problem, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example. Fourth, the cable might experience crosstalk, where the electrical signal bleeds from one wire pair to another, creating interference. Fifth, a broken cable might pick up noise, spurious signals usually caused by faulty hardware or poorly crimped jacks. Finally, a broken cable might have impedance mismatch. Impedance is
the natural electrical resistance of a cable. When cables of different types—think thickness, composition of the metal, and so on—connect and the flow of electrons is not uniform, it can cause a unique type of electrical noise, called an echo.
Beyond Local - Escalate: Proxy ARP: Proxy ARP is the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on
the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to send an ARP request to get the IP address. Your VPN device is designed to act as a proxy for all that type of data.
The vast majority of cabling problems occur when
the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects.
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"The type of question you ask at any given moment depends on what information you need and on the user's knowledge level. If, for example, the user seems to be technically oriented, you will probably be able to ask more close-ended questions because they will know what you are talking about. If, on the other hand, the user seems to be confused about what's happening, open-ended questions will allow him or her to explain in his or her own words what is going on. One of the first steps in trying to determine the cause of a problem is to understand the extent of the problem. Is it specific to one user or is it network-wide? Sometimes this entails trying the task yourself, both from the user's machine and from your own or another machine. For example, if a user is experiencing problems logging into the network, you might need to go to that user's machine and try to use his or her user name to log in. In other words, try to duplicate the problem. Doing this tells you whether the problem is a user error of some kind, as well as enables you to see the symptoms of the problem yourself. Next, you probably want to try logging in with your own user name from that machine, or have the user try to log in from another machine. In some cases, you can ask other users in the area if they are experiencing the same problem to see if the issue is affecting more than one user. Depending on the size of your network, you should find out whether the problem is occurring in only one part of your company or across the entire network. What does all of this tell you? Essentially, it tells you how big the problem is. If nobody in an entire remote office can log in, you may be able to assume that the problem is
the network link or router connecting that office to the server. If nobody in any office can log in, you may be able to assume the server is down or not accepting logins. If only that one user in that one location can't log in, the problem may be with that user, that machine, or that user's account.
Beyond Local - Escalate: Broadcast Storms: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well. The trick is to isolate; that's where escalation comes in. You need to break down the network quickly by unplugging devices until you can find the one causing trouble. Getting a packet analyzer to work can be difficult, but at least try. If you can scoop up one packet, you'll know what node is causing the trouble. The second the bad node is disconnected,
the network returns to normal. But if you have a lot of machines to deal with and a bunch of users who can't get on the network yelling at you, you'll need help. Call a supervisor to get support to solve the crisis as quickly as possible.
Troubleshooting process: Establish a Theory of Probable Cause:Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. You might imagine the reverse model in some situations. If __, for example, running through some of the basic connectivity at Layers 1 and 2 might be a good first approach.
the network was newly installed
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either
the next router or the connections between them.
Troubleshooting process: Establish a Plan of Action and Identify Potential Effects: By this point, you should have some ideas as to what the problem might be. It's time to "look before you leap" and establish a plan of action to resolve the problem. An action plan defines how you are going to fix this problem. Most problems are simple, but if the problem is complex, you need to write down the steps. As you do this, think about what else might happen as you go about the repair. Identify
the potential effects of the actions you're about to take, especially the unintended ones. If you take out a switch without a replacement switch at hand, the users might experience excessive downtime while you hunt for a new switch and move them over. If you replace a router, can you restore all the old router's settings to the new one or will you have to rebuild from scratch?
Beyond Local - Escalate: Proxy ARP: Proxy ARP is
the process of making remotely connected computers truly act as though they are on the same LAN as local computers. Proxy ARP is done in a number of different ways, with a Virtual Private Network (VPN) as the classic example. If a laptop in an airport connects to a network through a VPN, that computer takes on the network ID of your local network. In order for all of this to work, the VPN concentrator needs to allow some very LAN-type traffic to go through it that would normally never get through a router. ARP is a great example. If your VPN client wants to talk to another computer on the LAN, it has to send an ARP request to get the IP address. Your VPN device is designed to act as a proxy for all that type of data.
The end-to-end principle meant originally that applications and work should happen only at the endpoints in a network. In the early days of networking, this made a lot of sense. Connections weren't always fully reliable and thus were not good for real-time activity. So the work should get done by the computers at the ends of a network connection. The Internet was founded on the end-to-end principle. With modern networks like the Internet, the end-to-end concept has had to evolve. Clearly, anything you do over the Internet goes through many different machines. So, perhaps end-to-end means that the intermediary devices simply don't change the essential data in packets that flow through them. Add in today, though, the fact that plenty of intermediaries want to do a lot of things to your data as it flows through their devices. Thieves want to steal information. Merchants want to sell you things. Advertisers want to intrude on your monitor. Government agencies want to control what you can see or do, or simply want to monitor what you do for later, perhaps benign purposes. Other intermediaries help create trust bonds between your computer and a secure site so that e-commerce can function. That dynamic between the fundamental principle of work only happening on the ends of the connection and all the intermediaries facilitating, pilfering, or punctuating is the current state of the Internet. It's the basic tension between ISP companies that want to build in tiered profit structures and the consumers and creators who want Net Neutrality. As a common issue, end-to-end connectivity refers to connecting users with essential resources within a smaller network, such as a LAN or a private WAN. In such a scenario, the job of the tech is to ensure connections happen fully. Make sure (2)
the proper ports are open on an application server. Make sure the right people have the right permissions to access resources and that white list and black list ACLs are set up correctly.
Networks need __ and __, but most network techs tend to view these issues as outside of the normal places to look for problems. That's too bad, because both heat and power problems invariably manifest themselves as intermittent problems.
the proper temperature; adequate power
Everyone in the local office appears to have full access to local and Internet Web sites. No one, however, can reach a company-operated server at a particular remote site in Istanbul. There has been a recent change to the firewall configuration, so it is up to technician Terry to determine if the firewall change is the culprit or if the problem lies elsewhere. Terry has come up with three possible theories:
the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the ping and ping6 utilities. First he pings localhost to confirm the workstation has a working IP stack, then he attempts to ping the remote server and gets no response. Next, he tries the tracert and traceroute utilities on the different hosts. Traceroute shows a functional path to the router that connects the remote office to the Internet, but does not get a response from the server.
Beyond Local - Escalate: A broadcast storm is
the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well.
The netstat utility displays information on the current state of all the running IP processes on a system. It shows what sessions are active and can also provide statistics based on ports or protocols (TCP, UDP, and so on). Typing netstat by itself only shows current sessions. Typing netstat -r shows
the routing table (100 percent identical to route print). If you want to know about your current sessions, netstat is the tool to use.
Caution: No matter what the problem, always consider __ first. __
the safety of your data; Ask yourself this question before performing any troubleshooting action: "Can what I'm about to do potentially damage my data?"
Troubleshooting: First, identify the problem. That means grasping the true problem, rather than what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem. Gather information about the situation. If you are working directly on the affected system and not relying on somebody on the other end of a telephone to guide you, you will identify symptoms through your observation of what is (or isn't) happening. If you're troubleshooting over the telephone (always a joy, in my experience), you will need to question users. These questions can be close-ended, which is to say there can only be a yes-or-no-type answer, such as, "Can you see a light on the front of the monitor?" You can also ask open-ended questions, such as, "What have you already tried in attempting to fix the problem?"The type of question you ask at any given moment depends on what information you need and on the user's knowledge level. If, for example, the user seems to be technically oriented, you will probably be able to ask more close-ended questions because they will know what you are talking about. If, on the other hand, the user seems to be confused about what's happening, open-ended questions will allow him or her to explain in his or her own words what is going on. One of the first steps in trying to determine the cause of a problem is to understand the extent of the problem. Is it specific to one user or is it network-wide? Sometimes this entails trying the task yourself, both from the user's machine and from your own or another machine. For example, if a user is experiencing problems logging into the network, you might need to go to that user's machine and try to use his or her user name to log in. In other words, try to duplicate the problem. Doing this tells you whether the problem is a user error of some kind, as well as enables you to see the symptoms of the problem yourself. Next, you probably want to try logging in with your own user name from that machine, or have the user try to log in from another machine. In some cases, you can ask other users in the area if they are experiencing the same problem to see if the issue is affecting more than one user. Depending on the size of your network, you should find out whether the problem is occurring in only one part of your company or across the entire network. What does all of this tell you? Essentially, it tells you how big the problem is. If nobody in an entire remote office can log in, you may be able to assume that the problem is the network link or router connecting that office to the server. If nobody in any office can log in, you may be able to assume
the server is down or not accepting logins. If only that one user in that one location can't log in, the problem may be with that user, that machine, or that user's account.
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory. This step comes down to experience—or good use of the support tools at your disposal, such as your knowledge base. You need to select the most probable cause from all the possible causes, so
the solution you choose fixes the problem the first time. This may not always happen, but whenever possible, you want to avoid spending a whole day stabbing in the dark while the problem snores softly to itself in some cozy, neglected corner of your network.
As you'll recall from back in Chapter 18, "Managing Risk," a port scanner is a program that probes ports on another system, logging
the state of the scanned ports. These tools are used to look for unintentionally opened ports that might make a system vulnerable to attack. As you might imagine, they also are used by hackers to break into systems.
LAN Problems: Link Aggregation Problems: NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is to turn off a feature called IGMP snooping, which is enabled by default on Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't. The problem with turning off IGMP snooping is that the switches won't map and filter multicast traffic. Instead of only sending to the devices that are set up to receive multicast,
the switches will treat multicast messages as broadcast messages and send them to everybody. This is a NIC teaming misconfiguration that can seriously degrade network performance.
LAN Problems: Link Aggregation Problems: NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is to turn off a feature called IGMP snooping, which is enabled by default on Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't. The problem with turning off IGMP snooping is that
the switches won't map and filter multicast traffic. Instead of only sending to the devices that are set up to receive multicast, the switches will treat multicast messages as broadcast messages and send them to everybody. This is a NIC teaming misconfiguration that can seriously degrade network performance.
Sometimes you need to perform a ping or traceroute from a location outside of the local environment. Looking glass sites are remote servers accessible with a browser that contain common collections of diagnostic tools such as ping and traceroute, plus some Border Gateway Protocol (BGP) query tools. Most looking glass sites allow you to select where the diagnostic process will originate from a list of locations, as well as (3)
the target destination, which diagnostic, and sometimes the version of IP to test. A Google search for "looking glass sites" will provide a large selection from which to choose.
Everyone in the local office appears to have full access to local and Internet Web sites. No one, however, can reach a company-operated server at a particular remote site in Istanbul. There has been a recent change to the firewall configuration, so it is up to technician Terry to determine if the firewall change is the culprit or if the problem lies elsewhere. Terry has come up with three possible theories: the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the ping and ping6 utilities. First he pings localhost to confirm the workstation has a working IP stack, then he attempts to ping the remote server and gets no response. Next, he tries
the tracert and traceroute utilities on the different hosts. Traceroute shows a functional path to the router that connects the remote office to the Internet, but does not get a response from the server.
So here's the common network error with LACP setups. An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of NIC teaming misconfiguration. NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down,
the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on.
Throughput testers enable you to measure the data flow in a network. Which tool is appropriate depends on
the type of network throughput you want to test. Most techs use one of several speed-test sites for checking an Internet connection's throughput, such as MegaPath's Speakeasy Speed Test (Figure 21-8): www.speakeasy.net/speedtest. The CompTIA Network+ exam objectives refer to throughput testers as bandwidth speed testers.
Troubleshooting process: Establish a Plan of Action and Identify Potential Effects: By this point, you should have some ideas as to what the problem might be. It's time to "look before you leap" and establish a plan of action to resolve the problem. An action plan defines how you are going to fix this problem. Most problems are simple, but if the problem is complex, you need to write down the steps. As you do this, think about what else might happen as you go about the repair. Identify the potential effects of the actions you're about to take, especially
the unintended ones. If you take out a switch without a replacement switch at hand, the users might experience excessive downtime while you hunt for a new switch and move them over. If you replace a router, can you restore all the old router's settings to the new one or will you have to rebuild from scratch?
Troubleshooting process: Verify Full System Functionality and Implement Preventative Measures: Okay, now that you have changed something on the system in the process of solving one problem, you must think about the wider repercussions of what you have done. If you've replaced a faulty NIC in a server, for instance, will the fact that the MAC address has changed (remember, it's built into the NIC) affect anything else, such as the logon security controls or your network management and inventory software? If you've installed a patch on a client PC, will this change the default protocol or any other default settings that may affect other functionality? If you've changed a user's security settings, will this affect his or her ability to access other network resources? This is part of testing your solution to make sure it works properly, but it also makes you think about the impact of your work on the system as a whole. Make sure you verify full system functionality. If you think you fixed the problem between Martha's workstation and the database server, have her open the database while you're still there. That way you don't have to make a second tech call to resolve an outstanding issue. This saves time and money and helps your customer do his or her job better. Everybody wins. Also at this time, if applicable, implement preventative measures to avoid a repeat of the problem. If that means you need to educate
the user to do or not do something, teach him or her tactfully. If you need to install software or patch a system, do it now.
Troubleshooting process: Verify Full System Functionality and Implement Preventative Measures: Okay, now that you have changed something on the system in the process of solving one problem, you must think about
the wider repercussions of what you have done. If you've replaced a faulty NIC in a server, for instance, will the fact that the MAC address has changed (remember, it's built into the NIC) affect anything else, such as the logon security controls or your network management and inventory software? If you've installed a patch on a client PC, will this change the default protocol or any other default settings that may affect other functionality? If you've changed a user's security settings, will this affect his or her ability to access other network resources? This is part of testing your solution to make sure it works properly, but it also makes you think about the impact of your work on the system as a whole.
No single person is truly in control of an entire Internet-connected network. Large organizations split network support duties into very skill-specific areas: routers, cable infrastructure, user administration, and so on. Even in a tiny network with a single network support person, problems will arise that go beyond the tech's skill level or that involve equipment the organization doesn't own (usually it's __'s gear). In these situations, the tech needs to identify the problem and, instead of trying to fix it on his or her own, escalate the issue.
their ISP
Beyond Local - Escalate: Switching Loops: Also known as a bridging loop, a switching loop is when you connect and configure multiple switches together in such a way that causes a circular path to appear. Switching loops are rare because all switches use the Spanning Tree Protocol (STP), but they do happen. The symptoms are identical to a broadcast storm: every computer on the broadcast domain can no longer access the network. The good part about switching loops is that
they rarely take place on a well-running network. Someone had to break something, and that means someone, somewhere is messing with the switch configuration. Escalate the problem, and get the team to help you find the person making changes to the switches.
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you where the break is on the cable (Figure 21-2). A TDR works with copper cables and an OTDR works with fiber optics, but otherwise
they share the same function. If a problem shows itself as a disconnect and you've first checked easier issues that would manifest as disconnects, such as loss of permissions, an unplugged cable, or a server shut off, then think about using these tools.
LAN Problems: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but
they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user.
Hands-on problems refer to
things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make
those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module.
Make the CompTIA Network+ exam (and real life) easier by separating your software tools into two groups:
those that come built into every operating system and those that are third-party tools. Typical built-in tools are tracert/traceroute, ipconfig/ifconfig/ip, arp, ping, arping, pathping, nslookup/dig, route, and netstat/ss. Third-party tools fall into the categories of packet sniffers, port scanners, throughput testers, and looking glass sites.
LAN Problems: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an unresponsive service and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear
to be down for the user.
Tone probes and their partners, tone generators, have only one job:
to help you locate a particular cable. You'll never use a tone probe without a tone generator.
Beyond Local - Escalate: Broadcast Storms: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well. The trick is
to isolate; that's where escalation comes in. You need to break down the network quickly by unplugging devices until you can find the one causing trouble. Getting a packet analyzer to work can be difficult, but at least try. If you can scoop up one packet, you'll know what node is causing the trouble. The second the bad node is disconnected, the network returns to normal. But if you have a lot of machines to deal with and a bunch of users who can't get on the network yelling at you, you'll need help. Call a supervisor to get support to solve the crisis as quickly as possible.
LAN Problems: Link Aggregation Problems: So here's the common network error with LACP setups. An aggregated connection set to active on both ends (active-active) automatically talks, negotiates, and works. One set to active on one end and passive on the other (active-passive) will talk, negotiate, and work. But if you set both sides to passive (passive-passive), neither will initiate the conversation and LACP will not engage. Setting both ends to passive when you want to use LACP is an example of NIC teaming misconfiguration. NIC teaming provides many more benefits than just increasing bandwidth, such as redundancy. You can team two NICs in a logical unit, but set them up with one NIC as the primary—live—and the second as the hot spare—standby. If the first NIC goes down, the traffic will automatically flow through the second NIC. In a simple network setup for redundancy, you'd make one connection live and the other as standby on each device. Switch A has a live and a standby, Switch B has a live and a standby, and so on. The key here is that multicast traffic to the various devices needs to be enabled on every device through which that traffic might pass. If Switch C doesn't play nice with multicast and it's connected to Switch B, this can cause multicast traffic to stop. One "fix" for this in a Cisco network is
to turn off a feature called IGMP snooping, which is enabled by default on Cisco switches. IGMP snooping is normally a good thing, because it helps the switches keep track of devices that use multicast and filter traffic away from devices that don't.
You'll never use a tone probe without a
tone generator.
Troubleshooting process: Establish a Theory of Probable Cause: Once you've identified one or more problems, try to figure out what could have happened. In other words, establish a theory of probable cause. Just keep in mind that a theory is not a fact. You might need to chuck the theory out the window later in the process and establish a revised theory. This step comes down to experience—or good use of the support tools at your disposal, such as your knowledge base. You need to select the most probable cause from all the possible causes, so the solution you choose fixes the problem the first time. This may not always happen, but whenever possible, you want to avoid spending a whole day stabbing in the dark while the problem snores softly to itself in some cozy, neglected corner of your network. Don't forget to question the obvious. If Bob can't print to the networked printer, for example, check to see that the printer is plugged in and turned on. Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a __ or __ approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help.
top-to-bottom; bottom-to-top OSI model
The traceroute utility (the command in Windows is tracert) is used to
trace all of the routers between two points. Use traceroute to diagnose where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them.
My Traceroute (mtr) is a dynamic (keeps running) equivalent to
traceroute
The__ utility is used to trace all of the routers between two points.
traceroute
The traceroute command defaults to IPv4, but also functions well in an IPv6 network. In Windows, use the command with the -6 switch: tracert -6. In UNIX/Linux, use
traceroute6(or traceroute -6 in some variants of Linux).
Make the CompTIA Network+ exam (and real life) easier by separating your software tools into two groups: those that come built into every operating system and those that are third-party tools. Typical built-in tools are (9)
tracert/traceroute, ipconfig/ifconfig/ip, arp, ping, arping, pathping, nslookup/dig, route, and netstat/ss. Third-party tools fall into the categories of packet sniffers, port scanners, throughput testers, and looking glass sites.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settingscan lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users. Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router. One excellent tool for determining a router problem beyond your LAN is
tracert/traceroute.
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although
tracking down the culprit can take time and patience.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to link lights when you have a "hardware failure." The network connection LED status indicators—link lights—can quickly point to a connectivity issue. Try known good cables/NICs if you run into this issue. Hot-swappable __ (which you read about way back in Chapter 4, "Modern Ethernet") can go bad.
transceivers
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would
tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment.
As you'll recall from back in Chapter 18, "Managing Risk," a port scanner is a program that probes ports on another system, logging the state of the scanned ports. These tools are used to look for
unintentionally opened ports that might make a system vulnerable to attack. As you might imagine, they also are used by hackers to break into systems.
LAN Problems: Misconfigurations of server settings can block all or some access to resources on a LAN. Misconfigured DHCP settings on a host above can cause problems, but they will be limited to the host. If these settings are misconfigured on the DHCP server, however, many more machines and people can be affected. A misconfigured DNS server might direct hosts to incorrect sites or no sites at all. It might appear as an __ and just do nothing. Misconfigured DNS settings on a client results in names not resolving and causes the network to appear to be down for the user.
unresponsive service
WAN Problems: Certificate Problems: SSL/TLS certificates have expiration dates and companies need to maintain them properly. If you get complaints from clients that the company Web site is giving their browsers __ errors, chances are that the certificate has expired. The fix for that is pretty simple—update the certificate.
untrusted SSL certificate
WAN Problems: Certificate Problems: SSL/TLS certificates have expiration dates and companies need to maintain them properly. If you get complaints from clients that the company Web site is giving their browsers untrusted SSL certificate errors, chances are that the certificate has expired. The fix for that is pretty simple—
update the certificate.
Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to
upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent?
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario. To enable LACP between two devices, such as the switch and file server just noted, each device needs two or more interconnected network interfaces configured for LACP. When the two devices interact, they will make sure they can communicate over multiple physical ports at the same speeds and form a single logical port that takes advantage of the full combined bandwidth (Figure 21-12). Those ports can be in one of two modes: active or passive. Active ports want to
use LACP and send special frames out trying to initiate creating an aggregated logical port. Passiveports wait for active ports to initiate the conversation before they will respond.
Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to
use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario.
Terry has come up with three possible theories: the remote server is down, the remote site is inaccessible, or the local firewall is preventing communication with the server. He elects to test his theories with the "quickest to test" approach. His first test is to confirm that all of the local office workstations cannot reach the remote server. Using different hosts, he uses the ping and ping6 utilities. First he pings localhost to confirm the workstation has a working IP stack, then he attempts to ping the remote server and gets no response. Next, he tries the tracert and traceroute utilities on the different hosts. Traceroute shows a functional path to the router that connects the remote office to the Internet, but does not get a response from the server. So far, everything seems to confirm that the local office cannot get to the remote server. Just to be able to say he tried everything, Terry runs the mtr utility from a Linux box and lets it run for an extended time. At the same time, he runs the pathping utility from a Windows computer. Neither utility can contact the server. He tries all of these utilities on some other company resources and Internet sites and has no problems connecting. Confident that the reported symptom is confirmed, Terry puts in a call to the remote site to ask about the status. The virtual PBX sends Terry to voicemail for every extension that he calls. This could point to a network disconnection at the site or to everyone being out of the office there. Since it is 3:00 a.m. at the remote site, Terry does not have a clear answer. The next quick test to perform is to see if the site is reachable from outside of the local office. This will confirm or eliminate his theory of a local incorrect host-based firewall settingsissue. Terry sits down at a computer and searches on Google for a looking glass site. He selects one from the results list and browses to the site. Once in the site, he selects the location of a source router to perform a diagnostic test, and then he selects the type of test to run; in this case, he chooses a ping test. He enters the target server address of the company remote server and submits the test parameters. After a moment, the looking glass server sends a set of pings, none of which receives a response. He tries the test from a few other source router locations and gets the same results. To complete his tests, Terry
uses the looking glass site to ping some additional hosts at the remote site and is pleased to discover that they are all reachable. Now Terry knows that the site is accessible, so it must be that the server is down. When the office opens, he will contact the technician there and offer whatever help and information that he can. In the meantime, he informs the rest of the organization of the server's status.
Exam tip: Eliminating __ is one of the first tools in your arsenal of diagnostic techniques.
variables
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settings can lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users.Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to
verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router.
The arp utility enables you to
view and change the ARP table on a computer.
Multimeters test (3)
voltage (both AC and DC), resistance, and continuity. They are the unsung heroes of cabling infrastructures because no other tool can tell you how much voltage is on a line. They are also a great fallback for continuity testing when you don't have a cable tester handy.
Networks need the proper temperature and adequate power, but most network techs tend to view these issues as outside of the normal places to look for problems. That's too bad, because both heat and power problems invariably manifest themselves as intermittent problems. Look for problems that might point to heat or power issues: server rooms that get too hot at certain times of the day, switches that fail whenever an air conditioning system kicks on, and so on. You can use a __ and a __ to monitor server rooms over time to detect and record issues with electricity or heat, respectively.
voltage quality recorder; temperature monitor
LAN Problems: Link Aggregation Problems: Ethernet networks (traditionally) don't scale easily. If you have a Gigabit Ethernet connection between the main switch and a very busy file server, that connection by definition can handle up to 1 Gbps bandwidth. If that connection becomes saturated, the only way to bump up the bandwidth cap on that single connection would be to upgrade both the switch and the server NIC to the next higher Ethernet standard, 10-Gigabit Ethernet. That's a big jump and an expensive one, plus it's an upgrade of 1000 percent! What if you needed to bump bandwidth up by only 20 percent? The scaling issue became obvious early on, so manufacturers came up with ways to use multiple NICs in tandem to increase bandwidth in smaller increments, what's called link aggregation or NIC teaming. Numerous protocols enable two or more connections to work together simultaneously, such as the vendor-neutral IEEE 802.3ad specification Link Aggregation Control Protocol (LACP) and the Cisco-proprietary Port Aggregation Protocol (PAgP). Let's focus on the former for a common network issue scenario. To enable LACP between two devices, such as the switch and file server just noted, each device needs two or more interconnected network interfaces configured for LACP. When the two devices interact, they will make sure they can communicate over multiple physical ports at the same speeds and form a single logical port that takes advantage of the full combined bandwidth (Figure 21-12). Those ports can be in one of two modes: active or passive. Active ports want to use LACP and send special frames out trying to initiate creating an aggregated logical port. Passive ports
wait for active ports to initiate the conversation before they will respond.
Troubleshooting process: Establish a Theory of Probable Cause: Consider multiple approaches when tackling problems. This will keep you from locking your imagination into a single train of thought. You can use the OSI seven-layer model as a troubleshooting tool in several ways to help with this process. Here's a scenario to work through. Martha can't access the database server to start her workday. The problem manifests this way: She opens the database client on her computer, then clicks on recent documents, one of which is the current project that management has assigned to her team. Nothing happens. Normally, the database client will connect to the database that resides on the server on the other side of the network. Try a top-to-bottom or bottom-to-top OSI model approach to the problem. Sometimes it pays to try both. Here are some ideas on how this might help. You might imagine the reverse model in some situations. If the network was newly installed, for example, running through some of the basic connectivity at Layers 1 and 2 might be a good first approach. Another option for tackling multiple options is to use the divide and conquer approach.On its face, divide and conquer appears to be a compromise between top-to-bottom OSI troubleshooting and bottom-to-top OSI troubleshooting. But it's better than a compromise. If we arbitrarily always perform top-to-bottom troubleshooting,
we'll waste a lot of time at Layers 7 through 3 to troubleshoot Data Link layer and Physical layer issues. Divide and conquer is a time saver that comes into play as part of developing a theory of probable cause. As you gather information for troubleshooting, a general sense of where the problem lies should manifest. Place this likely cause at the appropriate layer of the OSI model and begin to test the theory and related theories at that layer. If the theory bears out, follow the appropriate troubleshooting steps. If the theory is wrong, move up or down the OSI model with new theories of probable causes.
Exam tip: The iptables utility in Linux enabled command-line control over IPv4 tables, rules that determine
what happens with an IPv4 packet when it encounters a firewall.
Troubleshooting process: Implement the Solution or Escalate as Necessary: Once you think you have isolated the cause of the problem, you should decide what you think is the best way to fix it and then implement the solution, whether that's giving advice over the phone to a user, installing a replacement part, or adding a software patch. Or, if the solution you propose requires either more skill than you possess at the moment or falls into someone else's purview, escalate as necessary to get the fix implemented. If you're the implementer, follow these guidelines. All the way through implementation, try only one likely solution at a time. There's no point in installing several patches at once, because then you can't tell which one fixed the problem. Similarly, there's no point in replacing several items of hardware (such as a hard disk and its controller cable) at the same time, because then you can't tell which part (or parts) was faulty. As you try each possibility, always document what you do and what results you get. This isn't just for a future problem either—during a lengthy troubleshooting process, it's easy to forget exactly what you tried two hours before or which thing you tried produced a particular result. Although being methodical may take longer, it will save time the next time—and it may enable you to pinpoint
what needs to be done to stop the problem from recurring at all, thereby reducing future call volume to your support team—and as any support person will tell you, that's definitely worth the effort!
Troubleshooting process: First, identify the problem. That means grasping the true problem, rather than
what someone tells you. A user might call in and complain that he can't access the Internet from his workstation, for example, which could be the only problem. But the problem could also be that the entire wing of the office just went down and you've got a much bigger problem on your hands. You need to gather information, duplicate the problem (if possible), question users, identify symptoms, determine if anything has changed on the network, and approach multiple problems individually. Following these steps will help you get to the root of the problem.
A packet sniffer, as you'll recall from Chapter 20, intercepts and logs network packets. You have many choices when it comes to packet sniffers. Some sniffers come as programs you run on a computer, while others manifest as dedicated hardware devices. Most packet sniffers come bundled with a protocol analyzer, the tool that takes the sniffed information and figures out
what's happening on the network. Arguably, the most popular GUI packet sniffer and protocol analyzer is Wireshark (Figure 21-6). You've already seen Wireshark in the book, but here's a screen to jog your memory.
While working through the process of finding a problem's cause, you sometimes need tools. These tools are the software and hardware tools that provide information about your network and enact repairs. I covered a number of tools already: hardware tools like cable testers and crimpers and software utilities like ping and tracert. The trick is knowing
when and how to use these tools to solve your network problems.
Beyond Local - Escalate: A broadcast storm is the result of one or more devices sending a nonstop flurry of broadcast frames on the network. The first sign of a broadcast storm is
when every computer on the broadcast domain suddenly can't connect to the rest of the network. There are usually no clues other than network applications freezing or presenting "can't connect to ..." types of error messages. Every activity light on every node is solidly on. Computers on other broadcast domains work perfectly well.
Beyond Local - Escalate: Switching Loops: Also known as a bridging loop, a switching loop is
when you connect and configure multiple switches together in such a way that causes a circular path to appear. Switching loops are rare because all switches use the Spanning Tree Protocol (STP), but they do happen. The symptoms are identical to a broadcast storm: every computer on the broadcast domain can no longer access the network.
Network technicians use three different devices to deal with broken cables. Cable testerscan tell you if you have a continuity problem or if a wire map isn't correct (Figure 21-1). Time domain reflectometers (TDRs) and optical time domain reflectometers (OTDRs) can tell you
where the break is on the cable (Figure 21-2). A TDR works with copper cables and an OTDR works with fiber optics, but otherwise they share the same function. If a problem shows itself as a disconnect and you've first checked easier issues that would manifest as disconnects, such as loss of permissions, an unplugged cable, or a server shut off, then think about using these tools.
Sometimes you need to perform a ping or traceroute from a location outside of the local environment. Looking glass sites are remote servers accessible with a browser that contain common collections of diagnostic tools such as ping and traceroute, plus some Border Gateway Protocol (BGP) query tools. Most looking glass sites allow you to select
where the diagnostic process will originate from a list of locations, as well as the target destination, which diagnostic, and sometimes the version of IP to test. A Google search for "looking glass sites" will provide a large selection from which to choose.
The traceroute utility (the command in Windows is tracert) is used to trace all of the routers between two points. Use traceroute to diagnose
where the problem lies when you have problems reaching a remote system. If a traceroute stops at a certain router, you know the problem is either the next router or the connections between them.
The vast majority of cabling problems occur when the network is first installed or when a change is made. Once a cable has been made, installed, and tested, the chances of it failing are pretty small compared to all of the other network problems that might take place. If you're having trouble connecting to a resource or experiencing performance problems after making a connection, a bad cable likely isn't the culprit. Broken cables don't make intermittent problems, and they don't slow down data. They make permanent disconnects. Network techs define a "broken" cable in numerous ways. First, a broken cable might have an open circuit, where one or more of the wires in a cable simply don't connect from one end of the cable to the other. The signal lacks continuity. Second, a cable might have a short, where one or more of the wires in a cable connect to another wire in the cable. (Within a normal cable, no wires connect to other wires.) Third, a cable might have a __, where one or more of the wires in a cable don't connect to the proper location on the jack or plug. This can be caused by improperly crimping a cable, for example.
wire map problem
LAN Problems: Incorrect configuration of any number of options in devices can stop a device from accessing resources over a LAN. These problems can be simple to fix, although tracking down the culprit can take time and patience. One of the most obvious errors occurs when you're duplicating machines and using static IP addresses. As soon as you plug in the duplicated machine with its duplicate IP address, the network will howl. No two computers can have the same IP address on a broadcast domain. The fix for the problem—after the face-palm—is to change the IP address on the new machine either to an unused static IP or to DHCP. A related issue comes from duplicate MAC addresses, something that can happen when
working with virtual machines or, rarely, as a result of a manufacturing error. The effect is the same as duplicate IP addresses. Either put the devices on different VLANs or swap out NICs to avoid duplication.
Troubleshooting process: Establish a Plan of Action and Identify Potential Effects: By this point, you should have some ideas as to what the problem might be. It's time to "look before you leap" and establish a plan of action to resolve the problem. An action plan defines how you are going to fix this problem. Most problems are simple, but if the problem is complex, you need to
write down the steps. As you do this, think about what else might happen as you go about the repair. Identify the potential effects of the actions you're about to take, especially the unintended ones. If you take out a switch without a replacement switch at hand, the users might experience excessive downtime while you hunt for a new switch and move them over. If you replace a router, can you restore all the old router's settings to the new one or will you have to rebuild from scratch?
No single person is truly in control of an entire Internet-connected network. Large organizations split network support duties into very skill-specific areas: routers, cable infrastructure, user administration, and so on. Even in a tiny network with a single network support person, problems will arise that go beyond the tech's skill level or that involve equipment the organization doesn't own (usually it's their ISP's gear). In these situations, the tech needs to identify the problem and, instead of trying to fix it on his or her own, escalate the issue. In network troubleshooting, problem escalation should occur when
you face a problem that falls outside the scope of your skills and you need help. In large organizations, escalation problems have very clear procedures, such as who to call and what to document. In small organizations, escalation often is nothing more than a technician realizing that he or she needs help. The CompTIA Network+ exam objectives define some classic networking situations that CompTIA feels should be escalated. Here's how to recognize broadcast storms, switching loops, routing problems, and proxy ARP.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settingscan lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users. Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router. One excellent tool for determining a router problem beyond your LAN is tracert/traceroute. Run traceroute to your default gateway. (You can also use ping to check connectivity.) If that fails, you know
you have a local issue and can potentially do something about it. If the traceroute comes back positive, run it to a site on the Internet. A solid connection should return something like Figure 21-13. A failed route will return a failed response.
Troubleshooting process: Test the Theory to Determine Cause: With the third step, you need to test the theory to determine the cause but do so without changing anything or risking any repercussions. If you have determined that the probable cause for Bob not being able to print is that the printer is turned off, go look. If that's the case, then you should plan out your next step to resolve the problem. Do not act yet! That comes next. If the theory is not confirmed, you need to reestablish a new theory or escalate the problem.Go back to step two and determine a new probable cause. Once you have another idea, test it. The reason you should hesitate to act at this third step is that
you might not have permission to make the fix or the fix might cause repercussions you don't fully understand yet. For example, if you walk over to the print server room to see if the printer is powered up and online and find the door padlocked, that's a whole different level of problem. Sure, the printer is turned off, but management has done it for a reason. In this sort of situation, you need to escalate the problem.
Hands-on problems refer to things that you can fix at the workstation, work area, or server. These include physical problems and configuration problems. A power failure or power anomalies, such as dips and surges, can make a network device unreachable. We've addressed the fixes for such issues a couple of times already in this book: manage the power to the network device in question and install an uninterruptible power supply (UPS). A hardware failure can certainly make a network device unreachable. Fall back on your CompTIA A+ training for troubleshooting. Check the link lights on the NIC. Try another NIC if the machine seems functional in every other aspect. Ping the localhost. Pay attention to link lights when you have a "hardware failure." The network connection LED status indicators—link lights—can quickly point to a connectivity issue. Try known good cables/NICs if you run into this issue. Hot-swappable transceivers (which you read about way back in Chapter 4, "Modern Ethernet") can go bad. The key when working with small form-factor pluggable (SFP) or the much older gigabit interface converter (GBIC) transceivers is that
you need to check both the media and the module. In other words, a seemingly bad SFP/GBIC could be the cable connected to it or the transceiver. As with other hardware issues, try known-good components to troubleshoot.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch. As a result, techs would tweak their MTU settings to improve throughput by matching up the MTU sizes between the ISP and their own network. This usually required a manual registry setting adjustment. Around 2007, Path MTU Discovery (PMTU), a method to determine the best MTU setting automatically, was created. PMTU works by adding a new feature called the "Don't Fragment (DF) flag" to the IP packet. A PMTU-aware operating system can automatically send a series of fixed-size ICMP packets (basically just pings) with the DF flag set to another device to see if it works. If it doesn't work, the system lowers the MTU size and tries again until the ping is successful. Imagine the hassle of incrementing the MTU size manually. That's the beauty of PMTU—you can automatically set your MTU size to the perfect amount. Unfortunately, PMTU runs under ICMP; most routers have firewall features that, by default, are configured to block ICMP requests, making PMTU worthless. This is called a PMTU or MTU black hole. If you're having terrible connection problems and you've checked everything else,
you need to consider this issue. In many cases, going into the router and turning off ICMP blocking in the firewall is all you need to do to fix the problem.
WAN Problems: ISPs and MTUs: I discussed the maximum transmission unit (MTU) in Chapter 7, "Routing." Back in the dark ages (before Windows Vista), Microsoft users often found themselves with terrible connection problems because IP packets were too big to fit into certain network protocols. The largest Ethernet packet is 1500 bytes, so some earlier versions of Windows set their MTU size to a value less than 1500 to minimize the fragmentation of packets. The problem cropped up when
you tried to connect to a technology other than Ethernet, such as DSL. Some DSL carriers couldn't handle an MTU size greater than 1400. When your network's packets are so large that they must be fragmented to fit into your ISP's packets, we call it an MTU mismatch.
WAN Problems: Router Problems: Routers enable networks to connect to other networks, which you know well by now. Problems with routers simply make those connections not work. (Recall that physical problems with routers or router interface modules were covered in Chapter 8 and Chapter 13.) Loss of power or a bad module can certainly wreck a tech's day, but the fixes are pretty simple: provide power or replace the module. Router configuration issues can be a bit trickier. The ways to mess up a router are many. You can specify the wrong routing protocol, for example, or misconfigure the right routing protocol. An access control list (ACL) might include addresses to block that shouldn't be blocked or allow access to network resources for nodes that shouldn't have it. Incorrect ACL settingscan lead to blocked TCP/UDP ports that shouldn't be blocked. A misconfiguration can lead to missing IP routes so that some destinations just aren't there for users. Improperly configured routers aren't going to send packets to the proper destination. The symptoms are clear: every system that uses the misconfigured router as a default gateway is either not able to get packets out or not able to get packets in, or sometimes both. Web pages don't come up, FTP servers suddenly disappear, and e-mail clients can't access their servers. In these cases, you need to verify first that everything in your area of responsibility works. If that is true, then escalate the problem and find the person responsible for the router. One excellent tool for determining a router problem beyond your LAN is tracert/traceroute. Run traceroute to
your default gateway. (You can also use ping to check connectivity.) If that fails, you know you have a local issue and can potentially do something about it. If the traceroute comes back positive, run it to a site on the Internet. A solid connection should return something like Figure 21-13. A failed route will return a failed response
Almost every new networking person I teach will, at some point, ask me: "What tools do I need to buy?" My answer shocks them: "None. Don't buy a thing." It's not so much that you don't need tools, but rather that different networking jobs require wildly different tools. Plenty of network techs never crimp a cable. An equal number never open a system. Some techs do nothing all day but pull cable. The tools you need are defined by
your job.