CCNA 4 chapter 8 Network Troubleshooting
Using IP SLA
Cisco IP Service Level Agreements (SLA) generate traffic to measure network performance. Additional benefits include: SLA monitoring, measurement, and verification Measure the jitter, latency, or packet loss in the network IP service network health assessment Edge-to-edge network availability
network documentation 2
Common Cisco IOS commands used for data collection. Manual collection of data should be reserved for smaller networks or limited to mission-critical network devices. Sophisticated network management software is typically used to baseline large and complex networks.
Isolating the Issue Using Layered Models
After gathering symptoms, the network administrator compares the characteristics of the problem to the logical layers of the network to isolate and solve the issue. After gathering symptoms, the network administrator compares the characteristics of the problem to the logical layers of the network to isolate and solve the issue.
Troubleshooting IP Connectivity
Common bottom-up troubleshooting steps for end-to-end connectivity: Step 1. Check physical connectivity Step 2. Check for duplex mismatches. Step 3. Check data link and network layer addressing. Step 4. Verify that the default gateway is correct. Step 5. Ensure that devices are determining the correct path from the source to the destination. Step 6. Verify the transport layer is functioning properly. Step 7. Verify that there are no ACLs blocking traffic. Step 8. Ensure that DNS settings are correct.
network documentation
Documentation is critical to being able to monitor and troubleshoot a network. Documentation includes: Configuration files, including network configuration files and end-system configuration files Physical and logical topology diagrams A baseline performance levels To establish and capture an initial network baseline, perform the following steps: Step 1. Determine what types of data to collect. Step 2. Identify devices and ports of interest. Step 3. Determine the baseline duration
Physical Layer Troubleshooting 3
Cabling faults - Many problems can be corrected by simply reseating cables that have become partially disconnected. When performing a physical inspection, look for damaged cables, improper cable types, and poorly crimped RJ-45 connectors. Suspect cables should be tested or exchanged with a known functioning cable. Attenuation - Attenuation can be caused if a cable length exceeds the design limit for the media, or when there is a poor connection resulting from a loose cable or dirty or oxidized contacts. If attenuation is severe, the receiving device cannot always successfully distinguish one bit in the data stream from another bit. Noise - Local electromagnetic interference (EMI) is commonly known as noise. Noise can be generated by many sources, such as FM radio stations, police radio, building security, and avionics for automated landing, crosstalk (noise induced by other cables in the same pathway or adjacent cables), nearby electric cables, devices with large electric motors, or anything that includes a transmitter more powerful than a cell phone. Interface configuration errors - Many things can be misconfigured on an interface to cause it to go down, such as incorrect clock rate, incorrect clock source, and interface not being turned on. This causes a loss of connectivity with attached network segments. Exceeding design limits - A component may be operating suboptimally at the physical layer because it is being utilized beyond specifications or configured capacity. When troubleshooting this type of problem, it becomes evident that resources for the device are operating at or near the maximum capacity and there is an increase in the number of interface errors. CPU overload - Symptoms include processes with high CPU utilization percentages, input queue drops, slow performance, SNMP timeouts, no remote access, or services such as DHCP, Telnet, and ping are slow or fail to respond. On a switch the following could occur: spanning tree reconvergence, EtherChannel links bounce, UDLD flapping, IP SLAs failures. For routers, there could be no routing updates, route flapping, or HSRP flapping. One of the causes of CPU overload in a router or switch is high traffic. If one or more interfaces are regularly overloaded with traffic, consider redesigning the traffic flow in the network or upgrading the hardware.
Troubleshooting Tools
Common hardware troubleshooting tools include: Digital Multimeters are test instruments that are used to directly measure electrical values of voltage, current, and resistance. Cable Testers are specialized, handheld devices designed for testing the various types of data communication cabling. These devices send signals along the cable and wait for them to be reflected. The time between sending the signal and receiving it back is converted into a distance measurement. Cable Analyzers are multifunctional handheld devices that are used to test and certify copper and fiber cables for different services and standards. Portable Network Analyzers can be plugged in anywhere in the network and used for troubleshooting. Network Analysis Module can capture and decode packets and track response times to pinpoint an application problem to a particular network or server.
Troubleshooting Tools
Common software troubleshooting tools include: Network Management System Tools include device-level monitoring, configuration, and fault-management tools. These tools can be used to investigate and correct network problems. Knowledge Bases from device vendors, combined with Internet search engines like Google, are used by network administrators to access a vast pool of experience-based information. Tools & Resources at http://www.cisco.com provide information on Cisco-related hardware and software. Baselining Tools can draw network diagrams, help keep network software and hardware documentation up-to-date, and help to cost-effectively measure baseline network bandwidth use. Protocol Analyzers are useful to investigate packet content while flowing through the network.
Data Link Layer Troubleshooting 1
Common symptoms of network problems at the data link layer include: No functionality or connectivity at the network layer or above - Some Layer 2 problems can stop the exchange of frames across a link, while others only cause network performance to degrade. Network is operating below baseline performance levels - There are two distinct types of suboptimal Layer 2 operation that can occur in a network. First, the frames take a suboptimal path to their destination but do arrive. In this case, the network might experience high-bandwidth usage on links that should not have that level of traffic. Second, some frames are dropped. These problems can be identified through error counter statistics and console error messages that appear on the switch or router. In an Ethernet environment, an extended or continuous ping also reveals if frames are being dropped. Excessive broadcasts - Operating systems use broadcasts and multicasts extensively to discover network services and other hosts. Generally, excessive broadcasts result from one of the following situations: poorly programmed or configured applications, large Layer 2 broadcast domains, or underlying network problems, such as STP loops or route flapping. Console messages - In some instances, a router recognizes that a Layer 2 problem has occurred and sends alert messages to the console. Typically, a router does this when it detects a problem with interpreting incoming frames (encapsulation or framing problems) or when keepalives are expected but do not arrive. The most common console message that indicates a Layer 2 problem is a line protocol down message. Issues at the data link layer that commonly result in network connectivity or performance problems include:
Network Layer Troubleshooting
Common symptoms of network problems at the network layer include: Network failure - Network failure is when the network is nearly or completely non-functional, affecting all users and applications on the network. These failures are usually noticed quickly by users and network administrators, and are obviously critical to the productivity of a company. Suboptimal performance - Network optimization problems usually involve a subset of users, applications, destinations, or a particular type of traffic. Optimization issues can be difficult to detect and even harder to isolate and diagnose. This is because they usually involve multiple layers, or even a single host computer. Determining that the problem is a network layer problem can take time. In most networks, static routes are used in combination with dynamic routing protocols. Improper configuration of static routes can lead to less than optimal routing. In some cases, improperly configured static routes can create routing loops which make parts of the network unreachable. Troubleshooting dynamic routing protocols requires a thorough understanding of how the specific routing protocol functions. Some problems are common to all routing protocols, while other problems are particular to the individual routing protocol. There is no single template for solving Layer 3 problems. Routing problems are solved with a methodical process, using a series of commands to isolate and diagnose the problem.
Physical Layer Troubleshooting 2
Console error messages - Error messages reported on the device console could indicate a physical layer problem. Issues that commonly cause network problems at the physical layer include: Power-related - Power-related issues are the most fundamental reason for network failure. Also, check the operation of the fans, and ensure that the chassis intake and exhaust vents are clear. If other nearby units have also powered down, suspect a power failure at the main power supply. Hardware faults - Faulty network interface cards (NICs) can be the cause of network transmission errors due to late collisions, short frames, and jabber. Jabber is often defined as the condition in which a network device continually transmits random, meaningless data onto the network. Other likely causes of jabber are faulty or corrupt NIC driver files, bad cabling, or grounding problems.
Data Link Layer Troubleshooting 2
Encapsulation errors - An encapsulation error occurs because the bits placed in a particular field by the sender are not what the receiver expects to see. This condition occurs when the encapsulation at one end of a WAN link is configured differently from the encapsulation used at the other end. Address mapping errors - In topologies, such as point-to-multipoint or broadcast Ethernet, it is essential that an appropriate Layer 2 destination address be given to the frame. This ensures its arrival at the correct destination. To achieve this, the network device must match a destination Layer 3 address with the correct Layer 2 address using either static or dynamic maps. In a dynamic environment, the mapping of Layer 2 and Layer 3 information can fail because devices may have been specifically configured not to respond to ARP requests, the Layer 2 or Layer 3 information that is cached may have physically changed, or invalid ARP replies are received because of a misconfiguration or a security attack. Framing errors - Frames usually work in groups of 8-bit bytes. A framing error occurs when a frame does not end on an 8-bit byte boundary. When this happens, the receiver may have problems determining where one frame ends and another frame starts. Too many invalid frames may prevent valid keepalives from being exchanged. Framing errors can be caused by a noisy serial line, an improperly designed cable (too long or not properly shielded), faulty NIC, duplex mismatch, or an incorrectly configured channel service unit (CSU) line clock. STP failures or loops - The purpose of the Spanning Tree Protocol (STP) is to resolve a redundant physical topology into a tree-like topology by blocking redundant ports. Most STP problems are related to forwarding loops that occur when no ports in a redundant topology are blocked and traffic is forwarded in circles indefinitely, excessive flooding because of a high rate of STP topology changes. A topology change should be a rare event in a well-configured network. When a link between two switches goes up or down, there is eventually a topology change when the STP state of the port is changing to or from forwarding. However, when a port is flapping (oscillating between up and down states), this causes repetitive topology changes and flooding, or slow STP convergence or re-convergence. This can be caused by a mismatch between the real and documented topology, a configuration error, such as an inconsistent configuration of STP timers, an overloaded switch CPU during convergence, or a software defect.
Summary 1
For network administrators to be able to monitor and troubleshoot a network, they must have a complete set of accurate and current network documentation, including configuration files, physical and logical topology diagrams, and a baseline performance level. The three major stages to troubleshooting problems are gather symptoms, isolate the problem, then correct the problem. The OSI model or the TCP/IP model can be applied to a network problem. A network administrator can use the bottom-up method, the top-down method, or the divide-and-conquer method. Common software tools that can help with troubleshooting include network management system tools, knowledge bases, baselining tools, host-based protocol analyzers, and Cisco IOS EPC. Hardware troubleshooting tools include a NAM, digital multimeters, cable testers, cable analyzers, and portable network analyzers. Cisco IOS log information can also be used to identify potential problems. There are characteristic physical layer, data link layer, network layer, transport layer, and application layer symptoms and problems of which the network administrator should be aware. The administrator may need to pay particular attention to physical connectivity, default gateways, MAC address tables, NAT, and routing information.
Network Layer Troubleshooting 2
Here are some areas to explore when diagnosing a possible problem involving routing protocols: General network issues - Often a change in the topology, such as a down link, may have effects on other areas of the network that might not be obvious at the time. This may include the installation of new routes, static or dynamic, or removal of other routes. Determine whether anything in the network has recently changed, and if there is anyone currently working on the network infrastructure. Connectivity issues - Check for any equipment and connectivity problems, including power problems such as outages and environmental problems (for example, overheating). Also check for Layer 1 problems, such as cabling problems, bad ports, and ISP problems. Routing table - Check the routing table for anything unexpected, such as missing routes or unexpected routes. Use debug commands to view routing updates and routing table maintenance. Neighbor issues - If the routing protocol establishes an adjacency with a neighbor, check to see if there are any problems with the routers forming neighbor adjacencies. Topology database - If the routing protocol uses a topology table or database, check the table for anything unexpected, such as missing entries or unexpected entries.
Physical Layer Troubleshooting
Performance lower than baseline - The most common reasons for slow or poor performance include overloaded or underpowered servers, unsuitable switch or router configurations, traffic congestion on a low-capacity link, and chronic frame loss. Loss of connectivity - If a cable or device fails; the most obvious symptom is a loss of connectivity between the devices that communicate over that link or with the failed device or interface. This is indicated by a simple ping test. Intermittent loss of connectivity can indicate a loose or oxidized connection. Network bottlenecks or congestion - If a router, interface, or cable fails, routing protocols may redirect traffic to other routes that are not designed to carry the extra capacity. This can result in congestion or bottlenecks in those parts of the network. High CPU utilization rates - High CPU utilization rates are a symptom that a device, such as a router, switch, or server, is operating at or exceeding its design limits. If not addressed quickly, CPU overloading can cause a device to shut down or fail.
Transport Layer Troubleshooting 2
Selection of transport layer protocol - When configuring ACLs, it is important that only the correct transport layer protocols be specified. Many network administrators, when unsure whether a particular traffic flow uses a TCP port or a UDP port, configure both. Specifying both opens a hole through the firewall, possibly giving intruders an avenue into the network. It also introduces an extra element into the ACL, so the ACL takes longer to process, introducing more latency into network communications. Source and destination ports - Properly controlling the traffic between two hosts requires symmetric access control elements for inbound and outbound ACLs. Address and port information for traffic generated by a replying host is the mirror image of address and port information for traffic generated by the initiating host. Use of the established keyword - The established keyword increases the security provided by an ACL. However, if the keyword is applied incorrectly, unexpected results may occur. Uncommon protocols - Misconfigured ACLs often cause problems for protocols other than TCP and UDP. Uncommon protocols that are gaining popularity are VPN and encryption protocols.
Verifying IP SLA Configuration
Show ip sla configuration show ip statistics
IP SLA Configuration
Step 1 Device> enable Step 2 Device# configure terminal Step 3 Device(config)# ip sla 6 Step 4 Device(config-ip-sla)# icmp-echo 172.29.139.134 : Step 5 Device(config-ip-sla-echo)# frequency 300 Step 6 Device(config-ip-sla-echo)# end
Transport Layer Troubleshooting 1
There are several areas where misconfigurations commonly occur: Selection of traffic flow - Traffic is defined by both the router interface through which the traffic is traveling and the direction in which this traffic is traveling. An ACL must be applied to the correct interface, and the correct traffic direction must be selected to function properly. Order of access control entries - The entries in an ACL should be from specific to general. Although an ACL may have an entry to specifically permit a particular traffic flow, packets never match that entry if they are being denied by another entry earlier in the list. If the router is running both ACLs and NAT, the order in which each of these technologies is applied to a traffic flow is important. Inbound traffic is processed by the inbound ACL before being processed by outside-to-inside NAT. Outbound traffic is processed by the outbound ACL after being processed by inside-to-outside NAT. Implicit deny any - When high security is not required on the ACL, this implicit access control element can be the cause of an ACL misconfiguration. Addresses and IPv4 wildcard masks - Complex IPv4 wildcard masks provide significant improvements in efficiency, but are more subject to configuration errors. An example of a complex wildcard mask is using the IPv4 address 10.0.32.0 and wildcard mask 0.0.32.15 to select the first 15 host addresses in either the 10.0.0.0 network or the 10.0.32.0 network.
Application Layer Troubleshooting 1
most widely known and implemented TCP/IP application layer protocols include: SSH/Telnet - Enables users to establish terminal session connections with remote hosts. HTTP - Supports the exchanging of text, graphic images, sound, video, and other multimedia files on the web. FTP - Performs interactive file transfers between hosts. TFTP - Performs basic interactive file transfers typically between hosts and networking devices. SMTP - Supports basic message delivery services. POP - Connects to mail servers and downloads email. Simple Network Management Protocol (SNMP) - Collects management information from network devices. DNS - Maps IP addresses to the names assigned to network devices. Network File System (NFS) - Enables computers to mount drives on remote hosts and operate them as if they were local drives. Originally developed by Sun Microsystems, it combines with two other application layer protocols, external data representation (XDR) and remote-procedure call (RPC), to allow transparent access to remote network resources. The types of symptoms and causes depend upon the actual application itself. Application layer problems prevent services from being provided to application programs. A problem at the application layer can result in unreachable or unusable resources when the physical, data link, network, and transport layers are functional. It is possible to have full network connectivity, but the application simply cannot provide data. Another type of problem at the application layer occurs when the physical, data link, network, and transport layers are functional, but the data transfer and requests for network services from a single network service or application do not meet the normal expectations of a user. A problem at the application layer may cause users to complain that the network or the particular application that they are working with is sluggish or slower than usual when transferring data or requesting network services.
Common Cisco IOS commands used to gather the symptoms of a network problem.
ping {host | ip-address} Sends an echo request packet to an address and then waits for a reply. The {host | ip-address} variable is the IP alias or IP address of the target system. traceroute {host | ip-address} Identifies the path that a packet takes through the network. The {host | ip-address} variable is the IP alias or IP address of the target system. telnet {host | ip-address} Connects to an IP device using the Telnet application. The {host | ip-address} variable is the IP alias or IP address of the target system. show ip interface [brief] Displays a summary of the status of all interfaces on a device. show ip route Displays the current state of the IP routing table. show running-config Displays the contents of the currently running configuration file. [no] debug ? Displays a list of options for enabling or disabling debugging events on a device.