CCNA4 Chapter 8 Network Troubleshooting

¡Supera tus tareas y exámenes ahora con Quizwiz!

IP SLA Concepts

continually monitor and test the network. The goal is to discover a network failure as early as possible. A useful tool for this task is the Cisco IOS IP Service Level Agreement (SLA). IP SLAs use generated traffic to measure network performance between two networking devices, multiple network locations, or across multiple network paths. In the example in the figure, R1 is the IP SLA source that monitors the connection to the DNS server by periodically sending ICMP requests to the server. Network engineers use IP SLAs to simulate network data and IP services to collect network performance information in real time. Performance monitoring can be done anytime, anywhere, without deploying a physical probe. Note: Ping and traceroute are probe tools. A physical probe is different. It is a device that can be inserted somewhere in the network to collect and monitor traffic. The use of physical probes is beyond the scope of this course. Measurements provided by the various IP SLA operations can be used for troubleshooting networks by providing consistent, reliable measurements that immediately identify problems and save troubleshooting time. There are additional benefits for using IP SLAs: Service-level agreement monitoring, measurement, and verification Network performance monitoring to provide continuous, reliable, and predictable measurements to measure the jitter, latency, or packet loss in the network IP service network health assessment to verify that the existing QoS is sufficient for new IP services Edge-to-edge network availability monitoring for proactive connectivity verification of network resources Multiple IP SLA operations can be running on the network, or on a device, at any time. IP SLA information can be displayed using CLI commands or through SNMP. Note: SNMP notifications based on the data gathered by an IP SLA operation is beyond the scope of this course.

Common software Troubleshooting Tools

Network Management system tool - includes device level monitoring, configuration and fault management. host based protocol analyser - analyses network traffic, specifically source and destination frames. Baseline Establishment tool - tools that documents tasks, draw network diagrams and establish network performance statistics. knowledge base - online repositories of experience based information.

Commands Used for Measuring Data

8.1.1.7 Show version - uptime and information about the device software and hardware. show interfaces - detailed settings and status of device interfaces. show ip route - contains a routing table. show ip interface brief - summarized of the up/down table of all devices interfaces. show arp - contains of the address resolution table. show vlan - summery of VLANS and across ports on a switch. show ip cache flow - summery of the natflow accounting statistics. show running-config - current configuration of the device.

Troubleshooting Methods

8.1.3.5

Step 2 - Check for Duplex Mismatches

Another common cause for interface errors is a mismatched duplex mode between two ends of an Ethernet link. In many Ethernet-based networks, point-to-point connections are now the norm, and the use of hubs and the associated half-duplex operation is becoming less common. This means that most Ethernet links today operate in full-duplex mode, and while collisions were seen as normal for an Ethernet link, collisions today often indicate that duplex negotiation has failed, and the link is not operating in the correct duplex mode. The IEEE 802.3ab Gigabit Ethernet standard mandates the use of autonegotiation for speed and duplex. In addition, although it is not strictly mandatory, practically all Fast Ethernet NICs also use autonegotiation by default. The use of autonegotiation for speed and duplex is the current recommended practice. However, if duplex negotiation fails for some reason, it might be necessary to set the speed and duplex manually on both ends. Typically, this would mean setting the duplex mode to full-duplex on both ends of the connection. If this does not work, running half-duplex on both ends is preferred over a duplex mismatch. Duplex configuration guidelines include: Autonegotiation of speed and duplex is recommended. If autonegotiation fails, manually set the speed and duplex on interconnecting ends. Point-to-point Ethernet links should always run in full-duplex mode. Half-duplex is uncommon and typically encountered only when legacy hubs are used. Troubleshooting Example In the previous scenario, the network administrator needed to add additional users to the network. To incorporate these new users, the network administrator installed a second switch and connected it to the first. Soon after S2 was added to the network, users on both switches began experiencing significant performance problems connecting with devices on the other switch, as shown in Figure 1. The network administrator notices a console message on switch S2: *Mar 1 00:45:08.756: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on FastEthernet0/20 (not half duplex), with Switch FastEthernet0/20 (half duplex). Using the show interfaces fa 0/20 command, the network administrator examines the interface on S1 used to connect to S2 and notices it is set to full-duplex, as shown in Figure 2. The network administrator now examines the other side of the connection, the port on S2. Figure 3 shows that this side of the connection has been configured for half-duplex. The network administrator corrects the setting to duplex auto to automatically negotiate the duplex. Because the port on S1 is set to full-duplex, S2 also uses full-duplex. The users report that there are no longer any performance problems.

Components of Troubleshooting End-to-End Connectivity

Diagnosing and solving problems is an essential skill for network administrators. There is no single recipe for troubleshooting, and a particular problem can be diagnosed in many different ways. However, by employing a structured approach to the troubleshooting process, an administrator can reduce the time it takes to diagnose and solve a problem. Throughout this topic, the following scenario is used. The client host PC1 is unable to access applications on Server SRV1 or Server SRV2. The figure shows the topology of this network. PC1 uses SLAAC with EUI-64 to create its IPv6 global unicast address. EUI-64 creates the Interface ID using the Ethernet MAC address, inserting FFFE in the middle, and flipping the seventh bit. When there is no end-to-end connectivity, and the administrator chooses to troubleshoot with a bottom-up approach, these are common steps the administrator can take: Step 1. Check physical connectivity at the point where network communication stops. This includes cables and hardware. The problem might be with a faulty cable or interface, or involve misconfigured or faulty hardware. Step 2. Check for duplex mismatches. Step 3. Check data link and network layer addressing on the local network. This includes IPv4 ARP tables, IPv6 neighbor tables, MAC address tables, and VLAN assignments. Step 4. Verify that the default gateway is correct. Step 5. Ensure that devices are determining the correct path from the source to the destination. Manipulate the routing information if necessary. Step 6. Verify the transport layer is functioning properly. Telnet can also be used to test transport layer connections from the command line. Step 7. Verify that there are no ACLs blocking traffic. Step 8. Ensure that DNS settings are correct. There should be a DNS server that is accessible. The outcome of this process is operational, end-to-end connectivity. If all of the steps have been performed without any resolution, the network administrator may either want to repeat the previous steps or escalate the problem to a senior administrator.

Documenting the Network

For network administrators to be able to monitor and troubleshoot a network, they must have a complete set of accurate and current network documentation. This documentation includes: Configuration files, including network configuration files and end-system configuration files Physical and logical topology diagrams A baseline performance levels Network documentation allows network administrators to efficiently diagnose and correct network problems, based on the network design and the expected performance of the network under normal operating conditions. All network documentation information should be kept in a single location either as hard copy or on the network on a protected server. Backup documentation should be maintained and kept in a separate location. Network Configuration Files Network configuration files contain accurate, up-to-date records of the hardware and software used in a network. Within the network configuration files a table should exist for each network device used on the network, containing all relevant information about that device. For example, Figure 1 displays a sample network configuration table for two routers while Figure 2 displays a similar table for a LAN switch. Information that could be captured within a device table includes: Type of device, model designation IOS image name Device network hostname Location of the device (building, floor, room, rack, panel) If modular, include each module type and slot number Data link layer addresses Network layer addresses Any additional important information about physical aspects of the device End-system Configuration Files End-system configuration files focus on the hardware and software used in end-system devices, such as servers, network management consoles, and user workstations. An incorrectly configured end system can have a negative impact on the overall performance of a network. For this reason, having a sample baseline record of the hardware and software used on devices, and recorded in end-system documentation as shown in Figure 3 can be very useful when troubleshooting. For troubleshooting purposes, the following information could be documented within the end-system configuration table: Device name (purpose) Operating system and version IPv4 and IPv6 addresses Subnet mask and prefix length Default gateway and DNS server Any high-bandwidth network applications used on the end system

Benefits for Establishing a Network Baseline

Identify where the most errors occur. Locate areas of the network that are most heavily used. investigate if network can meet the identified policies and use requirements. identify parts of the network that are used. establish traffic pattens and loads for normal average day.

Step 4 - Verify Default Gateway

If there is no detailed route on the router or if the host is configured with the wrong default gateway, then communication between two endpoints in different networks does not work. Figure 1 illustrates that PC1 uses R1 as its default gateway. Similarly, R1 uses R2 as its default gateway or gateway of last resort. If a host needs access to resources beyond the local network, the default gateway must be configured. The default gateway is the first router on the path to destinations beyond the local network. Troubleshooting Example 1 Figure 2 shows the show ip route Cisco IOS command and the route print Windows command to verify the presence of the IPv4 default gateway. In this example, the R1 router has the correct default gateway, which is the IPv4 address of the R2 router. However, PC1 has the wrong default gateway. PC1 should have the default gateway of R1 router 10.1.10.1. This must be configured manually if the IPv4 addressing information was manually configured on PC1. If the IPv4 addressing information was obtained automatically from a DHCPv4 server, then the configuration on the DHCP server must be examined. A configuration problem on a DHCP server usually affects multiple clients. Troubleshooting Example 2 In IPv6, the default gateway can be configured manually, using stateless autoconfiguration (SLAAC), or by using DHCPv6. With SLAAC, the default gateway is advertised by the router to hosts using ICMPv6 Router Advertisement (RA) messages. The default gateway in the RA message is the link-local IPv6 address of a router interface. If the default gateway is configured manually on the host, which is very unlikely, the default gateway can be set either to the global IPv6 address or to the link-local IPv6 address. As shown in Figure 3, use the show ipv6 route Cisco IOS command to check for the IPv6 default route on R1 and use the ipconfig Windows command to verify if a PC has an IPv6 default gateway. R1 has a default route via router R2, but notice the ipconfig command reveals the absence of an IPv6 global unicast address and an IPv6 default gateway. PC1 is enabled for IPv6 because it has an IPv6 link-local address. The link-local address is automatically created by the device. Checking the network documentation, the network administrator confirms that hosts on this LAN should be receiving their IPv6 address information from the router using SLAAC. Note: In this example, other devices on the same LAN using SLAAC would also experience the same problem receiving IPv6 address information. Using the show ipv6 interface GigabitEthernet 0/0 command in Figure 4, it can be seen that although the interface has an IPv6 address, it is not a member of the All-IPv6-Routers multicast group FF02::2. This means the router is not enabled as an IPv6 router. Therefore, it is not sending out ICMPv6 RAs on this interface. In Figure 5, R1 is enabled as an IPv6 router using the ipv6 unicast-routing command. The show ipv6 interface GigabitEthernet 0/0 command now reveals that R1 is a member of FF02::2, the All-IPv6-Routers multicast group. To verify that PC1 has the default gateway set, use the ipconfig command on the Microsoft Windows PC or the ifconfig command on Linux and Mac OS X. In Figure 6, PC1 has an IPv6 global unicast address and an IPv6 default gateway. The default gateway is set to the link-local address of router R1, FE80::1.

Other Troubleshooting Methods

In addition to the systematic, layered approach to troubleshooting, there are also, less-structured troubleshooting approaches. One troubleshooting approach is based on an educated guess by the network administrator, based on the symptoms of the problem. This method is more successfully implemented by seasoned network administrators, because seasoned network administrators rely on their extensive knowledge and experience to decisively isolate and solve network issues. With a less-experienced network administrator, this troubleshooting method may be more like random troubleshooting. Another approach involves comparing a working and non-working situation, and spotting significant differences, including: Configurations Software versions Hardware and other device properties Using this method may lead to a working solution, but without clearly revealing the cause of the problem. This method can be helpful when the network administrator is lacking an area of expertise, or when the problem needs to be resolved quickly. After the fix has been implemented, the network administrator can do further research on the actual cause of the problem. Substitution is another quick troubleshooting methodology. It involves swapping the problematic device with a known, working one. If the problem is fixed, that the network administrator knows the problem is with the removed device. If the problem remains, then the cause may be elsewhere. In specific situations, this can be an ideal method for quick problem resolution, such as when a critical single point of failure, like a border router, goes down. It may be more beneficial to simply replace the device and restore service, rather than troubleshoot the issue.

Questioning End Users

In many cases the problem is reported by an end user. The information may often be vague or misleading, such as, "The network is down" or "I cannot access my email". In these cases, the problem must be better defined. This may require asking questions of the end users. Use effective questioning techniques when asking the end users about a network problem they may be experiencing. This will help you to get the information required to document the symptoms of a problem.

Application Layer Troubleshooting

Most of the application layer protocols provide user services. Application layer protocols are typically used for network management, file transfer, distributed file services, terminal emulation, and email. New user services are often added, such as VPNs and VoIP. The figure shows the most widely known and implemented TCP/IP application layer protocols include: SSH/Telnet - Enables users to establish terminal session connections with remote hosts. HTTP - Supports the exchanging of text, graphic images, sound, video, and other multimedia files on the web. FTP - Performs interactive file transfers between hosts. TFTP - Performs basic interactive file transfers typically between hosts and networking devices. SMTP - Supports basic message delivery services. POP - Connects to mail servers and downloads email. Simple Network Management Protocol (SNMP) - Collects management information from network devices. DNS - Maps IP addresses to the names assigned to network devices. Network File System (NFS) - Enables computers to mount drives on remote hosts and operate them as if they were local drives. Originally developed by Sun Microsystems, it combines with two other application layer protocols, external data representation (XDR) and remote-procedure call (RPC), to allow transparent access to remote network resources. The types of symptoms and causes depend upon the actual application itself. Application layer problems prevent services from being provided to application programs. A problem at the application layer can result in unreachable or unusable resources when the physical, data link, network, and transport layers are functional. It is possible to have full network connectivity, but the application simply cannot provide data. Another type of problem at the application layer occurs when the physical, data link, network, and transport layers are functional, but the data transfer and requests for network services from a single network service or application do not meet the normal expectations of a user. A problem at the application layer may cause users to complain that the network or the particular application that they are working with is sluggish or slower than usual when transferring data or requesting network services.

Transport Layer Troubleshooting - ACLs

Network problems can arise from transport layer problems on the router, particularly at the edge of the network where traffic is examined and modified. Two of the most commonly implemented transport layer technologies are access control lists (ACLs) and Network Address Translation (NAT), as shown in Figure 1. The most common issues with ACLs are caused by improper configuration, as shown in Figure 2. Problems with ACLs may cause otherwise working systems to fail. There are several areas where misconfigurations commonly occur: Selection of traffic flow - Traffic is defined by both the router interface through which the traffic is traveling and the direction in which this traffic is traveling. An ACL must be applied to the correct interface, and the correct traffic direction must be selected to function properly. Order of access control entries - The entries in an ACL should be from specific to general. Although an ACL may have an entry to specifically permit a particular traffic flow, packets never match that entry if they are being denied by another entry earlier in the list. If the router is running both ACLs and NAT, the order in which each of these technologies is applied to a traffic flow is important. Inbound traffic is processed by the inbound ACL before being processed by outside-to-inside NAT. Outbound traffic is processed by the outbound ACL after being processed by inside-to-outside NAT. Implicit deny any - When high security is not required on the ACL, this implicit access control element can be the cause of an ACL misconfiguration. Addresses and IPv4 wildcard masks - Complex IPv4 wildcard masks provide significant improvements in efficiency, but are more subject to configuration errors. An example of a complex wildcard mask is using the IPv4 address 10.0.32.0 and wildcard mask 0.0.32.15 to select the first 15 host addresses in either the 10.0.0.0 network or the 10.0.32.0 network. Selection of transport layer protocol - When configuring ACLs, it is important that only the correct transport layer protocols be specified. Many network administrators, when unsure whether a particular traffic flow uses a TCP port or a UDP port, configure both. Specifying both opens a hole through the firewall, possibly giving intruders an avenue into the network. It also introduces an extra element into the ACL, so the ACL takes longer to process, introducing more latency into network communications. Source and destination ports - Properly controlling the traffic between two hosts requires symmetric access control elements for inbound and outbound ACLs. Address and port information for traffic generated by a replying host is the mirror image of address and port information for traffic generated by the initiating host. Use of the established keyword - The established keyword increases the security provided by an ACL. However, if the keyword is applied incorrectly, unexpected results may occur. Uncommon protocols - Misconfigured ACLs often cause problems for protocols other than TCP and UDP. Uncommon protocols that are gaining popularity are VPN and encryption protocols. The log keyword is a useful command for viewing ACL operation on ACL entries. This keyword instructs the router to place an entry in the system log whenever that entry condition is matched. The logged event includes details of the packet that matched the ACL element. The log keyword is especially useful for troubleshooting and also provides information on intrusion attempts being blocked by the ACL.

commands for data collection

Show version show ip interface brief show ipv6 interface brief show interfaces show ip route show ipv6 route show arp shoe ipv6 neigbours show running-config show port show VLan show tech-support show ip cache flow

Physical Layer Troubleshooting

The physical layer transmits bits from one computer to another and regulates the transmission of a stream of bits over the physical medium. The physical layer is the only layer with physically tangible properties, such as wires, cards, and antennas. Issues on a network often present as performance problems. Performance problems mean that there is a difference between the expected behavior and the observed behavior, and the system is not functioning as could be reasonably expected. Failures and suboptimal conditions at the physical layer not only inconvenience users but can impact the productivity of the entire company. Networks that experience these kinds of conditions usually shut down. Because the upper layers of the OSI model depend on the physical layer to function, a network administrator must have the ability to effectively isolate and correct problems at this layer. Common symptoms of network problems at the physical layer include: Performance lower than baseline - The most common reasons for slow or poor performance include overloaded or underpowered servers, unsuitable switch or router configurations, traffic congestion on a low-capacity link, and chronic frame loss. Loss of connectivity - If a cable or device fails; the most obvious symptom is a loss of connectivity between the devices that communicate over that link or with the failed device or interface. This is indicated by a simple ping test. Intermittent loss of connectivity can indicate a loose or oxidized connection. Network bottlenecks or congestion - If a router, interface, or cable fails, routing protocols may redirect traffic to other routes that are not designed to carry the extra capacity. This can result in congestion or bottlenecks in those parts of the network. High CPU utilization rates - High CPU utilization rates are a symptom that a device, such as a router, switch, or server, is operating at or exceeding its design limits. If not addressed quickly, CPU overloading can cause a device to shut down or fail. Console error messages - Error messages reported on the device console could indicate a physical layer problem. Issues that commonly cause network problems at the physical layer include: Power-related - Power-related issues are the most fundamental reason for network failure. Also, check the operation of the fans, and ensure that the chassis intake and exhaust vents are clear. If other nearby units have also powered down, suspect a power failure at the main power supply. Hardware faults - Faulty network interface cards (NICs) can be the cause of network transmission errors due to late collisions, short frames, and jabber. Jabber is often defined as the condition in which a network device continually transmits random, meaningless data onto the network. Other likely causes of jabber are faulty or corrupt NIC driver files, bad cabling, or grounding problems. Cabling faults - Many problems can be corrected by simply reseating cables that have become partially disconnected. When performing a physical inspection, look for damaged cables, improper cable types, and poorly crimped RJ-45 connectors. Suspect cables should be tested or exchanged with a known functioning cable. Attenuation - Attenuation can be caused if a cable length exceeds the design limit for the media, or when there is a poor connection resulting from a loose cable or dirty or oxidized contacts. If attenuation is severe, the receiving device cannot always successfully distinguish one bit in the data stream from another bit. Noise - Local electromagnetic interference (EMI) is commonly known as noise. Noise can be generated by many sources, such as FM radio stations, police radio, building security, and avionics for automated landing, crosstalk (noise induced by other cables in the same pathway or adjacent cables), nearby electric cables, devices with large electric motors, or anything that includes a transmitter more powerful than a cell phone. Interface configuration errors - Many things can be misconfigured on an interface to cause it to go down, such as incorrect clock rate, incorrect clock source, and interface not being turned on. This causes a loss of connectivity with attached network segments. Exceeding design limits - A component may be operating suboptimally at the physical layer because it is being utilized beyond specifications or configured capacity. When troubleshooting this type of problem, it becomes evident that resources for the device are operating at or near the maximum capacity and there is an increase in the number of interface errors. CPU overload - Symptoms include processes with high CPU utilization percentages, input queue drops, slow performance, SNMP timeouts, no remote access, or services such as DHCP, Telnet, and ping are slow or fail to respond. On a switch the following could occur: spanning tree reconvergence, EtherChannel links bounce, UDLD flapping, IP SLAs failures. For routers, there could be no routing updates, route flapping, or HSRP flapping. One of the causes of CPU overload in a router or switch is high traffic. If one or more interfaces are regularly overloaded with traffic, consider redesigning the traffic flow in the network or upgrading the hardware.

Establishing a Network Baseline

The purpose of network monitoring is to watch network performance in comparison to a predetermined baseline. A baseline is used to establish normal network or system performance. Establishing a network performance baseline requires collecting performance data from the ports and devices that are essential to network operation. The figure shows several questions that a baseline should answer. Measuring the initial performance and availability of critical network devices and links allows a network administrator to determine the difference between abnormal behavior and proper network performance as the network grows or traffic patterns change. The baseline also provides insight into whether the current network design can meet business requirements. Without a baseline, no standard exists to measure the optimum nature of network traffic and congestion levels. Analysis after an initial baseline also tends to reveal hidden problems. The collected data shows the true nature of congestion or potential congestion in a network. It may also reveal areas in the network that are underutilized and quite often can lead to network redesign efforts, based on quality and capacity observations.

Hardware Troubleshooting Tools

There are multiple types of hardware troubleshooting tools. Common hardware troubleshooting tools include: Digital Multimeters - Digital multimeters (DMMs), such as the Fluke 179 shown in Figure 1, are test instruments that are used to directly measure electrical values of voltage, current, and resistance. In network troubleshooting, most tests that would need a multimeter involve checking power supply voltage levels and verifying that network devices are receiving power. Cable Testers - Cable testers are specialized, handheld devices designed for testing the various types of data communication cabling. Figure 2 displays the Fluke LinkRunner AT Network Auto-Tester. Cable testers can be used to detect broken wires, crossed-over wiring, shorted connections, and improperly paired connections. These devices can be inexpensive continuity testers, moderately priced data cabling testers, or expensive time-domain reflectometers (TDRs). TDRs are used to pinpoint the distance to a break in a cable. These devices send signals along the cable and wait for them to be reflected. The time between sending the signal and receiving it back is converted into a distance measurement. The TDR function is normally packaged with data cabling testers. TDRs used to test fiber optic cables are known as optical time-domain reflectometers (OTDRs). Cable Analyzers - Cable analyzers, such as the Fluke DTX Cable Analyzer in Figure 3, are multifunctional handheld devices that are used to test and certify copper and fiber cables for different services and standards. The more sophisticated tools include advanced troubleshooting diagnostics that measure the distance to a performance defect such as near-end crosstalk (NEXT) or return loss (RL), identify corrective actions, and graphically display crosstalk and impedance behavior. Cable analyzers also typically include PC-based software. After field data is collected, the data from the handheld device can be uploaded so that the network administrator can create up-to-date reports. Portable Network Analyzers - Portable devices like the Fluke OptiView in Figure 4 are used for troubleshooting switched networks and VLANs. By plugging the network analyzer in anywhere on the network, a network engineer can see the switch port to which the device is connected, and the average and peak utilization. The analyzer can also be used to discover VLAN configuration, identify top network talkers, analyze network traffic, and view interface details. The device can typically output to a PC that has network monitoring software installed for further analysis and troubleshooting. Network Analysis Module -The Cisco NAM is a device or software as shown in Figure 5. It provides an embedded browser-based interface that generates reports on the traffic that consumes critical network resources. It displays a graphical representation of traffic from local and remote switches and routers such as seen in Figure 6. In addition, the NAM can capture and decode packets and track response times to pinpoint an application problem to a particular network or server.

Sample IP SLA Configuration

To help understand how to configure a simple IP SLA, refer to the topology in Figure 1. The configuration in Figure 2 configures an IP SLA operation with an operation number of 1. Multiple IP SLA operations may be configured on a device. Each operation can be referred to by its operation-number. The icmp-echo command identifies the destination address to be monitored. In the example, it is set to monitor R3's S1 interface. The frequency command is setting the IP SLA rate to 30 second intervals. The ip sla schedule command is scheduling the IP SLA operation number 1 to start immediately (now) and continue until manually cancelled (forever). Note: Use the no ip sla schedule operation-number command to cancel the SLA operation. The SLA operation configuration is preserved and can be rescheduled when needed.

Guidelines for Selecting a Troubleshooting Method

To quickly resolve network problems, take the time to select the most effective network troubleshooting method. The figure illustrates this process. The following is an example of how to choose a troubleshooting method based on a specific problem: Two IP routers are not exchanging routing information. The last time this type of problem occurred it was a protocol issue. Therefore, choose the divide-and-conquer troubleshooting method. Analysis reveals that there is connectivity between the routers. Start the troubleshooting process at the physical or data link layer. Confirm connectivity and begin testing the TCP/IP-related functions at the next layer up in the OSI model, the network layer.

Data Link Layer Troubleshooting

Troubleshooting Layer 2 problems can be a challenging process. The configuration and operation of these protocols are critical to creating a functional, well-tuned network. Layer 2 problems cause specific symptoms that, when recognized, will help identify the problem quickly. Common symptoms of network problems at the data link layer include: No functionality or connectivity at the network layer or above - Some Layer 2 problems can stop the exchange of frames across a link, while others only cause network performance to degrade. Network is operating below baseline performance levels - There are two distinct types of suboptimal Layer 2 operation that can occur in a network. First, the frames take a suboptimal path to their destination but do arrive. In this case, the network might experience high-bandwidth usage on links that should not have that level of traffic. Second, some frames are dropped. These problems can be identified through error counter statistics and console error messages that appear on the switch or router. In an Ethernet environment, an extended or continuous ping also reveals if frames are being dropped. Excessive broadcasts - Operating systems use broadcasts and multicasts extensively to discover network services and other hosts. Generally, excessive broadcasts result from one of the following situations: poorly programmed or configured applications, large Layer 2 broadcast domains, or underlying network problems, such as STP loops or route flapping. Console messages - In some instances, a router recognizes that a Layer 2 problem has occurred and sends alert messages to the console. Typically, a router does this when it detects a problem with interpreting incoming frames (encapsulation or framing problems) or when keepalives are expected but do not arrive. The most common console message that indicates a Layer 2 problem is a line protocol down message. Issues at the data link layer that commonly result in network connectivity or performance problems include: Encapsulation errors - An encapsulation error occurs because the bits placed in a particular field by the sender are not what the receiver expects to see. This condition occurs when the encapsulation at one end of a WAN link is configured differently from the encapsulation used at the other end. Address mapping errors - In topologies, such as point-to-multipoint or broadcast Ethernet, it is essential that an appropriate Layer 2 destination address be given to the frame. This ensures its arrival at the correct destination. To achieve this, the network device must match a destination Layer 3 address with the correct Layer 2 address using either static or dynamic maps. In a dynamic environment, the mapping of Layer 2 and Layer 3 information can fail because devices may have been specifically configured not to respond to ARP requests, the Layer 2 or Layer 3 information that is cached may have physically changed, or invalid ARP replies are received because of a misconfiguration or a security attack. Framing errors - Frames usually work in groups of 8-bit bytes. A framing error occurs when a frame does not end on an 8-bit byte boundary. When this happens, the receiver may have problems determining where one frame ends and another frame starts. Too many invalid frames may prevent valid keepalives from being exchanged. Framing errors can be caused by a noisy serial line, an improperly designed cable (too long or not properly shielded), faulty NIC, duplex mismatch, or an incorrectly configured channel service unit (CSU) line clock. STP failures or loops - The purpose of the Spanning Tree Protocol (STP) is to resolve a redundant physical topology into a tree-like topology by blocking redundant ports. Most STP problems are related to forwarding loops that occur when no ports in a redundant topology are blocked and traffic is forwarded in circles indefinitely, excessive flooding because of a high rate of STP topology changes. A topology change should be a rare event in a well-configured network. When a link between two switches goes up or down, there is eventually a topology change when the STP state of the port is changing to or from forwarding. However, when a port is flapping (oscillating between up and down states), this causes repetitive topology changes and flooding, or slow STP convergence or re-convergence. This can be caused by a mismatch between the real and documented topology, a configuration error, such as an inconsistent configuration of STP timers, an overloaded switch CPU during convergence, or a software defect.

General Troubleshooting Procedures

Troubleshooting takes a large portion of network administrators' and support personnel's time. Using efficient troubleshooting techniques shortens overall troubleshooting time when working in a production environment. There are three major stages to the troubleshooting process: Stage 1. Gather symptoms - Troubleshooting begins with gathering and documenting symptoms from the network, end systems, and users (Figure 1). In addition, the network administrator determines which network components have been affected and how the functionality of the network has changed compared to the baseline. Symptoms may appear in many different forms, including alerts from the network management system, console messages, and user complaints. While gathering symptoms, it is important that the network administrator ask questions and investigate the issue in order to localize the problem to a smaller range of possibilities. For example, is the problem restricted to a single device, a group of devices, or an entire subnet or network of devices? Stage 2. Isolate the problem - Isolating is the process of eliminating variables until a single problem, or a set of related problems has been identified as the cause (Figure 2). To do this, the network administrator examines the characteristics of the problems at the logical layers of the network so that the most likely cause can be selected. At this stage, the network administrator may gather and document more symptoms, depending on the characteristics that are identified. Stage 3. Implement corrective action - Having identified the cause of the problem, the network administrator works to correct the problem by implementing, testing, and documenting possible solutions (Figure 3). After finding the problem and determining a solution, the network administrator may need to decide if the solution can be implemented immediately or if it must be postponed. This depends on the impact of the changes on the users and the network. The severity of the problem should be weighed against the impact of the solution. For example, if a critical server or router must be offline for a significant amount of time, it may be better to wait until the end of the workday to implement the fix. Sometimes, a workaround can be created until the actual problem is resolved. This is typically part of a company's change control procedures. If the corrective action creates another problem or does not solve the problem, the attempted solution is documented, the changes are removed, and the network administrator returns to gathering symptoms and isolating the issue. These stages are not mutually exclusive. At any point in the process, it may be necessary to return to previous stages. For instance, the network administrator may need to gather more symptoms while isolating a problem. Additionally, when attempting to correct a problem, another problem could be created. In this instance, remove changes and begin troubleshooting again. A troubleshooting policy, including change control procedures which documents the change made and who made the change, should be established for each stage. A policy provides a consistent manner in which to perform each stage. Part of the policy should include documenting every important piece of information. Communicate to the users and anyone involved in the troubleshooting process that the problem has been resolved. Other IT team members should be informed of the solution. Appropriate documentation of the cause and the fix will assist other support technicians in preventing and solving similar problems in the future.

Gathering Symptoms

hen gathering symptoms, it is important that the administrator gather facts and evidence to progressively eliminate possible causes, and eventually identify the root cause of the issue. By analyzing the information, the network administrator formulates a hypothesis to propose possible causes and solutions, while eliminating others. As shown in Figure 1, there are five information gathering steps. Step 1. Gather information - Gather information from the trouble ticket, users, or end systems affected by the problem to form a definition of the problem. Step 2. Determine ownership - If the problem is within the control of the organization, move onto the next stage. If the problem is outside the boundary of the organization's control (for example, lost Internet connectivity outside of the autonomous system), contact an administrator for the external system before gathering additional network symptoms. Step 3. Narrow the scope - Determine if the problem is at the core, distribution, or access layer of the network. At the identified layer, analyze the existing symptoms and use your knowledge of the network topology to determine which piece of equipment is the most likely cause. Step 4. Gather symptoms from suspect devices - Using a layered troubleshooting approach, gather hardware and software symptoms from the suspect devices. Start with the most likely possibility and use knowledge and experience to determine if the problem is more likely a hardware or software configuration problem. Step 5. Document symptoms - Sometimes the problem can be solved using the documented symptoms. If not, begin the isolating stage of the general troubleshooting process. To gather symptoms from suspected networking device, use Cisco IOS commands and other tools including: ping, traceroute, and telnet commands show and debug commands packet captures device logs The table in Figure 2 describes common Cisco IOS commands used to gather the symptoms of a network problem. Note: Although the debug command is an important tool for gathering symptoms, it generates a large amount of console message traffic and the performance of a network device can be noticeably affected. If the debug must be performed during normal working hours, warn network users that a troubleshooting effort is underway and that network performance may be affected. Remember to disable debugging when you are done.

Step 3 - Verify Layer 2 and Layer 3 Addressing on the Local Network

hen troubleshooting end-to-end connectivity, it is useful to verify mappings between destination IP addresses and Layer 2 Ethernet addresses on individual segments. In IPv4, this functionality is provided by ARP. In IPv6, the ARP functionality is replaced by the neighbor discovery process and ICMPv6. The neighbor table caches IPv6 addresses and their resolved Ethernet physical (MAC) addresses. IPv4 ARP Table The arp Windows command displays and modifies entries in the ARP cache that are used to store IPv4 addresses and their resolved Ethernet physical (MAC) addresses. As shown in Figure 1, the arp Windows command lists all devices that are currently in the ARP cache. The information that is displayed for each device includes the IPv4 address, physical (MAC) address, and the type of addressing (static or dynamic). The cache can be cleared by using the arp -d Windows command if the network administrator wants to repopulate the cache with updated information. Note: The arp commands in Linux and MAC OS X have a similar syntax. IPv6 Neighbor Table As shown in Figure 2, the netsh interface ipv6 show neighbor Windows command lists all devices that are currently in the neighbor table. The information that is displayed for each device includes the IPv6 address, physical (MAC) address, and the type of addressing. By examining the neighbor table, the network administrator can verify that destination IPv6 addresses map to correct Ethernet addresses. The IPv6 link-local addresses on all of R1's interfaces have been manually configured to FE80::1. Similarly, R2 has been configured with the link-local address of FE80::2 on its interfaces and R3 has been configured with the link-local address of FE80::3 on its interfaces. Remember, link-local addresses only have to be unique on the link or network. Note: The neighbor table for Linux and MAC OS X can be displayed using ip neigh show command. Figure 3 shows an example of the neighbor table on the Cisco IOS router, using the show ipv6 neighbors command. Note: The neighbor states for IPv6 are more complex than the ARP table states in IPv4. Additional information is contained in RFC 4861. Switch MAC Address Table When a destination MAC address is found in the switch MAC address table, the switch forwards the frame only to the port that has the device that has that particular MAC address. To do this, the switch consults its MAC address table. The MAC address table lists the MAC address connected to each port. Use the show mac address-table command to display the MAC address table on the switch. An example of a switch MAC address table is shown in Figure 4. Notice how the MAC address for PC1, a device in VLAN 10, has been discovered along with the S1 switch port to which PC1 attaches. Remember, a switch's MAC address table only contains Layer 2 information, including the Ethernet MAC address and the port number. IP address information is not included. VLAN Assignment Another issue to consider when troubleshooting end-to-end connectivity is VLAN assignment. In the switched network, each port in a switch belongs to a VLAN. Each VLAN is considered a separate logical network, and packets destined for stations that do not belong to the VLAN must be forwarded through a device that supports routing. If a host in one VLAN sends a broadcast Ethernet frame, such as an ARP request, all hosts in the same VLAN receive the frame; hosts in other VLANs do not. Even if two hosts are in the same IP network, they will not be able to communicate if they are connected to ports assigned to two separate VLANs. Additionally, if the VLAN to which the port belongs is deleted, the port becomes inactive. All hosts attached to ports belonging to the VLAN that was deleted are unable to communicate with the rest of the network. Commands such as show vlan can be used to validate VLAN assignments on a switch. Troubleshooting Example Refer to the topology in Figure 5. To improve the wire management in the wiring closet, the cables connecting to S1 were reorganized. Almost immediately afterward, users started calling the support desk stating that they could no longer reach devices outside their own network. An examination of PC1's ARP table using the arp Windows command shows that the ARP table no longer contains an entry for the default gateway 10.1.10.1, as shown in Figure 6. There were no configuration changes on the router, so S1 is the focus of the troubleshooting. The MAC address table for S1, as shown in Figure 7, shows that the MAC address for R1 is on a different VLAN than the rest of the 10.1.10.0/24 devices including PC1. During the re-cabling, R1's patch cable was moved from Fa 0/4 on VLAN 10 to Fa 0/1 on VLAN 1. After the network administrator configured S1's Fa 0/1 port to be on VLAN 10, as shown in Figure 8, the problem was resolved. As shown in Figure 9, the MAC address table now shows VLAN 10 for the MAC address of R1 on port Fa 0/1.

Commands for Gathering Symptoms

ping Traceroute show ipv6 interface brief show running-config show ipv6 route debug ? show protocols telnet

Verifying an IP SLA Configuration

se the show ip sla configuration operation-number command to display configuration values including all defaults for IP SLA operations or for a specific operation. the show ip sla configuration command displays the IP SLA ICMP Echo configuration. Use the show ip sla statistics [operation-number] command to display the IP SLA operation monitoring statistics,

Identify the OSI Layer Associated with a Network Issue

OSI Layer Associated with a Network Layer 1 traffic is congested on a low capacity links and frames are lost. Layer 2 STP loops and route flipping are generating a broadcast storm. layer 3 the routing table is missing routes and has unknown networks listed. Layer 4 ACLs are mis configured and blocking all web traffic. SNMP messages are unable to traverse NAT. Layers 5,6 and 7 SSH error messages display unknown/untrusted certificates. The DNS server is not configured with the correct URL.

Protocol Analyzers

Protocol analyzers are useful to investigate packet content while flowing through the network. A protocol analyzer decodes the various protocol layers in a recorded frame and presents this information in a relatively easy to use format. The figure shows a screen capture of the Wireshark protocol analyzer. The information displayed by a protocol analyzer includes the physical, data link, protocol, and descriptions for each frame. Most protocol analyzers can filter traffic that meets certain criteria so that, for example, all traffic to and from a particular device can be captured. Protocol analyzers such as Wireshark can help troubleshoot network performance problems. It is important to have both a good understanding of TCP/IP and how to use a protocol analyzer to inspect information at each TCP/IP layer.

Software Troubleshooting Tools

A wide variety of software and hardware tools are available to make troubleshooting easier. These tools may be used to gather and analyze symptoms of network problems. They often provide monitoring and reporting functions that can be used to establish the network baseline. Common software troubleshooting tools include: Network Management System Tools Network management system (NMS) tools include device-level monitoring, configuration, and fault-management tools. Figure 1 shows an example display from the WhatsUp Gold NMS software. These tools can be used to investigate and correct network problems. Network monitoring software graphically displays a physical view of network devices, allowing network managers to monitor remote devices continuously and automatically. Device management software provides dynamic device status, statistics, and configuration information for key network devices. Knowledge Bases On-line network device vendor knowledge bases have become indispensable sources of information. When vendor-based knowledge bases are combined with Internet search engines like Google, a network administrator has access to a vast pool of experience-based information. Figure 2 shows the Cisco Tools & Resources page found at http://www.cisco.com. This page provides information on Cisco-related hardware and software. It contains troubleshooting procedures, implementation guides, and original white papers on most aspects of networking technology. Baselining Tools Many tools for automating the network documentation and baselining process are available. Figure 3 shows a screen capture of the SolarWinds Network Performance Monitor 12 baseline view. Baselining tools help with common documentation tasks. For example, they can draw network diagrams, help keep network software and hardware documentation up-to-date, and help to cost-effectively measure baseline network bandwidth use.

Using Layered Models for Troubleshooting

After all symptoms are gathered, if no solution is identified, the network administrator compares the characteristics of the problem to the logical layers of the network to isolate and solve the issue. Logical networking models, such as the OSI and TCP/IP models, separate network functionality into modular layers. These layered models can be applied to the physical network to isolate network problems when troubleshooting. For example, if the symptoms suggest a physical connection problem, the network technician can focus on troubleshooting the circuit that operates at the physical layer. If that circuit functions as expected, the technician looks at areas within another layer that could be causing the problem. OSI Reference Model The OSI reference model provides a common language for network administrators and is commonly used in troubleshooting networks. Problems are typically described in terms of a given OSI model layer. The OSI reference model describes how information from a software application in one computer moves through a network medium to a software application in another computer. The upper layers (5 to 7) of the OSI model deal with application issues and generally are implemented only in software. The application layer is closest to the end user. Both users and application layer processes interact with software applications that contain a communications component. The lower layers (1 to 4) of the OSI model handle data-transport issues. Layers 3 and 4 are generally implemented only in software. The physical layer (Layer 1) and data link layer (Layer 2) are implemented in hardware and software. The physical layer is closest to the physical network medium, such as the network cabling, and is responsible for actually placing information on the medium. Figure 1 shows some common devices and the OSI layers that must be examined during the troubleshooting process for that device. Notice that routers and multilayer switches are shown at Layer 4, the transport layer. Although routers and multilayer switches usually make forwarding decisions at Layer 3, ACLs on these devices can be used to make filtering decisions using Layer 4 information. TCP/IP Model Similar to the OSI networking model, the TCP/IP networking model also divides networking architecture into modular layers. Figure 2 shows how the TCP/IP networking model maps to the layers of the OSI networking model. It is this close mapping that allows the TCP/IP suite of protocols to successfully communicate with so many networking technologies. The application layer in the TCP/IP suite actually combines the functions of the three OSI model layers: session, presentation, and application. The application layer provides communication between applications, such as FTP, HTTP, and SMTP on separate hosts. The transport layers of TCP/IP and OSI directly correspond in function. The transport layer is responsible for exchanging segments between devices on a TCP/IP network. The TCP/IP Internet layer relates to the OSI network layer. The Internet layer is responsible for addressing used for data transfer from source to destination. The TCP/IP network access layer corresponds to the OSI physical and data link layers. The network access layer communicates directly with the network media and provides an interface between the architecture of the network and the Internet layer.

Step 1 - Verify the Physical Layer

All network devices are specialized computer systems. At a minimum, these devices consist of a CPU, RAM, and storage space, allowing the device to boot and run the operating system and interfaces. This allows for the reception and transmission of network traffic. When a network administrator determines that a problem exists on a given device, and that problem might be hardware-related, it is worthwhile to verify the operation of these generic components. The most commonly used Cisco IOS commands for this purpose are show processes cpu, show memory, and show interfaces. This topic discusses the show interfaces command. When troubleshooting performance-related issues and hardware is suspected to be at fault, the show interfaces command can be used to verify the interfaces through which the traffic passes. The output of the show interfaces command in the figure lists a number of important statistics that can be checked: Input queue drops - Input queue drops (and the related ignored and throttle counters) signify that at some point, more traffic was delivered to the router than it could process. This does not necessarily indicate a problem. That could be normal during traffic peaks. However, it could be an indication that the CPU cannot process packets in time, so if this number is consistently high, it is worth trying to spot at which moments these counters are increasing and how this relates to CPU usage. Output queue drops - Output queue drops indicate that packets were dropped due to congestion on the interface. Seeing output drops is normal for any point where the aggregate input traffic is higher than the output traffic. During traffic peaks, packets are dropped if traffic is delivered to the interface faster than it can be sent out. However, even if this is considered normal behavior, it leads to packet drops and queuing delays, so applications that are sensitive to those, such as VoIP, might suffer from performance issues. Consistently seeing output drops can be an indicator that you need to implement an advanced queuing mechanism to implement or modify QoS. Input errors - Input errors indicate errors that are experienced during the reception of the frame, such as CRC errors. High numbers of CRC errors could indicate cabling problems, interface hardware problems, or, in an Ethernet-based network, duplex mismatches. Output errors - Output errors indicate errors, such as collisions, during the transmission of a frame. In most Ethernet-based networks today, full-duplex transmission is the norm, and half-duplex transmission is the exception. In full-duplex transmission, operation collisions cannot occur; therefore, collisions and especially late collisions often indicate duplex mismatches.

IP SLA Configuration

Instead of using ping manually, a network engineer can use the IP SLA ICMP Echo operation to test the availability of network devices. A network device can be any device with IP capabilities (router, switch, PC, server, etc.). The IP SLA ICMP Echo operation provides the following measurements: Availability monitoring (packet loss statistics) Performance monitoring (latency and response time) Network operation (end-to-end connectivity) To verify that the desired IP SLA operation is supported on the source device, use the show ip sla application privileged EXEC mode command. The output generated in the figure confirms that R1 is capable of supporting IP SLA. However, there are currently no sessions configured. To create an IP SLA operation and enter IP SLA configuration mode, use the ip sla operation-number global configuration command. The operation number is a unique number used to identify the operation being configured. From IP SLA configuration mode, you can configure the IP SLA operation as an ICMP Echo operation and enter ICMP echo configuration mode using the following command: Router(config-ip-sla)# icmp-echo { dest-ip-address | dest-hostname } [ source-ip { ip-address | hostname } | source-interface interface-id ] Next, set the rate at which a specified IP SLA operation repeats using the frequency seconds command. The range is from 1 to 604800 seconds and the default is 60 seconds. To schedule the IP SLA operation, use the following global configuration command: Router(config)# ip sla schedule operation-number [ life { forever | seconds }] [ start-time { hh : mm [: ss ] [ month day | day month ] | pending | now | after hh:mm:ss ] [ ageout seconds ] [ recurring ]

Network Topology Diagrams

Network Topology Diagrams Network topology diagrams keep track of the location, function, and status of devices on the network. There are two types of network topology diagrams: the physical topology and the logical topology. Physical Topology A physical network topology shows the physical layout of the devices connected to the network. It is necessary to know how devices are physically connected to troubleshoot physical layer problems. Information recorded on the diagram typically includes: Device type Model and manufacturer Operating system version Cable type and identifier Cable specification Connector type Cabling endpoints Figure 1 shows a sample physical network topology diagram. Logical Topology A logical network topology illustrates how devices are logically connected to the network, meaning how devices actually transfer data across the network when communicating with other devices. Symbols are used to represent network elements, such as routers, servers, hosts, VPN concentrators, and security devices. Additionally, connections between multiple sites may be shown, but do not represent actual physical locations. Information recorded on a logical network diagram may include: Device identifiers IP address and prefix lengths Interface identifiers Connection type Frame Relay DLCI for virtual circuits (if applicable) Site-to-site VPNs Routing protocols Static routes Data-link protocols WAN technologies used

Network Layer Troubleshooting

Network layer problems include any problem that involves a Layer 3 protocol, both routed protocols (such as IPv4 or IPv6) and routing protocols (such as EIGRP, OSPF, etc.). Common symptoms of network problems at the network layer include: Network failure - Network failure is when the network is nearly or completely non-functional, affecting all users and applications on the network. These failures are usually noticed quickly by users and network administrators, and are obviously critical to the productivity of a company. Suboptimal performance - Network optimization problems usually involve a subset of users, applications, destinations, or a particular type of traffic. Optimization issues can be difficult to detect and even harder to isolate and diagnose. This is because they usually involve multiple layers, or even a single host computer. Determining that the problem is a network layer problem can take time. In most networks, static routes are used in combination with dynamic routing protocols. Improper configuration of static routes can lead to less than optimal routing. In some cases, improperly configured static routes can create routing loops which make parts of the network unreachable. Troubleshooting dynamic routing protocols requires a thorough understanding of how the specific routing protocol functions. Some problems are common to all routing protocols, while other problems are particular to the individual routing protocol. There is no single template for solving Layer 3 problems. Routing problems are solved with a methodical process, using a series of commands to isolate and diagnose the problem. Here are some areas to explore when diagnosing a possible problem involving routing protocols: General network issues - Often a change in the topology, such as a down link, may have effects on other areas of the network that might not be obvious at the time. This may include the installation of new routes, static or dynamic, or removal of other routes. Determine whether anything in the network has recently changed, and if there is anyone currently working on the network infrastructure. Connectivity issues - Check for any equipment and connectivity problems, including power problems such as outages and environmental problems (for example, overheating). Also check for Layer 1 problems, such as cabling problems, bad ports, and ISP problems. Routing table - Check the routing table for anything unexpected, such as missing routes or unexpected routes. Use debug commands to view routing updates and routing table maintenance. Neighbor issues - If the routing protocol establishes an adjacency with a neighbor, check to see if there are any problems with the routers forming neighbor adjacencies. Topology database - If the routing protocol uses a topology table or database, check the table for anything unexpected, such as missing entries or unexpected entries.

useful information gathering cisco IOS commands

Ping traceroute Telnet ssh -1 show ip interface brief show ipv6 interface brief show ip route show ipv6 route show running-config [no] debug show protocols

Step 7 - Verify ACLs

On routers, there may be ACLs configured that prohibit protocols from passing through the interface in the inbound or outbound direction. Use the show ip access-lists command to display the contents of all IPv4 ACLs and the show ipv6 access-list command to show the contents of all IPv6 ACLs configured on a router. The specific ACL can be displayed by entering the ACL name or number as an option for this command; you can display a specific ACL. The show ip interfaces and show ipv6 interfaces commands display IPv4 and IPv6 interface information that indicates whether any IP ACLs are set on the interface. Troubleshooting Example To prevent spoofing attacks, the network administrator decided to implement an ACL preventing devices with a source network address of 172.16.1.0/24 from entering the inbound S0/0/1 interface on R3, as shown in Figure 1. All other IP traffic should be allowed. However, shortly after implementing the ACL, users on the 10.1.10.0/24 network were unable to connect to devices on the 172.16.1.0/24 network, including SRV1. The show ip access-lists command shows that the ACL is configured correctly, as shown in Figure 2. The show ip interfaces serial 0/0/1 command reveals that the ACL was never applied to the inbound interface on Serial 0/0/1. Further investigation reveals that the ACL was accidentally applied to the G0/0 interface, blocking all outbound traffic from the 172.16.1.0/24 network. After correctly placing the IPv4 ACL on the Serial 0/0/1 inbound interface, as shown in Figure 3, devices are able to successfully connect to the server.

Steps to Establish a Network Baseline

Step 1. Determine what types of data to collect. When conducting the initial baseline, start by selecting a few variables that represent the defined policies. If too many data points are selected, the amount of data can be overwhelming, making analysis of the collected data difficult. Start out simply and fine-tune along the way. Some good starting measures are interface utilization and CPU utilization. Step 2. Identify devices and ports of interest. Use the network topology to identify those devices and ports for which performance data should be measured. Devices and ports of interest include: Network device ports that connect to other network devices Servers Key users Anything else considered critical to operations A logical network topology diagram can be useful in identifying key devices and ports to monitor. For example, in Figure 1 the network administrator has highlighted the devices and ports of interest to monitor during the baseline test. The devices of interest include PC1 (the Admin terminal), and SRV1 (the Web/TFTP server). The ports of interest include those ports on R1, R2, and R3 that connect to the other routers or to switches, and on R2, the port that connects to SRV1 (G0/0). By shortening the list of ports that are polled, the results are concise, and the network management load is minimized. Remember that an interface on a router or switch can be a virtual interface, such as a switch virtual interface (SVI). Step 3. Determine the baseline duration. The length of time and the baseline information being gathered must be sufficient for establishing a typical picture of the network. It is important that daily trends of network traffic are monitored. It is also important to monitor for trends that occur over a longer period of time, such as weekly or monthly. For this reason, when capturing data for analysis, the period specified should be, at a minimum, seven days long. Figure 2 shows examples of several screenshots of CPU utilization trends captured over a daily, weekly, monthly, and yearly period. In this example, notice that the work week trends are too short to reveal the recurring utilization surge every weekend on Saturday evening, when a database backup operation consumes network bandwidth. This recurring pattern is revealed in the monthly trend. A yearly trend as shown in the example may be too long of a duration to provide meaningful baseline performance details. However, it may help identify long term patterns which should be analyzed further. Typically, a baseline needs to last no more than six weeks, unless specific long-term trends need to be measured. Generally, a two-to-four-week baseline is adequate. Baseline measurements should not be performed during times of unique traffic patterns, because the data would provide an inaccurate picture of normal network operations. Baseline analysis of the network should be conducted on a regular basis. Perform an annual analysis of the entire network or baseline different sections of the network on a rotating basis. Analysis must be conducted regularly to understand how the network is affected by growth and other changes.

Using a Syslog Server for Troubleshooting

Syslog is a simple protocol used by an IP device known as a syslog client, to send text-based log messages to another IP device, the syslog server. Syslog is currently defined in RFC 5424. Implementing a logging facility is an important part of network security and for network troubleshooting. Cisco devices can log information regarding configuration changes, ACL violations, interface status, and many other types of events. Cisco devices can send log messages to several different facilities. Event messages can be sent to one or more of the following: Console - Console logging is on by default. Messages log to the console and can be viewed when modifying or testing the router or switch using terminal emulation software while connected to the console port of the network device. Terminal lines - Enabled EXEC sessions can be configured to receive log messages on any terminal lines. Similar to console logging, this type of logging is not stored by the network device and, therefore, is only valuable to the user on that line. Buffered logging - Buffered logging is a little more useful as a troubleshooting tool because log messages are stored in memory for a time. However, log messages are cleared when the device is rebooted. SNMP traps - Certain thresholds can be preconfigured on routers and other devices. Router events, such as exceeding a threshold, can be processed by the router and forwarded as SNMP traps to an external SNMP network management station. SNMP traps are a viable security logging facility, but require the configuration and maintenance of an SNMP system. Syslog - Cisco routers and switches can be configured to forward log messages to an external syslog service. This service can reside on any number of servers or workstations, including Microsoft Windows and Linux-based systems. Syslog is the most popular message logging facility, because it provides long-term log storage capabilities and a central location for all router messages. Cisco IOS log messages fall into one of eight levels, shown in Figure 1. The lower the level number, the higher the severity level. By default, all messages from level 0 to 7 are logged to the console. While the ability to view logs on a central syslog server is helpful in troubleshooting, sifting through a large amount of data can be an overwhelming task. The logging trap level command limits messages logged to the syslog server based on severity. The level is the name or number of the severity level. Only messages equal to or numerically lower than the specified level are logged. In the example in Figure 2, system messages from level 0 (emergencies) to 5 (notifications) are sent to the syslog server at 209.165.200.225.

Step 8 - Verify DNS

The DNS protocol controls the DNS, a distributed database with which you can map hostnames to IP addresses. When you configure DNS on the device, you can substitute the hostname for the IP address with all IP commands, such as ping or telnet. To display the DNS configuration information on the switch or router, use the show running-config command. When there is no DNS server installed, it is possible to enter names to IP mappings directly into the switch or router configuration. Use the ip host command to enter name to IPv4 mapping to the switch or router. The ipv6 host command is used for the same mappings using IPv6. These commands are demonstrated in Figure 1. Because IPv6 network numbers are long and difficult to remember, DNS is even more important for IPv6 than for IPv4. To display the name-to-IP-address mapping information on the Windows-based PC, use the nslookup command. Troubleshooting Example The output in Figure 2 indicates that either the client was unable to reach the DNS server or the DNS service on the 10.1.1.1 device was not running. At this point, the troubleshooting needs to focus on communications with the DNS server, or to verify the DNS server is running properly. To display the DNS configuration information on a Microsoft Windows PC, use the nslookup command. There should be DNS configured for IPv4, IPv6, or both. DNS can provide IPv4 and IPv6 addresses at the same time, regardless of the protocol that is used to access the DNS server. Because domain names and DNS are a vital component of accessing servers on the network, many times the user thinks the "network is down" when the problem is actually with the DNS server.

Step 5 - Verify Correct Path

Troubleshooting the Network Layer When troubleshooting, it is often necessary to verify the path to the destination network. Figure 1 shows the reference topology indicating the intended path for packets from PC1 to SRV1. In Figure 2, the show ip route command is used to examine the IPv4 routing table. The IPv4 and IPv6 routing tables can be populated by the following methods: Directly connected networks Local host or local routes Static routes Dynamic routes Default routes The process of forwarding IPv4 and IPv6 packets is based on the longest bit match or longest prefix match. The routing table process will attempt to forward the packet using an entry in the routing table with the greatest number of far left matching bits. The number of matching bits is indicated by the route's prefix length. Figure 3 shows a similar scenario with IPv6. To verify that the current IPv6 path matches the desired path to reach destinations, use the show ipv6 route command on a router to examine the routing table. After examining the IPv6 routing table, R1 does have a path to 2001:DB8:ACAD:4::/64 via R2 at FE80::2. The following list, along with Figure 4, describes the process for both the IPv4 and IPv6 routing tables. If the destination address in a packet: Does not match an entry in the routing table, then the default route is used. If there is not a default route that is configured, the packet is discarded. Matches a single entry in the routing table, then the packet is forwarded through the interface that is defined in this route. Matches more than one entry in the routing table and the routing entries have the same prefix length, then the packets for this destination can be distributed among the routes that are defined in the routing table. Matches more than one entry in the routing table and the routing entries have different prefix lengths, then the packets for this destination are forwarded out of the interface that is associated with the route that has the longer prefix match. Troubleshooting Example Devices are unable to connect to the server SRV1 at 172.16.1.100. Using the show ip route command, the administrator should check to see if a routing entry exists to network 172.16.1.0/24. If the routing table does not have a specific route to SRV1's network, the network administrator must then check for the existence of a default or summary route entry in the direction of the 172.16.1.0/24 network. If none exists, then the problem may be with routing and the administrator must verify that the network is included within the dynamic routing protocol configuration, or add a static route.

Step 6 - Verify the Transport Layer

Troubleshooting the Transport Layer If the network layer appears to be functioning as expected, but users are still unable to access resources, then the network administrator must begin troubleshooting the upper layers. Two of the most common issues that affect transport layer connectivity include ACL configurations and NAT configurations. A common tool for testing transport layer functionality is the Telnet utility. Caution: While Telnet can be used to test the transport layer, for security reasons, SSH should be used to remotely manage and configure devices. A network administrator is troubleshooting a problem where someone cannot send email through a particular SMTP server. The administrator pings the server, and it responds. This means that the network layer, and all layers below the network layer, between the user and the server is operational. The administrator knows the issue is with Layer 4 or up and must start troubleshooting those layers. Although the Telnet server application runs on its own well-known port number 23 and Telnet clients connect to this port by default, a different port number can be specified on the client to connect to any TCP port that must be tested. This indicates whether the connection is accepted (as indicated by the word "Open" in the output), refused, or times out. From any of those responses, further conclusions can be made concerning the connectivity. Certain applications, if they use an ASCII-based session protocol, might even display an application banner, it may be possible to trigger some responses from the server by typing in certain keywords, such as with SMTP, FTP, and HTTP. Given the previous scenario, the administrator Telnets from PC1 to the server HQ, using IPv6, and the Telnet session is successful, as shown in Figure 1. In Figure 2 the administrator attempts to Telnet to the same server, using port 80. The output verifies that the transport layer is connecting successfully from PC1 to HQ. However, the server is not accepting connections on port 80. The example in Figure 3 shows a successful Telnet connection from R1 to R3, over IPv6. Figure 4 is a similar Telnet attempt using port 80. Again, the output verifies a success transport layer connection, but R3 is refusing the connection using port 80.

End-to-End Connectivity Problem Initiates Troubleshooting

Usually what initiates a troubleshooting effort is the discovery that there is a problem with end-to-end connectivity. Two of the most common utilities used to verify a problem with end-to-end connectivity are ping and traceroute, as shown in Figure 1. Ping is probably the most widely-known connectivity-testing utility in networking and has always been part of Cisco IOS Software. It sends out requests for responses from a specified host address. The ping command uses a Layer 3 protocol that is a part of the TCP/IP suite called ICMP. Ping uses the ICMP echo request and ICMP echo reply packets. If the host at the specified address receives the ICMP echo request, it responds with an ICMP echo reply packet. Ping can be used to verify end-to-end connectivity for both IPv4 and IPv6. Figure 2 shows a successful ping from PC1 to SRV1, at address 172.16.1.100. The traceroute command in Figure 3 illustrates the path the IPv4 packets take to reach their destination. Similar to the ping command, the Cisco IOS traceroute command can be used for both IPv4 and IPv6. The tracert command is used with Windows operating system. The trace generates a list of hops, router IP addresses and the final destination IP address that are successfully reached along the path. This list provides important verification and troubleshooting information. If the data reaches the destination, the trace lists the interface on every router in the path. If the data fails at some hop along the way, the address of the last router that responded to the trace is known. This address is an indication of where the problem or security restrictions reside. As stated, the ping and traceroute utilities can be used to test and diagnose end-to-end IPv6 connectivity by providing the IPv6 address as the destination address. When using these utilities, the Cisco IOS utility recognizes whether the address is an IPv4 or IPv6 address and uses the appropriate protocol to test connectivity. Figure 4 shows the ping and traceroute commands on router R1 used to test IPv6 connectivity. Note: The traceroute command is commonly performed when the ping command fails. If the ping succeeds, the traceroute command is commonly not needed because the technicia

Measuring Data

When documenting the network, it is often necessary to gather information directly from routers and switches. Obvious useful network documentation commands include ping, traceroute, and telnet as well as the following show commands: The show ip interface brief and show ipv6 interface brief commands are used to display the up or down status and IP address of all interfaces on a device. The show ip route and show ipv6 route commands are used to display the routing table in a router to learn the directly connected neighbors, more remote devices (through learned routes), and the routing protocols that have been configured. The show cdp neighbors detail command is used to obtain detailed information about directly connected Cisco neighbor devices. The figure lists some of the most common Cisco IOS commands used for data collection. Manual data collection using show commands on individual network devices is extremely time consuming and is not a scalable solution. Manual collection of data should be reserved for smaller networks or limited to mission-critical network devices. For simpler network designs, baseline tasks typically use a combination of manual data collection and simple network protocol inspectors. Sophisticated network management software is typically used to baseline large and complex networks. These software packages enable administrators to automatically create and review reports, compare current performance levels with historical observations, automatically identify performance problems, and create alerts for applications that do not provide expected levels of service. Establishing an initial baseline or conducting a performance-monitoring analysis may require many hours or days to accurately reflect network performance. Network management software or protocol inspectors and sniffers often run continuously over the course of the data collection process.

Transport Layer Troubleshooting - NAT for IPv4

here are a number of problems with NAT such as not interacting with services like DHCP and tunneling. These can include misconfigured NAT inside, NAT outside, or ACL. Other issues include interoperability with other network technologies, especially those that contain or derive information from host network addressing in the packet. Some of these technologies include: BOOTP and DHCP - Both protocols manage the automatic assignment of IPv4 addresses to clients. Recall that the first packet that a new client sends is a DHCP-Request broadcast IPv4 packet. The DHCP-Request packet has a source IPv4 address of 0.0.0.0. Because NAT requires both a valid destination and source IPv4 address, BOOTP and DHCP can have difficulty operating over a router running either static or dynamic NAT. Configuring the IPv4 helper feature can help solve this problem. DNS - Because a router running dynamic NAT is changing the relationship between inside and outside addresses regularly as table entries expire and are recreated, a DNS server outside the NAT router does not have an accurate representation of the network inside the router. Configuring the IPv4 helper feature can help solve this problem. SNMP - Similar to DNS packets, NAT is unable to alter the addressing information stored in the data payload of the packet. Because of this, an SNMP management station on one side of a NAT router may not be able to contact SNMP agents on the other side of the NAT router. Configuring the IPv4 helper feature can help solve this problem. Tunneling and encryption protocols - Encryption and tunneling protocols often require that traffic be sourced from a specific UDP or TCP port, or use a protocol at the transport layer that cannot be processed by NAT. For example, IPsec tunneling protocols and generic routing encapsulation protocols used by VPN implementations cannot be processed by NAT.

Common hardware Troubleshooting Tools part 2

network analysis module - shows traffic and remote switches and routers in graphic form. digital multimeter - measures electronic values of voltage, current, and resistance. cable tester - tests data connection cabling for broken wires, crossed wiring, and shorted connection. cable analyzer - tests and certifies for different services and via a hand held device. portable network analyzer - discoverers vlan configurations, average and peak bandwidth utilization using a portable device.

Troubleshooting Methods

sing the layered models, there are three primary methods for troubleshooting networks: Bottom-up Top-down Divide-and-conquer Each approach has its advantages and disadvantages. This topic describes the three methods and provides guidelines for choosing the best method for a specific situation. Bottom-Up Troubleshooting Method In bottom-up troubleshooting, you start with the physical components of the network and move up through the layers of the OSI model until the cause of the problem is identified, as shown in Figure 1. Bottom-up troubleshooting is a good approach to use when the problem is suspected to be a physical one. Most networking problems reside at the lower levels, so implementing the bottom-up approach is often effective. The disadvantage with the bottom-up troubleshooting approach is it requires that you check every device and interface on the network until the possible cause of the problem is found. Remember that each conclusion and possibility must be documented so there can be a lot of paper work associated with this approach. A further challenge is to determine which devices to start examining first. Top-Down Troubleshooting Method In Figure 2, top-down troubleshooting starts with the end-user applications and moves down through the layers of the OSI model until the cause of the problem has been identified. End-user applications of an end system are tested before tackling the more specific networking pieces. Use this approach for simpler problems, or when you think the problem is with a piece of software. The disadvantage with the top-down approach is it requires checking every network application until the possible cause of the problem is found. Each conclusion and possibility must be documented. The challenge is to determine which application to start examining first. Divide-and-Conquer Troubleshooting Method Figure 3 shows the divide-and-conquer approach to troubleshooting a networking problem. The network administrator selects a layer and tests in both directions from that layer. In divide-and-conquer troubleshooting, you start by collecting user experiences of the problem, document the symptoms and then, using that information, make an informed guess as to which OSI layer to start your investigation. When a layer is verified to be functioning properly, it can be assumed that the layers below it are functioning. The administrator can work up the OSI layers. If an OSI layer is not functioning properly, the administrator can work down the OSI layer model. For example, if users cannot access the web server, but they can ping the server, then the problem is above Layer 3. If pinging the server is unsuccessful, then the problem is likely at a lower OSI layer.

Commands for Gathering Symptoms

tep 1. Gather information - Gather information from the trouble ticket, users, or end systems affected by the problem to form a definition of the problem. Step 2. Determine ownership - If the problem is within the control of the organization, move onto the next stage. If the problem is outside the boundary of the organization's control (for example, lost Internet connectivity outside of the autonomous system), contact an administrator for the external system before gathering additional network symptoms. Step 3. Narrow the scope - Determine if the problem is at the core, distribution, or access layer of the network. At the identified layer, analyze the existing symptoms and use your knowledge of the network topology to determine which piece of equipment is the most likely cause. Step 4. Gather symptoms from suspect devices - Using a layered troubleshooting approach, gather hardware and software symptoms from the suspect devices. Start with the most likely possibility and use knowledge and experience to determine if the problem is more likely a hardware or software configuration problem. Step 5. Document symptoms - Sometimes the problem can be solved using the documented symptoms. If not, begin the isolating stage of the general troubleshooting process. To gather symptoms from suspected networking device, use Cisco IOS commands and other tools including: ping, traceroute, and telnet commands show and debug commands packet captures device logs


Conjuntos de estudio relacionados

Placement Test Cambridge Assessment English

View Set

Qualified plans and federal tax considerations for life insurance and annuities

View Set

Policy Provisions, Options, and Other Features (Chapter 3)

View Set

WWI (section 4): costs of war & peace

View Set

Geology Exam Review: Mountain Building

View Set

تاريخ ادبيات پيش دانشگاهي نيم سال اول

View Set

Chapitre 2- les scores et leur distribution

View Set