Describe how to use Network Watcher to diagnose connectivity problems with Azure Load Balancer. Expertise Level of Developer Required to Answer this Question
Question
Describe how to use Network Watcher to diagnose connectivity problems with Azure Load Balancer. Expertise Level of Developer Required to Answer this Question
Brief Answer
Azure Network Watcher is crucial for diagnosing connectivity problems with Azure Load Balancers by helping you pinpoint issues related to network paths, security rules (NSGs), and routing (UDRs). The key tools for Load Balancer troubleshooting include:
- IP Flow Verify: Simulates traffic flow (e.g., from client to LB frontend or LB to backend VM) to determine if NSG rules or UDRs are blocking communication. It’s excellent for initial diagnosis of both frontend and backend connectivity.
- Connection Troubleshoot: Provides a comprehensive, VM-centric analysis of network path connectivity to a specific endpoint (e.g., a backend VM to a database or another service). This is vital for checking if backend VMs can reach their dependencies or if health probes are failing due to internal VM issues.
- Next Hop: Helps verify that traffic is correctly routed towards the Load Balancer, ensuring that UDRs or system routes aren’t inadvertently bypassing it.
- Security Group View: Consolidates all effective NSG rules applied to a VM or subnet, allowing you to quickly audit if necessary ports (e.g., HTTP/HTTPS) are open for the Load Balancer and its backend pool.
- Packet Capture: For deep-dive analysis, capture actual network traffic on a backend VM to diagnose complex application-level or intermittent issues that other tools might not reveal.
Practical Approach: Start with IP Flow Verify to check external client-to-LB connectivity. Then, use Connection Troubleshoot from a backend VM to its dependencies. Always confirm routing with Next Hop and audit NSGs with Security Group View. This systematic approach helps isolate and resolve Load Balancer connectivity problems efficiently.
Super Brief Answer
Use Azure Network Watcher’s IP Flow Verify to check client-to-Load Balancer path and NSG rules. Employ Connection Troubleshoot for backend VM connectivity to dependencies. Verify routing with Next Hop, audit NSGs with Security Group View, and use Packet Capture for deep diagnostics. These tools collectively pinpoint network, security, or routing issues affecting your Load Balancer.
Detailed Answer
Azure Network Watcher is an essential service for monitoring, diagnosing, and gaining insights into your Azure network performance. When dealing with an Azure Load Balancer, which distributes incoming traffic among healthy virtual machines (VMs) in a backend pool, connectivity issues can be complex. Network Watcher offers powerful tools to accurately pinpoint the root cause of these problems, whether they lie with network security groups (NSGs), user-defined routes (UDRs), or the load balancer’s configuration itself.
At a high level, you can use Network Watcher’s IP Flow Verify and Connection Troubleshoot to diagnose Azure Load Balancer connectivity issues by checking network paths, security rules, and VM connectivity.
Key Network Watcher Capabilities for Load Balancer Troubleshooting
Network Watcher provides a suite of tools that are invaluable for diagnosing connectivity problems related to Azure Load Balancers. Here are the primary capabilities:
IP Flow Verify
IP Flow Verify checks if a network path exists between a specified source and destination, and crucially, identifies any Network Security Group (NSG) rules that might be blocking the traffic. This tool is vital for understanding why traffic might not be reaching your load balancer or its backend VMs.
It helps reveal if NSGs or User-Defined Routes (UDRs) are causing issues by simulating traffic flow and showing the effective security rules and routing decisions for that flow.
Example Scenario: We were troubleshooting intermittent connectivity issues to our e-commerce website. Users were reporting sporadic errors, and our monitoring showed connection drops. Using IP Flow Verify, targeting the load balancer’s public IP and port 443 from a test VM in a different region, we immediately pinpointed the culprit: an overly restrictive Network Security Group rule on the load balancer’s subnet. The rule was blocking traffic from certain IP ranges, explaining the intermittent nature of the problem.
Connection Troubleshoot
Connection Troubleshoot diagnoses connectivity issues from a VM’s perspective, providing a comprehensive view of the network path and identifying where the problem lies. It can pinpoint whether the issue is within the VM itself, the load balancer, or the broader network infrastructure.
This tool is particularly useful for checking if a VM in the backend pool can reach required endpoints, such as a database server or an external API.
Example Scenario: During the deployment of a new microservice, we observed that while the health probes were passing, the application wasn’t receiving traffic. We used Connection Troubleshoot from a VM in the backend pool, targeting the database server. The results revealed that the VM’s internal DNS was misconfigured, preventing it from resolving the database hostname. This highlighted an issue within the VM itself, not the load balancer or network.
Next Hop
The Next Hop feature helps you understand the next hop IP address for a specific VM when routing traffic to a destination. This is crucial for verifying that traffic is indeed going to the load balancer as expected, and not being misrouted due to incorrect UDRs or system routes.
Example Scenario: We migrated our load balancer to a different virtual network. Afterwards, some VMs weren’t routing traffic correctly. Using Network Watcher’s Next Hop feature, we checked the next hop for VMs in the backend pool. We discovered that some VMs still had their old routing table, pointing to the previous virtual network’s gateway instead of the new load balancer. This helped us quickly identify and correct the misconfigured routing.
Security Group View
Security Group View provides a consolidated list of all effective security rules applied to a VM, combining rules from Network Security Groups at the subnet level and network interface level. This helps verify that traffic to and from the load balancer is explicitly allowed by the NSGs, preventing unexpected blocks.
Example Scenario: Our monitoring alerted us to dropped connections on port 8080 for our internal API. We used Security Group View on a backend pool VM and found that while we intended to allow traffic on port 8080, an inherited security rule from a higher-level NSG was blocking it. Identifying this conflict allowed us to rectify the NSG configuration and restore connectivity.
Packet Capture
Packet Capture allows you to capture network traffic directly on an Azure VM. This provides granular, real-time insights into network communications, which is invaluable for deep dives into complex connectivity issues that other tools might not fully diagnose.
It’s particularly useful for analyzing application-level protocols or identifying specific packet drops that indicate subtle network or application misconfigurations.
Example Scenario: We encountered a complex connectivity issue where the load balancer’s health probes were passing, but application traffic was intermittently failing. To investigate further, we used Packet Capture on a backend VM. Analyzing the captured packets revealed that the application was sending responses to a non-existent source port, leading to the connection drops. This level of detail, unavailable through other diagnostic tools, was crucial for identifying and resolving the root cause.
Practical Scenarios & Best Practices
When troubleshooting Azure Load Balancer connectivity, consider these practical approaches:
-
Troubleshooting Frontend Connectivity (Client Perspective)
Use IP Flow Verify from a client machine (or a VM simulating a client) to the load balancer’s public IP and frontend port. This helps isolate whether the issue resides within your on-premises network, the internet connection, or the Azure infrastructure up to the load balancer’s frontend.
“In a recent project, we experienced intermittent connectivity issues to our public-facing web application. To isolate the problem, we used IP Flow Verify from a client VM outside our Azure environment, targeting the load balancer’s public IP and port 443. This allowed us to determine if the issue resided within our on-premises network, the internet connection, or the Azure infrastructure up to the load balancer’s frontend.”
-
Diagnosing Backend Connectivity (Backend VM Perspective)
Utilize Connection Troubleshoot from a VM in the backend pool to the desired endpoint (e.g., a database server, another microservice). This helps diagnose backend connectivity issues, identify latency, or determine if the backend VMs can properly communicate with their dependencies.
“We had a situation where our web application, fronted by a load balancer, was experiencing slow response times. Using Connection Troubleshoot from a VM in the backend pool, targeting the database server, revealed high network latency. This pointed to a performance bottleneck between the backend VMs and the database, allowing us to focus our troubleshooting efforts on that specific segment of the network.”
-
Verifying Routing with Next Hop
Employ Network Watcher’s Next Hop feature to verify that traffic is routed correctly to the load balancer. Misconfigured routing tables or UDRs can bypass the load balancer entirely, leading to connectivity failures that are hard to spot otherwise.
“After a network restructuring, we noticed that some client requests weren’t reaching our load balancer. We leveraged Network Watcher’s Next Hop feature on a client VM and discovered that the default route was incorrectly pointing to an old gateway, bypassing the load balancer. This misconfiguration explained the connectivity issue and allowed us to quickly fix the routing tables.”
-
Auditing Security Rules with Security Group View
Examine the Security Group View for VMs in the backend pool and the load balancer’s subnet to ensure that traffic to and from the load balancer isn’t inadvertently blocked by NSG rules. Pay close attention to specific port and protocol configurations, such as allowing HTTP on port 80 or HTTPS on port 443.
“During a security audit, we needed to verify that only HTTPS traffic was allowed to our web servers behind the load balancer. By examining the Security Group View on the backend VMs, we confirmed that port 443 was open for inbound traffic while port 80 (HTTP) was blocked. This ensured compliance with our security policy and confirmed that the NSG configuration was correctly applied.”
Code Sample
While Network Watcher operations are typically performed via the Azure Portal, CLI, or PowerShell/SDKs, here’s an example CLI command for IP Flow Verify:
az network watcher test-ip-flow \
--direction Inbound \
--protocol Tcp \
--local-ip 10.0.0.4 \
--local-port 80 \
--remote-ip 13.107.21.200 \
--remote-port 53 \
--resource-group MyResourceGroup \
--vm MyVm

