Selenium WebDriver
Broken Link Detection
Learn to identify and validate broken links in web applications using HTTP response codes and URL connection testing.
Understanding Broken Links
What are Broken Links?
Broken links (also called invalid links) are hyperlinks that don't work properly - they either lead to non-existent pages, return server errors, or fail to load due to various issues.
Impact on Users
- • Poor user experience
- • Loss of credibility
- • Reduced engagement
- • Navigation frustration
SEO Impact
- • Lower search rankings
- • Reduced crawl efficiency
- • Negative site quality signals
- • Lost link equity
HTTP Response Code Classification
Response Code Categories
HTTP response codes are grouped into series that indicate different types of responses from the server.
✅ Valid Links (Response < 400)
1** Series - Informational
Provides information about the URL
Example: 100 Continue, 101 Switching Protocols
2** Series - Success
Request was successful
Example: 200 OK, 201 Created, 204 No Content
3** Series - Redirection
Navigates from one URL to another
Example: 301 Moved Permanently, 302 Found
❌ Invalid Links (Response ≥ 400)
4** Series - Client Errors
Client-side error codes
Example: 404 Not Found, 403 Forbidden, 400 Bad Request
5** Series - Server Errors
Server-side error codes
Example: 500 Internal Server Error, 502 Bad Gateway
📝 Quick Rule
• Response code < 400 = Valid link
• Response code ≥ 400 = Invalid/Broken link
Single Link Validation
6-Step Process for Link Validation
Learn the systematic approach to validate individual links using Java's URL and HttpURLConnection classes.
Step 1: Create URL Object
URL url = new URL("https://www.facebook.com");
Step 2: Open Connection
URLConnection urlCon = url.openConnection();
Step 3: Type Cast to HttpURLConnection
HttpURLConnection httpCon = (HttpURLConnection)urlCon;
Step 4: Connect to URL
httpCon.connect();
Step 5: Get Response Code
int responseCode = httpCon.getResponseCode();
Step 6: Validate Link
if(responseCode < 400) // Valid
else // Invalid
Complete Single Link Example
package Tutorial18; import java.io.IOException; import java.net.HttpURLConnection; import java.net.URL; import java.net.URLConnection; public class Demo1 { public static void main(String[] args) throws IOException { // Step 1: Create URL object URL url = new URL("http://www.deadlinkcity.com/error-page.asp?e=404"); // Step 2: Open connection URLConnection urlCon = url.openConnection(); // Step 3: Type cast to HttpURLConnection HttpURLConnection httpCon = (HttpURLConnection) urlCon; // Step 4: Connect httpCon.connect(); // Step 5: Get response code int responseCode = httpCon.getResponseCode(); // Step 6: Validate link if (responseCode < 400) { System.out.println("Link is valid"); } else { System.out.println("Link is invalid"); } } }
Knowledge Check
Knowledge Check
Question 1 of 5
What HTTP response code series indicates valid links?
Key Points Summary
🎯 Link Validation Essentials
- • Response codes < 400 = Valid links
- • Response codes ≥ 400 = Broken links
- • Always validate HTTP/HTTPS links only
- • Handle null and empty href attributes
⚡ Implementation Tips
- • Use try-catch for robust error handling
- • Set appropriate connection timeouts
- • Filter links before validation
- • Implement proper logging mechanisms
📈 Scalability Considerations
- • Use thread pools for parallel processing
- • Implement rate limiting for large sites
- • Cache results to avoid duplicate checks
- • Consider using headless browsers
🚨 Common Pitfalls
- • Not handling relative URLs properly
- • Ignoring JavaScript-generated links
- • Missing timeout configurations
- • Not considering authentication requirements