Finding LInks |
Demo. Link finder:
We want to write a program to find and extract all of the links in an html file. See FindLinks.
To do this, we need to know how a link is defined in an HTML file:
<a href="the URL">link </a>
So we need to find the tags "<a>" and "</a>" that surround the URL.
Convert the string that is the HTML file to lowercase.
String links = "";
Find the first position of "<a"
While there is a link remaining (i.e., position is not -1)
// Extract all the links from a web page private String findLinks( String fullpage ) { int tagPos, // Start of <A tag specification tagEnd; // Position of first ">" after tag start // A lower case only version of the page for searching String lowerpage = fullpage.toLowerCase(); // Text of one A tag String tag; // The A tags found so far String links = ""; // Paste stuff on end of page to ensure searches succeed fullpage = fullpage + " >"; tagPos = lowerpage.indexOf("<a ",0); while (tagPos >= 0 ) { tagEnd = fullpage.indexOf(">",tagPos+1); tag = fullpage.substring(tagPos, tagEnd+1); links = links + tag + "\n"; tagPos = lowerpage.indexOf("<a ", tagEnd); } return links; }
Finding LInks |