Reading from a URL |
How different are Web pages from files? There are two obvious differences:
The fact that the text in a Web page is HTML is only an issue if we want to display it nicely, like a Web browser does. HTML is, in fact, just text but with a funny syntax, in the same way that a Java file is just text with a funny syntax.
What we care about is the second point. The Web page is probably on a different machine. Because of this when we read a Web page in Java, we can't use a FileReader as above. Fortunately, Java makes it easy for us to read web pages with a URL object. A URL object can be constructed using new URL (urlString) or new URL(currentURL,urlString, where currentURL is the url for the current web page (this is used when urlString can be a relative URL).
We need to construct the stream reader differently, but after we do that the code to read from the stream is identical. Here it is:
// Note the different way of constructing a stream to read a URL // over the network pageReader = new BufferedReader(new InputStreamReader(url.openStream())); // Read the first line nextLine = pageReader.readLine(); // Loop until all the lines are read while (nextLine != null) { // Code to process next line omitted nextLine = pageReader.readLine(); } // Close the stream pageReader.close();
HTMLLinkFinder w/ BufferedReader shows a complete example reading a Web page in this way.
Reading from a URL |