Reading from a URL

Reading from a URL

How different are Web pages from files? There are two obvious differences:

The Web page contains HTML, not just simple text.
The Web page is probably not on our machine

The fact that the text in a Web page is HTML is only an issue if we want to display it nicely, like a Web browser does. HTML is, in fact, just text but with a funny syntax, in the same way that a Java file is just text with a funny syntax.

What we care about is the second point. The Web page is probably on a different machine. Because of this when we read a Web page in Java, we can't use a FileReader as above. Fortunately, Java makes it easy for us to read web pages with a URL object. A URL object can be constructed using new URL (urlString) or new URL(currentURL,urlString, where currentURL is the url for the current web page (this is used when urlString can be a relative URL).

We need to construct the stream reader differently, but after we do that the code to read from the stream is identical. Here it is:

// Note the different way of constructing a stream to read a URL
// over the network
pageReader = new BufferedReader(new InputStreamReader(url.openStream()));

// Read the first line
nextLine = pageReader.readLine();

// Loop until all the lines are read
while (nextLine != null) {
   // Code to process next line omitted

   nextLine = pageReader.readLine();
}
// Close the stream
pageReader.close();

HTMLLinkFinder w/ BufferedReader shows a complete example reading a Web page in this way.

Reading from a URL