A reader writes to me: “I work with HumaniNet, an organization that assists humanitarian field teams with their communications needs, mostly over satellite. Bandwidth is typically very expensive, in the dollars per minute range. Am I correct to assume that Lynx doesn’t even request images or other forms of ‘rich content’ like Flash, and would therefore save dramatically on bandwidth costs? In other words, Lynx doesn’t download anything beyond (x)html, right?”
First off, let me compliment your organization on fulfilling an important and most likely unappreciated role in the world of humanitarian efforts. With the advent of modern high-speed communications, news about humanitarian crises travels quite fast, and being able to give those people who are on the ground the ability to communicate is a wonderful thing!
In terms of your question, you are absolutely correct.
If you think of Lynx as the lowest-common-denominator text-only Web browser, you’d definitely be on the right track. It just gets the base page from the URL you’ve specified and stops. That isn’t to say that you can’t actually use it to download and save to disk images, flash content, etc., but by default it just lists all the links and lets you navigate to them (with TAB or arrow keys) and then use ENTER or RETURN to request them.
From a command-line perspective, you can see how this works by realizing that when I type in lynx -dump http://www.askdavetaylor.com/ I get all the text from the home page in neat paragraphs, followed by a numbered list like this:
References 1. http://www.askdavetaylor.com/index.rdf 2. http://www.askdavetaylor.com/ 3. http://www.askdavetaylor.com/cat_unix_linux_basics.html 4. http://www.sysadminmag.com/
On and on, for almost 400 entries. And that’s without the graphics, which I can get by adding the command line flag -image_links, which, if I use with an interactive invocation of the program, offers the chance to download any image links with a single keystroke.
To extract a list of all image references you’d want to do a little bit of scripting. Since I can’t resist, here’s a top-of-the-head command-line solution that would extract all the images referenced on a Web page through Lynx:
$ lynx -source http://www.intuitive.com/ | tr '<' '\ ' | grep -i 'src="' | tr ' ' '\ ' | grep -i src= | sed 's/src="//g;s/"//g' | sort | uniq
Rather ugly because of the hassle of translating a character into a carriage return on the command line, but when run here's its useful output:
/Graphics/header_right2.jpg /images/bio.jpg /images/consulting.jpg /images/contact.jpg /images/header_left.jpg /images/left_bottom.jpg /images/left_space1.jpg /images/left_space2.jpg /images/left_space3.jpg /images/left_space4.jpg /images/left_space5.jpg /images/left_space6.jpg /images/left_space7.jpg /images/left_top.jpg /images/middle_bottom.jpg /images/photography.jpg /images/right_bottom.jpg /images/right_top.jpg /images/speaking.jpg /images/teaching.jpg /images/top_spacer.jpg /images/weblog.jpg /images/writing.jpg
Anyway, that's beyond what you're asking. The long and short of it is, yes, Lynx skips all referenced graphics and other media elements. 🙂