Free tech support / small logo


Update to Wicked Cool Shell Script #62: define.sh

This is regarding script #62 (define a word): Looks like WordNet has changed their online version again and I tried the following replacement for the url=

http://wordnet.princeton.edu/perl/webwn?s=

But the script doesn't return anything but goes back to the prompt. I tried the url with a word in lynx and I got the source. Could you push me in a general direction?


Dave's Answer:

While most of the content of my popular book Wicked Cool Shell Scripts has weathered the passage of time well, the scripts that scrape specific content off Web sites have had a harder time with the inevitable redesigns, restructuring and general changes. In general, scraping content is fraught with risk anyway because you're very dependent on the current information architecture which can change without warning.

Nonetheless, let's dig into this. First off, if you'd like to follow along and don't have my book (Shocking! Hey, just buy a copy at Amazon, it's well worth it) you can view the script here: Script #62: define.sh.

The problem is that when you go to the given URL, you find out that: "WordNet 2.0 is no longer available." Fortunately the message goes on to explain that you can access the latest version of this nifty utility at href="http://wordnet.princeton.edu/perl/webwn" target="_blank">http://wordnet.princeton.edu/perl/webwn, so let's go there and enter a search query with a standard Web browser like Firefox. I'll search for "harmonious" because that's just a word on my mind today. :-)

The resultant URL from the Princeton tool is rather scarylong:

http://wordnet.princeton.edu/perl/webwn?s=harmonious&sub=Search+WordNet&o2=
&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&h=

As with most of these, however, you can axe out any name=value pair where there's no value specified, which immediately trims it down to:

http://wordnet.princeton.edu/perl/webwn?s=harmonious&sub=Search+WordNet&
o0=1&o1=1

A little more fiddling reveals that in fact if we want the default behavior - and we do - that the URL can be hacked down to:

http://wordnet.princeton.edu/perl/webwn?s=harmonious

A result that is, well, more harmonious. :-)

Now we can at least get definitions again with the script, but parsing the result to display it attractively within the shell, well, I think I'd do it differently now. To understand the challenge, here's the Wordnet definition of baroque:

Princeton University's Wordnet service: Definition of 'baroque'

The goal is to display both of these definition groups, but omit the material above and below it (as I have neatly done with the screen shot).

Here's the good news, however, gleaned by reading the source code: parts of speech are signified by <h3> headers, so part of the source of the above is <h3>Noun</h3>. We can search for that, and that gives us the beginning of the definition.

The end turns out to be easy too: the line after the last definition line is:

<a href="http://wordnet.princeton.edu">WordNet home page</a>

so we can use that as the end marker too and let "sed" do the dirty work of chopping out what we don't want to see.

That's done, as readers would know, with something like:

sed -n "/<h3>/,/wordnet/p"

The rest, I'll leave as an exercise for enthused readers. :-)









Subscribe!
Never miss another Q&A article! Click to subscribe: Add to Google Reader Add to My Yahoo! Subscribe in NewsGator RDF XML
Comments

just tried http://wordnet.princeton.edu/perl/webwn?s=harmonious and got a 404. Thought you'd like to know.

Posted by: Steve O at July 31, 2009 1:32 PM

wordnet.princeton is now wordnetweb.princeton, so the url is now http://wordnetweb.princeton.edu/perl/webwn?s=harmonious if anyone is interested.

Posted by: Oval at October 27, 2009 6:57 AM

I have something to say, now that you mention it, but ...
Starbucks coffee cup I do have a lot to say, and questions of my own for that matter, but first I'd like to say thank you for all your efforts on this Web site by buying you a cup of coffee!

I do have a comment, now that you mention it!











Remember personal info?


Please note that I will never send you any unsolicited email. Ever.

While I'm at it, please note that by submitting a question or comment you're agreeing to my terms of service, which are: you relinquish any subsequent rights of ownership to your material by submitting it on this site.









Recent Entries


Search
I Need Help!
Need Help? Ask Dave Taylor!


© 2002 - 2012 by Dave Taylor. All Rights Reserved.

Note: This web site is for the purpose of disseminating information for educational purposes, free of charge, for the benefit of all visitors. We take great care to provide quality information. However, we do not guarantee, and accept no legal liability whatsoever arising from or connected to, the accuracy, reliability, currency or completeness of any material contained on this web site or on any linked site.

[whiteboard marker tray]
"Ask Dave Taylor®" is a registered trademark of Intuitive Systems, LLC.