
Update to Wicked Cool Shell Script #62: define.shThis is regarding script #62 (define a word): Looks like WordNet has changed their online version again and I tried the following replacement for the url= http://wordnet.princeton.edu/perl/webwn?s= But the script doesn't return anything but goes back to the prompt. I tried the url with a word in lynx and I got the source. Could you push me in a general direction? While most of the content of my popular book Wicked Cool Shell Scripts has weathered the passage of time well, the scripts that scrape specific content off Web sites have had a harder time with the inevitable redesigns, restructuring and general changes. In general, scraping content is fraught with risk anyway because you're very dependent on the current information architecture which can change without warning. Nonetheless, let's dig into this. First off, if you'd like to follow along and don't have my book (Shocking! Hey, just buy a copy at Amazon, it's well worth it) you can view the script here: Script #62: define.sh. The problem is that when you go to the given URL, you find out that: "WordNet 2.0 is no longer available." Fortunately the message goes on to explain that you can access the latest version of this nifty utility at href="http://wordnet.princeton.edu/perl/webwn" target="_blank">http://wordnet.princeton.edu/perl/webwn, so let's go there and enter a search query with a standard Web browser like Firefox. I'll search for "harmonious" because that's just a word on my mind today. :-) The resultant URL from the Princeton tool is rather scarylong: http://wordnet.princeton.edu/perl/webwn?s=harmonious&sub=Search+WordNet&o2=
&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&h= As with most of these, however, you can axe out any name=value pair where there's no value specified, which immediately trims it down to: http://wordnet.princeton.edu/perl/webwn?s=harmonious&sub=Search+WordNet&
o0=1&o1=1 A little more fiddling reveals that in fact if we want the default behavior - and we do - that the URL can be hacked down to: http://wordnet.princeton.edu/perl/webwn?s=harmonious
A result that is, well, more harmonious. :-) Now we can at least get definitions again with the script, but parsing the result to display it attractively within the shell, well, I think I'd do it differently now. To understand the challenge, here's the Wordnet definition of baroque: ![]() The goal is to display both of these definition groups, but omit the material above and below it (as I have neatly done with the screen shot). Here's the good news, however, gleaned by reading the source code: parts of speech are signified by <h3> headers, so part of the source of the above is <h3>Noun</h3>. We can search for that, and that gives us the beginning of the definition. The end turns out to be easy too: the line after the last definition line is: <a href="http://wordnet.princeton.edu">WordNet home page</a>
so we can use that as the end marker too and let "sed" do the dirty work of chopping out what we don't want to see. That's done, as readers would know, with something like: sed -n "/<h3>/,/wordnet/p"
The rest, I'll leave as an exercise for enthused readers. :-)
Help others find this article at Del.icio.us, Digg, Netscape, Reddit, and Stumble Upon
Categorized:
Shell Script Programming
(Article 8003)
Tagged: hacking, programming, shell scripting, wicked cool shell scripts Previous: How is a stock index calculated? Next: What does "bricked" mean? Subscribe!
Never miss another useful Q&A article again! Subscribe to AskDaveTaylor with Google Reader. just tried http://wordnet.princeton.edu/perl/webwn?s=harmonious and got a 404. Thought you'd like to know. Posted by: Steve O at July 31, 2009 1:32 PMwordnet.princeton is now wordnetweb.princeton, so the url is now http://wordnetweb.princeton.edu/perl/webwn?s=harmonious if anyone is interested. Posted by: Oval at October 27, 2009 6:57 AMI have something to say, now that you mention it, but ...
I do have a comment, now that you mention it!
|
![]() ![]()
Search
Find just the answers you seek from among our 2300+ free tech support articles by using our Lijit search engine.
Help!
Subscribe to
Ask Dave Taylor!
Free Updates!
Sign up and get free weekly updates and special offers on books, seminars, workshops and more.
Articles and Reviews
Auctions and Online Shopping Blogs and RSS Feeds Building Web site traffic Business and Management Cell Phones and Mobile Phones CGI Scripts and Web Site Programming Computer and Internet Basics d) None of the Above HTML and CSS Industry News and Trade Shows Mac OS X Help MySpace, Facebook, Twitter and Social Network Help Pay Per Click (PPC) Search Engine Optimization Shell Script Programming Sony PSP, MP3 Players, Etc. The Writing Business Unix and Linux Help Video Game Tips and Help Windows Help
Recent Entries
Book Links
|