Ask Dave Taylor
  • Facebook
  • Instagram
  • Linkedin
  • Pinterest
  • Twitter
  • YouTube
  • Home
  • YouTube Videos
  • Top Categories
  • Subscribe via Email
  • Ask A Question
  • Meet Dave
  • Home
  • Linux Shell Script Programming
  • Can I track an RSS feed with a shell script?

Can I track an RSS feed with a shell script?

November 15, 2004 / Dave Taylor / Linux Shell Script Programming, Wordpress Help / 6 Comments

More than once, readers have written to me, asking if it was possible to track an RSS feed from a Weblog or news site with a shell script. Sounds kinda wacky, but in fact, it’s a very good use of a shell script, as the following rather extensive entry — including source code! — demonstrates. If you’re a bit confused by the following, you might want to consider picking up a copy of my best-selling Wicked Cool Shell Scripts.

The following article originally appeared at MacDevCenter and is reprinted with permission.

Tapping RSS with Shell Scripts

If you’re like me, you want to keep up with the latest news and information.
Shell scripts help me do just that. In this article I’ll show you how
I wrote a shell script that watches the news at Slashdot.org
and automatically shows me the latest story headlines every time I launch
a Terminal application.

First Things First

Before any shell script work begins, the first step is to figure out
the URL of the RSS page on Slashdot.

TIP: RSS is Really Simple Syndication,
an XML-format data stream that’s much more easily parsed
and tracked than HTML pages, at least programmatically.

The Slashdot home page doesn’t make it particularly easy to find, but
the very bottom line, the very rightmost link, is "rss", and
the URL behind that link is http://slashdot.org/index.rss.

To look at it from within the Terminal, I’m going to utilize the powerful
curl application, piping the output to head to ensure that I’m not drowned
in output:

$ curl --silent 'http://slashdot.org/index.rss' | head
<?xml version="1.0" encoding="ISO-8859-1"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"

Yes, this looks fairly scary as output goes, I admit, but with a little
help from the grep utility, this can quickly become a lot more user-friendly.
In this case, let’s just pull out the lines that are tagged as either
the <title> or the <description>:

$ curl --silent "$url" | grep -E '(title>|description>)' | head
<title>Slashdot</title>
<description>News for nerds, stuff that matters</description>
<title>Slashdot</title>
<title>Yahoo To Charge For Search Listings</title>
<description>ibi writes "Yahoo will start taking payments
to "tilt the playing field" for companies that want their
listings given more prominence by Yahoo's search engine. ...</description>
<title>Infinium Labs Threatens HardOCP Again</title>
<description>XBox4Evr writes "In a follow-up from two weeks ago,
Infinium Labs is again threatening the tech web site HardOCP
with legal action. This in itself, is no big ...</description>
<title>SCO Postpones Lawsuit, Now Threatening Two</title>
<description>zzxc writes "In a surprise turn of events, SCO says
that they need more time to prepare an announcement of who
they are going to sue. According to SCO, the ...</description>
<title>Gyroscopic Wireless Mouse</title>

Not bad. In fact, that’s really almost all we need. So let’s turn this
into a shell.

Headlines Only

To turn this command line into a shell script is a breeze: just open
up your favorite Terminal command-line editor (I use vi but I’ve been
trapped in Unix since 1980 so it’s already subverted my neural pathways.
You might prefer pico or even BBEdit or similar) Whichever you choose,
type in the following, a standard shell script preamble:

#!/bin/sh

This tells the operating system that when this particular file is executed,
it should be given to the shell (sh) to be run. Then let’s create a
variable that contains the URL:

url="http://slashdot.org/index.rss"

Now we can reference $url and the entire script has become more portable
and easily modified. The next line is the entire command:

curl --silent "$url" | grep -E '(title>|description>)'

NOTE: If you get a “command not found” error with curl, you might need
to specify a full path. In Panther, the curl command can be found at
/usr/bin/curl in standard installations.

This script produces the output already seen, so let’s make two tweaks
to it so it’s more useful. First off, the first three lines of output,
the Slashdot title and description, never change so it’d be just as
easy to strip them out of the output. This can be done a variety of
ways, but I’m going to turn to the sed command, which has many hidden
powers. One of them is that if you specify the ‘-n‘ flag, by default
it won’t output any of its input. The value of this? Then we can specify
a pattern of some sort and only output those lines that match the pattern.
Like this:

curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p'

Notice the trailing backslash here: rather than have our command pipe
stretch longer and longer, the backslash (which must be the very last
character on the line) let’s me wrap the command to multiple lines and
make it generally more readable.

We’re getting close to trying the script. The only other tweak worth
making is to strip out the <title>, </title>, <description>,
and </description> tags themselves. This too can be done with
sed, in a typically Unix-y fashion:

curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p' | \
sed -e 's/<title>//' -e 's/<\/title>//' -e 's/<description>/   /' \
-e 's/<\/description>//'

The XML tags are effectively stripped out, except the <description>
tag is replaced by two spaces, just for formatting. The result, assuming
you’ve saved this as slash-rss.sh, as I have:

$ sh slash-rss.sh | head -4
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the
playing field" for companies that want their listings given more
prominence by Yahoo's search engine. ...
Infinium Labs Threatens HardOCP Again
XBox4Evr writes "In a follow up from two weeks ago, Infinium Labs
is again threatening the tech web site HardOCP with legal action. This in
itself, is no big ...

This shows the top two stories (4 lines = two titles + two descriptions).
Not bad. Not beautiful, but certainly functional for a first script.

I always spend way too much time fine-tuning scripts to get just the
output I want, so let’s continue working on this to ensure that the
output is more readable, shall we? It’s so easy, you’ll be amazed:

curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p' | \
sed -e 's/<title>//' -e 's/<\/title>//' -e 's/<description>/   /' \
-e 's/<\/description>//' | \
fmt

The results, piped through head again:

$ sh slash-rss.sh | head
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the playing
field" for companies that want their listings given more prominence
by Yahoo's search engine. ...
Infinium Labs Threatens HardOCP Again
XBox4Evr writes "In a follow up from two weeks ago, Infinium
Labs is again threatening the tech web site HardOCP with legal
action. This in itself, is no big ...
SCO Postpones Lawsuit, Now Threatening Two
zzxc writes "In a surprise turn of events, SCO says that they

The problem now is that the head needs to be between the sed invocations
and the fmt command, since we have no way of knowing how many lines
each description is going to produce when fed through fmt. The solution
is to build the next generation of this script!

Headlines, As Many As You Want

The obvious solution is to add a command flag that lets you specify how
many headlines you want: multiply it by two and you’ll know what value
to feed head within the script. Here’s how that looks as part of a shell
script ($# is the number of arguments and $1 is the first argument):

#!/bin/sh
url="http://slashdot.org/index.rss"
if [ $# -eq 1 ] ; then
headarg=$(( $1 * 2 ))  # $(( )) specifies that you're using an equation
else
headarg="-8"  # default is four headlines
fi
curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p' | \
sed -e 's/<title>//' -e 's/<\/title>//' -e 's/<description>/   /' \
-e 's/<\/description>//' | \
head $headarg | fmt

Now I can specify that I only want the top headline, the newest entry
on the Slashdot site, by simply specifying ‘-1‘ when I invoke the script:

$ sh slash-rss.sh -1
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the playing
field" for companies that want their listings given more prominence
by Yahoo's search engine. ...

That’s pretty cool, I think. I could tweak it forever, but let’s stop
here and see how to turn this into a Unix command just like ls and cd.

TIP: You can download
this shell script
in finished form.

Turning It Into a Command

There are two ways to turn a shell script into a command: create an alias
or make the script executable and ensure it’s in your PATH. To create
an alias, if you’re using Bash, an alias can be created like this:

alias slashdot="sh slash-rss.sh"

Then you can see the headlines by just typing slashdot on your command
line.

To make the shell script itself executable, first make sure you’ve saved
it in a directory that’s in your PATH by typing:

$ echo $PATH
/bin:/sbin:/usr/bin:/usr/sbin:/sw/bin:/usr/X11R6/bin:
/usr/local/bin:/Users/dt/bin:/sw/bin

You can see that my PATH includes /Users/dt/bin – that’s where I save
this script and similar. Once it’s in the right place, you’ll need to
make it executable by using the chmod command:

$ chmod +x slash-rss.sh

Optionally, you could rename the script to be a bit more friendly, of
course.

Finally, Having It Auto-Execute Upon Terminal Launch

If you’re running the Bash shell, which you probably are if you’re in
Panther, then it’s a breeze: move to your home directory and append
an invocation of the script to your .bash_login file:

$ cd
$ echo "sh slash-rss.sh -2" >> .bash_login

Make extra sure that you use two >>, not one, on that last command!

Now the next time you start up a Terminal application window, you’ll
see:

Last login: Tue Mar  2 23:09:36 on ttyp3
Welcome to Darwin!
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the playing
field" for companies that want their listings given more prominence
by Yahoo's search engine. ...
Infinium Labs Threatens HardOCP Again
XBox4Evr writes "In a follow up from two weeks ago, Infinium
Labs is again threatening the tech web site HardOCP with legal
action. This in itself, is no big ...
$ 

It’s also worth noting that this use of shell scripts to parse and format
XML has more applications. For example, go to http://www.casino-bookstore.com/ and have a close look at the "Latest Gambling News" box: it’s
using almost an identical script to keep track of the gambling news
XML feed from about.com. Another example? Go to http://www.healthy-bookstore.com/ and look at the medicinenet news feed. Again, it’s using curl and sed to turn the XML data into HTML data.

About the Author: Dave Taylor has been involved with the online world since the early days of the Internet. Author of over 20 technical books, he runs the popular AskDaveTaylor.com tech help site. You can also find his gadget reviews on YouTube and chat with him on Twitter as @DaveTaylor.

Let’s Stay In Touch!

Never miss a single article, review or tutorial here on AskDaveTaylor, sign up for my fun weekly newsletter!
Name: 
Your email address:*
Please enter all required fields
Correct invalid entries
No spam, ever. Promise. Powered by FeedBlitz
Please choose a color:
Starbucks coffee cup I do have a lot to say, and questions of my own for that matter, but first I'd like to say thank you, Dave, for all your helpful information by buying you a cup of coffee!

6 comments on “Can I track an RSS feed with a shell script?”

  1. ZZTech says:
    July 15, 2008 at 10:04 am

    Nice article Dave, as always.
    A question: http://www.casino-bookstore.com/ http://www.healthy-bookstore.com/ don’t seem to have a news feed anymore. Was there a reason that you took them off – or am I missing something.

    Reply
  2. Hiho says:
    July 22, 2007 at 9:45 am

    Hello, Is it possible to add more sites with rss feed than one in the same script?

    Reply
  3. Dave Taylor says:
    April 19, 2007 at 11:59 am

    Your general approach, checking $?, is correct. What’s wrong with your script?

    Reply
  4. Selvam says:
    April 17, 2007 at 7:19 am

    In a shell script I am running a command.If the command is failed after some time I want to run this command untill unless the command is successed.Can any one give some suggessation how to do it?.
    Ans::
    for example
    {
    echo $selvam
    exit 1
    }
    if [ $? -eq 0 ] # this command check whether your command is run successful or not
    then
    echo “your Command run successfully”
    else
    echo “Error occurred while running the command”
    fi

    Reply
  5. Ajit Kumar Sahoo says:
    February 26, 2007 at 11:08 pm

    In a shell script I am running a command.If the command is failed after some time I want to run this command untill unless the command is successed.Can any one give some suggessation how to do it?.

    Reply
  6. jot says:
    May 20, 2006 at 12:00 pm

    Hi, Dave this is nice to have a script that monitors rss feeds, but i think it is even nicer If you can output it on your desktop,for example with conky or torsmo.
    You can also use your gmail account atom feed to monitor your inbox, just wget’in and sed’in. I’d like to ask You how to makke my gmail password maximally secure, when it is stored in a shell script.
    Thank You in advance
    jot

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Recent Posts

  • How Can I Share My Netflix Account With My Friend? [Updated for 2023]
  • How to Check Energy Recommendations on your Windows Laptop
  • How Do I Customize New Tab Windows in Safari for Mac?
  • Can AI-Generated ChatGPT Text Be Accurately Identified?
  • How to Perform a Microsoft Account Security Audit and Checkup

On Our YouTube Channel

How to: Replace a Switchbot Door Sensor Battery

EMEET Luna vs INNOTRIK Studio Bluetooth Speakerphones -- DEMOS & REVIEW

Categories

  • AdSense, AdWords, and PPC Help (106)
  • Amazon, eBay, and Online Shopping Help (163)
  • Android Help (226)
  • Apple iPad Help (147)
  • Apple Watch Help (53)
  • Articles, Tutorials, and Reviews (346)
  • Auto Tech Help (15)
  • Business Advice (200)
  • ChromeOS Help (31)
  • Computer & Internet Basics (779)
  • d) None of the Above (166)
  • Facebook Help (383)
  • Google, Chrome & Gmail Help (188)
  • HTML & Web Page Design (247)
  • Instagram Help (49)
  • iPhone & iOS Help (623)
  • iPod & MP3 Player Help (173)
  • Kindle & Nook Help (99)
  • LinkedIn Help (88)
  • Linux Help (173)
  • Linux Shell Script Programming (89)
  • Mac & MacOS Help (911)
  • Most Popular (16)
  • Outlook & Office 365 Help (33)
  • PayPal Help (68)
  • Pinterest Help (54)
  • Reddit Help (19)
  • SEO & Marketing (82)
  • Spam, Scams & Security (95)
  • Trade Show News & Updates (23)
  • Twitter Help (220)
  • Video Game Tips (66)
  • Web Site Traffic Tips (62)
  • Windows PC Help (947)
  • Wordpress Help (206)
  • Writing and Publishing (72)
  • YouTube Help (47)
  • YouTube Video Reviews (159)
  • Zoom, Skype & Video Chat Help (62)

Archives

Social Connections:

Ask Dave Taylor


Follow Me on Pinterest
Follow me on Twitter
Follow me on LinkedIn
Follow me on Instagram


AskDaveTaylor on Facebook



microsoft insider mvp


This web site is for the purpose of disseminating information for educational purposes, free of charge, for the benefit of all visitors. We take great care to provide quality information. However, we do not guarantee, and accept no legal liability whatsoever arising from or connected to, the accuracy, reliability, currency or completeness of any material contained on this site or on any linked site. Further, please note that by submitting a question or comment you're agreeing to our terms of service, which are: you relinquish any subsequent rights of ownership to your material by submitting it on this site. Our lawyer says "Thanks for your cooperation."
© 2023 by Dave Taylor. "Ask Dave Taylor®" is a registered trademark of Intuitive Systems, LLC.
Privacy Policy - Terms and Conditions - Accessibility Policy