How do I find out what searches people did to end up on my Web site?
I've been trying to figure out whether there's a way that I can automate digging through the "referrals" on my site so I can see what searches people did to end up on one of my Web site pages. I'm running a Linux server and have Apache installed, so I get a huge log file with tons of info. But what I'd love is a simple script that will let me get email once a week with a sorted list of what searches people did to get to me. Doable?
There are lots of great applications that you can install on your server to get traffic statistics, programs that are going to do a far better job letting you visualize what's going on than anything you can cobble together in a shell script. Further, there are also great utilities like Google Analytics that are free and quite easy to hook in (see: adding Google Analytics to your Web site).
You had a pretty specific request, however, so let's have a look at how we could dig through the Apache log file to identify which hits are directly from Google and then how to extract them so that you get a clean summary in your mailbox.
First off, to have something run on a regular schedule, we'll use the cron facility in Linux. It's one of the very best features of a Linux system and if you have a Linux system, learning crontab is time very, very well spent.
But let's start at the beginning. You'll need to find where Apache is storing your log files, then you can just start out by searching for "google.com" with "grep". The output lines are llooonnnggg:
$ grep google.com /home/www/logs/askdavetaylor.com-access_log | head -1
22.214.171.124 - - [01/Jun/2009:18:01:00 -0600] "GET /how_does_ebay_actually_work.html HTTP/1.1" 200 34599
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:126.96.36.199) Gecko/2009042316 Firefox/3.0.10"
(I've added line breaks so it's more readable, but the output is one long line in reality)
As you can see, there are many fields in this output, separated by spaces. If you count, space by space, you'll see that the REFERRER field is #11, so we can isolate it by using the "cut" command:
$ grep google.com /home/taylor/www/logs/askdavetaylor.com-access_log | cut -f11 -d\ | head -1
That's a bit more readable. Now let's go further and observe that Google queries are name=value pairs separated by an ampersand (as are, of course, all CGI query URLs). Let's break the URL down and see what we get:
$ grep google.com /home/taylor/www/logs/askdavetaylor.com-access_log | cut -f11 -d\ | head -1 | tr '&' '\012'
One more step and I think, by George, we have something:
$ grep google.com /home/taylor/www/logs/askdavetaylor.com-access_log |cut -f11 -d\ | head -1 | tr '&' '\012' | grep "q="
One heck of a command for a small bit of output, but once we tweak the "head -1" which has let us just work with one match, we can now quickly see, say, the 20 most recent searches ("head -20"):
Uh oh, looks like that "grep" pattern isn't sufficiently isolating. Instead we'll try "^q=" and the results are more what we seek:
Interesting, but what about getting a useful report from it? We need to clean things up a bit (remove the "q=" and replace '+' with ' ') and we need to sort and tally things so that we can see the most common searches rather than every single search. This is done with "sed" and the power combination of "sort | uniq -c | sort -rn":
$ grep google.com /home/taylor/www/logs/askdavetaylor.com-access_log |cut -f11 -d\ | head -20 |
Still a few things to tweak, but let's finally strip out that "head" and look at all the searches people have done to get to the site:
108 convert wma to mp3 43 myspace at school 34 windows security alert 34 how to convert wma to mp3 33 virtual memory too low 26 how do i delete my myspace 26 google address book 26 comcast remote codes 24 converting wma to mp3 23 how to install windows on mac
Nice. That's great information and ready to use. At least, ready enough for this quick and dirty solution.
My resultant script, when I take the command sequence and drop it into a Bourne shell script file, is:
# Referrrers - shell script generates an email of popular referrer searchs from Google:
echo "Log file analysis for $(basename $logfile):"
grep google.com $logfile | \
cut -f11 -d\ | tr '&' '\012' | \
grep "^q=" | sed 's/q=//;s/+/ /g' | \
sort | uniq -c | sort -rn | head -$max
Now, finally, use "crontab -e" to add a line to cron that invokes this new script on a weekly basis. It brings up your favorite $EDITOR with your cron file within - if you have one. Crontab entries are in the form: minute, hour, day-of-month, month, day-of-week, command, so lets pick midnight on Mondays as our desired date and time.
In crontab, that looks like:
0 0 * * Monday command
There are two ways we can structure the command itself. We can just invoke the script, in which case the script itself will have to deal with turning the output into an email message, or we can do that within the crontab entry itself:
sh $SCRIPTS/referrers.sh | mail -s "Referrer report" taylor
That's all there is to it. Make sure "SCRIPTS" is defined earlier in the crontab file, save and quit the edits, and you're done. Tuesday morning you'll have a report in your inbox.
✔ How to Create Predefined Google Image Search Links?
Thanks for the Amazon URL [see Creating Amazon Search Links]. That worked beautifully. In fact, I sent you $5.00 for coffee in thanks....✔ Can I embed a Facebook search box on my blog site?
I've seen your articles about how to add a Twitter or Google search box on a Web page, but I have a tougher...✔ Can I use CSS for drop shadows on my blog?
I want to give my site a bit of a facelift and add some neat graphical elements. One of which is drop shadows....✔ How can I embed interactive photo panoramas on my site/blog?
I read through your blog entry about how to take panoramic photos with iOS 6 and an iPhone 5 and got enthused. I've...✔ How can I create a Twitter search URL shortcut?
I'd like to add a few Twitter search links to my Web site. Is that possible, or does Twitter prohibit this sort of...
Let's stay in touch!
Sign up for my weekly AskDaveTaylor Newsletter and you'll receive even more tech and gadget help right to your inbox, along with exclusive news and industry updates. It's good stuff. I promise!
I do have a comment, now that you mention it!
Check This Out Too...
Look for Answers
All Our Categories
Apple iPad Help
Articles and Reviews
Auctions and Online Shopping
Blogs and Blogging
Building Web Site Traffic
Business and Management
Computer and Internet Basics
d) None of the Above
Google Gmail Help
Google Plus Help
Industry News and Trade Shows
iPhone and Cell Phone Help
iPod, Sony PSP and MP3 Player Help
Kindle Fire Help
Mac OS X Help
Pay Per Click (PPC) Advertising
Search Engine Optimization (SEO)
Shell Script Programming
Tech Support Video Help
The Writing Business
Twitter, LinkedIn and Social Network Help
Unix and Linux Help
Video Game Tips and Help
Windows PC Help
Find Me on Google+
ADT on G+