My boss is traveling and sent me a Web page to review but when I open it up, it’s a massive jumble with text on top of other text. Completely unreadable. How can I decipher it so I can respond without a request for help?
While the HTML (hypertext markup language) started out as a very basic markup that let you add links, put words in bold and break up text into paragraphs, as time has passed it’s gotten quite a bit more sophisticated. But in my experience, where this can all be a real mess is when a page uses separate “containers” for different content, but then can’t find the source for those frames, as it’s called. It’s written to have the top contain one item, the left side another, and the main material in the middle, but if that’s not accessible, things can collapse or end up overwriting each other. Then add external style sheets (known as “.css” files for the cascading style sheet spec they contain) that could be missing in action and it’s a recipe for chaos and disaster!
I encountered a similar situation while helping my daughter with a class assignment in her literature class. She was supposed to read an excerpt of the great book The Adventures of Don Quixote of La Mancha, so saved the HTML source file (actually, it was ‘.htm’ but it’s the same thing) and then emailed it to me as an attachment. Since the source is behind a login, I couldn’t just go to the page on my computer to grab a clean copy, so instead was stuck looking at this when I used the Mac preview feature:
Clearly not pretty at all, but maybe Google Chrome (my default browser, which is why it’s showing up on the button above) can do a better job of opening it and displaying the contents?
Nope. Sadly it too is completely baffled:
So what can I do? Turns out that there are some pretty neat capabilities that can do a great job of basically ignoring all of the messed up HTML markup codes and just dump out the text itself. My favorite is within Microsoft Word, which has a remarkably sophisticated filter system.
Fire up Microsoft Word if you have it, then go to the Edit menu:
That “Paste Special…” is like a secret superpower of the Microsoft Word app and well worth remembering. Choose it after you’ve gone into the Web browser, selected the contents of the jumbled page, and copied it all. Easiest is to click in the window, then choose Edit > Select All then Edit > Copy. On the Mac the shortcuts are Command-A then Command-C.
Here’s what the Paste Special window looks like:
As you can see, it’s detected that the material in the Clipboard is in HTML format and by default offering to paste it into the Word document, translating HTML format into Word format as appropriate. But that’s not what you want! Instead choose “Unformatted Text” and it’ll be pasted clean, clear and without any formatting whatsoever:
Now it’s just a matter of going through the lines of text content to find what you want. In my case, it was about half way down page two of this newly readable page that I found the content my daughter wanted me to peruse related to Don Quixote.
Oh, and if you’re on a Mac and don’t have Microsoft Word, you can always use “Paste and Match Formatting” in Stickies, of all apps, to get a similarly unscrambled result:
There ya go, a couple of ways to unravel the mystery of the page your boss sent. Now, is it worth reading? That I can’t help you with!