Grammar Cracker: Bots with a Limited Vocabulary

There was an article in the New York Times recently that pointed out an interesting trend: newspapers are beginning to alter their prose style to entice the search engine bots of Google, Yahoo, and MSN.

One of the biggest changes involves headline-writing. In order to attract readers, a headline writer often tries to use a witty quip or a clever play of words. Literary or cinematic allusions and puns are common. But such nuances are lost on machines. A bot is trying to figure out the content of an article as quickly as possible, and wordplay just gets in the way. This dilemma is known in A.I. circles as “the problem of synonymity.” When a writer pens the line, “A horse of a different color,” a machine doesn’t know that she’s not talking about horses. The bot might accidentally slot that story into the sports section, even if the piece is actually about politics.

Granted, most newsbots are capable of very sophisticated language-processing techniques that parse complex word relationships. But not always. The upshot is that many news web sites – including the BBC – have begun to put two different headlines on each article. One is literary and intended to draw in human readers; the other is terse and written for bots.

Technology has always affected the way journalists write. The advent of the telegraph created the inverted-pyramid style: Since journalists weren't sure how much text they'd be able to transmit before the fragile and expensive line went dead, they wrote the most crucial facts in the first paragraph or two, and less-critical ones as they went on. If they were cut off after 60 words, the gist of the story would still be there. Now, as the Times article notes, search engine algorithms may drive even more changes in how we write.

Grammar Cracker

Thursday, April 20, 2006

Bots with a Limited Vocabulary

0 Comments:

About Me

Previous Posts