Ask us a question!

Web Moves Blog

Web Moves News and Information


Use With Caution: Google Books Ngram Viewer

With Goggle introducing its new tool, the Google Books Ngram Viewer several days ago, many were enthusiastic about this being an ultimate feature to use in etymological research. After all, the Ngram Viewer allowed to search millions of books (Google books, of course) and then check, track, and analyze the appearances of any word throughout many centuries.

The users were enthusiastic at first, but it turned out that the tools is far from perfect. According to recent review, there are many problems and inaccuracies in Ngram Viewer reports – both expected and unexpected. A very basic issue is the OCR – Optical Character Recognition. Even for modern books and fonts, there are occasional mistakes that occur, best OCR programs report just below 1% percent error margin for a text of recognized words. For books from the 16th and 17th centuries, with the artistic fonts this margin is sure to be higher. One example is the letter “s” confused with “f” on numerous occasions.

Another problem observed is that for the first occurrence, as Google Books NGram Viewer does not take into account the developing of language over time, thus you have to research several forms of the world used throughout the centuries to find the actual first usage. And also, there are the reprints. Many Google books are labeled with the year of their print, instead of the year of the original manuscript, making the search produce more hits for “recent” years.

Overall, Google Books Ngram Viewer is not bad. It is just not as reliable as one could think it is. Suitable for occasional queries, it cannot be considered as reliable tool in serious academic research.