Why do some books, as simple as they may be, succeed in becoming worldwide sensations? Do their authors treat the language differently? How do printed symbols lure us into epic worlds? I had to dig in.
I picked the most successful book series of the last 20 years and applied text mining techniques, seeking for patterns and, well, a way to reverse engineeran author’s mind while writing.
I analyzed the Harry Potter books by J.K.Rowling, the Game of Thrones books (ok nerd, “A Song of Ice and Fire”) by George R. R. Martin, the Hunger Gamestrilogy by Suzanne Collins and the Lord of the Rings trilogy + Hobbit by J. R. R. Tolkien.
4 authors. 19 books. 3,896,568 words.
Contents
- Common phrases
- Top nouns
- Top verbs
- Top adverbs
- Top adjectives
- Lexical density
- Understandability
1) Phrases
The first thought while messing with natural language processing on books, is to isolate the most frequent phrases, usually found in bigrams, trigrams..n-grams. You may not find many common phrases among authors, but you get a hint about the story and the significance of some key concepts, such as the ring in LOTR or the arena in Hunger Games. Displayed are the top 2-grams to 7-grams.
