AN ARTIFICIAL INTELLIGENCE EXPERIMENT USING GOOGLE'S GEMINI, PART ONE: FROM MARILYN MONROE TO PETER NOONE
This blog post originated when I was sitting on the toilet (perhaps too much detail there) reading a newspaper's review of a biography of Marilyn Monroe - James Patterson and Imogen Edwards-Jones' The Last Days of Marilyn Monroe. The book review wasn't a favourable one, not least because the reviewer, Anne Billson, felt there was no need for yet another biography of Marilyn Monroe. She noted there have been “more than 3,000 books” already written about Marilyn Monroe.
That figure sparked a thought about who has had the most biographies (including autobiographies) written about them, which I thought would make for an interesting blog post. But the plot thickened when I checked how many biographies had been written about Marilyn Monroe. Using Google's artificial intelligence (AI) search tool, Gemini, I came across two very different answers and none of which came close to the 3,000 books that Anne Billson mentioned. My first search came up with 600 books, and my second search, a few days later, came up with “"at least 37 books” with Goodreads cited as the source for the figure.
I was confused. Surely given that a book can be easily counted and given that most published books have ISBN numbers so are very easy to trace and count, why were there such discrepancies in the number of biographies written about Marilyn Monroe?
I decided to change tack with my planned blog post. Though still interested in who has the most books written about them, I decided that I would only use AI, and specifically Google's Gemini, to determine my league table.
Going back to that book review, it was published in The Times, dated 19 July 2025. It's only now, seven months later, and after well over 200 pages of notes being compiled that I've completed my experiment. To make things more manageable, and more readable, I'm going to have two parts to the blog post - this first part outlines the design of my AI experiment, and the second part will reveal the results of my experiment.
Also to help reduce the length of the two parts, I'm going to do what historians do and use footnotes or rather endnotes. I'll periodically cite an endnote in this blog post so that those who are interested can read the endnote as a comment posted below rather than clog up this blog post with words. I don't want to be accused of being wordy!
So my plan was to compile a league table of lived and living, that's real, persons who have had most biographies (including autobiographies) written about them using Google's Gemini AI tool as the basis for the league table - see Endnote 1 on non-lived/living persons.
To avoid discrepancies, I asked the same question to Google's Gemini. The search question I asked was 'the number of biographies and autobiographies about' and then I added the name of a person at the end of my search request. I was very consistent with this as I know that slightly differently worded questions with the same meaning yield different search results.
For this experiment, I sampled and therefore searched well over 10,000 names of people (currently standing at 10,474). My sample is very large - see Endnote 2 on sampling.
My searches took me far and wide and all over the world, from politicians to porn stars, from inventors to conquistadors, from musicians to criminals, from business leaders to playwrights, from novelists to engineers, from artists to soldiers, from scientists to courtesans, from religious leaders to bankers, from kings and queens to spies, from sportspersons to comedians, from philosophers to poets, and from actors to architects. My first search obviously was for Marilyn Monroe, and my last search less obviously was for Peter Noone the lead singer of the English 1960s' pop band Herman's Hermits.
In my searches, I went down some serious rabbit holes. Eastern Europe was a big rabbit hole, as was the Hollywood rabbit hole. But European empires were by far the biggest rabbit holes, it was one humongous rabbit warren!
There was a time that I thought I may not finish this project as I was thinking of more people to search quicker than I could search for people. Also there was a time knowing when to stop - how long is a piece of string? I stopped searching when the results I got about people consistently ceased troubling the leaderboard. I'm confident as I can be that I've included all those that have had many and certainly most books written about them - see Endnote 3 on what is a book and Endnote 4 on books written in different languages.
After these caveats, my retirement project was a quantitative and not a qualitative research project. I wanted numbers of books written about people I searched for. I naively assumed from the outset that AI, and Google's Gemini AI tool in particular, would be geared up at producing numbers. I was asking it to count the number of books and not to read the books. I felt I was challenging AI on its own terms. Google's Gemini was very keen on giving its qualitative opinions on books it had found - see Endnote 5 on Gemini adjectives. I only wanted Gemini to count!
What people do you think will have most books written about them?
My top twenty before doing this experiment, for what's it worth, was and in order: Jesus Christ, Prophet Muhammad, William Shakespeare, Napoleon Bonaparte, Adolf Hitler, Donald Trump, John F Kennedy, Winston Churchill, Queen Victoria, Queen Elizabeth II, George Washington, Joseph Stalin, Nelson Mandela, Marilyn Monroe, Confucius, Abraham Lincoln, Leonardo da Vinci, Michelangelo, Karl Marx and Albert Einstein.
This list isn't the same as the most significant, influential or powerful figures in history. Number of books published is just one metric and it favours certain people over others. The metric certainly favours the West where the printing press was invented and first rolled out.
The second part of this blog post, to be posted soon, will reveal the search results and provide a list of people who have had the most books written about them according to Google's Gemini. The results are very surprising but also very disappointing.


ENDNOTE 1: NON-LIVED/LIVING PERSONS
ReplyDeleteI'm only interested in real lived or living persons, so there's no in this experiment, presented in surname alphabetical order wherever possible: Adam and Eve, Aphrodite, King Arthur, Ted Baker, Jack Bauer, Adam Bede, James Bond, Madame Bovary, David Brent, Charlie Brown, Harry Callahan, Santa Claus, Frank Columbo, Lee Cooper, Betty Crocker, Lara Croft, Frasier Crane, Robinson Crusoe, Dan Dare, Dirk Diggler, Eliza Doolittle, Moll Flanders, Jessica Fletcher, Phileas Fogg, Dorothy Gale, Don Giovanni, Dorian Gray, Rachel Green and Ross Geller, Charlie Hebdo, Helen of Troy, Heracles and Hercules, William Sherlock Scott Holmes, Dr Henry Jekyll and Mr Edward Hyde, Bridget Jones, Davy Jones, Clark Kent and Lois Lane, Kunta Kinte, Aitana López, Ned Ludd, Perry Mason, Ally McBeal, Romeo Montague and Juliet Capulet, Hilda Ogden, Scarlett O'Hara, Alan Partridge, Emma Peel, Emily Pellegrini, Dorothy Perkins, Hercule Poirot, Mary Poppins, Harry Potter, Austin Powers, Don Quixote, Roy Race, Eleanor Rigby, Romulus and Remus, Sienna Rose, George Smiley, Tony Soprano, Anastasia Steele and Christian Grey, Ann Summers, William Tell, Friar Tuck and Maid Marian, Bruce Wayne, J D Wetherspoon, and Danny Zuko. I'm sure there are books written about these fictional characters. I must admit that I thought Ned Ludd, Friar Tuck and Maid Marian, all based in Nottinghamshire, were real and not fictional characters. After seeing the *Roots* television series, I also thought Kunta Kinte was real, but I now realise Alex Haley, who never set foot in The Gambia from where the alleged Kunta Kinte was enslaved to trace his roots, was largely making things up and it was just a novel he wrote! I felt a bit conned, but Gambians still dine out on his tale! And there have been figures whom I thought were fictitious that turned out to be real figures like Jimmy Cricket, a Northern Irish comedian, and Lazarus of Bethany, the one bought back from the dead by Jesus Christ! And I'm still at a loss whether Baron Munchausen, after whom the Munchausen Syndrome was named, was a real person or a fictional character.
ENDNOTE 2: SAMPLING
ReplyDeleteMy sampling method was a mix of purposive sampling and snowball sampling. I wasn't interested in constructing a representative sample. I wanted to compile a definitive list of all people, whether historical or contemporary, who had the most books written about them. I was sampling for that purpose, hence the purposive sampling term. As a Brit, I realised my search list would be skewed - I know far more British, other European and American people potentially likely to write autobiographies or be a subject of biographies than, say, African, Asian and South American people. Also, because of my interests, I was likely to capture more artists, film and television stars, politicians, musicians, scientists and sportspersons. I tried very hard to correct theses biases - I did have many months to eliminate my biases.
And that's where snowball sampling came in. Snowball sampling is when you stumble across someone as a result of your initial sampling and you follow the snowball. A lot of people in my sample, my guess is about a half, were snowball-sampled.
ENDNOTE 3: WHAT IS A BOOK
ReplyDeleteOne of the first issues I encountered was what constituted a biography or an autobiography of a person. I had no problem including unofficial biographies or ghostwritten autobiographies (which are effectively official biographies). But I had more problems about pamphlets and journal/magazine articles. I decided a pamphlet is a book and a journal/magazine article isn't a book even though journal articles are often longer in word length than a pamphlet. A pamphlet is stand-alone unlike a journal/magazine article. On this basis, I excluded encyclopaedia entries (including Wikipedia entries) as books.
I included self-published books in my count though I'm not confident that Google's Gemini could easily find such books.
Perhaps controversially, I excluded audiobooks and ebooks, plus films and television documentaries. I know I'm coming across very legacy, and how I hate that dismissive term. But I wanted to count hard copy books.
Probably my biggest quandaries were exhibition catalogues and autobiographical novels. I decided to include both exhibition catalogues and autobiographical novels, which meant certain artists and novelists got a higher score than perhaps they should have done. It's difficult to draw methodological lines around categories particularly as many novels, especially first novels, contain autobiographical detail. Qualitative judgement is often required when doing quantitative research.
ENDNOTE 4: BOOKS WRITTEN IN DIFFERENT LANGUAGES
ReplyDeleteI didn't care whether books were written in the English language or not. A book is a book regardless of the language it's written in.
This issue reminded me of an issue that international students, whose first language wasn't English, often used to ask me when I was a university lecturer. They would ask whether it's okay to cite a source not written in the English language in their essays. I always said that it's fine to cite non-English sources as if knowledge written in the English language only counts as knowledge.
But throughout my searches, the language issue became a problem and perhaps a big problem. It was never clear from my searches that Google's AI was picking up books only written in the English language or whether it was picking up books written in any languages. I suspect Gemini missed many books written in a language other than English, thus leading to bias and error in the final count.
ENDNOTE 5: GEMINI ADJECTIVES
ReplyDeleteThough I expected AI to be better at quantitative research (numbers) than qualitative research (words), Google's Gemini was very free about giving its qualitative opinions about its quantitative findings. It frequently used the following qualitative adjectives to describe its quantitative findings: academic, adult, authentic, authoritative, best-selling, brief, clear, commercial, complete, comprehensive, contemporary, critical, critically acclaimed, dedicated, definitive, direct, formal, full, fully researched, generally accepted, independent, intellectual, introductory, juvenile, lesser-known, main, mainstream, major, notable, primary, prominent, scholarly, significant, traditional, universally recognized, well-known, well-regarded, widely documented, widely known and widely recognised. Given that AI cannot read as humans do, just where does Google's Gemini get these qualitative adjectives from? They're quite telling.