This video shows the first (or fourth, depending on your point of view) Star Wars film — Episode IV: A New Hope — recut so that its dialogue is sorted into alphabetical order. Each word is displayed on screen along with a count of its occurrences in the script. It runs 43 minutes and change, although you probably wouldn’t want to sit through the entire thing:
Everyone associates lightsabers with Star Wars, so you might be surprised that the word is used only once in the film:
Even Wedge — one of the few rebels who survives the original trilogy — gets mentioned more than lightsabers in the first film:
This is a project you wouldn’t want to do purely manually. If I were assigned this task, I’d write a program that would make use of the subtitle file to identify and sort every word in the dialogue, and the approximate time — give or take some fractions of a second — when each word is uttered. Courtesy of the people behind the Matroska file format for videos, here’s a sample of a subtitle file, which should give you an idea of the information they hold. Oddly enough, it features dialogue from another Star Wars film:
00:02:17,440 –> 00:02:20,375
Senator, we’re making our final approach into Coruscant.
00:02:20,476 –> 00:02:22,501
Very good, Lieutenant.