Main      Site Guide    
Fun With Words

Letter Frequencies


Etaoin Shrdlu is a somewhat infamous phrase among language enthusiasts. It is pronounced "eh-tay-oh-in shird-loo" and is believed to be the twelve most common letters in English, in order of most frequently used to least frequently used. The expression came about from linotype typesetting machines. Were one to run a finger down the first and then second left-hand vertical banks of six keys on a linotype machine, it would produce the words etaoin shrdlu. Linotype machines were sometimes tested in this manner. Once in a while, a careless linotype machine operator would fail to throw his test lines away, and that phrase would mysteriously show up in published material. The full sequence is etaoin shrdlu cmfgyp wbvkxj qz.

When analyzing the frequency with which letters appear in English, it's important to understand whether you are factoring in the frequency with which individual words are used. For example, the letter h is not found in a comparatively large number of English words, but as it appears in several of the most commonly used words, such as the, then, there, and that, it appears more often in every day speech and writing than it does in a list of dictionary words. The "etaoin shrdlu" sequence given above is based on the frequency of letters as they appear in speech and writing.

The following chart measures the frequency of letters as found in several thousand English words, without giving preference to the most commonly used words. The first table shows the frequency of letters as they appear in 18584 base words -- no plural words or words with common suffixes are included in the list. The second table analyzes a larger pool of words, which includes plurals and words with common suffixes. (Notice how the letter s appears more frequently when considering plural words, and how the letters i, n, and g appear more frequently when considering verbs ending in ing.)

Analysis of 18584 Common Base Words

 

# of Occurrences

# of Words

e1678211.42%1199164.52%
a125748.56%1005054.08%
i116747.94%936450.39%
r110427.51%933750.24%
t109597.46%892948.05%
o104667.12%825944.44%
n94136.41%794842.77%
s81545.55%685936.91%
l81145.52%688237.03%
c69684.74%602832.44%
u53733.66%491026.42%
p48093.27%428323.05%
m47353.22%424122.82%
d45963.13%418622.52%
h40582.76%372420.04%
g33802.30%306116.47%
b31212.12%291815.70%
y29382.00%281515.15%
f21571.47%189910.22%
v15741.07%15318.24%
w13880.94%13287.15%
k12350.84%11836.37%
x5070.35%5052.72%
z3560.24%3091.66%
q3430.23%3431.85%
j2200.15%2171.17%

Analysis of 45406 Common Words

 

# of Occurrences

# of Words

e4268911.74%3025466.63%
i314508.65%2387552.58%
s296398.15%2269749.99%
a289657.97%2340851.55%
r270457.44%2264249.87%
n269757.42%2164447.67%
t245996.76%2004044.14%
o215885.94%1777639.15%
l194715.35%1628935.87%
c150024.13%1314228.94%
d138493.81%1233427.16%
u117153.22%1089423.99%
g103392.84%942620.76%
p100632.77%895219.72%
m98032.70%887119.54%
h78082.15%737216.24%
b73682.03%688015.15%
y60051.65%588112.95%
f49261.35%43859.66%
v39711.09%38848.55%
k32090.88%30916.81%
w30730.85%29976.60%
z16310.45%15553.42%
x10530.29%10462.30%
j7270.20%7271.60%
q6820.19%6811.50%