Advanced | Help | Encyclopedia
Directory


Letter frequencies

The frequency of letters in text messages has often been studied for use in cryptography, and frequency analysis in particular. An exact analysis of this is not possible, as each person writes slightly differently; however, an approximate order of frequency is ETAOIN SHRDL UCMFG YPWBV KXJQZ.

An analysis based on all the words in the Cambridge Encyclopedia gave a word frequency list quite unlike that which shows up in most lists. From most common to least common, it gave EATIN ORSLH DCMUF PGBYW VKXJZQ. Note that more A's appeared than T's. The author stated that the variance from standard lists could be due to the many foreign words often repeated within articles. Note, too, that the frequency of X is greater in this work than that of J.

This brings up an interesting point. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. You cannot talk about x-rays without using frequent x's, and you cannot use any letter if on your keyboard it is broken. Letter, digraph, trigraph, and word frequencies can be used to prove or disprove authorship. Things like average word and sentence length is also used. Everyone writes differently. Hemingway is not Faulkner, and so on. A precise average usage could only be gleaned by analyzing usage in, say, a number of different chatrooms, or, say, by covertly checking email, or something of that order using a huge mass of differing inputs.

Relative Frequencies of Text

Relative frequencies of text.
By Letter By Frequency
Letter Frequency Letter Frequency
a0.08167e0.12702
b0.01492t0.09056
c0.02782a0.08167
d0.04253n0.06749
e0.12702h0.06094
f0.02228o0.07507
g0.02015i0.06966
h0.06094r0.05987
i0.06966c0.02782
j0.00153u0.02758
k0.00772w0.02360
l0.04025f0.02228
m0.02406g0.02015
n0.06749y0.01974
o0.07507p0.01929
p0.01929b0.01492
q0.00095s0.06327
r0.05987d0.04253
s0.06327l0.04025
t0.09056m0.02406
u0.02758v0.00978
v0.00978k0.00772
w0.02360j0.00153
x0.00150x0.00150
y0.01974q0.00095
z0.00074z0.00074


Top 10 Beginning of Word Letters

LetterFrequency
t0.1594
a 0.155
i 0.0823
s 0.0775
o 0.0712
c 0.0597
m 0.0426
f 0.0408
p 0.040
w 0.0382

Top 10 End of Word Letters

LetterFrequency
e0.1917
s 0.1435
d 0.0923
t 0.0864
n 0.0786
y 0.0730
r 0.0693
o 0.0467
l 0.0456
f 0.0408

Most Common Digrams (in order)

th, he, in, en, nt, re, er, an, ti, es, on, at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em, ro.

Most Common Trigrams (in order)

the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men

See Also

ETAOIN SHRDLU








Links: Addme | Keyword Research | Paid Inclusion | Femail | Software | Completive Intelligence

Add URL | About Slider | FREE Slider Toolbar - Simply Amazing
Copyright © 2000-2008 Slider.com. All rights reserved.
Content is distributed under the GNU Free Documentation License.