SOCR LetterFrequencyData

From SOCR
Revision as of 16:34, 1 November 2009 by IvoDinov (talk | contribs) (Data Table: minor formatting)
Jump to: navigation, search

SOCR Data - Letter Frequency Data

Data Description

The data table below present the average frequencies of the 26 most common Latin letters for different languages. Letter frequenciess in text is studied in cryptography. There is no exact letter frequency distribution underlies a given language, since all writers write slightly differently. Modern International Morse code encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order. Similar ideas are used in modern data-compression techniques such as Huffman coding.

Letter frequencies, like word frequencies, tend to vary by writer, subject and language. Accurate average letter frequencies are obtained by analyzing large amounts of representative text.

Sources

Data Table

Letter English French German Spanish Portuguese Esperanto Italian Turkish Swedish Polish Toki_Pona Dutch Avgerage
a 0.08 0.08 0.07 0.13 0.15 0.12 0.12 0.12 0.09 0.08 0.17 0.07 0.11
b 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.03 0.01 0.01 0.00 0.02 0.01
c 0.03 0.03 0.03 0.05 0.04 0.01 0.05 0.01 0.01 0.04 0.00 0.01 0.03
d 0.04 0.04 0.05 0.06 0.05 0.03 0.04 0.05 0.05 0.03 0.00 0.06 0.04
e 0.13 0.15 0.17 0.14 0.13 0.09 0.12 0.09 0.10 0.07 0.07 0.19 0.12
f 0.02 0.01 0.02 0.01 0.01 0.01 0.01 0.00 0.02 0.00 0.00 0.01 0.01
g 0.02 0.01 0.03 0.01 0.01 0.01 0.02 0.01 0.03 0.01 0.00 0.03 0.02
h 0.06 0.01 0.05 0.01 0.01 0.00 0.02 0.01 0.02 0.01 0.00 0.02 0.02
i 0.07 0.08 0.08 0.06 0.06 0.10 0.11 0.08 0.05 0.07 0.15 0.07 0.08
j 0.00 0.01 0.00 0.00 0.00 0.04 0.00 0.00 0.01 0.02 0.03 0.01 0.01
k 0.01 0.00 0.01 0.00 0.00 0.04 0.00 0.05 0.03 0.03 0.05 0.02 0.02
l 0.04 0.05 0.03 0.05 0.03 0.06 0.07 0.06 0.05 0.03 0.10 0.04 0.05
m 0.02 0.03 0.03 0.03 0.05 0.03 0.03 0.04 0.04 0.02 0.04 0.02 0.03
n 0.07 0.07 0.10 0.07 0.05 0.08 0.07 0.07 0.09 0.05 0.12 0.10 0.08
o 0.08 0.05 0.03 0.09 0.11 0.09 0.10 0.02 0.04 0.07 0.08 0.06 0.07
p 0.02 0.03 0.01 0.03 0.03 0.03 0.03 0.01 0.02 0.02 0.04 0.02 0.02
q 0.00 0.01 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
r 0.06 0.07 0.07 0.07 0.07 0.06 0.06 0.07 0.08 0.04 0.00 0.06 0.06
s 0.06 0.08 0.07 0.08 0.08 0.06 0.05 0.03 0.06 0.04 0.04 0.04 0.06
t 0.09 0.07 0.06 0.05 0.05 0.05 0.06 0.03 0.09 0.02 0.05 0.07 0.06
u 0.03 0.06 0.04 0.04 0.05 0.03 0.03 0.03 0.02 0.02 0.03 0.02 0.03
v 0.01 0.02 0.01 0.01 0.02 0.02 0.02 0.01 0.02 0.00 0.00 0.03 0.01
w 0.02 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.03 0.02 0.01
x 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
y 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.03 0.01 0.03 0.00 0.00 0.01
z 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.02 0.00 0.05 0.00 0.01 0.01
total 1.00 0.97 1.00 1.00 1.00 0.98 1.00 0.88 0.94 0.80 1.00 1.00 0.96



Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif