SOCR LetterFrequencyData

From SOCR
Revision as of 14:45, 31 May 2010 by IvoDinov (talk | contribs) (Data Description: typo)
Jump to: navigation, search

SOCR Data - Latin Letters Frequency Distributions in Different Languages

Data Description

The data table below present the average frequencies of the 26 most common Latin letters for different languages. Letter frequencies in text is studied in cryptography. There is no exact letter frequency distribution underlies a given language, since all writers write slightly differently. Modern International Morse code encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order. Similar ideas are used in modern data-compression techniques such as Huffman coding.

Letter frequencies, like word frequencies, tend to vary by writer, subject and language. Accurate average letter frequencies are obtained by analyzing large amounts of representative text.

Sources

Data Table

Letter English French German Spanish Portuguese Esperanto Italian Turkish Swedish Polish Toki_Pona Dutch Avgerage
a 0.08 0.08 0.07 0.13 0.15 0.12 0.12 0.12 0.09 0.08 0.17 0.07 0.11
b 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.03 0.01 0.01 0.00 0.02 0.01
c 0.03 0.03 0.03 0.05 0.04 0.01 0.05 0.01 0.01 0.04 0.00 0.01 0.03
d 0.04 0.04 0.05 0.06 0.05 0.03 0.04 0.05 0.05 0.03 0.00 0.06 0.04
e 0.13 0.15 0.17 0.14 0.13 0.09 0.12 0.09 0.10 0.07 0.07 0.19 0.12
f 0.02 0.01 0.02 0.01 0.01 0.01 0.01 0.00 0.02 0.00 0.00 0.01 0.01
g 0.02 0.01 0.03 0.01 0.01 0.01 0.02 0.01 0.03 0.01 0.00 0.03 0.02
h 0.06 0.01 0.05 0.01 0.01 0.00 0.02 0.01 0.02 0.01 0.00 0.02 0.02
i 0.07 0.08 0.08 0.06 0.06 0.10 0.11 0.08 0.05 0.07 0.15 0.07 0.08
j 0.00 0.01 0.00 0.00 0.00 0.04 0.00 0.00 0.01 0.02 0.03 0.01 0.01
k 0.01 0.00 0.01 0.00 0.00 0.04 0.00 0.05 0.03 0.03 0.05 0.02 0.02
l 0.04 0.05 0.03 0.05 0.03 0.06 0.07 0.06 0.05 0.03 0.10 0.04 0.05
m 0.02 0.03 0.03 0.03 0.05 0.03 0.03 0.04 0.04 0.02 0.04 0.02 0.03
n 0.07 0.07 0.10 0.07 0.05 0.08 0.07 0.07 0.09 0.05 0.12 0.10 0.08
o 0.08 0.05 0.03 0.09 0.11 0.09 0.10 0.02 0.04 0.07 0.08 0.06 0.07
p 0.02 0.03 0.01 0.03 0.03 0.03 0.03 0.01 0.02 0.02 0.04 0.02 0.02
q 0.00 0.01 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
r 0.06 0.07 0.07 0.07 0.07 0.06 0.06 0.07 0.08 0.04 0.00 0.06 0.06
s 0.06 0.08 0.07 0.08 0.08 0.06 0.05 0.03 0.06 0.04 0.04 0.04 0.06
t 0.09 0.07 0.06 0.05 0.05 0.05 0.06 0.03 0.09 0.02 0.05 0.07 0.06
u 0.03 0.06 0.04 0.04 0.05 0.03 0.03 0.03 0.02 0.02 0.03 0.02 0.03
v 0.01 0.02 0.01 0.01 0.02 0.02 0.02 0.01 0.02 0.00 0.00 0.03 0.01
w 0.02 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.03 0.02 0.01
x 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
y 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.03 0.01 0.03 0.00 0.00 0.01
z 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.02 0.00 0.05 0.00 0.01 0.01
Others 0 0.03 0 0 0 0.02 0 0.12 0.06 0.2 0 0 0.04

Graphs

  • Histogram (HistogramChartDemo7) of the English letters
SOCR Data Dinov EnglishLetterFrequency.png
SOCR Data Dinov EnglishLetterFrequency1.png





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif