Difference between revisions of "SOCR LetterFrequencyData"

From SOCR
Jump to: navigation, search
m (Data Description: typo)
m (Data Description)
Line 4: Line 4:
 
[[Image:SOCR_Data_Dinov_EnglishLetterFrequency.png|150px|thumbnail|right| [http://en.wikipedia.org/wiki/Letter_frequency English Letter Frequencies] ]]
 
[[Image:SOCR_Data_Dinov_EnglishLetterFrequency.png|150px|thumbnail|right| [http://en.wikipedia.org/wiki/Letter_frequency English Letter Frequencies] ]]
  
The data table below present the average frequencies of the 26 most common Latin letters for different languages. Letter frequencies in text are studied in cryptography. There is no ''exact'' letter frequency distribution underlies a given language, since all writers write slightly differently. Modern International [http://en.wikipedia.org/wiki/Morse_code Morse code] encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order. Similar ideas are used in modern data-compression techniques such as [http://en.wikipedia.org/wiki/Huffman_coding Huffman coding].
+
The data table below present the average frequencies of the 26 most common Latin letters for different languages. Letter frequencies in text are studied in cryptography. The exact letter frequency distribution underling a given language is unknown and varies with time, since all writers tend to write slightly differently and are affected by their culture. Modern International [http://en.wikipedia.org/wiki/Morse_code Morse code] encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order. Similar ideas are used in modern data-compression techniques such as [http://en.wikipedia.org/wiki/Huffman_coding Huffman coding].
  
 
Letter frequencies, like word frequencies, tend to vary by writer, subject and language. Accurate average letter frequencies are obtained by analyzing large amounts of ''representative'' text.
 
Letter frequencies, like word frequencies, tend to vary by writer, subject and language. Accurate average letter frequencies are obtained by analyzing large amounts of ''representative'' text.

Revision as of 13:49, 31 May 2010

SOCR Data - Latin Letters Frequency Distributions in Different Languages

Data Description

The data table below present the average frequencies of the 26 most common Latin letters for different languages. Letter frequencies in text are studied in cryptography. The exact letter frequency distribution underling a given language is unknown and varies with time, since all writers tend to write slightly differently and are affected by their culture. Modern International Morse code encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order. Similar ideas are used in modern data-compression techniques such as Huffman coding.

Letter frequencies, like word frequencies, tend to vary by writer, subject and language. Accurate average letter frequencies are obtained by analyzing large amounts of representative text.

Sources

Data Table

Letter English French German Spanish Portuguese Esperanto Italian Turkish Swedish Polish Toki_Pona Dutch Avgerage
a 0.08 0.08 0.07 0.13 0.15 0.12 0.12 0.12 0.09 0.08 0.17 0.07 0.11
b 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.03 0.01 0.01 0.00 0.02 0.01
c 0.03 0.03 0.03 0.05 0.04 0.01 0.05 0.01 0.01 0.04 0.00 0.01 0.03
d 0.04 0.04 0.05 0.06 0.05 0.03 0.04 0.05 0.05 0.03 0.00 0.06 0.04
e 0.13 0.15 0.17 0.14 0.13 0.09 0.12 0.09 0.10 0.07 0.07 0.19 0.12
f 0.02 0.01 0.02 0.01 0.01 0.01 0.01 0.00 0.02 0.00 0.00 0.01 0.01
g 0.02 0.01 0.03 0.01 0.01 0.01 0.02 0.01 0.03 0.01 0.00 0.03 0.02
h 0.06 0.01 0.05 0.01 0.01 0.00 0.02 0.01 0.02 0.01 0.00 0.02 0.02
i 0.07 0.08 0.08 0.06 0.06 0.10 0.11 0.08 0.05 0.07 0.15 0.07 0.08
j 0.00 0.01 0.00 0.00 0.00 0.04 0.00 0.00 0.01 0.02 0.03 0.01 0.01
k 0.01 0.00 0.01 0.00 0.00 0.04 0.00 0.05 0.03 0.03 0.05 0.02 0.02
l 0.04 0.05 0.03 0.05 0.03 0.06 0.07 0.06 0.05 0.03 0.10 0.04 0.05
m 0.02 0.03 0.03 0.03 0.05 0.03 0.03 0.04 0.04 0.02 0.04 0.02 0.03
n 0.07 0.07 0.10 0.07 0.05 0.08 0.07 0.07 0.09 0.05 0.12 0.10 0.08
o 0.08 0.05 0.03 0.09 0.11 0.09 0.10 0.02 0.04 0.07 0.08 0.06 0.07
p 0.02 0.03 0.01 0.03 0.03 0.03 0.03 0.01 0.02 0.02 0.04 0.02 0.02
q 0.00 0.01 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
r 0.06 0.07 0.07 0.07 0.07 0.06 0.06 0.07 0.08 0.04 0.00 0.06 0.06
s 0.06 0.08 0.07 0.08 0.08 0.06 0.05 0.03 0.06 0.04 0.04 0.04 0.06
t 0.09 0.07 0.06 0.05 0.05 0.05 0.06 0.03 0.09 0.02 0.05 0.07 0.06
u 0.03 0.06 0.04 0.04 0.05 0.03 0.03 0.03 0.02 0.02 0.03 0.02 0.03
v 0.01 0.02 0.01 0.01 0.02 0.02 0.02 0.01 0.02 0.00 0.00 0.03 0.01
w 0.02 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.03 0.02 0.01
x 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
y 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.03 0.01 0.03 0.00 0.00 0.01
z 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.02 0.00 0.05 0.00 0.01 0.01
Others 0 0.03 0 0 0 0.02 0 0.12 0.06 0.2 0 0 0.04

Graphs

  • Histogram (HistogramChartDemo7) of the English letters
SOCR Data Dinov EnglishLetterFrequency.png
SOCR Data Dinov EnglishLetterFrequency1.png





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif