Difference between revisions of "SOCR LetterFrequencyData"
(New page: == SOCR Data - Letter Frequency Data== ===Data Description=== [[Image:SOCR_Data_Dinov_H_Index_Schematic.png|150px|thumbnail|right| [http://en.wikipedia.org/wiki/Hirsch_numb...) |
m (→Data Table: minor formatting) |
||
Line 16: | Line 16: | ||
{| class="wikitable" style="text-align:center; width:75%" border="1" | {| class="wikitable" style="text-align:center; width:75%" border="1" | ||
|- | |- | ||
− | ! [http://en.wikipedia.org/wiki/Letter_frequency Letter] || [http://en.wikipedia.org/wiki/English_language English] || [http://en.wikipedia.org/wiki/French_language French] || [http://en.wikipedia.org/wiki/German_language German] || [http://en.wikipedia.org/wiki/Spanish_language Spanish] || [http://en.wikipedia.org/wiki/Portuguese_language Portuguese] || [http://en.wikipedia.org/wiki/Esperanto_language Esperanto] || [http://en.wikipedia.org/wiki/Italian_language Italian] || [http://en.wikipedia.org/wiki/Turkish_language Turkish] || [http://en.wikipedia.org/wiki/Swedish_language Swedish] || [http://en.wikipedia.org/wiki/Polish_language Polish] || [http://en.wikipedia.org/wiki/Toki_Pona | + | ! [http://en.wikipedia.org/wiki/Letter_frequency Letter] || [http://en.wikipedia.org/wiki/English_language English] || [http://en.wikipedia.org/wiki/French_language French] || [http://en.wikipedia.org/wiki/German_language German] || [http://en.wikipedia.org/wiki/Spanish_language Spanish] || [http://en.wikipedia.org/wiki/Portuguese_language Portuguese] || [http://en.wikipedia.org/wiki/Esperanto_language Esperanto] || [http://en.wikipedia.org/wiki/Italian_language Italian] || [http://en.wikipedia.org/wiki/Turkish_language Turkish] || [http://en.wikipedia.org/wiki/Swedish_language Swedish] || [http://en.wikipedia.org/wiki/Polish_language Polish] || [http://en.wikipedia.org/wiki/Toki_Pona Toki_Pona] || [http://en.wikipedia.org/wiki/Dutch_language Dutch] || [http://en.wikipedia.org/wiki/Average Avgerage] |
|- | |- | ||
| a || 0.08 || 0.08 || 0.07 || 0.13 || 0.15 || 0.12 || 0.12 || 0.12 || 0.09 || 0.08 || 0.17 || 0.07 || 0.11 | | a || 0.08 || 0.08 || 0.07 || 0.13 || 0.15 || 0.12 || 0.12 || 0.12 || 0.09 || 0.08 || 0.17 || 0.07 || 0.11 |
Revision as of 16:34, 1 November 2009
SOCR Data - Letter Frequency Data
Data Description
The data table below present the average frequencies of the 26 most common Latin letters for different languages. Letter frequenciess in text is studied in cryptography. There is no exact letter frequency distribution underlies a given language, since all writers write slightly differently. Modern International Morse code encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order. Similar ideas are used in modern data-compression techniques such as Huffman coding.
Letter frequencies, like word frequencies, tend to vary by writer, subject and language. Accurate average letter frequencies are obtained by analyzing large amounts of representative text.
Sources
Data Table
Letter | English | French | German | Spanish | Portuguese | Esperanto | Italian | Turkish | Swedish | Polish | Toki_Pona | Dutch | Avgerage |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | 0.08 | 0.08 | 0.07 | 0.13 | 0.15 | 0.12 | 0.12 | 0.12 | 0.09 | 0.08 | 0.17 | 0.07 | 0.11 |
b | 0.01 | 0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.03 | 0.01 | 0.01 | 0.00 | 0.02 | 0.01 |
c | 0.03 | 0.03 | 0.03 | 0.05 | 0.04 | 0.01 | 0.05 | 0.01 | 0.01 | 0.04 | 0.00 | 0.01 | 0.03 |
d | 0.04 | 0.04 | 0.05 | 0.06 | 0.05 | 0.03 | 0.04 | 0.05 | 0.05 | 0.03 | 0.00 | 0.06 | 0.04 |
e | 0.13 | 0.15 | 0.17 | 0.14 | 0.13 | 0.09 | 0.12 | 0.09 | 0.10 | 0.07 | 0.07 | 0.19 | 0.12 |
f | 0.02 | 0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.00 | 0.02 | 0.00 | 0.00 | 0.01 | 0.01 |
g | 0.02 | 0.01 | 0.03 | 0.01 | 0.01 | 0.01 | 0.02 | 0.01 | 0.03 | 0.01 | 0.00 | 0.03 | 0.02 |
h | 0.06 | 0.01 | 0.05 | 0.01 | 0.01 | 0.00 | 0.02 | 0.01 | 0.02 | 0.01 | 0.00 | 0.02 | 0.02 |
i | 0.07 | 0.08 | 0.08 | 0.06 | 0.06 | 0.10 | 0.11 | 0.08 | 0.05 | 0.07 | 0.15 | 0.07 | 0.08 |
j | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.04 | 0.00 | 0.00 | 0.01 | 0.02 | 0.03 | 0.01 | 0.01 |
k | 0.01 | 0.00 | 0.01 | 0.00 | 0.00 | 0.04 | 0.00 | 0.05 | 0.03 | 0.03 | 0.05 | 0.02 | 0.02 |
l | 0.04 | 0.05 | 0.03 | 0.05 | 0.03 | 0.06 | 0.07 | 0.06 | 0.05 | 0.03 | 0.10 | 0.04 | 0.05 |
m | 0.02 | 0.03 | 0.03 | 0.03 | 0.05 | 0.03 | 0.03 | 0.04 | 0.04 | 0.02 | 0.04 | 0.02 | 0.03 |
n | 0.07 | 0.07 | 0.10 | 0.07 | 0.05 | 0.08 | 0.07 | 0.07 | 0.09 | 0.05 | 0.12 | 0.10 | 0.08 |
o | 0.08 | 0.05 | 0.03 | 0.09 | 0.11 | 0.09 | 0.10 | 0.02 | 0.04 | 0.07 | 0.08 | 0.06 | 0.07 |
p | 0.02 | 0.03 | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.01 | 0.02 | 0.02 | 0.04 | 0.02 | 0.02 |
q | 0.00 | 0.01 | 0.00 | 0.01 | 0.01 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
r | 0.06 | 0.07 | 0.07 | 0.07 | 0.07 | 0.06 | 0.06 | 0.07 | 0.08 | 0.04 | 0.00 | 0.06 | 0.06 |
s | 0.06 | 0.08 | 0.07 | 0.08 | 0.08 | 0.06 | 0.05 | 0.03 | 0.06 | 0.04 | 0.04 | 0.04 | 0.06 |
t | 0.09 | 0.07 | 0.06 | 0.05 | 0.05 | 0.05 | 0.06 | 0.03 | 0.09 | 0.02 | 0.05 | 0.07 | 0.06 |
u | 0.03 | 0.06 | 0.04 | 0.04 | 0.05 | 0.03 | 0.03 | 0.03 | 0.02 | 0.02 | 0.03 | 0.02 | 0.03 |
v | 0.01 | 0.02 | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | 0.01 | 0.02 | 0.00 | 0.00 | 0.03 | 0.01 |
w | 0.02 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 | 0.03 | 0.02 | 0.01 |
x | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
y | 0.02 | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.03 | 0.01 | 0.03 | 0.00 | 0.00 | 0.01 |
z | 0.00 | 0.00 | 0.01 | 0.01 | 0.00 | 0.01 | 0.00 | 0.02 | 0.00 | 0.05 | 0.00 | 0.01 | 0.01 |
total | 1.00 | 0.97 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 0.88 | 0.94 | 0.80 | 1.00 | 1.00 | 0.96 |
- SOCR Home page: http://www.socr.ucla.edu
Translate this page: