hp
toc

The Hundredth Post

2016-01-16, post № 100

programming, Pygame, Python, #100, #100th, #char, #char count, #character, #character distribution, #characters, #chars, #data, #data plot, #distribution, #graph, #graphics, #hundredth, #plot, #plotted, #plotted data, #representating, #Zipf, #Zipf curve

The J-Blog has reached its hundredth post, this one. It being the hundredth post, I want to review my previous posts. But rather than going into the context of my previous 𝟫𝟫 posts, I decided to write a program to do this for me.

The program goes through a list fetched from my Table of contents page, looks up the html code, distills the post and counts how many characters of each type there are. [1] It then outputs its results in a nice text-based style (can be seen below).

To further visualize the data, I wrote another program to display it graphically. The results look like this. On the left the characters are sorted alphabetically (ASCII) and on the right they are sorted by occurrence (you may click on the images to view them closely).

the-hundredth-post_alphabetically.png
the-hundredth-post_highest-count.png
+------+-------+---------+    +------+-------+---------+
| char | count | percent |    | char | count | percent |
+------+-------+---------+    +------+-------+---------+
| all  | 33746 | 100.0   |    | all  | 33746 | 100.0   |
| '!'  |     8 |   0.023 |    | ' '  |  5693 |  16.870 |
| ' '  |  5693 |  16.870 |    | 'e'  |  3323 |   9.847 |
| '%'  |     2 |   0.005 |    | 't'  |  2422 |   7.177 |
| '&'  |     1 |   0.002 |    | 'o'  |  2013 |   5.965 |
| ')'  |   109 |   0.323 |    | 'a'  |  1949 |   5.775 |
| '('  |   109 |   0.323 |    | 'i'  |  1912 |   5.665 |
| '-'  |    45 |   0.133 |    | 'n'  |  1839 |   5.449 |
| ','  |   206 |   0.610 |    | 's'  |  1782 |   5.280 |
| '.'  |   364 |   1.078 |    | 'r'  |  1765 |   5.230 |
| '1'  |    33 |   0.097 |    | 'h'  |  1165 |   3.452 |
| '0'  |    88 |   0.260 |    | 'l'  |  1126 |   3.336 |
| '3'  |    12 |   0.035 |    | 'c'  |  1035 |   3.067 |
| '2'  |    20 |   0.059 |    | 'g'  |   802 |   2.376 |
| '5'  |    11 |   0.032 |    | 'd'  |   799 |   2.367 |
| '4'  |    21 |   0.062 |    | 'm'  |   714 |   2.115 |
| '7'  |     9 |   0.026 |    | 'u'  |   681 |   2.018 |
| '6'  |    14 |   0.041 |    | 'p'  |   656 |   1.943 |
| '9'  |     1 |   0.002 |    | 'y'  |   487 |   1.443 |
| '8'  |     4 |   0.011 |    | 'f'  |   466 |   1.380 |
| ';'  |     1 |   0.002 |    | 'w'  |   437 |   1.294 |
| ':'  |     9 |   0.026 |    | '.'  |   364 |   1.078 |
| '='  |     2 |   0.005 |    | 'b'  |   287 |   0.850 |
| '?'  |     2 |   0.005 |    | 'v'  |   233 |   0.690 |
| 'A'  |    40 |   0.118 |    | 'k'  |   215 |   0.637 |
| 'C'  |    58 |   0.171 |    | ','  |   206 |   0.610 |
| 'B'  |    32 |   0.094 |    | 'I'  |   151 |   0.447 |
| 'E'  |    10 |   0.029 |    | 'T'  |   134 |   0.397 |
| 'D'  |    15 |   0.044 |    | '('  |   109 |   0.323 |
| 'G'  |     8 |   0.023 |    | ')'  |   109 |   0.323 |
| 'F'  |    17 |   0.050 |    | 'x'  |    94 |   0.278 |
| 'I'  |   151 |   0.447 |    | '0'  |    88 |   0.260 |
| 'H'  |    13 |   0.038 |    | 'C'  |    58 |   0.171 |
| 'K'  |     1 |   0.002 |    | 'S'  |    48 |   0.142 |
| 'J'  |     9 |   0.026 |    | '-'  |    45 |   0.133 |
| 'M'  |    20 |   0.059 |    | 'A'  |    40 |   0.118 |
| 'L'  |    27 |   0.080 |    | '1'  |    33 |   0.097 |
| 'O'  |     6 |   0.017 |    | 'U'  |    33 |   0.097 |
| 'N'  |     5 |   0.014 |    | 'B'  |    32 |   0.094 |
| 'Q'  |     1 |   0.002 |    | 'L'  |    27 |   0.080 |
| 'P'  |    19 |   0.056 |    | 'z'  |    26 |   0.077 |
| 'S'  |    48 |   0.142 |    | 'q'  |    24 |   0.071 |
| 'R'  |    19 |   0.056 |    | '4'  |    21 |   0.062 |
| 'U'  |    33 |   0.097 |    | '2'  |    20 |   0.059 |
| 'T'  |   134 |   0.397 |    | 'M'  |    20 |   0.059 |
| 'W'  |    19 |   0.056 |    | 'P'  |    19 |   0.056 |
| 'V'  |     7 |   0.020 |    | 'R'  |    19 |   0.056 |
| 'Y'  |    14 |   0.041 |    | 'W'  |    19 |   0.056 |
| 'X'  |     4 |   0.011 |    | 'F'  |    17 |   0.050 |
| '_'  |     2 |   0.005 |    | 'j'  |    17 |   0.050 |
| 'a'  |  1949 |   5.775 |    | 'D'  |    15 |   0.044 |
| 'c'  |  1035 |   3.067 |    | '6'  |    14 |   0.041 |
| 'b'  |   287 |   0.850 |    | 'Y'  |    14 |   0.041 |
| 'e'  |  3323 |   9.847 |    | 'H'  |    13 |   0.038 |
| 'd'  |   799 |   2.367 |    | '3'  |    12 |   0.035 |
| 'g'  |   802 |   2.376 |    | '5'  |    11 |   0.032 |
| 'f'  |   466 |   1.380 |    | 'E'  |    10 |   0.029 |
| 'i'  |  1912 |   5.665 |    | ':'  |     9 |   0.026 |
| 'h'  |  1165 |   3.452 |    | 'J'  |     9 |   0.026 |
| 'k'  |   215 |   0.637 |    | '7'  |     9 |   0.026 |
| 'j'  |    17 |   0.050 |    | '!'  |     8 |   0.023 |
| 'm'  |   714 |   2.115 |    | 'G'  |     8 |   0.023 |
| 'l'  |  1126 |   3.336 |    | 'V'  |     7 |   0.020 |
| 'o'  |  2013 |   5.965 |    | 'O'  |     6 |   0.017 |
| 'n'  |  1839 |   5.449 |    | 'N'  |     5 |   0.014 |
| 'q'  |    24 |   0.071 |    | '8'  |     4 |   0.011 |
| 'p'  |   656 |   1.943 |    | 'X'  |     4 |   0.011 |
| 's'  |  1782 |   5.280 |    | '%'  |     2 |   0.005 |
| 'r'  |  1765 |   5.230 |    | '='  |     2 |   0.005 |
| 'u'  |   681 |   2.018 |    | '?'  |     2 |   0.005 |
| 't'  |  2422 |   7.177 |    | '_'  |     2 |   0.005 |
| 'w'  |   437 |   1.294 |    | '&'  |     1 |   0.002 |
| 'v'  |   233 |   0.690 |    | '~'  |     1 |   0.002 |
| 'y'  |   487 |   1.443 |    | '9'  |     1 |   0.002 |
| 'x'  |    94 |   0.278 |    | ';'  |     1 |   0.002 |
| 'z'  |    26 |   0.077 |    | 'K'  |     1 |   0.002 |
| '~'  |     1 |   0.002 |    | 'Q'  |     1 |   0.002 |
+------+-------+---------+    +------+-------+---------+

The main program (shell-based): the-hundredth-post.py
The visualization program: the-hundredth-post_data-visualization.py

Footnotes

  1. [2020-07-17] Note that the data was compiled around the post’s publication date; whilst moving the blog content over, punctuation and occasionally sentences were changed leading to a probable discrepency of the presented data to the current blog’s content.
Jonathan Frech's blog; built 2024/04/13 20:55:09 CEST