Hoisting HTTP headers home

2021-10-02, post № 249

web, #http, #header

While developing my new minimalistic HTTP backend — a web server called vanadium —, scourering MDN’s HTTP header documentation and inspecting the headers sent by web servers serving public websites, I first stumbled upon the header “Server” and decided for my server to tell the world its name. Yet most servers send a lot of headers, many of which non-standard and most of unclear purpose to myself. This discovery made me think: HTTP headers may be the most commonly sent textual data virtually invisible to most computer operators due to common web browser’s failure to communicate them. Thus I will in this post shine a light on my findings examining various web server’s initial banter.

Interrogation of web servers hinges on using a capable client, as without the server’s headers, no web browser could fulfill its role as a HTTP client. Yet this communication is often hidden from the user. One client which allows viewing headers is curl, although one has to read its man page thoroughly:

$ curl -fsSLD- -o/dev/null https://...

Above, curl is invoked to fail silently, be silent about its network progress yet still Show errors, follow Location redirects and Dump the HTTP response header to standard output (indicated by a singular dash). Furthermore, the response’s body is output to the null device.
Note that web servers may respond differently to different user agents, about which intel is acquired via the HTTP header “User-Agent” sent by the connecting client. One may specifically remove or set this header using $ curl -A '' or the same with a non-empty flag argument. Since I did neither, curl sent its own name together with its version, for me “curl/7.77.0” and “curl/7.74.0”.

Amusing HTTP headers

Looking at the sometimes dozens of headers sent by various websites, among the cookies and entity tags one finds sparsely sprinkled innocent items of information. Heise for one, a German publishing house operating “heise.de” and “ct.de”, configured some of their Apache and nginx servers to spew a whopping four non-standard inert HTTP headers at unsuspecting surfers:

$ curl -fsSLD- -o/dev/null heise.de | grep ^X
X-Cobbler: servo65.heise.de
X-Pect: The Spanish Inquisition
X-Clacks-Overhead: GNU Terry Pratchett

Calling those who neglegt to be wary of brutal Catholics foolish is wise in any situation, even when serving web pages. Some find referencing an overused popular culture integer to be fulfilling and panicking is seldom advisable. The first header is a total mystery; it seems to have nothing to do with the build and deployment system written in slow snake speak and I neither get the joke nor manage to find anything conclusive about it. The second-from-last header, however, has a real story behind it.
First seeing “X-Clacks-Overhead: GNU Terry Pratchett”, I thought it referenced the hoofed non-Unix animal. Yet under closer inspection, it is a different beast entirely: the three consecutive uppercase letters instruct a fictional network to keep its own author’s name and thereby legacy alive in the real world. [1] [2] [3] However, I fear the commands G, N and U are never truly executed since HTTP servers generate these bytes which are then probably promptly not understood by any client and in the process discarded.

Rather than paying tribute to a deceased fantasy writer by sending commands to machines which cannot process them, Automattic appears to employ an unconventional hiring strategy, setting “x-hacker”:

$ curl -fsSLD- -o/dev/null wordpress.com | grep ^x-h
x-hacker: If you're reading this, you should visit automattic.com/jobs and apply to join the fun, mention this header.

Whilst original and somewhat charming, I am doubtful of the effectiveness of this targeted advertising campaign. Though considering the prevelance of WordPress instances, the above message may be one of the most-viewed ad of mankind — if overwhelmingly read by non-humans.
Doing the suggested, one is instructed to go where one is — “x-hacker” is sent with the same message — yet redirected to “/work-with-us” and faced with a rather quirky header:

$ curl -fsSLD- -o/dev/null http://automattic.com/jobs | grep ^x-n | uniq
x-nananana: Batcache

Most bizarrely, “wordpress.org” seems to reference a character from Disney’s Frozen, yet I can neither confirm this interpretation nor make sense of it.

$ curl -fsSLD- -o/dev/null wordpress.org | grep ^x-o
x-olaf: ⛄

Functionally intriguing HTTP headers

When I first polled “mit.edu”, I was certain to have found a typo in one of the headers:

$ curl -fsSLD- -o/dev/null mit.edu | grep ^X-C
X-Cnection: close

However, some brief search-engining revealed it to be an F5 custom header [4], very reminiscent of HTTP’s standard “Referer” header spelling oddity [5].

Visiting “gnu.org”, I was surprised to both learn about the existence of and see in action the standard HTTP header “Content-Location”, allowing a web server to inform clients about alternate URLs to the requested resource.

$ curl -fsSLD- -o/dev/null gnu.org | grep ^Content-Lo
Content-Location: home.html

And both “gnu.org” as well as “gnu.org/home.html”, to which “gnu.org/index.html” redirects, serve the GNU homepage.

Yet not only standardized headers which are used by nigh no one are sent; deprecated non-standard headers are as well: “google.com” defines the following P3P policy:

$ curl -fsSLD- -o/dev/null google.com | grep ^P
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."

Which lead me to ask what “P3P” is in the first place. And no wonder I never heard about it since it had been already retired in late summer of 2018. [6] It seems to have attempted to standardize website’s privacy policies. A noble goal indeed, if not potentially doomed to be all but a pipe dream in the modern usage of the web.
Google’s goals, however, are unclear to me. One may be lead to believe that political symbolism is at play. After all, this neither is a P3P policy nor would it make sense to send such a policy in 2021.

Cookie clusters

One of the most infuriating findings is many website’s handling of cookies. It truly is the worst of both worlds: upon loading, the websites clutter themselves with a cookie banner or lock themselves behind a cookie screen when cookies were already set using the “Set-Cookie” header.
But most irritatingly, the Bundesbeauftragte für den Datenschutz und die Informationsfreiheit operates a web server which sets two cookies via HTTP headers, does not show a cookie banner as far as I can tell and in writes in their privacy notice “8. Sonstige Informationen ; Es besteht hinsichtlich der Datenverarbeitung des BfDI kein Beschwerderecht bei einer Aufsichtsbehörde. Eine automatisierte Entscheidungsfindung findet nicht statt.” [7] which translates to informing about the nonexistence of any right for complaint in regards to data processing.
I find the entire situation somewhat shady.

$ curl -fsSLD- -o/dev/null bfdi.bund.de | grep -i cookie
Set-Cookie: AL_SESS-S=AeWgHd5rUVOyjSxVj2CjbYUabfPTQS8Ed_n3FbXJzoAhYqQCn1gJDf91LTcH_AgT0bze; Path=/; Secure; HttpOnly; SameSite=Lax
Set-Cookie: AL_SESS-S=AWtabijCgd7AlQ!92sxZTaxs5Pr_5Clc3Rnn3mlt7Pn3qXTbuqO2i1vExDt4C3qceYd!; Path=/; Secure; HttpOnly; SameSite=Lax

Terse toilers

Lastly, I would like to mention websites run by reserved servers not eager to force their metadata, cookies or jokes onto potential passersby. I found of note because of their brevity “openbsd.org”, “oeis.org” and “fefe.de”. Contrastingly, “ibm.com” sets an obscene amount of headers.

$ curl -fsSLD- -o/dev/null www.openbsd.org | sed '1d;$d' | wc -l

$ curl -fsSLD- -o/dev/null oeis.org | sed '1d;$d' | wc -l

$ curl -fsSLD- -o/dev/null www.fefe.de | sed '1d;$d' | wc -l

$ curl -fsSLD- -o/dev/null https://www.ibm.com/de-de | sed '1d;$d' | wc -l

Closing thoughts

I am routinely fascinated what RFCs have defined in the HTTP standard over the years; features I never knew existed or nearly no browser supports. Yet as I started to look at running server’s responses, I saw a whole parallel world of non-standard headers, with their meaning and aspirations as inspiring as they are wacky and waning. A whole world seen by only the fewest of people, with you now amongst them.

Jonathan Frech's blog; built 2024/05/27 06:43:58 CEST