{"id":126,"date":"2007-08-17T09:54:31","date_gmt":"2007-08-17T13:54:31","guid":{"rendered":"http:\/\/www.eyoungwon.com\/journal\/?p=126"},"modified":"2007-08-17T21:46:08","modified_gmt":"2007-08-18T01:46:08","slug":"caching-tutorial-web","status":"publish","type":"post","link":"https:\/\/eyoungwon.com\/journal\/caching-tutorial-web\/","title":{"rendered":"Caching Tutorial &#8211; Web"},"content":{"rendered":"<h2><a title=\"DEFINITION\" name=\"DEFINITION\" id=\"DEFINITION\"><\/a>What\u2019s a Web Cache? Why do people use them?<\/h2>\n<p>A <em>Web cache<\/em> sits between one or more Web servers (also known as <em>origin servers<\/em>) and a client or many clients, and watches requests come by, saving copies of the responses \u2014 like HTML pages, images and files (collectively known as <em>representations<\/em>) \u2014 for itself. Then, if there is another request for the same URL, it can use the response that it has, instead of asking the origin server for it again.<\/p>\n<p>There are two main reasons that Web caches are used:<\/p>\n<ul>\n<li>To <strong>reduce latency<\/strong> \u2014 Because the request is satisfied from the cache (which is closer to the client) instead of the origin server, it takes less time for it to get the representation and display it. This makes the Web seem more responsive.<\/li>\n<li>To <strong>reduce network traffic<\/strong> \u2014 Because representations are reused, it reduces the amount of bandwidth used by a client. This saves money if the client is paying for traffic, and keeps their bandwidth requirements lower and more manageable.<\/li>\n<\/ul>\n<p><script type=\"text\/javascript\"><!--\ngoogle_ad_client = \"pub-5668217421182594\";\ngoogle_ad_width = 468;\ngoogle_ad_height = 15;\ngoogle_ad_format = \"468x15_0ads_al_s\";\n\/\/2007-08-17: http:\/\/www.eyoungwon.com\/\ngoogle_ad_channel = \"2052199229\";\ngoogle_color_border = \"C3D9FF\";\ngoogle_color_bg = \"FFCC66\";\ngoogle_color_link = \"000000\";\ngoogle_color_text = \"333333\";\ngoogle_color_url = \"666666\";\n\/\/-->\n<\/script><br \/>\n<script type=\"text\/javascript\"\n  src=\"http:\/\/pagead2.googlesyndication.com\/pagead\/show_ads.js\">\n<\/script><\/p>\n<p><!--more--><\/p>\n<h2><a title=\"KINDS\" name=\"KINDS\" id=\"KINDS\"><\/a>Kinds of Web Caches<\/h2>\n<h3><a title=\"BROWSER\" name=\"BROWSER\" id=\"BROWSER\"><\/a>Browser Caches<\/h3>\n<p>If you examine the preferences dialog of any modern Web browser (like Internet Explorer, Safari or Mozilla), you\u2019ll probably notice a \u201ccache\u201d setting. This lets you set aside a section of your computer\u2019s hard disk to store representations that you\u2019ve seen, just for you. The browser cache works according to fairly simple rules. It will check to make sure that the representations are fresh, usually once a session (that is, the once in the current invocation of the browser).<\/p>\n<p>This cache is especially useful when users hit the \u201cback\u201d button or click a link to see a page they\u2019ve just looked at. Also, if you use the same navigation images throughout your site, they\u2019ll be served from browsers\u2019 caches almost instantaneously.<\/p>\n<h3><a title=\"PROXY\" name=\"PROXY\" id=\"PROXY\"><\/a>Proxy Caches<\/h3>\n<p>Web proxy caches work on the same principle, but a much larger scale. Proxies serve hundreds or thousands of users in the same way; large corporations and ISPs often set them up on their firewalls, or as standalone devices (also known as <em>intermediaries<\/em>).<\/p>\n<p>Because proxy caches aren\u2019t part of the client or the origin server, but instead are out on the network, requests have to be routed to them somehow. One way to do this is to use your browser\u2019s proxy setting to manually tell it what proxy to use; another is using interception. <em>Interception proxies<\/em> have Web requests redirected to them by the underlying network itself, so that clients don\u2019t need to be configured for them, or even know about them.<\/p>\n<p>Proxy caches are a type of <em>shared cache<\/em>; rather than just having one person using them, they usually have a large number of users, and because of this they are very good at reducing latency and network traffic. That\u2019s because popular representations are reused a number of times.<\/p>\n<h3><a title=\"GATEWAY\" name=\"GATEWAY\" id=\"GATEWAY\"><\/a>Gateway Caches<\/h3>\n<p>Also known as \u201creverse proxy caches\u201d or \u201csurrogate caches,\u201d gateway caches are also intermediaries, but instead of being deployed by network administrators to save bandwidth, they\u2019re typically deployed by Webmasters themselves, to make their sites more scalable, reliable and better performing.<\/p>\n<p>Requests can be routed to gateway caches by a number of methods, but typically some form of load balancer is used to make one or more of them look like the origin server to clients.<\/p>\n<p><em>Content delivery networks<\/em> (CDNs) distribute gateway caches throughout the Internet (or a part of it) and sell caching to interested Web sites. <a href=\"http:\/\/www.speedera.com\/\" class=\"offsite\">Speedera<\/a> and <a href=\"http:\/\/www.akamai.com\/\" class=\"offsite\">Akamai<\/a> are examples of CDNs.<\/p>\n<p>This tutorial focuses mostly on browser and proxy caches, although some of the information is suitable for those interested in gateway caches as well.<\/p>\n<h2><a title=\"WHY\" name=\"WHY\" id=\"WHY\"><\/a>Aren\u2019t Web Caches bad for me? Why should I help them?<\/h2>\n<p>Web caching is one of the most misunderstood technologies on the Internet. Webmasters in particular fear losing control of their site, because a proxy cache can \u201chide\u201d their users from them, making it difficult to see who\u2019s using the site.<\/p>\n<p>Unfortunately for them, even if Web caches didn\u2019t exist, there are too many variables on the Internet to assure that they\u2019ll be able to get an accurate picture of how users see their site. If this is a big concern for you, this tutorial will teach you how to get the statistics you need without making your site cache-unfriendly.<\/p>\n<p>Another concern is that caches can serve content that is out of date, or <em>stale<\/em>. However, this tutorial can show you how to configure your server to control how your content is cached.<\/p>\n<p class=\"callout right\"><acronym title=\"Content Delivery Networks\">CDNs<\/acronym> are an interesting development, because unlike many proxy caches, their gateway caches are aligned with the interests of the Web site being cached, so that these problems aren\u2019t seen. However, even when you use a CDN, you still have to consider that there will be proxy and browser caches downstream.<\/p>\n<p>On the other hand, if you plan your site well, caches can help your Web site load faster, and save load on your server and Internet link. The difference can be dramatic; a site that is difficult to cache may take several seconds to load, while one that takes advantage of caching can seem instantaneous in comparison. Users will appreciate a fast-loading site, and will visit more often.<\/p>\n<p>Think of it this way; many large Internet companies are spending millions of dollars setting up farms of servers around the world to replicate their content, in order to make it as fast to access as possible for their users. Caches do the same for you, and they\u2019re even closer to the end user. Best of all, you don\u2019t have to pay for them.<\/p>\n<p>The fact is that proxy and browser caches will be used whether you like it or not. If you don\u2019t configure your site to be cached correctly, it will be cached using whatever defaults the cache\u2019s administrator decides upon.<\/p>\n<h2><a title=\"WORK\" name=\"WORK\" id=\"WORK\"><\/a>How Web Caches Work<\/h2>\n<p>All caches have a set of rules that they use to determine when to serve a representation from the cache, if it\u2019s available. Some of these rules are set in the protocols (HTTP 1.0 and 1.1), and some are set by the administrator of the cache (either the user of the browser cache, or the proxy administrator).<\/p>\n<p>Generally speaking, these are the most common rules that are followed (don\u2019t worry if you don\u2019t understand the details, it will be explained below):<\/p>\n<p class=\"ol\">&nbsp;<\/p>\n<ol>\n<li>If the response\u2019s headers tell the cache not to keep it, it won\u2019t.<\/li>\n<li>If the request is authenticated or secure, it won\u2019t be cached.<\/li>\n<li>If no validator (an <code>ETag<\/code> or <code>Last-Modified<\/code> header) is present on a response, <em>and<\/em> it doesn&#8217;t have any explicit freshness information, it will be considered uncacheable.<\/li>\n<li>A cached representation is considered <em>fresh<\/em> (that is, able to be sent to a client without checking with the origin server) if:\n<ul>\n<li>It has an expiry time or other age-controlling header set, and is still within the fresh period.<\/li>\n<li>If a browser cache has already seen the representation, and has been set to check once a session.<\/li>\n<li>If a proxy cache has seen the representation recently, and it was modified relatively long ago.<\/li>\n<\/ul>\n<p>Fresh representations are served directly from the cache, without checking with the origin server.<\/li>\n<li>If an representation is stale, the origin server will be asked to <em>validate<\/em> it, or tell the cache whether the copy that it has is still good.<\/li>\n<\/ol>\n<p>Together, <em>freshness<\/em> and <em>validation<\/em> are the most important ways that a cache works with content. A fresh representation will be available instantly from the cache, while a validated representation will avoid sending the entire representation over again if it hasn\u2019t changed.<\/p>\n<h2><a title=\"CONTROL\" name=\"CONTROL\" id=\"CONTROL\"><\/a>How (and how not) to Control Caches<\/h2>\n<p>There are several tools that Web designers and Webmasters can use to fine-tune how caches will treat their sites. It may require getting your hands a little dirty with your server\u2019s configuration, but the results are worth it. For details on how to use these tools with your server, see the <a href=\"http:\/\/www.mnot.net\/cache_docs\/#IMP-SERVER\">Implementation<\/a> sections below.<\/p>\n<h3><a title=\"META\" name=\"META\" id=\"META\"><\/a>HTML Meta Tags and HTTP Headers<\/h3>\n<p>HTML authors can put tags in a document\u2019s &lt;HEAD&gt; section that describe its attributes. These <em>meta tags<\/em> are often used in the belief that they can mark a document as uncacheable, or expire it at a certain time.<\/p>\n<p>Meta tags are easy to use, but aren\u2019t very effective. That\u2019s because they\u2019re only honored by a few browser caches (which actually read the HTML), not proxy caches (which almost never read the HTML in the document). While it may be tempting to put a Pragma: no-cache meta tag into a Web page, it won\u2019t necessarily cause it to be kept fresh.<\/p>\n<p class=\"callout right\">If your site is hosted at an ISP or hosting farm and they don\u2019t give you the ability to set arbitrary HTTP headers (like <code>Expires<\/code> and <code>Cache-Control<\/code>), complain loudly; these are tools necessary for doing your job.<\/p>\n<p>On the other hand, true <em>HTTP headers<\/em> give you a lot of control over how both browser caches and proxies handle your representations. They can\u2019t be seen in the HTML, and are usually automatically generated by the Web server. However, you can control them to some degree, depending on the server you use. In the following sections, you\u2019ll see what HTTP headers are interesting, and how to apply them to your site.<\/p>\n<p>HTTP headers are sent by the server before the HTML, and only seen by the browser and any intermediate caches. Typical HTTP 1.1 response headers might look like this:<\/p>\n<pre class=\"example\">HTTP\/1.1 200 OK Date: Fri, 30 Oct 1998 13:19:41 GMT Server: Apache\/1.3.3 (Unix) Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: \"3e86-410-3596fbbc\" Content-Length: 1040 Content-Type: text\/html<\/pre>\n<p>The HTML would follow these headers, separated by a blank line. See the <a href=\"http:\/\/www.mnot.net\/cache_docs\/#IMP-SERVER\">Implementation<\/a> sections for information about how to set HTTP headers.<\/p>\n<h3><a title=\"PRAGMA\" name=\"PRAGMA\" id=\"PRAGMA\"><\/a>Pragma HTTP Headers (and why they don\u2019t work)<\/h3>\n<p>Many people believe that assigning a <code>Pragma: no-cache<\/code> HTTP header to a representation will make it uncacheable. This is not necessarily true; the HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honor this header, the majority won\u2019t, and it won\u2019t have any effect. Use the headers below instead.<\/p>\n<h3><a title=\"EXPIRES\" name=\"EXPIRES\" id=\"EXPIRES\"><\/a>Controlling Freshness with the Expires HTTP Header<\/h3>\n<p>The <code>Expires<\/code> HTTP header is a basic means of controlling caches; it tells all caches how long the associated representation is fresh for. After that time, caches will always check back with the origin server to see if a document is changed. <code>Expires<\/code> headers are supported by practically every cache.<\/p>\n<p>Most Web servers allow you to set <code>Expires<\/code> response headers in a number of ways. Commonly, they will allow setting an absolute time to expire, a time based on the last time that the client saw the representation (last <em>access time<\/em>), or a time based on the last time the document changed on your server (last <em>modification time<\/em>).<\/p>\n<p><code>Expires<\/code> headers are especially good for making static images (like navigation bars and buttons) cacheable. Because they don\u2019t change much, you can set extremely long expiry time on them, making your site appear much more responsive to your users. They\u2019re also useful for controlling caching of a page that is regularly changed. For instance, if you update a news page once a day at 6am, you can set the representation to expire at that time, so caches will know when to get a fresh copy, without users having to hit \u2018reload\u2019.<\/p>\n<p>The <strong>only<\/strong> value valid in an <code>Expires<\/code> header is a HTTP date; anything else will most likely be interpreted as \u2018in the past\u2019, so that the representation is uncacheable. Also, remember that the time in a HTTP date is Greenwich Mean Time (GMT), not local time.<\/p>\n<p>For example:<\/p>\n<pre><span class=\"example\">Expires: Fri, 30 Oct 1998 14:19:41 GMT<\/span><\/pre>\n<p class=\"callout right\">It\u2019s important to make sure that your Web server\u2019s clock is accurate if you use the <code>Expires<\/code> header. One way to do this is using the <a href=\"http:\/\/www.ntp.org\/\" class=\"offsite\">Network Time Protocol<\/a> (NTP); talk to your local system administrator to find out more.<\/p>\n<p>Although the <code>Expires<\/code> header is useful, it has some limitations. First, because there\u2019s a date involved, the clocks on the Web server and the cache must be synchronised; if they have a different idea of the time, the intended results won\u2019t be achieved, and caches might wrongly consider stale content as fresh.<\/p>\n<p>Another problem with <code>Expires<\/code> is that it\u2019s easy to forget that you\u2019ve set some content to expire at a particular time. If you don\u2019t update an <code>Expires<\/code> time before it passes, each and every request will go back to your Web server, increasing load and latency.<\/p>\n<h3><a title=\"CACHE-CONTROL\" name=\"CACHE-CONTROL\" id=\"CACHE-CONTROL\"><\/a>Cache-Control HTTP Headers<\/h3>\n<p>HTTP 1.1 introduced a new class of headers, <code>Cache-Control<\/code> response headers, to give Web publishers more control over their content, and to address the limitations of <code>Expires<\/code>.<\/p>\n<p>Useful <code>Cache-Control<\/code> response headers include:<\/p>\n<ul>\n<li><strong><code>max-age=<\/code><\/strong>[seconds] \u2014 specifies the maximum amount of time that an representation will be considered fresh. Similar to <code>Expires<\/code>, this directive is relative to the time of the request, rather than absolute. [seconds] is the number of seconds from the time of the request you wish the representation to be fresh for.<\/li>\n<li><strong><code>s-maxage=<\/code><\/strong>[seconds] \u2014 similar to <code>max-age<\/code>, except that it only applies to shared (e.g., proxy) caches.<\/li>\n<li><strong><code>public<\/code><\/strong> \u2014 marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically uncacheable.<\/li>\n<li><strong><code>no-cache<\/code><\/strong> \u2014 forces caches to submit the request to the origin server for validation before releasing a cached copy, every time. This is useful to assure that authentication is respected (in combination with public), or to maintain rigid freshness, without sacrificing all of the benefits of caching.<\/li>\n<li><strong><code>no-store<\/code><\/strong> \u2014 instructs caches not to keep a copy of the representation under any conditions.<\/li>\n<li><strong><code>must-revalidate<\/code><\/strong> \u2014 tells caches that they must obey any freshness information you give them about a representation. HTTP allows caches to serve stale representations under special conditions; by specifying this header, you\u2019re telling the cache that you want it to strictly follow your rules.<\/li>\n<li><strong><code>proxy-revalidate<\/code><\/strong> \u2014 similar to <code>must-revalidate<\/code>, except that it only applies to proxy caches.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/www.mnot.net\/cache_docs\/\" target=\"_blank\" title=\"Caching Tutorial\">Read More<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What\u2019s a Web Cache? Why do people use them? A Web cache sits between one or more Web servers (also known as origin servers) and a client or many clients, and watches requests come by, saving copies of the responses \u2014 like HTML pages, images and files (collectively known as representations) \u2014 for itself. Then, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-126","post","type-post","status-publish","format-standard","hentry","category-web-development"],"_links":{"self":[{"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/posts\/126","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/comments?post=126"}],"version-history":[{"count":0,"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/posts\/126\/revisions"}],"wp:attachment":[{"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/media?parent=126"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/categories?post=126"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/eyoungwon.com\/journal\/wp-json\/wp\/v2\/tags?post=126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}