From bortz  Wed Nov 29 10:52:08 1995 Received: from mailimailo.univ-rennes1.fr (mailimailo.univ-rennes1.fr [129.20.128.39]) by mendel.sis.pasteur.fr (8.6.11/8.6.9) with ESMTP id KAA21909; Wed, 29 Nov 1995 10:44:48 +0100 Received: from (listserv@localhost)           by mailimailo.univ-rennes1.fr (8.6.10/kit5.1-0905) with TULP id KAA19033           ; Wed, 29 Nov 1995 10:44:41 +0100 Received: from mendel.sis.pasteur.fr (mendel.sis.pasteur.fr [157.99.64.100])           by mailimailo.univ-rennes1.fr (8.6.10/kit5.1-0905) with ESMTP id KAA19015           for <renater-cache@univ-rennes1.fr>; Wed, 29 Nov 1995 10:44:07 +0100 Received: from josephine.sis.pasteur.fr (josephine.sis.pasteur.fr [157.99.60.23]) by mendel.sis.pasteur.fr (8.6.11/8.6.9) with ESMTP id KAA21876; Wed, 29 Nov 1995 10:44:04 +0100 Received: from localhost (localhost [127.0.0.1])           by josephine.sis.pasteur.fr (8.6.12/jtpda-5.1) with SMTP id KAA25655           ; Wed, 29 Nov 1995 10:44:03 +0100 Message-Id: <199511290944.KAA25655@josephine.sis.pasteur.fr> X-Mailer: exmh version 1.6.4 10/10/95 From: Stephane Bortzmeyer <bortzmeyer@pasteur.fr> To: renater-cache@univ-rennes1.fr cc: cubaud@cnam.fr, courtois@cnam.fr Subject: Un peu de theorie et des mesures Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Date: Wed, 29 Nov 95 10:44:00 +0100 Sender: bortz@pasteur.fr X-Mts: smtp Precedence: list Reply-To: liste RENATER-CACHE <renater-cache@univ-rennes1.fr> X-Sequence: 15  Sur le serveur cache de production de Pasteur, un Harvest 1.3 sans parent  ni voisin, on constate sur les trois derniers jours un nombre moyen de  requtes de 5000 par jour et un ratio de "hits" de 37 %. (Tout le monde  peut-il donner ses chiffres pour comparer ?)  Refaisons un peu de thorie des caches. Si t est la dure de service lors  d'un "hit", T lors d'une requte directe, sans utiliser le cache (T est   peu prs constant lors d'accs RAM, par exemple, mais pas sur  l'Internet), t+T est donc la dure de service lors d'un "miss" (sauf  paralllisme).   La dure moyenne d'un service devient donc, avec le cache, t+(T*M) o M  est le pourcentage de "misses". Voici, pour diffrents T, le pourcentage  M o le cache devient rentable :   T          M --------------- 2*t        50 % 3*t        67 % 10*t       90 %  ( noter que j'ai dfini la rentabilit via le temps de service moyen.  L'utilisateur voudra peut-tre aussi s'intresser aux meilleur et pire  temps.)  Si on prend un pourcentage de "misses" de 63 % comme  Pasteur, on voit  que le cache est rentable si T > 2.7 * t ce qui est certainement vrai  pour n'importe quel serveur en dehors de Renater.  J'ai fait des mesures (TCP echo de 2000 octets, donc un travail qui  ressemble au service d'une page HTML) qui indique qu'actuellement, la  mdiane du temps de service est :  Localhost :  0.10 s Renater :  0.13 s Europe : 0.71 s World : 4.03 s  Soit T = 5,5 * t entre Renater et l'Europe et T = 31 * t entre Renater et  le reste du monde ! Le cache est donc largement justifi.  Annexe 1 :  Pour les amateurs de "one-liners" : 				 echo " scale = 1; \ 		(100 * `grep TCP_HIT cache.access.log | wc -l `) \ 		/ `grep TCP_MISS cache.access.log | wc -l ` " | \ 	bc  Annexe 2 : les dmonstrations  Pour chercher M  partir de r (r = T/t), partons de : 				 t+(T*M) < T   =>  T*M < T-t M < (T-t)/T  Si T = r*t  => M < (r-1)/r   Ou bien si on cherche r  partir de M :  t+(T*M) < T  =>  (T*M) - T < -t  =>  T * (M-1) < -t  =>   T < -t/(M-1)  =>   T > t/(1-M)  r > 1/(1-M)   Annexe 3 : la mesure de temps de service TCP   Measuring "realistic" response times in the Internet  We see quite often in the media and in the advertising (synonym?) the  idea that the Internet "knows no borders", "the world is available at a  mouse click", "you get from Tokyo to Paris at once" which clearly does  not correspond to the daily experience of Net users. I wanted to know  more precisely how big were the real delays in the Net.  Measures of response times in the Internet seem (anyone has a good set of  references?) to be done most of the times with tools like ping or  traceroute which involves only low-level, "kernel" functions of the Unix  machines. To try to deduce the response time of, say, a Web server, from  these measures is quite difficult. For instance, if an Unix machine  paginates a lot, ping response times will still be fair while Web clients  will suffer. ping is fine if you're interested in the network, for  instance if you plan line extensions. But it's not what the user want to  know.  The (modest) experience I'm currently performing uses the echoping  program <ftp://ftp.pasteur.fr/pub/Network/echoping> which transfers a  given amount of data on a TCP connection. Of course, it's not a real Web  or FTP transfer for many reasons and my experiment is therefore not so  realistic as it could be.  Mor precisely, the "modus operandi" is to (open one connection to the  "echo" port, send 2000 bytes and waits the response) repeated five times  at four seconds interval, every hour during several days. The median (not  the average) of the five tests is stored.   The tests are conducted from an Unix workstation at the Pasteur  Institute, connected to the French research and education network,  Renater, itself connected to the Ebone network, whose transatlantic line  is completely overloaded at the time of the tests.  Of course, permission has been asked to the postmaster of the target  domain. Thanks to those who accepted. Here are the Web servers used:  localhost (for calibration, the test station being used for daily tasks  as well)  Renater :  www.obspm.fr www.cnusc.fr www.urec.fr www.lirmm.fr www.ircam.fr www.cnam.fr  Europe, including French servers not on Renater :  www.fuw.edu.pl www.pipex.net www.ilog.fr www.rain.fr www.iway.fr www.culture.fr www.cs.hut.fi www.Uni-Koeln.DE www.UU.SE  World :  www.nc.u-tokyo.ac.jp www.kaist.ac.kr www.ibm.com www.arl.mil www.mit.csu.edu.au   First results, after a two-week run.  First, to end with the legend of "the Internet: everywhere in the world  at a glance", the typical times for a connection (there are between 40  and 200 measures per line, the median of all these measures is shown  here):  localhost :  0.099253 s www.obspm.fr : 0.10747 s www.cnusc.fr :  0.121875 s www.cnam.fr : 0.130918 s www.lirmm.fr :  0.135402 s www.urec.fr :  0.183676 s www.rain.fr : 0.3830335 s www.iway.fr : 0.385808 s www.UU.SE : 0.395647 s www.culture.fr : 0.403446 s www.cs.hut.fi : 0.5642985 s www.Uni-Koeln.DE :  0.775515 s www.ilog.fr :  1.408909 s www.pipex.net : 1.5268515 s www.arl.mil :  1.915163 s www.ibm.com : 2.7588775 s www.nc.u-tokyo.ac.jp : 3.020771 s www.fuw.edu.pl : 5.059695 s www.mit.csu.edu.au : 7.1465595 s www.kaist.ac.kr :  14.334049 s  As you can see, distance (connectivity distance, not geographical one)  matters.  If we group the servers (after all, a particular server can be close but  being a lousy or overloaded machine):  Localhost :  0.099253 s Renater :  0.132705 s Europe : 0.7053655 s World : 4.030711 s  On Renater, the delay is almost all because of the machines. Localhost or  a remote server replie almost as fast.  (In the context of the current Internet, "World" means for us "go to the  USA first", so the load of the transatlantic line affects every  non-european server.)   If we use the time of the day, we see that response timnes are almost  constant for localhost or Renater sites but vary for the others:  ------Europe 00 : 0.587718 s 01 :0.594954 s 02 :0.59318 s 03 :0.577895 s 04 : 0.580328 s 05 : 0.404226 s 06 : 0.559920 s 07 : 0.568026 s 08 : 0.534300 s 09 : 0.684192 s 10 :0.6020565 s 11 : 0.832936 s 12 : 0.722441 s 13 : 1.075504 s 14 : 0.970610 s 15 : 1.076384 s 16 : 0.861313 s 17 :1.1373505 s 18 :0.810293 s 19 :1.242318 s 20 :0.773346 s 21 : 0.772900 s 22 : 0.642569 s 23 : 0.666611 s  ------World 00 : 4.041986 s 01 : 2.112686 s 02 : 2.709370 s 03 : 2.015590 s 04 : 2.061537 s 05 : 2.006816 s 06 : 1.779218 s 07 :2.0697335 s 08 : 2.327038 s 09 :3.2259855 s 10 : 5.637663 s 11 :5.2748205 s 12 : 4.413374 s 13 : 9.117312 s 14 : 6.403580 s 15 :6.589361 s 16 :8.1418555 s 17 :8.0477195 s 18 :6.570412 s 19 : 6.779137 s 20 : 3.454967 s 21 : 3.507964 s 22 :5.475293 s 23 : 4.270582 s  There is a clear increase in response time after 10 h and until 20-21 h.  The Internet never sleeps but our daily cycle is clearly visible.  (GIF or PostScript available on request.)  Stephane Bortzmeyer           Institut Pasteur bortzmeyer@pasteur.fr         Service d'Informatique Scientifique                               Paris, France +33 1 40 61 34 62  http://web.cnam.fr/personnes/bortzmeyer/home_page.dom     
