<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4391686554026969818</id><updated>2011-11-27T17:12:11.406-08:00</updated><title type='text'>Koder3's Technology Forum</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://koder3.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4391686554026969818/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://koder3.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Koder3</name><uri>http://www.blogger.com/profile/03333594866588263978</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>3</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4391686554026969818.post-9182841285697893152</id><published>2009-09-23T20:08:00.000-07:00</published><updated>2009-09-23T22:54:54.876-07:00</updated><title type='text'>Paper Review - Binary Translation Using Peephole Superoptimizers</title><content type='html'>This post is a review of a paper that was published in the Operating Systems Design and Implementation conference (OSDI) 2008:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Binary Translation Using Peephole Superoptimizers&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Sorav Bansal and Alex Aiken, Stanford University&lt;/span&gt;&lt;br /&gt;URL: &lt;a href="http://www.usenix.org/events/osdi08/tech/full_papers/bansal/bansal.pdf"&gt;http://www.usenix.org/events/osdi08/tech/full_papers/bansal/bansal.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This is one of my new favorite papers from OSDI. They propose a novel application of techniques found in superoptimizers: automatic learning of translation rules that can be used to construct binary translators. They provide a method of determining optimal register mappings between architectures, and implement their approach. They use the implementation to generate translation rules, thereby constructing a static binary translator. PowerPC is used as the source architecture, and the target instruction set is x86. Experimental results are shown, that measure the runtime of translated code, compared with the native execution. Micro-benchmarks (lmbench) and larger benchmarks (specint) are the experimental workloads.&lt;br /&gt;&lt;br /&gt;The following are some rough notes that I made just after reading this paper. I'll try to tweak my original notes a little, so they seem more coherent to the reader:&lt;br /&gt;&lt;br /&gt;One really interesting thing about this work is that they use qemu on a linux-x86 host to compare performance and results of instruction sequences executing on an emulated PowerPC platform. Although the paper didn't say much about how they actually ran their experiments, I assume that they  batch two phases: 1) train with sequences found on the powerpc platform, and record all of the state change and performance results, 2) feed the results from (1) into the x86 half of their implementation, discovering the translation rules. They didn't say much about how certain information about the architectures is encoded, such as register layout, flags, but did mention how they handled the stack and heap. I'm assuming the architectural details and general info about the instruction set encoding could be represented in some high-level interpreted language or configuration file.&lt;br /&gt;&lt;br /&gt;Their approach is computationally complex, so it takes about a week to get a good set of rules. One source for additional research would be to look into optimizations for the approach. Pruning the search tree of possible sequences and finding register mapping more efficiently are particular examples. However, once the translation rules are found, the binary translator can be used with minimal overhead (avg ~67% native in their case).&lt;br /&gt;&lt;br /&gt;Interestingly, the translated code often performed better than the native on microbenchmarks. This, they mention, could be due the fact that code is fundamentally easier to optimize on a RISC architecture such as PowerPC. When that highly optimized code is then translated to x86, a CISC architecture, the translated code is better optimized than the native x86 instructions used in the benchmarks. One thing to note here is that it is very difficult to compare performance across different architectures, so in general, this is a touchy subject. However, the approach in this paper is using relative execution costs, to find the best possible translation sequence of instructions such that performance is optimized and correctness is ensured. Also, I'm assuming all comparisons are done between translated x86 code and native x86 code from the x86 versions of the benchmarks (with some unspecified level of optimization).&lt;br /&gt;&lt;br /&gt;Experiments were only performed on a static translator, although they do explain that their method could be theoretically applied to a dynamic translator. They calculate the overhead that would result from dynamic translation, and try to compare, but I'm not quite sure their analysis is correct, have to check this. I think they are ignoring the fact that there will be repeated&lt;br /&gt;invocations of the algorithm that determines the best register mapping, as the register mappings change with different sequences of instructions. It seems like a bad idea to keep changing the register mappings, because the memory layout of the translation cache and representation of the virtual cpu will change continuously. However from my experience implementing a dynamic translator, the translation overhead is typically very low compared to the overhead due to low instruction quality (execution time of the instructions from the cache). The only difference in my case was that my source and destination were the same architecture (roughly), so the translation module itself was very efficient, also since many of the instructions had identical translations (and it was a RISC architecture). The point here is, that I think it is an open problem whether or not a performant dynamic translator would result from this work, although I am optimistic.&lt;br /&gt;&lt;br /&gt;What I mostly learned from this paper is the material on superoptimizers. It makes sense that this would be a natural application, and I think that it's a good overlap between OS and Programming Languages. Also, it was good to get a review on issues in binary translation, which I have a lot of knowledge of (certainly, from implementing a full-system dynamic binary translator from scratch!). The paper touched on a large numbers of the problems I had to deal with during implementation. They don't go into too much depth on any one of these, but it definitely jogged my memory and made me remember the additional things you have to worry about when the translator is dynamic.&lt;br /&gt;&lt;br /&gt;Unfortunately, I don't think their contribution can be applied to writing a same-architecture translator that includes translation of kernel code (full-system), with much reduction on the engineering effort. This is because much of the engineering complexity in this case has to do with branch handling, spilling/filling registers, exception handling and redirection, emulating memory accesses to protected data or memory mapped I/O, emulation of interrupt logic and low-level devices (ie, timer, serial port), and synchonizing resources among emulated threads. Additionally, a translator that translates between the same source and target instruction sets trivializes many of the translation rules, because many non-priveleged instructions will have identical translations. But, for a basic survey of fundamental binary translation issues for mostly user-level code between different instruction sets, this is a good paper.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4391686554026969818-9182841285697893152?l=koder3.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://koder3.blogspot.com/feeds/9182841285697893152/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://koder3.blogspot.com/2009/09/paper-review-binary-translation-using.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4391686554026969818/posts/default/9182841285697893152'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4391686554026969818/posts/default/9182841285697893152'/><link rel='alternate' type='text/html' href='http://koder3.blogspot.com/2009/09/paper-review-binary-translation-using.html' title='Paper Review - Binary Translation Using Peephole Superoptimizers'/><author><name>Koder3</name><uri>http://www.blogger.com/profile/03333594866588263978</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4391686554026969818.post-3241839045461502899</id><published>2009-05-28T16:50:00.000-07:00</published><updated>2009-05-29T13:35:43.951-07:00</updated><title type='text'>Programming the Garcia Robot</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_WI5MNKtG0TI/Sh9btLFZoyI/AAAAAAAAAAk/q7YMlk9wnk8/s1600-h/garcia.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 188px;" src="http://3.bp.blogspot.com/_WI5MNKtG0TI/Sh9btLFZoyI/AAAAAAAAAAk/q7YMlk9wnk8/s400/garcia.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5341088514712380194" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Over the past few years, I've received emails from various students at universities around the world, wanting to find out how to write software for their Garcia Robot. The product is made by Acroname, Inc. and comes with a wide variety of options. Most of what I focus on here is the software aspect of the system, which requires an embedded computing platform. Acroname ships the Garcia with a Stargate board, sold by Crossbow Solutions, equipped with a USB port, a PCMCIA slot for wireless networking, two serial ports for interfacing with sensors/actuators, and an ARMv5-based Intel XScale CPU.&lt;br /&gt;&lt;br /&gt;Besides the main computer described above, the Garcia also includes a micro-controller board, called the Brainstem, that provides a serial interface for controlling 2 wheel motors and accessing data from infra-red sensors and the ledge detector. Knowing the various ways of interfacing with the Brainstem is key to being able to write complex programs to control every aspect of the robot's behavior.&lt;br /&gt;&lt;br /&gt;The Stargate platform runs a provided distribution of Linux. It is therefore possible to develop applications using the system call APIs (ie, read()/write()) in order to receive and send data over a serial port to the Brainstem. This is a cumbersome but useful way to gain precise control over all of the operations of the Brainstem, but is usually only a good option if predictability is needed for timeliness experiments. Also, this method requires the developer to understand many details about the Brainstem architecture and configuration.&lt;br /&gt;&lt;br /&gt;Included with the software is a cross-compiler toolchain. Various toolchains for ARM-based platforms can be downloaded from the web, but they all provide an assembler, compiler, linker and other tools for generating ARM binaries using an x86-based pc for the development environment. Once a program is compiled, it can be uploaded to the Stargate via a wireless network, and subsequently executed.&lt;br /&gt;&lt;h3&gt; Brainstem Commands &lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;A Brainstem command is a variable-length sequence of bytes, with the following format:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_WI5MNKtG0TI/Sh8wdbhm-WI/AAAAAAAAAAc/wJ3C9A894rY/s1600-h/brainstem-format.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 102px;" src="http://1.bp.blogspot.com/_WI5MNKtG0TI/Sh8wdbhm-WI/AAAAAAAAAAc/wJ3C9A894rY/s400/brainstem-format.gif" alt="" id="BLOGGER_PHOTO_ID_5341040965247760738" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The &lt;span style="font-style: italic;"&gt;module&lt;/span&gt; field specifies the address of one of the two available modules on the Brainstem: the GP and the MOTO. The MOTO controls the motors and some of the infra-red sensors, and the GP controls the rest of the sensors. Refer to manuals on Acroname's website for details on which module is associated with each of the sensors/actuators. By default, the module address of the GP is 2, and the MOTO is configured at address 4.&lt;br /&gt;&lt;br /&gt;Following the &lt;span style="font-style: italic;"&gt;module&lt;span style="font-style: italic;"&gt; address&lt;/span&gt;&lt;/span&gt;, the second byte of the command specifies the &lt;span style="font-style: italic;"&gt;length&lt;/span&gt; in bytes of the command packet, not including the first two bytes for the module and length. Next, the &lt;span style="font-style: italic;"&gt;command&lt;/span&gt; field specifies one of the many possible commands to execute. Acroname maintains a list of these commands and their argument formats in their Brainstem reference. Each command has a different number of arguments with varying semantics, so following the &lt;span style="font-style: italic;"&gt;command&lt;/span&gt; field there are zero or more &lt;span style="font-style: italic;"&gt;argument&lt;/span&gt; bytes.&lt;br /&gt;&lt;br /&gt;The following is an example of a command packet that tells the Brainstem to set the velocity of one of the motors to the value 10: 4 4 62 0 0 10. Here, the first byte of value 4 specifies that this packet is for the MOTO module. The third byte of value 62, specifies the command number that sets the motor velocity, and the following bytes are the command's arguments, which specify to set the motor with index 0 to a velocity of 10 units. The command and its arguments have a combined size of 4 bytes, thus 4 is specified as the length field in the second byte of the packet.&lt;br /&gt;&lt;br /&gt;In many cases, after a command is sent to the Brainstem, a reply will be sent back through the serial port in the form a &lt;span style="font-style: italic;"&gt;debug packet&lt;/span&gt;. For example, after informing the GP module to do an infra-red sensor reading, a debug packet containing the value read from the sensor can be obtained via a &lt;span style="font-style: italic;"&gt;read()&lt;/span&gt; call on the serial port's file descriptor.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt; An Example Program &lt;/h3&gt;&lt;br /&gt;The code for a simple console program is listed below, as an example of how to send Brainstem commands over the Stargate's serial port. The code uses standard APIs and assumes that the serial port device file is located at /dev/cua0. This code is intended for illustrative purposes only, and may need to be tweaked for use in practice. The values for the command bytes are read from the standard input and sent in binary format using the &lt;span style="font-style: italic;"&gt;write()&lt;/span&gt; system call on the file descriptor of the serial port.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;span class="pre"&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;#include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;&lt;br /&gt;#include &amp;lt;unistd.h&amp;gt;&lt;br /&gt;#include &amp;lt;sys/types.h&amp;gt;&lt;br /&gt;#include &amp;lt;fcntl.h&amp;gt;&lt;br /&gt;&lt;br /&gt;#define MAX_COMMAND_BYTES 50&lt;br /&gt;#define SERIAL_PORT "/dev/cua0"&lt;br /&gt;&lt;/span&gt;&lt;span class="keyword"&gt;&lt;br /&gt;main&lt;/span&gt;&lt;span class="operator"&gt;()&lt;br /&gt;{&lt;/span&gt;&lt;span class="type"&gt;&lt;br /&gt;   int&lt;/span&gt; i&lt;span class="operator"&gt;;&lt;/span&gt;&lt;span class="type"&gt;&lt;br /&gt;&lt;br /&gt;   int&lt;/span&gt; fd&lt;span class="operator"&gt;;&lt;/span&gt;&lt;br /&gt;   size_t bytes_written&lt;span class="operator"&gt;;&lt;/span&gt;&lt;span class="type"&gt;&lt;br /&gt;   char&lt;/span&gt; command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;MAX_COMMAND_BYTES&lt;span class="operator"&gt;];&lt;/span&gt;&lt;span class="comment"&gt;&lt;br /&gt;&lt;br /&gt;   // Open the serial port for writing&lt;br /&gt;&lt;/span&gt;   fd&lt;span class="operator"&gt; =&lt;/span&gt; open&lt;span class="operator"&gt;(&lt;/span&gt;SERIAL_PORT&lt;span class="operator"&gt;,&lt;/span&gt; O_RDWR&lt;span class="operator"&gt; |&lt;/span&gt; O_SYNC&lt;span class="operator"&gt;);&lt;/span&gt;&lt;span class="flow"&gt;&lt;br /&gt;&lt;br /&gt;   if&lt;/span&gt;&lt;span class="operator"&gt;(&lt;/span&gt;fd&lt;span class="operator"&gt; &amp;lt;&lt;/span&gt;&lt;span class="int"&gt; 0&lt;/span&gt;&lt;span class="operator"&gt;)&lt;br /&gt;   {&lt;/span&gt;&lt;br /&gt;      printf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"Failed to open serial port\n"&lt;/span&gt;&lt;span class="operator"&gt;);&lt;/span&gt;&lt;br /&gt;      exit&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;);&lt;br /&gt;   }&lt;/span&gt;&lt;span class="flow"&gt;&lt;br /&gt;&lt;br /&gt;   while&lt;/span&gt;&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;)&lt;br /&gt;   {&lt;/span&gt;&lt;span class="comment"&gt;&lt;br /&gt;      // Collect command byte values from user&lt;br /&gt;&lt;/span&gt;      printf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"Enter Command &amp;gt;&amp;gt; "&lt;/span&gt;&lt;span class="operator"&gt;);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      scanf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"%d %d"&lt;/span&gt;&lt;span class="operator"&gt;, &amp;amp;&lt;/span&gt;command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;0&lt;/span&gt;&lt;span class="operator"&gt;], &amp;amp;&lt;/span&gt;command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;]);&lt;/span&gt;&lt;span class="flow"&gt;&lt;br /&gt;&lt;br /&gt;      for&lt;/span&gt;&lt;span class="operator"&gt;(&lt;/span&gt;i&lt;span class="operator"&gt; =&lt;/span&gt;&lt;span class="int"&gt; 0&lt;/span&gt;&lt;span class="operator"&gt;;&lt;/span&gt; i&lt;span class="operator"&gt; &amp;lt;&lt;/span&gt; command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;];&lt;/span&gt; i&lt;span class="operator"&gt;++)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;         scanf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"%d"&lt;/span&gt;&lt;span class="operator"&gt;, &amp;amp;&lt;/span&gt;command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;i&lt;span class="operator"&gt;+&lt;/span&gt;&lt;span class="int"&gt;2&lt;/span&gt;&lt;span class="operator"&gt;]);&lt;/span&gt;&lt;span class="comment"&gt;&lt;br /&gt;&lt;br /&gt;      // Send command packet to serial port&lt;br /&gt;&lt;/span&gt;      bytes_written&lt;span class="operator"&gt; =&lt;/span&gt;&lt;span class="int"&gt; 0&lt;/span&gt;&lt;span class="operator"&gt;;&lt;/span&gt;&lt;span class="flow"&gt;&lt;br /&gt;&lt;br /&gt;      do&lt;/span&gt;&lt;span class="operator"&gt;&lt;br /&gt;      {&lt;/span&gt;&lt;br /&gt;         bytes_written&lt;span class="operator"&gt; =&lt;/span&gt; write&lt;span class="operator"&gt;(&lt;/span&gt;fd&lt;span class="operator"&gt;, (&lt;/span&gt;&lt;span class="type"&gt;void&lt;/span&gt;&lt;span class="operator"&gt; *)(&lt;/span&gt;command_bytes&lt;span class="operator"&gt; +&lt;/span&gt; bytes_written&lt;span class="operator"&gt;),&lt;br /&gt;                               (&lt;/span&gt;size_t&lt;span class="operator"&gt;)(&lt;/span&gt;command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;] +&lt;/span&gt;&lt;span class="int"&gt; 2&lt;/span&gt;&lt;span class="operator"&gt;) -&lt;/span&gt; bytes_written&lt;span class="operator"&gt;);&lt;/span&gt;&lt;span class="flow"&gt;&lt;br /&gt;&lt;br /&gt;         if&lt;/span&gt;&lt;span class="operator"&gt;(&lt;/span&gt;bytes_written&lt;span class="operator"&gt; &amp;lt;&lt;/span&gt;&lt;span class="int"&gt; 0&lt;/span&gt;&lt;span class="operator"&gt;)&lt;br /&gt;         {&lt;/span&gt;&lt;br /&gt;            printf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"Error writing to serial port: %d\n"&lt;/span&gt;&lt;span class="operator"&gt;,&lt;/span&gt; bytes_written&lt;span class="operator"&gt;);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;            exit&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;);&lt;br /&gt;         }&lt;br /&gt;      }&lt;/span&gt;&lt;span class="flow"&gt; while&lt;/span&gt;&lt;span class="operator"&gt;(&lt;/span&gt;bytes_written&lt;span class="operator"&gt; &amp;lt; (&lt;/span&gt;size_t&lt;span class="operator"&gt;)(&lt;/span&gt;command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;] +&lt;/span&gt;&lt;span class="int"&gt; 2&lt;/span&gt;&lt;span class="operator"&gt;));&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;      printf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"Sent command to module %d, length %d:"&lt;/span&gt;&lt;span class="operator"&gt;,&lt;/span&gt; command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;0&lt;/span&gt;&lt;span class="operator"&gt;],&lt;/span&gt;&lt;br /&gt;             command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;]);&lt;/span&gt;&lt;span class="flow"&gt;&lt;br /&gt;&lt;br /&gt;      for&lt;/span&gt;&lt;span class="operator"&gt;(&lt;/span&gt;i&lt;span class="operator"&gt; =&lt;/span&gt;&lt;span class="int"&gt; 2&lt;/span&gt;&lt;span class="operator"&gt;;&lt;/span&gt; i&lt;span class="operator"&gt; &amp;lt; (&lt;/span&gt;command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;&lt;span class="int"&gt;1&lt;/span&gt;&lt;span class="operator"&gt;] +&lt;/span&gt;&lt;span class="int"&gt; 2&lt;/span&gt;&lt;span class="operator"&gt;);&lt;/span&gt; i&lt;span class="operator"&gt;++)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;         printf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;" %d"&lt;/span&gt;&lt;span class="operator"&gt;,&lt;/span&gt; command_bytes&lt;span class="operator"&gt;[&lt;/span&gt;i&lt;span class="operator"&gt;]);&lt;/span&gt;&lt;br /&gt;      printf&lt;span class="operator"&gt;(&lt;/span&gt;&lt;span class="string"&gt;"\n"&lt;/span&gt;&lt;span class="operator"&gt;);&lt;br /&gt;   }&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;References&lt;/h3&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Acroname, Inc.: &lt;a href="http://www.acroname.com/"&gt;http://www.acroname.com&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Crossbow Solutions XScale Platform: &lt;a href="http://blog.xbow.com/xblog/stargate_xscale_platform/"&gt;http://blog.xbow.com/xblog/stargate_xscale_platform/&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Stargate Resources: &lt;a href="http://platformx.sourceforge.net/Links/resource.html"&gt;http://platformx.sourceforge.net/Links/resource.html &lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4391686554026969818-3241839045461502899?l=koder3.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://koder3.blogspot.com/feeds/3241839045461502899/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://koder3.blogspot.com/2009/05/programming-garcia-robot.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4391686554026969818/posts/default/3241839045461502899'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4391686554026969818/posts/default/3241839045461502899'/><link rel='alternate' type='text/html' href='http://koder3.blogspot.com/2009/05/programming-garcia-robot.html' title='Programming the Garcia Robot'/><author><name>Koder3</name><uri>http://www.blogger.com/profile/03333594866588263978</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_WI5MNKtG0TI/Sh9btLFZoyI/AAAAAAAAAAk/q7YMlk9wnk8/s72-c/garcia.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4391686554026969818.post-1950148327602553909</id><published>2009-05-14T18:09:00.000-07:00</published><updated>2009-05-14T18:13:00.123-07:00</updated><title type='text'>A Real-time Systems Tutorial</title><content type='html'>Real-time systems design is an increasingly important topic in systems research communities as well as the software industries.  Real-time applications and their requirements can be found in almost every area of operating systems and networking research.  An incomplete list of such domains includes distributed systems, embedded systems, network protocol processing, aircraft design, spacecraft design..., and the list goes on.  This page is intended to give the reader a brief, however incomplete, introduction to the concept of real-time tasks and their implications in the design of operating systems. &lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;&lt;h3 id="section2"&gt;Table of Contents:&lt;/h3&gt;&lt;br /&gt;&lt;a href="#section1"&gt;1. Introduction&lt;/a&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="#section2"&gt;2. Table of contents&lt;/a&gt;&lt;br&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="#section3"&gt;3. Real-time task models&lt;/a&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="#section4"&gt;4. Real-time scheduling&lt;/a&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="#section5"&gt;5. Hard vs. soft real-time&lt;/a&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="#section6"&gt;6. Real-time networking&lt;/a&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="#section7"&gt;7. A survey of real-time systems&lt;/a&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="#section8"&gt;8. Conclusion&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3 id="section3"&gt;Real-time Task Models&lt;/h3&gt;&lt;br /&gt;In order to understand the functions of a real-time system, it is necessary to develop models of real-time activities and study their characteristics. &lt;br /&gt;In this section, two fundamental real-time task models are described: &lt;i&gt; periodic &lt;/i&gt; and &lt;i&gt; aperiodic &lt;/i&gt; real-time tasks.&lt;br /&gt;&lt;br /&gt;&lt;h4&gt; Periodic Real-time Tasks &lt;/h4&gt; &lt;br /&gt;In general, a real-time task requires a specified amount of particular resources during specified periods of time.  A periodic task is a task that requests resources at time values representing a periodic function.  That is, there is a continuous and deterministic pattern of time intervals between requests of a resource.  In addition to this requirement, a &lt;i&gt; real-time &lt;/i&gt; periodic task must complete processing by a specified deadline relative to the time that it acquires the processor (or some other resource).&lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;For simplicity, assume that a real-time task has a constant request period (ie, must begin execution on the processor every n milliseconds).  We can characterize such tasks as a tuple of values, (C, T), where T denotes the &lt;i&gt;request period&lt;/i&gt;, which corresponds to the amount of time between temporally adjacent requests, and C represents the &lt;i&gt;service time&lt;/i&gt;, or the amount of time for which the task must execute (in total) before a deadline, which is typically assumed to occur at the start of the next request period. &lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;For example, a robotics application may consist of a number of periodic real-time tasks, which perform activities such as sensor data collection or regular network transmissions.  Suppose the robot runs a task that must collect infrared sensor data to determine if a barrier is nearby at regular time intervals.  If the configuration of this task requires that every 5 milliseconds it must complete 2 milliseconds of collecting and processing the sensor data, then the task is a periodic real-time task, and can be characterized in this model as the tuple: (C=2ms, T=5ms).&lt;br /&gt;&lt;h4&gt; Aperiodic Real-time Tasks &lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;The &lt;i&gt;aperiodic real-time task&lt;/i&gt; model involves real-time activities that request a resource during non-deterministic request periods.  There may either be no bound or only a statistical bound on the arrival period of&lt;br /&gt;individual requests (task instances).  Each task instance is also associated with a specified deadline, which represents the time necessary for it to complete its execution.&lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;Examples of aperiodic real-time tasks are found in event-driven real-time systems, such as ejection of a pilot seat when the command is given to the navigation system in a jet fighter.  Many, less time-sensitive, applications also arise in distributed systems involving real-time streaming media (ie, end-host routing over a logical overlay). &lt;br /&gt;&lt;br /&gt;&lt;h3 id="section4"&gt;Real-time Scheduling&lt;/h3&gt;&lt;br /&gt;One of the most important responsibilities of a real-time system is to schedule tasks according to their deadlines in order to guarantee that all real-time activities achieve the required service level.  Many scheduling algorithms exist for a variety of task models, but fundamental to many of these are the &lt;i&gt;earliest deadline first&lt;/i&gt; (EDF) and &lt;i&gt;rate-monotonic&lt;/i&gt; (RM) scheduling policies. A schedule for a set of tasks is said to be &lt;i&gt;feasible&lt;/i&gt; if a proof exists that every task instance in the set will complete processing by its associated deadline.&lt;br /&gt;Also, a task set is &lt;i&gt;schedulable&lt;/i&gt; if there exists a feasible schedule for the set.  The &lt;i&gt;utilization&lt;/i&gt; associated with a given task schedule and resource (ie, CPU) is the fraction of time that the resource is allocated over the time that the scheduler is active.&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;h4&gt; Earliest Deadline First &lt;/h4&gt;&lt;br /&gt;The EDF scheduling algorithm is a preemptive and dynamic priority scheduler that executes tasks in order of the time remaining before their corresponding deadlines.  Tasks with the least time remaining before their deadline are executed before tasks with more remaining time.  At each invocation of the scheduler, the remaining time is calculated for each waiting task, and the task with the least remaining time is dispatched.  If a task set is schedulable, the EDF algorithm results in a schedule that achieves optimal resource utilization.  However, EDF is shown to be unpredictable if the required utilization exceeds 100%, known as an &lt;i&gt;overload condition&lt;/i&gt;.  EDF is useful for scheduling aperiodic tasks, since the dynamic priorities of tasks do not depend on the determinism of request periods.&lt;br /&gt;&lt;br /&gt;&lt;h4&gt; Rate Monotonic &lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;Recall the parameterization of a real-time periodic task as a tuple (C, T).  Under the static-priority rate monotonic scheduling algorithm, tasks are ordered according to the value of their request period, T. Tasks with shorter request periods are assigned higher priority than those with longer periods.  Liu and Layland proved that a feasible schedule is found under rate monotonic scheduling if the total requested utilization is less than or equal to ln(2), which is approximately 69.3%.   &lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;RM scheduling is optimal with respect to maximum utilization over all static-priority schedulers.  However, this scheduling policy only supports tasks that fit the periodic task model, since priorities depend upon request periods.  Because the request times of aperiodic tasks are not always predictable, these tasks are not supported by the RM algorithm, but are instead typically scheduled using a dynamic priority scheduler such as EDF. &lt;br /&gt;&lt;br /&gt;&lt;h3 id="section5"&gt;Hard vs. Soft Real-time &lt;/h3&gt;&lt;br /&gt;The issue of predictability is of key concern in real-time systems.  In fact, the term "predictable" can often be found in the multitude of definitions of real-time in the literature. Both periodic and aperiodic tasks have strict timing requirements in the form of deadlines.  That is, the scheduler must be able to guarantee that each task is allocated a resource by a particular point in time, based on the task's parameters.  In order to accomplish this, every allocation of any resource to any task must incur a latency that is deterministic or that can at least be predicted within a statistical margin of error.  &lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Hard real-time&lt;/i&gt; tasks are required to meet &lt;i&gt;all&lt;/i&gt; deadlines for &lt;i&gt;every&lt;/i&gt; instance, and for these activities the failure to meet even a single deadline is considered catastrophic.  Examples are found in flight navigation, automobile, and spacecraft systems.  In contrast, &lt;i&gt;soft real-time&lt;/i&gt; tasks allow for a statistical bound on the number of deadlines missed, or on the allowable lateness of completing processing for an instance in relation to a deadline.  Soft real-time applications include media streaming in distributed systems and non-mission-ciritical tasks in control systems.     &lt;br /&gt;&lt;br /&gt;&lt;h3 id="section6"&gt;Real-time Networking&lt;/h3&gt;&lt;br /&gt;The growth in number of real-time distributed applications has fueled interest in methods for providing predictable communication-related services.  Consider a wireless sensor network designed to locate injured civilians in disaster areas, or a networked application that allows a surgeon to control medical instruments from a distance.  Many researchers are developing real-time network protocols and operating system mechanisms to ensure predictability of data distribution for such applications.&lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;The real-time transport protocol (RTP) is a protocol based on UDP and is typically used in video/audio streaming applications in which there is a high tolerance for packet loss and low tolerance for delay variation.  RTP is used in both multicast and unicast scenarios along with RTCP for transmission of control messages that carry quality feedback data.&lt;br /&gt;&lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;Even if real-time networking protocols are employed, predictable service may not be possible unless end-host operating systems are equipped with appropriate mechanisms for processing incoming/outgoing network packets.  These services typically dispatch event handlers in interrupt context in order to respond to events on the network, such as a packet received or ready for transmission.  Systems such as RTLinux and RTAI, when used in conjunction with extensions like RTNet, provide deterministic timing guarantees on network packet processing and thus allow for real-time communication assuming that the low level network protocols in use are sufficiently predictable. &lt;br /&gt;&lt;br /&gt;&lt;h3 id="section7"&gt;A Survey of Real-time Systems&lt;/h3&gt;&lt;br /&gt;The following is a list of some well-known real-time systems and links to more information:&lt;br /&gt;&lt;br&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="http://www.qnx.com/products/rtos/index.html"&gt;QNX Neutrino Realtime Operating System&lt;/a&gt;&lt;br /&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="http://www.windriver.com/vxworks"&gt;VxWorks RTOS&lt;/a&gt;&lt;br /&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="http://www.fsmlabs.com/rtlinuxpro.html"&gt;RTLinuxPro&lt;/a&gt;&lt;br /&gt; and &lt;a href="http://www.fsmlabs.com/rtlinuxfree.html"&gt;RTLinuxFree&lt;/a&gt; at &lt;a href="http://www.fsmlabs.com"&gt;FSMLabs&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="https://www.rtai.org"&gt;RTAI&lt;/a&gt; - the RealTime Application Interface for Linux&lt;br /&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="http://www.mvista.com/products/realtime.html"&gt;Montavista Real-time Linux&lt;/a&gt;&lt;br /&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="http://www.xenomai.org"&gt;Xenomai&lt;/a&gt; - Real-time framework for Linux&lt;br /&gt;&lt;br&gt;&lt;br /&gt;&lt;a href="http://www.rts.uni-hannover.de/rtnet/"&gt;RTnet&lt;/a&gt; - Hard Real-Time Networking for Real-Time Linux&lt;br /&gt;&lt;h3 id="section8"&gt;Conclusion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;As real-time applications become ubiquitous in various areas of computing research, there is motivation to extend existing systems with the mechanisms and policies necessary for providing predictable service.  Also, many new real-time task models are being investigated in order to devise suitable scheduling algorithms for these new applications.  Computer scientists will continue to analyze these new methods and apply them in situations where predictability and timeliness is of key consideration.  Real-time systems is a rich domain for further study and will most likely continue to thrive for years to come.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4391686554026969818-1950148327602553909?l=koder3.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://koder3.blogspot.com/feeds/1950148327602553909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://koder3.blogspot.com/2009/05/real-time-systems-tutorial.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4391686554026969818/posts/default/1950148327602553909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4391686554026969818/posts/default/1950148327602553909'/><link rel='alternate' type='text/html' href='http://koder3.blogspot.com/2009/05/real-time-systems-tutorial.html' title='A Real-time Systems Tutorial'/><author><name>Koder3</name><uri>http://www.blogger.com/profile/03333594866588263978</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
