Report

Event-driven Programming for Robust Software

To view this page ensure that Adobe Flash Player version 9.0.124 or greater is installed.

Get Adobe Flash player
Please login or register to make a comment!

Event-driven Programming for Robust Software Frank Dabek, Nickolai Zeldovich, Frans Kaashoek, David Mazi` eres, Robert Morris MIT Laboratory for Computer Science { fdabek,kolya,kaashoek,rtm } @lcs.mit.edu, dm@scs.cs.nyu.edu Abstract Events are a better means of managing I/O concurrency in server software than threads: events help avoid bugs caused by the unnecessary CPU concurrency introduced by threads. Event-based programs also tend to have more sta- ble performance under heavy load than threaded programs. We argue that our libasync non-blocking I/O library makes event-based programming convenient and evaluate exten- sions to the library that allow event-based programs to take advantage of multi-processors.

We conclude that events pro- vide all the bene ts of threads, with substantially less com- plexity; the result is more robust software. 1 Introduction The debate over whether threads or events are best suited to systems software has been raging for decades. The cen- tral question of this debate is whether threads or events should be used to manage concurrent I/O.

Many prefer threads because they preserve the appearance of serial pro- gramming and can take advantage of multi-processor hard- ware. Programmers nd programming with threads dif - cult, however, and as a result produce buggy software. This paper contributes to the debate ... more. less.

by showing that event- based programming can provide a convenient programming model, that it is naturally robust, and that it can also be ex- tended to take advantage of multi-processors.<br><br> For these rea- sons, we argue that there is no reason to use threads for managing concurrent I/O in system programs 4events pro- vide all the bene ts that threads provide, but in a more ro- bust way. Thread-based programs use multiple threads of control within a single program in a single address space [3]. Threaded programs achieve I/O concurrency by suspend- ing a thread blocked on I/O and resuming execution in a different thread.<br><br> Under this model, the programmer must carefully protect shared data structures with locks and use condition variables to coordinate the execution of threads. Event-based programs are organized around the process- ing of events. When a program cannot complete an opera- tion immediately because it has to wait for an event (e.g., the arrival of a packet or the completion of a disk transfer), it registers a callback 4a function that will be invoked when the event occurs.<br><br> Event-based programs are typically driven by a loop that polls for events and executes the appropriate callback when the event occurs. A callback executes indi- visibly until it hits a blocking operation, at which point it registers a new callback and then returns. In this position paper, we show that writing programs in the event-based model can be convenient and show how to extend the event-based model to exploit multi-processors in a way that requires programmers to make only minor changes to their code.<br><br> This multi-processor event model makes it easy to capture the levels of CPU concurrency typically available to systems programs, but does not re- quire the programmer to insert locks or condition vari- ables. The multi-processor support is part of the libasync asynchronous programming library. Unlike programming with threads, libasync doesn 9t force programmers to man- age thread synchronization, doesn 9t force programmers to guess how much space to reserve for stacks, provides high performance under high load, and doesn 9t have hidden per- formance costs.<br><br> As a result, programs written with libasync tend to be robust. Based on our experience with the library, we believe there is no reason to use threads for managing concurrent I/O. 2 Threads lead to unreliable software The main advantage of threads is that they allow the pro- grammer to overlap I/O and computation while preserving the appearance of a serial programming model.<br><br> The main disadvantage of threads is that they introduce concurrent ex- ecution even where it is not needed. This concurrency un- necessarily forces programmers to cope with synchroniza- tion between threads. In practice, threaded programs almost always have latent data races and deadlocks and, as a re- sult, they are not robust.<br><br> In our view, programmers must be spared the complexities of concurrency to make software more robust. 1 The problems with thread-based programming have long been recognized. Ousterhout argues that the convenience of threads is not worth the errors they encourage, except for multi-processor code [11].<br><br> Engler et al. demonstrate that synchronization errors, particularly potential deadlocks, are common in the Linux kernel [5]. Savage et al.<br><br> [13] found races in both student code and production servers. While it might seem that one could eliminate synchro- nization by moving to non-pre-emptive threads on a unipro- cessor, errors will still arise if a thread blocks (yields the CPU) unexpectedly. In complex systems, it is easy for pro- grammers of one module not to understand the exact block- ing behavior of code in other modules.<br><br> Thus, one can eas- ily call a blocking function within a critical section with- out realizing that pre-emption may occur [1]. Deadlocks are also a possibility when using non-preemptive threads, as one still needs to hold locks across known yields. Adya et al.<br><br> propose the use of non-preemptive threads augmented with runtime checks to detect unexpected yields in submod- ules [2] and describe a new taxonomy for I/O concurrency techniques. They also show how to integrate code written in the event-driven style (described in their taxonomy as co- operative task management and manual stack management) with non-preemptive threads (cooperative task management and automatic stack mangement) in the same program. Another complication of threads is the need to predict the maximum size of a stack.<br><br> Most systems conservatively use one or two pages of memory per thread stack 4far more than the few hundred bytes typically required per callback. ThedesignersoftheMachkernelfoundstackmemoryover- head so high that they restructured the kernel to use ccontin- uations, d essentially rewriting the kernel in an event-driven style [4]. Stack memory overhead is especially burdensome in embedded environments where memory is scarce.<br><br> The standard practice of allocating full pages for thread stacks also causes additional TLB pressure and cache pressure, particularly with direct-mapped caches [4]. Cohort schedul- ing attempts to increase data and instruction locality in threaded programs by running related computations, orga- nized as cstages, d consecutively [7]. Robust software must gracefully handle overload con- ditions.<br><br> Both Pai et al. [12] and Welsh et al. [14] explore the advantages of event-driven programs under high load.<br><br> Welsh demonstrates that the throughput of a simple server using kernel-supported threads degrades rapidly as many concurrent threads are required. Pai extends the traditional event-driven architecture to overcome the lack of support for non-blocking disk I/O in most UNIX implementations by using helper processes to handle disk reads. On work- loads involving many concurrent clients, Pai 9s event-based Web server provides higher performance than a server using kernel-based threads.<br><br> Writing programs in the event-driven style alleviates all of the problems mentioned above. Data races are not a con- cern since event-based programs use a single thread of exe- cution. Event-driven programs need only allocate the mem- ory required to hold a callback function pointer and its argu- ments, not a whole thread stack; this reduces overall mem- ory usage.<br><br> In addition, these pointers and arguments can be allocated roughly contiguously, reducing TLB pressure. The livelock-like behavior of threaded servers under high load is avoided by event driven programs that queue events that cannot be serviced rather than dedicating resources to them [10]. Lauer and Needham observe that programs based on message passing (which corresponds to the event-driven model) have a dual constructed in the procedure-oriented (threaded) model (and vice versa) [8].<br><br> Based on this obser- vation, the authors conclude that neither model is inherently preferable. The model described by Lauer and Needham does not exploit the fact that coordination is considerably simpler when using non-preemptive scheduling. As a result the authors 9 conclusion neglects the advantages of the lower synchronization burden offered by the event-driven model.<br><br> While this paper focuses on user-level servers, the ar- guments apply to operating system kernels as well. In this realm, the event-driven architecture corresponds to a kernel driven by transient interrupts and traps. Ford et al.<br><br> compare event-driven kernels with threaded kernels in the context of Fluke [6]. 3 Making asynchronous programming easy The most cited drawback of the event-driven model is programming dif culty. Threaded programs can be struc- tured as a single ow of control, using standard linguis- tic constructs such as loops across blocking calls.<br><br> Event- driven programs, in contrast, require a series of small call- back functions, one for each blocking operation. Any stack- allocated variables disappear across callbacks. Thus, event- driven programs rely heavily on dynamic memory alloca- tion and are more prone to memory errors in low-level lan- guages such as C and C++.<br><br> As an example, consider the following hypothetical asynchronous write function: void awrite (int fd, char *buf, size_t size, void (*cb) (void *), void *arg); awrite might return after arranging for the following to happen: as soon as the le descriptor becomes writeable, write the contents of buf to the descriptor, and then call cb with arg . arg is state to preserve across the callback 4 state that likely would be stack-allocated in a threaded pro- gram. A number of bugs arise with functions like awrite .<br><br> For example, awrite probably assumes buf will remain intact until the callback, while a programmer might be 2 tempted to use a stack-allocated buffer. Moreover, the cast of arg to void pointer and back is not type safe. The C++ non-blocking I/O library in use by the au- thors, libasync , provides several features to eliminate such memory problems.<br><br> It supplies a generic reference-counted garbage collector so as to free programmers from worrying about which function is responsible for deallocating what data. libasync also provides a type-safe method of passing state between callbacks. A heavily overloaded template function, wrap ,allowstheprogrammer topassdatabetween callbacks with function currying: wrap takes as arguments a function or method pointer and one or more arguments and returns a function object accepting the original func- tion 9s remaining arguments.<br><br> Thus, the state of an operation can be bundled as arguments to successive callbacks; the arguments are type-checked at compile time. Finally, the library also provides classes to help deal with the complications of short I/O operations (i.e., when kernel buffers ll up and a system call such as writev only writes part of the data). The suio class can hold onto a reference counted object until the cprinted d data is consumed by an output call.<br><br> Experience with libasync shows that it is easy to learn and use. We use the library day-to-day when implement- ing network applications. Students have used the library to complete laboratory assignments including web proxies and cryptographic le systems.<br><br> 4 Multi-processor event programming We have modi ed the asynchronous programming li- brary described in Section 3 to take advantage of multi- processors. The modi ed library ( libasync-mp ) provides a simple but effective model for running threads on multiple CPUs, but avoids most of the synchronization complexity of threaded programming models. Programs written with libasync-mp take advantage of multi-processors with a simple concurrency mechanism: each callback is assigned a color by the programmer, and the system guarantees that no two callbacks with the same color will run concurrently.<br><br> Because callbacks are assigned a default color, libasync-mp is backwards compatible with existing event-based applications. This allows the program- mer to incrementally add parallelism to an application by focusing on just the event callbacks that are likely to bene t from parallel execution. In contrast, typical uses of threads involve parallelizing the entire computation, by (for exam- ple) creating one server thread per client; this means that all mutable non-thread-private data must be synchronized.<br><br> The concurrency control model provided by libasync-mp also avoids deadlocks: a callback has only one color, so cy- cles can 9t arise; even if multi-color callbacks were allowed, the colors are pre-declared, so deadlock avoidance would be easy. The libasync-mp model is more restrictive than many thread synchronization models; for example, there is cur- rently no concept of a read-only color. However, our expe- riencesuggeststhatthemodelissuf cienttoobtainaboutas much parallel speedup as could typically be obtained from threads.<br><br> libasync-mp maintains a single queue of pending call- backs. The library uses one kernel thread per CPU to ex- ecute callbacks. Each kernel thread repeatedly dequeues a callback that is eligible to run under the color constraints and executes it.<br><br> An additional callback, inserted by the li- brary, calls the select () system call to add new call- backs to the queue when the events they correspond to oc- cur. Ideal support for multi-processor programming would make it easy to port existing programs to a multi-processor, and would make it easy to modify programs to get good par- allel speedup. Thus we are interested in two metrics: per- formance and ease of programming.<br><br> We evaluate the two criteria in an example application: the SFS le server [9]. Our performance results were obtained on a 700 MHz 4-processor Pentium III Xeon system running Linux ker- nel 2.4.18. Processor scaling results were obtained by com- pletely disabling all but a certain number of processors on the server while running the benchmark.<br><br> All communication between the SFS server and clients is encrypted using a symmetric stream cipher, and authenti- cated with a keyed cryptographic hash. Because of its use of encryption, the SFS server is compute-bound under heavy workloads and therefore we expect that by using libasync- mp we can extract signi cant multiprocessor speedup. The modi cations to the SFS server are concentrated in the code that encrypts, decrypts, and authenticates data sent to and received from the clients.<br><br> The callback respon- sible for sending data to the client is representative of how we parallelized this server: we split the callback into three smaller callbacks. The rst and last remain synchronized with the rest of the server code (i.e. have the default color), and copy data to be transmitted into and out of a per-client buffer.<br><br> The second callback encrypts the data in the client buffer, and runs in parallel with other callbacks (i.e., has a different color for each client). The amount of effort re- quired to achieve this parallel speedup was approximately 90 lines of changed code, out of a total of roughly 12,000 in the SFS server. We measured the total throughput of the le server to all clients, in bits per second, when multiple clients read a 200 MByte le whose contents remained in the server 9s disk buffer cache.<br><br> We repeated this experiment for different numbers of processors. Thebarslabeled clibasync-mp dinFigure1showtheper- 3 0 1 2 3 4 Number of CPUs 0 10 20 30 Throughput (MBytes/second) libasync-mp N-copy Figure 1. Performance of the SFS le server using different numbers of CPUs, relative to the performance on one CPU.<br><br> formance of the parallelized SFS server on the throughput test. On a single CPU, the parallelized server is 0.95 times as fast as the original uniprocessor server. The parallelized server is 1.62, 2.18, and 2.55 times as fast as the original uniprocessor server on two, three and four CPUs, respec- tively.<br><br> Further parallelization of the SFS server code would allow it to incrementally take advantage of more processors. To explore the performance limits imposed by the hard- ware and operating system, we also measured the total per- formance of multiple independent copies (as many as CPUs were available) of the original libasync SFS server code. In practice, such a con guration would not work unless each server were serving a distinct le system.<br><br> An SFS server maintains mutable per- le-system state, such as attribute leases, that would require synchronization among the server processes. This test gives an upper bound on the perfor- mance that SFS with libasync-mp could achieve. The results of this test are labeled cN-copy d in Figure 1.<br><br> The SFS server implemented using libasync-mp closely fol- lows the aggregate performance of multiple independent server copies for up to three CPUs. The performance differ- ence for the 2- and 3-processor cases is due to the penalty incurred due to shared state maintained by the server, such as le lease data, user ID mapping tables, and so on. 5 Conclusions The traditional wisdom regarding event-driven program- ming holds that it offers superior performance but is too dif- cult to program and cannot take advantage of SMP hard- ware.<br><br> We have shown that, by using libasync-mp , program- mers can easily write event-driven applications and take ad- vantage of multiple processors. References [1] The Amoeba reference manual: Programming guide. http://www.cs.vu.nl/pub/amoeba/manuals/pro.pdf.<br><br> [2] A DYA , A., H OWELL , J., T HEIMER , M., B OLOSKY , W. J., AND D OUCEUR , J. R.<br><br> Cooperative task management with- out manual stack management or, event-driven programming isnottheoppositeofthreadedprogramming. In Proc.Usenix Technical Conference (2002). [3] B IRRELL , A.<br><br> D. An introduction to programming with threads. Tech.<br><br> Rep. SRC 35, Digital SRC, 1989. [4] D RAVES , R.<br><br> P., B ERSHAD , B. N., R ASHID , R. F., AND D EAN , R.<br><br> W. Using continuations to implement thread management and communication in operating systems. In Proceedings of the 13th ACM Symposium on Operating Sys- tems Principles (1991), Association for Computing Machin- ery SIGOPS, pp.<br><br> 122 3136. [5] E NGLER , D., C HELF , B., C HOU , A., AND H ALLEM , S. Checking system rules using system-speci c, programmer- written compiler extensions.<br><br> In Usenix Symposium on Oper- ating Systems Design and Implementation (OSDI) (2000). [6] F ORD , B., H IBLER , M., L EPREAU , J., M C G RATH , R., AND T ULLMANN , P. Interface and execution models in the Fluke kernel.<br><br> In Operating Systems Design and Implemen- tation (1999), pp. 101 3115. [7] L ARUS , J.<br><br> R., AND P ARKES , M. Using cohort scheduling to enhance server performance. In Proc.<br><br> Usenix Technical Conference (2002). [8] L AUER , H. C., AND N EEDHAM , R.<br><br> M. On the duality of operating system structures. In Proc.<br><br> Second Interna- tional Symposium on Operating Systems, IRIA (Oct. 1978). Reprinted in Operating Systems Review, Vol.<br><br> 12, Number 2, April 1979. [9] M AZI ` ERES , D., K AMINSKY , M., K AASHOEK , M. F., AND W ITCHEL , E.<br><br> Separating key management from le system security. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP 999) (Kiawah Island, South Carolina, December 1999). [10] M OGUL , J.<br><br> C., AND R AMAKRISHNAN , K. K. Eliminating receive livelock in an interrupt-driven kernel.<br><br> ACM Trans. Comput. Syst.<br><br> 15 , 3 (Aug. 1997), 217 3252. [11] O USTERHOUT , J.<br><br> K. Why threads are a bad idea (for most purposes). Invited talk at the 1996 USENIX technical con- ference, 1996.<br><br> [12] P AI , V., D RUSCHEL , P., AND Z WAENEPOEL , W. Flash: An ef cient and portable web server. In Proceedings of the 1999 Annual Usenix Technical Conference (June 1999).<br><br> [13] S AVAGE , S., B URROWS , M., N ELSON , G., S OBALVARRO , P., AND A NDERSON , T. Eraser: A dynamic data race detec- tor for multithreaded programs. ACM Transactions on Com- puter Systems 15 , 4 (1997), 391 3411.<br><br> [14] W ELSH , M., C ULLER , D., AND B REWER , E. SEDA: An architecture for well-conditioned, scalable Internet services. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (Oct.<br><br> 2001), pp. 230 3243. 4<br><br>

less

Copyright © 2010 beepdf.com. All rights reserved.