blob: 17774b1f75870d41e2f70c826b22e83f0e4de56c [file] [log] [blame]
Mauro Carvalho Chehabe548cde2017-05-11 09:55:30 -03001===========================
2Unreliable Guide To Locking
3===========================
4
5:Author: Rusty Russell
6
7Introduction
8============
9
10Welcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking
11issues. This document describes the locking systems in the Linux Kernel
12in 2.6.
13
14With the wide availability of HyperThreading, and preemption in the
15Linux Kernel, everyone hacking on the kernel needs to know the
16fundamentals of concurrency and locking for SMP.
17
18The Problem With Concurrency
19============================
20
21(Skip this if you know what a Race Condition is).
22
23In a normal program, you can increment a counter like so:
24
25::
26
27 very_important_count++;
28
29
30This is what they would expect to happen:
31
32+------------------------------------+------------------------------------+
33| Instance 1 | Instance 2 |
34+====================================+====================================+
35| read very_important_count (5) | |
36+------------------------------------+------------------------------------+
37| add 1 (6) | |
38+------------------------------------+------------------------------------+
39| write very_important_count (6) | |
40+------------------------------------+------------------------------------+
41| | read very_important_count (6) |
42+------------------------------------+------------------------------------+
43| | add 1 (7) |
44+------------------------------------+------------------------------------+
45| | write very_important_count (7) |
46+------------------------------------+------------------------------------+
47
48Table: Expected Results
49
50This is what might happen:
51
52+------------------------------------+------------------------------------+
53| Instance 1 | Instance 2 |
54+====================================+====================================+
55| read very_important_count (5) | |
56+------------------------------------+------------------------------------+
57| | read very_important_count (5) |
58+------------------------------------+------------------------------------+
59| add 1 (6) | |
60+------------------------------------+------------------------------------+
61| | add 1 (6) |
62+------------------------------------+------------------------------------+
63| write very_important_count (6) | |
64+------------------------------------+------------------------------------+
65| | write very_important_count (6) |
66+------------------------------------+------------------------------------+
67
68Table: Possible Results
69
70Race Conditions and Critical Regions
71------------------------------------
72
73This overlap, where the result depends on the relative timing of
74multiple tasks, is called a race condition. The piece of code containing
75the concurrency issue is called a critical region. And especially since
76Linux starting running on SMP machines, they became one of the major
77issues in kernel design and implementation.
78
79Preemption can have the same effect, even if there is only one CPU: by
80preempting one task during the critical region, we have exactly the same
81race condition. In this case the thread which preempts might run the
82critical region itself.
83
84The solution is to recognize when these simultaneous accesses occur, and
85use locks to make sure that only one instance can enter the critical
86region at any time. There are many friendly primitives in the Linux
87kernel to help you do this. And then there are the unfriendly
88primitives, but I'll pretend they don't exist.
89
90Locking in the Linux Kernel
91===========================
92
93If I could give you one piece of advice: never sleep with anyone crazier
94than yourself. But if I had to give you advice on locking: *keep it
95simple*.
96
97Be reluctant to introduce new locks.
98
99Strangely enough, this last one is the exact reverse of my advice when
100you *have* slept with someone crazier than yourself. And you should
101think about getting a big dog.
102
103Two Main Types of Kernel Locks: Spinlocks and Mutexes
104-----------------------------------------------------
105
106There are two main types of kernel locks. The fundamental type is the
107spinlock (``include/asm/spinlock.h``), which is a very simple
108single-holder lock: if you can't get the spinlock, you keep trying
109(spinning) until you can. Spinlocks are very small and fast, and can be
110used anywhere.
111
112The second type is a mutex (``include/linux/mutex.h``): it is like a
113spinlock, but you may block holding a mutex. If you can't lock a mutex,
114your task will suspend itself, and be woken up when the mutex is
115released. This means the CPU can do something else while you are
116waiting. There are many cases when you simply can't sleep (see
117`What Functions Are Safe To Call From Interrupts? <#sleeping-things>`__),
118and so have to use a spinlock instead.
119
120Neither type of lock is recursive: see
121`Deadlock: Simple and Advanced <#deadlock>`__.
122
123Locks and Uniprocessor Kernels
124------------------------------
125
126For kernels compiled without ``CONFIG_SMP``, and without
127``CONFIG_PREEMPT`` spinlocks do not exist at all. This is an excellent
128design decision: when no-one else can run at the same time, there is no
129reason to have a lock.
130
131If the kernel is compiled without ``CONFIG_SMP``, but ``CONFIG_PREEMPT``
132is set, then spinlocks simply disable preemption, which is sufficient to
133prevent any races. For most purposes, we can think of preemption as
134equivalent to SMP, and not worry about it separately.
135
136You should always test your locking code with ``CONFIG_SMP`` and
137``CONFIG_PREEMPT`` enabled, even if you don't have an SMP test box,
138because it will still catch some kinds of locking bugs.
139
140Mutexes still exist, because they are required for synchronization
141between user contexts, as we will see below.
142
143Locking Only In User Context
144----------------------------
145
146If you have a data structure which is only ever accessed from user
147context, then you can use a simple mutex (``include/linux/mutex.h``) to
148protect it. This is the most trivial case: you initialize the mutex.
149Then you can call :c:func:`mutex_lock_interruptible()` to grab the
150mutex, and :c:func:`mutex_unlock()` to release it. There is also a
151:c:func:`mutex_lock()`, which should be avoided, because it will
152not return if a signal is received.
153
154Example: ``net/netfilter/nf_sockopt.c`` allows registration of new
155:c:func:`setsockopt()` and :c:func:`getsockopt()` calls, with
156:c:func:`nf_register_sockopt()`. Registration and de-registration
157are only done on module load and unload (and boot time, where there is
158no concurrency), and the list of registrations is only consulted for an
159unknown :c:func:`setsockopt()` or :c:func:`getsockopt()` system
160call. The ``nf_sockopt_mutex`` is perfect to protect this, especially
161since the setsockopt and getsockopt calls may well sleep.
162
163Locking Between User Context and Softirqs
164-----------------------------------------
165
166If a softirq shares data with user context, you have two problems.
167Firstly, the current user context can be interrupted by a softirq, and
168secondly, the critical region could be entered from another CPU. This is
169where :c:func:`spin_lock_bh()` (``include/linux/spinlock.h``) is
170used. It disables softirqs on that CPU, then grabs the lock.
171:c:func:`spin_unlock_bh()` does the reverse. (The '_bh' suffix is
172a historical reference to "Bottom Halves", the old name for software
173interrupts. It should really be called spin_lock_softirq()' in a
174perfect world).
175
176Note that you can also use :c:func:`spin_lock_irq()` or
177:c:func:`spin_lock_irqsave()` here, which stop hardware interrupts
178as well: see `Hard IRQ Context <#hardirq-context>`__.
179
180This works perfectly for UP as well: the spin lock vanishes, and this
181macro simply becomes :c:func:`local_bh_disable()`
182(``include/linux/interrupt.h``), which protects you from the softirq
183being run.
184
185Locking Between User Context and Tasklets
186-----------------------------------------
187
188This is exactly the same as above, because tasklets are actually run
189from a softirq.
190
191Locking Between User Context and Timers
192---------------------------------------
193
194This, too, is exactly the same as above, because timers are actually run
195from a softirq. From a locking point of view, tasklets and timers are
196identical.
197
198Locking Between Tasklets/Timers
199-------------------------------
200
201Sometimes a tasklet or timer might want to share data with another
202tasklet or timer.
203
204The Same Tasklet/Timer
205~~~~~~~~~~~~~~~~~~~~~~
206
207Since a tasklet is never run on two CPUs at once, you don't need to
208worry about your tasklet being reentrant (running twice at once), even
209on SMP.
210
211Different Tasklets/Timers
212~~~~~~~~~~~~~~~~~~~~~~~~~
213
214If another tasklet/timer wants to share data with your tasklet or timer
215, you will both need to use :c:func:`spin_lock()` and
216:c:func:`spin_unlock()` calls. :c:func:`spin_lock_bh()` is
217unnecessary here, as you are already in a tasklet, and none will be run
218on the same CPU.
219
220Locking Between Softirqs
221------------------------
222
223Often a softirq might want to share data with itself or a tasklet/timer.
224
225The Same Softirq
226~~~~~~~~~~~~~~~~
227
228The same softirq can run on the other CPUs: you can use a per-CPU array
229(see `Per-CPU Data <#per-cpu>`__) for better performance. If you're
230going so far as to use a softirq, you probably care about scalable
231performance enough to justify the extra complexity.
232
233You'll need to use :c:func:`spin_lock()` and
234:c:func:`spin_unlock()` for shared data.
235
236Different Softirqs
237~~~~~~~~~~~~~~~~~~
238
239You'll need to use :c:func:`spin_lock()` and
240:c:func:`spin_unlock()` for shared data, whether it be a timer,
241tasklet, different softirq or the same or another softirq: any of them
242could be running on a different CPU.
243
244Hard IRQ Context
245================
246
247Hardware interrupts usually communicate with a tasklet or softirq.
248Frequently this involves putting work in a queue, which the softirq will
249take out.
250
251Locking Between Hard IRQ and Softirqs/Tasklets
252----------------------------------------------
253
254If a hardware irq handler shares data with a softirq, you have two
255concerns. Firstly, the softirq processing can be interrupted by a
256hardware interrupt, and secondly, the critical region could be entered
257by a hardware interrupt on another CPU. This is where
258:c:func:`spin_lock_irq()` is used. It is defined to disable
259interrupts on that cpu, then grab the lock.
260:c:func:`spin_unlock_irq()` does the reverse.
261
262The irq handler does not to use :c:func:`spin_lock_irq()`, because
263the softirq cannot run while the irq handler is running: it can use
264:c:func:`spin_lock()`, which is slightly faster. The only exception
265would be if a different hardware irq handler uses the same lock:
266:c:func:`spin_lock_irq()` will stop that from interrupting us.
267
268This works perfectly for UP as well: the spin lock vanishes, and this
269macro simply becomes :c:func:`local_irq_disable()`
270(``include/asm/smp.h``), which protects you from the softirq/tasklet/BH
271being run.
272
273:c:func:`spin_lock_irqsave()` (``include/linux/spinlock.h``) is a
274variant which saves whether interrupts were on or off in a flags word,
275which is passed to :c:func:`spin_unlock_irqrestore()`. This means
276that the same code can be used inside an hard irq handler (where
277interrupts are already off) and in softirqs (where the irq disabling is
278required).
279
280Note that softirqs (and hence tasklets and timers) are run on return
281from hardware interrupts, so :c:func:`spin_lock_irq()` also stops
282these. In that sense, :c:func:`spin_lock_irqsave()` is the most
283general and powerful locking function.
284
285Locking Between Two Hard IRQ Handlers
286-------------------------------------
287
288It is rare to have to share data between two IRQ handlers, but if you
289do, :c:func:`spin_lock_irqsave()` should be used: it is
290architecture-specific whether all interrupts are disabled inside irq
291handlers themselves.
292
293Cheat Sheet For Locking
294=======================
295
296Pete Zaitcev gives the following summary:
297
298- If you are in a process context (any syscall) and want to lock other
299 process out, use a mutex. You can take a mutex and sleep
300 (``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``).
301
302- Otherwise (== data can be touched in an interrupt), use
303 :c:func:`spin_lock_irqsave()` and
304 :c:func:`spin_unlock_irqrestore()`.
305
306- Avoid holding spinlock for more than 5 lines of code and across any
307 function call (except accessors like :c:func:`readb()`).
308
309Table of Minimum Requirements
310-----------------------------
311
312The following table lists the *minimum* locking requirements between
313various contexts. In some cases, the same context can only be running on
314one CPU at a time, so no locking is required for that context (eg. a
315particular thread can only run on one CPU at a time, but if it needs
316shares data with another thread, locking is required).
317
318Remember the advice above: you can always use
319:c:func:`spin_lock_irqsave()`, which is a superset of all other
320spinlock primitives.
321
Mauro Carvalho Chehab5b9fd1d2017-05-11 10:45:47 -0300322============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
323. IRQ Handler A IRQ Handler B Softirq A Softirq B Tasklet A Tasklet B Timer A Timer B User Context A User Context B
324============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
325IRQ Handler A None
326IRQ Handler B SLIS None
327Softirq A SLI SLI SL
328Softirq B SLI SLI SL SL
329Tasklet A SLI SLI SL SL None
330Tasklet B SLI SLI SL SL SL None
331Timer A SLI SLI SL SL SL SL None
332Timer B SLI SLI SL SL SL SL SL None
333User Context A SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH None
334User Context B SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH MLI None
335============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
Mauro Carvalho Chehabe548cde2017-05-11 09:55:30 -0300336
337Table: Table of Locking Requirements
338
339+--------+----------------------------+
340| SLIS | spin_lock_irqsave |
341+--------+----------------------------+
342| SLI | spin_lock_irq |
343+--------+----------------------------+
344| SL | spin_lock |
345+--------+----------------------------+
346| SLBH | spin_lock_bh |
347+--------+----------------------------+
348| MLI | mutex_lock_interruptible |
349+--------+----------------------------+
350
351Table: Legend for Locking Requirements Table
352
353The trylock Functions
354=====================
355
356There are functions that try to acquire a lock only once and immediately
357return a value telling about success or failure to acquire the lock.
358They can be used if you need no access to the data protected with the
359lock when some other thread is holding the lock. You should acquire the
360lock later if you then need access to the data protected with the lock.
361
362:c:func:`spin_trylock()` does not spin but returns non-zero if it
363acquires the spinlock on the first try or 0 if not. This function can be
364used in all contexts like :c:func:`spin_lock()`: you must have
365disabled the contexts that might interrupt you and acquire the spin
366lock.
367
368:c:func:`mutex_trylock()` does not suspend your task but returns
369non-zero if it could lock the mutex on the first try or 0 if not. This
370function cannot be safely used in hardware or software interrupt
371contexts despite not sleeping.
372
373Common Examples
374===============
375
376Let's step through a simple example: a cache of number to name mappings.
377The cache keeps a count of how often each of the objects is used, and
378when it gets full, throws out the least used one.
379
380All In User Context
381-------------------
382
383For our first example, we assume that all operations are in user context
384(ie. from system calls), so we can sleep. This means we can use a mutex
385to protect the cache and all the objects within it. Here's the code::
386
387 #include <linux/list.h>
388 #include <linux/slab.h>
389 #include <linux/string.h>
390 #include <linux/mutex.h>
391 #include <asm/errno.h>
392
393 struct object
394 {
395 struct list_head list;
396 int id;
397 char name[32];
398 int popularity;
399 };
400
401 /* Protects the cache, cache_num, and the objects within it */
402 static DEFINE_MUTEX(cache_lock);
403 static LIST_HEAD(cache);
404 static unsigned int cache_num = 0;
405 #define MAX_CACHE_SIZE 10
406
407 /* Must be holding cache_lock */
408 static struct object *__cache_find(int id)
409 {
410 struct object *i;
411
412 list_for_each_entry(i, &cache, list)
413 if (i->id == id) {
414 i->popularity++;
415 return i;
416 }
417 return NULL;
418 }
419
420 /* Must be holding cache_lock */
421 static void __cache_delete(struct object *obj)
422 {
423 BUG_ON(!obj);
424 list_del(&obj->list);
425 kfree(obj);
426 cache_num--;
427 }
428
429 /* Must be holding cache_lock */
430 static void __cache_add(struct object *obj)
431 {
432 list_add(&obj->list, &cache);
433 if (++cache_num > MAX_CACHE_SIZE) {
434 struct object *i, *outcast = NULL;
435 list_for_each_entry(i, &cache, list) {
436 if (!outcast || i->popularity < outcast->popularity)
437 outcast = i;
438 }
439 __cache_delete(outcast);
440 }
441 }
442
443 int cache_add(int id, const char *name)
444 {
445 struct object *obj;
446
447 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
448 return -ENOMEM;
449
450 strlcpy(obj->name, name, sizeof(obj->name));
451 obj->id = id;
452 obj->popularity = 0;
453
454 mutex_lock(&cache_lock);
455 __cache_add(obj);
456 mutex_unlock(&cache_lock);
457 return 0;
458 }
459
460 void cache_delete(int id)
461 {
462 mutex_lock(&cache_lock);
463 __cache_delete(__cache_find(id));
464 mutex_unlock(&cache_lock);
465 }
466
467 int cache_find(int id, char *name)
468 {
469 struct object *obj;
470 int ret = -ENOENT;
471
472 mutex_lock(&cache_lock);
473 obj = __cache_find(id);
474 if (obj) {
475 ret = 0;
476 strcpy(name, obj->name);
477 }
478 mutex_unlock(&cache_lock);
479 return ret;
480 }
481
482Note that we always make sure we have the cache_lock when we add,
483delete, or look up the cache: both the cache infrastructure itself and
484the contents of the objects are protected by the lock. In this case it's
485easy, since we copy the data for the user, and never let them access the
486objects directly.
487
488There is a slight (and common) optimization here: in
489:c:func:`cache_add()` we set up the fields of the object before
490grabbing the lock. This is safe, as no-one else can access it until we
491put it in cache.
492
493Accessing From Interrupt Context
494--------------------------------
495
496Now consider the case where :c:func:`cache_find()` can be called
497from interrupt context: either a hardware interrupt or a softirq. An
498example would be a timer which deletes object from the cache.
499
500The change is shown below, in standard patch format: the ``-`` are lines
501which are taken away, and the ``+`` are lines which are added.
502
503::
504
505 --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100
506 +++ cache.c.interrupt 2003-12-09 14:07:49.000000000 +1100
507 @@ -12,7 +12,7 @@
508 int popularity;
509 };
510
511 -static DEFINE_MUTEX(cache_lock);
512 +static DEFINE_SPINLOCK(cache_lock);
513 static LIST_HEAD(cache);
514 static unsigned int cache_num = 0;
515 #define MAX_CACHE_SIZE 10
516 @@ -55,6 +55,7 @@
517 int cache_add(int id, const char *name)
518 {
519 struct object *obj;
520 + unsigned long flags;
521
522 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
523 return -ENOMEM;
524 @@ -63,30 +64,33 @@
525 obj->id = id;
526 obj->popularity = 0;
527
528 - mutex_lock(&cache_lock);
529 + spin_lock_irqsave(&cache_lock, flags);
530 __cache_add(obj);
531 - mutex_unlock(&cache_lock);
532 + spin_unlock_irqrestore(&cache_lock, flags);
533 return 0;
534 }
535
536 void cache_delete(int id)
537 {
538 - mutex_lock(&cache_lock);
539 + unsigned long flags;
540 +
541 + spin_lock_irqsave(&cache_lock, flags);
542 __cache_delete(__cache_find(id));
543 - mutex_unlock(&cache_lock);
544 + spin_unlock_irqrestore(&cache_lock, flags);
545 }
546
547 int cache_find(int id, char *name)
548 {
549 struct object *obj;
550 int ret = -ENOENT;
551 + unsigned long flags;
552
553 - mutex_lock(&cache_lock);
554 + spin_lock_irqsave(&cache_lock, flags);
555 obj = __cache_find(id);
556 if (obj) {
557 ret = 0;
558 strcpy(name, obj->name);
559 }
560 - mutex_unlock(&cache_lock);
561 + spin_unlock_irqrestore(&cache_lock, flags);
562 return ret;
563 }
564
565Note that the :c:func:`spin_lock_irqsave()` will turn off
566interrupts if they are on, otherwise does nothing (if we are already in
567an interrupt handler), hence these functions are safe to call from any
568context.
569
570Unfortunately, :c:func:`cache_add()` calls :c:func:`kmalloc()`
571with the ``GFP_KERNEL`` flag, which is only legal in user context. I
572have assumed that :c:func:`cache_add()` is still only called in
573user context, otherwise this should become a parameter to
574:c:func:`cache_add()`.
575
576Exposing Objects Outside This File
577----------------------------------
578
579If our objects contained more information, it might not be sufficient to
580copy the information in and out: other parts of the code might want to
581keep pointers to these objects, for example, rather than looking up the
582id every time. This produces two problems.
583
584The first problem is that we use the ``cache_lock`` to protect objects:
585we'd need to make this non-static so the rest of the code can use it.
586This makes locking trickier, as it is no longer all in one place.
587
588The second problem is the lifetime problem: if another structure keeps a
589pointer to an object, it presumably expects that pointer to remain
590valid. Unfortunately, this is only guaranteed while you hold the lock,
591otherwise someone might call :c:func:`cache_delete()` and even
592worse, add another object, re-using the same address.
593
594As there is only one lock, you can't hold it forever: no-one else would
595get any work done.
596
597The solution to this problem is to use a reference count: everyone who
598has a pointer to the object increases it when they first get the object,
599and drops the reference count when they're finished with it. Whoever
600drops it to zero knows it is unused, and can actually delete it.
601
602Here is the code::
603
604 --- cache.c.interrupt 2003-12-09 14:25:43.000000000 +1100
605 +++ cache.c.refcnt 2003-12-09 14:33:05.000000000 +1100
606 @@ -7,6 +7,7 @@
607 struct object
608 {
609 struct list_head list;
610 + unsigned int refcnt;
611 int id;
612 char name[32];
613 int popularity;
614 @@ -17,6 +18,35 @@
615 static unsigned int cache_num = 0;
616 #define MAX_CACHE_SIZE 10
617
618 +static void __object_put(struct object *obj)
619 +{
620 + if (--obj->refcnt == 0)
621 + kfree(obj);
622 +}
623 +
624 +static void __object_get(struct object *obj)
625 +{
626 + obj->refcnt++;
627 +}
628 +
629 +void object_put(struct object *obj)
630 +{
631 + unsigned long flags;
632 +
633 + spin_lock_irqsave(&cache_lock, flags);
634 + __object_put(obj);
635 + spin_unlock_irqrestore(&cache_lock, flags);
636 +}
637 +
638 +void object_get(struct object *obj)
639 +{
640 + unsigned long flags;
641 +
642 + spin_lock_irqsave(&cache_lock, flags);
643 + __object_get(obj);
644 + spin_unlock_irqrestore(&cache_lock, flags);
645 +}
646 +
647 /* Must be holding cache_lock */
648 static struct object *__cache_find(int id)
649 {
650 @@ -35,6 +65,7 @@
651 {
652 BUG_ON(!obj);
653 list_del(&obj->list);
654 + __object_put(obj);
655 cache_num--;
656 }
657
658 @@ -63,6 +94,7 @@
659 strlcpy(obj->name, name, sizeof(obj->name));
660 obj->id = id;
661 obj->popularity = 0;
662 + obj->refcnt = 1; /* The cache holds a reference */
663
664 spin_lock_irqsave(&cache_lock, flags);
665 __cache_add(obj);
666 @@ -79,18 +111,15 @@
667 spin_unlock_irqrestore(&cache_lock, flags);
668 }
669
670 -int cache_find(int id, char *name)
671 +struct object *cache_find(int id)
672 {
673 struct object *obj;
674 - int ret = -ENOENT;
675 unsigned long flags;
676
677 spin_lock_irqsave(&cache_lock, flags);
678 obj = __cache_find(id);
679 - if (obj) {
680 - ret = 0;
681 - strcpy(name, obj->name);
682 - }
683 + if (obj)
684 + __object_get(obj);
685 spin_unlock_irqrestore(&cache_lock, flags);
686 - return ret;
687 + return obj;
688 }
689
690We encapsulate the reference counting in the standard 'get' and 'put'
691functions. Now we can return the object itself from
692:c:func:`cache_find()` which has the advantage that the user can
693now sleep holding the object (eg. to :c:func:`copy_to_user()` to
694name to userspace).
695
696The other point to note is that I said a reference should be held for
697every pointer to the object: thus the reference count is 1 when first
698inserted into the cache. In some versions the framework does not hold a
699reference count, but they are more complicated.
700
701Using Atomic Operations For The Reference Count
702~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
703
704In practice, ``atomic_t`` would usually be used for refcnt. There are a
705number of atomic operations defined in ``include/asm/atomic.h``: these
706are guaranteed to be seen atomically from all CPUs in the system, so no
707lock is required. In this case, it is simpler than using spinlocks,
708although for anything non-trivial using spinlocks is clearer. The
709:c:func:`atomic_inc()` and :c:func:`atomic_dec_and_test()`
710are used instead of the standard increment and decrement operators, and
711the lock is no longer used to protect the reference count itself.
712
713::
714
715 --- cache.c.refcnt 2003-12-09 15:00:35.000000000 +1100
716 +++ cache.c.refcnt-atomic 2003-12-11 15:49:42.000000000 +1100
717 @@ -7,7 +7,7 @@
718 struct object
719 {
720 struct list_head list;
721 - unsigned int refcnt;
722 + atomic_t refcnt;
723 int id;
724 char name[32];
725 int popularity;
726 @@ -18,33 +18,15 @@
727 static unsigned int cache_num = 0;
728 #define MAX_CACHE_SIZE 10
729
730 -static void __object_put(struct object *obj)
731 -{
732 - if (--obj->refcnt == 0)
733 - kfree(obj);
734 -}
735 -
736 -static void __object_get(struct object *obj)
737 -{
738 - obj->refcnt++;
739 -}
740 -
741 void object_put(struct object *obj)
742 {
743 - unsigned long flags;
744 -
745 - spin_lock_irqsave(&cache_lock, flags);
746 - __object_put(obj);
747 - spin_unlock_irqrestore(&cache_lock, flags);
748 + if (atomic_dec_and_test(&obj->refcnt))
749 + kfree(obj);
750 }
751
752 void object_get(struct object *obj)
753 {
754 - unsigned long flags;
755 -
756 - spin_lock_irqsave(&cache_lock, flags);
757 - __object_get(obj);
758 - spin_unlock_irqrestore(&cache_lock, flags);
759 + atomic_inc(&obj->refcnt);
760 }
761
762 /* Must be holding cache_lock */
763 @@ -65,7 +47,7 @@
764 {
765 BUG_ON(!obj);
766 list_del(&obj->list);
767 - __object_put(obj);
768 + object_put(obj);
769 cache_num--;
770 }
771
772 @@ -94,7 +76,7 @@
773 strlcpy(obj->name, name, sizeof(obj->name));
774 obj->id = id;
775 obj->popularity = 0;
776 - obj->refcnt = 1; /* The cache holds a reference */
777 + atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
778
779 spin_lock_irqsave(&cache_lock, flags);
780 __cache_add(obj);
781 @@ -119,7 +101,7 @@
782 spin_lock_irqsave(&cache_lock, flags);
783 obj = __cache_find(id);
784 if (obj)
785 - __object_get(obj);
786 + object_get(obj);
787 spin_unlock_irqrestore(&cache_lock, flags);
788 return obj;
789 }
790
791Protecting The Objects Themselves
792---------------------------------
793
794In these examples, we assumed that the objects (except the reference
795counts) never changed once they are created. If we wanted to allow the
796name to change, there are three possibilities:
797
798- You can make ``cache_lock`` non-static, and tell people to grab that
799 lock before changing the name in any object.
800
801- You can provide a :c:func:`cache_obj_rename()` which grabs this
802 lock and changes the name for the caller, and tell everyone to use
803 that function.
804
805- You can make the ``cache_lock`` protect only the cache itself, and
806 use another lock to protect the name.
807
808Theoretically, you can make the locks as fine-grained as one lock for
809every field, for every object. In practice, the most common variants
810are:
811
812- One lock which protects the infrastructure (the ``cache`` list in
813 this example) and all the objects. This is what we have done so far.
814
815- One lock which protects the infrastructure (including the list
816 pointers inside the objects), and one lock inside the object which
817 protects the rest of that object.
818
819- Multiple locks to protect the infrastructure (eg. one lock per hash
820 chain), possibly with a separate per-object lock.
821
822Here is the "lock-per-object" implementation:
823
824::
825
826 --- cache.c.refcnt-atomic 2003-12-11 15:50:54.000000000 +1100
827 +++ cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100
828 @@ -6,11 +6,17 @@
829
830 struct object
831 {
832 + /* These two protected by cache_lock. */
833 struct list_head list;
834 + int popularity;
835 +
836 atomic_t refcnt;
837 +
838 + /* Doesn't change once created. */
839 int id;
840 +
841 + spinlock_t lock; /* Protects the name */
842 char name[32];
843 - int popularity;
844 };
845
846 static DEFINE_SPINLOCK(cache_lock);
847 @@ -77,6 +84,7 @@
848 obj->id = id;
849 obj->popularity = 0;
850 atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
851 + spin_lock_init(&obj->lock);
852
853 spin_lock_irqsave(&cache_lock, flags);
854 __cache_add(obj);
855
856Note that I decide that the popularity count should be protected by the
857``cache_lock`` rather than the per-object lock: this is because it (like
858the :c:type:`struct list_head <list_head>` inside the object)
859is logically part of the infrastructure. This way, I don't need to grab
860the lock of every object in :c:func:`__cache_add()` when seeking
861the least popular.
862
863I also decided that the id member is unchangeable, so I don't need to
864grab each object lock in :c:func:`__cache_find()` to examine the
865id: the object lock is only used by a caller who wants to read or write
866the name field.
867
868Note also that I added a comment describing what data was protected by
869which locks. This is extremely important, as it describes the runtime
870behavior of the code, and can be hard to gain from just reading. And as
871Alan Cox says, “Lock data, not code”.
872
873Common Problems
874===============
875
876Deadlock: Simple and Advanced
877-----------------------------
878
879There is a coding bug where a piece of code tries to grab a spinlock
880twice: it will spin forever, waiting for the lock to be released
881(spinlocks, rwlocks and mutexes are not recursive in Linux). This is
882trivial to diagnose: not a
883stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem.
884
885For a slightly more complex case, imagine you have a region shared by a
886softirq and user context. If you use a :c:func:`spin_lock()` call
887to protect it, it is possible that the user context will be interrupted
888by the softirq while it holds the lock, and the softirq will then spin
889forever trying to get the same lock.
890
891Both of these are called deadlock, and as shown above, it can occur even
892with a single CPU (although not on UP compiles, since spinlocks vanish
893on kernel compiles with ``CONFIG_SMP``\ =n. You'll still get data
894corruption in the second example).
895
896This complete lockup is easy to diagnose: on SMP boxes the watchdog
897timer or compiling with ``DEBUG_SPINLOCK`` set
898(``include/linux/spinlock.h``) will show this up immediately when it
899happens.
900
901A more complex problem is the so-called 'deadly embrace', involving two
902or more locks. Say you have a hash table: each entry in the table is a
903spinlock, and a chain of hashed objects. Inside a softirq handler, you
904sometimes want to alter an object from one place in the hash to another:
905you grab the spinlock of the old hash chain and the spinlock of the new
906hash chain, and delete the object from the old one, and insert it in the
907new one.
908
909There are two problems here. First, if your code ever tries to move the
910object to the same chain, it will deadlock with itself as it tries to
911lock it twice. Secondly, if the same softirq on another CPU is trying to
912move another object in the reverse direction, the following could
913happen:
914
915+-----------------------+-----------------------+
916| CPU 1 | CPU 2 |
917+=======================+=======================+
918| Grab lock A -> OK | Grab lock B -> OK |
919+-----------------------+-----------------------+
920| Grab lock B -> spin | Grab lock A -> spin |
921+-----------------------+-----------------------+
922
923Table: Consequences
924
925The two CPUs will spin forever, waiting for the other to give up their
926lock. It will look, smell, and feel like a crash.
927
928Preventing Deadlock
929-------------------
930
931Textbooks will tell you that if you always lock in the same order, you
932will never get this kind of deadlock. Practice will tell you that this
933approach doesn't scale: when I create a new lock, I don't understand
934enough of the kernel to figure out where in the 5000 lock hierarchy it
935will fit.
936
937The best locks are encapsulated: they never get exposed in headers, and
938are never held around calls to non-trivial functions outside the same
939file. You can read through this code and see that it will never
940deadlock, because it never tries to grab another lock while it has that
941one. People using your code don't even need to know you are using a
942lock.
943
944A classic problem here is when you provide callbacks or hooks: if you
945call these with the lock held, you risk simple deadlock, or a deadly
946embrace (who knows what the callback will do?). Remember, the other
947programmers are out to get you, so don't do this.
948
949Overzealous Prevention Of Deadlocks
950~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
951
952Deadlocks are problematic, but not as bad as data corruption. Code which
953grabs a read lock, searches a list, fails to find what it wants, drops
954the read lock, grabs a write lock and inserts the object has a race
955condition.
956
957If you don't see why, please stay the fuck away from my code.
958
959Racing Timers: A Kernel Pastime
960-------------------------------
961
962Timers can produce their own special problems with races. Consider a
963collection of objects (list, hash, etc) where each object has a timer
964which is due to destroy it.
965
966If you want to destroy the entire collection (say on module removal),
967you might do the following::
968
969 /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE
970 HUNGARIAN NOTATION */
971 spin_lock_bh(&list_lock);
972
973 while (list) {
974 struct foo *next = list->next;
975 del_timer(&list->timer);
976 kfree(list);
977 list = next;
978 }
979
980 spin_unlock_bh(&list_lock);
981
982
983Sooner or later, this will crash on SMP, because a timer can have just
984gone off before the :c:func:`spin_lock_bh()`, and it will only get
985the lock after we :c:func:`spin_unlock_bh()`, and then try to free
986the element (which has already been freed!).
987
988This can be avoided by checking the result of
989:c:func:`del_timer()`: if it returns 1, the timer has been deleted.
990If 0, it means (in this case) that it is currently running, so we can
991do::
992
993 retry:
994 spin_lock_bh(&list_lock);
995
996 while (list) {
997 struct foo *next = list->next;
998 if (!del_timer(&list->timer)) {
999 /* Give timer a chance to delete this */
1000 spin_unlock_bh(&list_lock);
1001 goto retry;
1002 }
1003 kfree(list);
1004 list = next;
1005 }
1006
1007 spin_unlock_bh(&list_lock);
1008
1009
1010Another common problem is deleting timers which restart themselves (by
1011calling :c:func:`add_timer()` at the end of their timer function).
1012Because this is a fairly common case which is prone to races, you should
1013use :c:func:`del_timer_sync()` (``include/linux/timer.h``) to
1014handle this case. It returns the number of times the timer had to be
1015deleted before we finally stopped it from adding itself back in.
1016
1017Locking Speed
1018=============
1019
1020There are three main things to worry about when considering speed of
1021some code which does locking. First is concurrency: how many things are
1022going to be waiting while someone else is holding a lock. Second is the
1023time taken to actually acquire and release an uncontended lock. Third is
1024using fewer, or smarter locks. I'm assuming that the lock is used fairly
1025often: otherwise, you wouldn't be concerned about efficiency.
1026
1027Concurrency depends on how long the lock is usually held: you should
1028hold the lock for as long as needed, but no longer. In the cache
1029example, we always create the object without the lock held, and then
1030grab the lock only when we are ready to insert it in the list.
1031
1032Acquisition times depend on how much damage the lock operations do to
1033the pipeline (pipeline stalls) and how likely it is that this CPU was
1034the last one to grab the lock (ie. is the lock cache-hot for this CPU):
1035on a machine with more CPUs, this likelihood drops fast. Consider a
1036700MHz Intel Pentium III: an instruction takes about 0.7ns, an atomic
1037increment takes about 58ns, a lock which is cache-hot on this CPU takes
1038160ns, and a cacheline transfer from another CPU takes an additional 170
1039to 360ns. (These figures from Paul McKenney's `Linux Journal RCU
1040article <http://www.linuxjournal.com/article.php?sid=6993>`__).
1041
1042These two aims conflict: holding a lock for a short time might be done
1043by splitting locks into parts (such as in our final per-object-lock
1044example), but this increases the number of lock acquisitions, and the
1045results are often slower than having a single lock. This is another
1046reason to advocate locking simplicity.
1047
1048The third concern is addressed below: there are some methods to reduce
1049the amount of locking which needs to be done.
1050
1051Read/Write Lock Variants
1052------------------------
1053
1054Both spinlocks and mutexes have read/write variants: ``rwlock_t`` and
1055:c:type:`struct rw_semaphore <rw_semaphore>`. These divide
1056users into two classes: the readers and the writers. If you are only
1057reading the data, you can get a read lock, but to write to the data you
1058need the write lock. Many people can hold a read lock, but a writer must
1059be sole holder.
1060
1061If your code divides neatly along reader/writer lines (as our cache code
1062does), and the lock is held by readers for significant lengths of time,
1063using these locks can help. They are slightly slower than the normal
1064locks though, so in practice ``rwlock_t`` is not usually worthwhile.
1065
1066Avoiding Locks: Read Copy Update
1067--------------------------------
1068
1069There is a special method of read/write locking called Read Copy Update.
1070Using RCU, the readers can avoid taking a lock altogether: as we expect
1071our cache to be read more often than updated (otherwise the cache is a
1072waste of time), it is a candidate for this optimization.
1073
1074How do we get rid of read locks? Getting rid of read locks means that
1075writers may be changing the list underneath the readers. That is
1076actually quite simple: we can read a linked list while an element is
1077being added if the writer adds the element very carefully. For example,
1078adding ``new`` to a single linked list called ``list``::
1079
1080 new->next = list->next;
1081 wmb();
1082 list->next = new;
1083
1084
1085The :c:func:`wmb()` is a write memory barrier. It ensures that the
1086first operation (setting the new element's ``next`` pointer) is complete
1087and will be seen by all CPUs, before the second operation is (putting
1088the new element into the list). This is important, since modern
1089compilers and modern CPUs can both reorder instructions unless told
1090otherwise: we want a reader to either not see the new element at all, or
1091see the new element with the ``next`` pointer correctly pointing at the
1092rest of the list.
1093
1094Fortunately, there is a function to do this for standard
1095:c:type:`struct list_head <list_head>` lists:
1096:c:func:`list_add_rcu()` (``include/linux/list.h``).
1097
1098Removing an element from the list is even simpler: we replace the
1099pointer to the old element with a pointer to its successor, and readers
1100will either see it, or skip over it.
1101
1102::
1103
1104 list->next = old->next;
1105
1106
1107There is :c:func:`list_del_rcu()` (``include/linux/list.h``) which
1108does this (the normal version poisons the old object, which we don't
1109want).
1110
1111The reader must also be careful: some CPUs can look through the ``next``
1112pointer to start reading the contents of the next element early, but
1113don't realize that the pre-fetched contents is wrong when the ``next``
1114pointer changes underneath them. Once again, there is a
1115:c:func:`list_for_each_entry_rcu()` (``include/linux/list.h``)
1116to help you. Of course, writers can just use
1117:c:func:`list_for_each_entry()`, since there cannot be two
1118simultaneous writers.
1119
1120Our final dilemma is this: when can we actually destroy the removed
1121element? Remember, a reader might be stepping through this element in
1122the list right now: if we free this element and the ``next`` pointer
1123changes, the reader will jump off into garbage and crash. We need to
1124wait until we know that all the readers who were traversing the list
1125when we deleted the element are finished. We use
1126:c:func:`call_rcu()` to register a callback which will actually
1127destroy the object once all pre-existing readers are finished.
1128Alternatively, :c:func:`synchronize_rcu()` may be used to block
1129until all pre-existing are finished.
1130
1131But how does Read Copy Update know when the readers are finished? The
1132method is this: firstly, the readers always traverse the list inside
1133:c:func:`rcu_read_lock()`/:c:func:`rcu_read_unlock()` pairs:
1134these simply disable preemption so the reader won't go to sleep while
1135reading the list.
1136
1137RCU then waits until every other CPU has slept at least once: since
1138readers cannot sleep, we know that any readers which were traversing the
1139list during the deletion are finished, and the callback is triggered.
1140The real Read Copy Update code is a little more optimized than this, but
1141this is the fundamental idea.
1142
1143::
1144
1145 --- cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100
1146 +++ cache.c.rcupdate 2003-12-11 17:55:14.000000000 +1100
1147 @@ -1,15 +1,18 @@
1148 #include <linux/list.h>
1149 #include <linux/slab.h>
1150 #include <linux/string.h>
1151 +#include <linux/rcupdate.h>
1152 #include <linux/mutex.h>
1153 #include <asm/errno.h>
1154
1155 struct object
1156 {
1157 - /* These two protected by cache_lock. */
1158 + /* This is protected by RCU */
1159 struct list_head list;
1160 int popularity;
1161
1162 + struct rcu_head rcu;
1163 +
1164 atomic_t refcnt;
1165
1166 /* Doesn't change once created. */
1167 @@ -40,7 +43,7 @@
1168 {
1169 struct object *i;
1170
1171 - list_for_each_entry(i, &cache, list) {
1172 + list_for_each_entry_rcu(i, &cache, list) {
1173 if (i->id == id) {
1174 i->popularity++;
1175 return i;
1176 @@ -49,19 +52,25 @@
1177 return NULL;
1178 }
1179
1180 +/* Final discard done once we know no readers are looking. */
1181 +static void cache_delete_rcu(void *arg)
1182 +{
1183 + object_put(arg);
1184 +}
1185 +
1186 /* Must be holding cache_lock */
1187 static void __cache_delete(struct object *obj)
1188 {
1189 BUG_ON(!obj);
1190 - list_del(&obj->list);
1191 - object_put(obj);
1192 + list_del_rcu(&obj->list);
1193 cache_num--;
1194 + call_rcu(&obj->rcu, cache_delete_rcu);
1195 }
1196
1197 /* Must be holding cache_lock */
1198 static void __cache_add(struct object *obj)
1199 {
1200 - list_add(&obj->list, &cache);
1201 + list_add_rcu(&obj->list, &cache);
1202 if (++cache_num > MAX_CACHE_SIZE) {
1203 struct object *i, *outcast = NULL;
1204 list_for_each_entry(i, &cache, list) {
1205 @@ -104,12 +114,11 @@
1206 struct object *cache_find(int id)
1207 {
1208 struct object *obj;
1209 - unsigned long flags;
1210
1211 - spin_lock_irqsave(&cache_lock, flags);
1212 + rcu_read_lock();
1213 obj = __cache_find(id);
1214 if (obj)
1215 object_get(obj);
1216 - spin_unlock_irqrestore(&cache_lock, flags);
1217 + rcu_read_unlock();
1218 return obj;
1219 }
1220
1221Note that the reader will alter the popularity member in
1222:c:func:`__cache_find()`, and now it doesn't hold a lock. One
1223solution would be to make it an ``atomic_t``, but for this usage, we
1224don't really care about races: an approximate result is good enough, so
1225I didn't change it.
1226
1227The result is that :c:func:`cache_find()` requires no
1228synchronization with any other functions, so is almost as fast on SMP as
1229it would be on UP.
1230
1231There is a further optimization possible here: remember our original
1232cache code, where there were no reference counts and the caller simply
1233held the lock whenever using the object? This is still possible: if you
1234hold the lock, no one can delete the object, so you don't need to get
1235and put the reference count.
1236
1237Now, because the 'read lock' in RCU is simply disabling preemption, a
1238caller which always has preemption disabled between calling
1239:c:func:`cache_find()` and :c:func:`object_put()` does not
1240need to actually get and put the reference count: we could expose
1241:c:func:`__cache_find()` by making it non-static, and such
1242callers could simply call that.
1243
1244The benefit here is that the reference count is not written to: the
1245object is not altered in any way, which is much faster on SMP machines
1246due to caching.
1247
1248Per-CPU Data
1249------------
1250
1251Another technique for avoiding locking which is used fairly widely is to
1252duplicate information for each CPU. For example, if you wanted to keep a
1253count of a common condition, you could use a spin lock and a single
1254counter. Nice and simple.
1255
1256If that was too slow (it's usually not, but if you've got a really big
1257machine to test on and can show that it is), you could instead use a
1258counter for each CPU, then none of them need an exclusive lock. See
1259:c:func:`DEFINE_PER_CPU()`, :c:func:`get_cpu_var()` and
1260:c:func:`put_cpu_var()` (``include/linux/percpu.h``).
1261
1262Of particular use for simple per-cpu counters is the ``local_t`` type,
1263and the :c:func:`cpu_local_inc()` and related functions, which are
1264more efficient than simple code on some architectures
1265(``include/asm/local.h``).
1266
1267Note that there is no simple, reliable way of getting an exact value of
1268such a counter, without introducing more locks. This is not a problem
1269for some uses.
1270
1271Data Which Mostly Used By An IRQ Handler
1272----------------------------------------
1273
1274If data is always accessed from within the same IRQ handler, you don't
1275need a lock at all: the kernel already guarantees that the irq handler
1276will not run simultaneously on multiple CPUs.
1277
1278Manfred Spraul points out that you can still do this, even if the data
1279is very occasionally accessed in user context or softirqs/tasklets. The
1280irq handler doesn't use a lock, and all other accesses are done as so::
1281
1282 spin_lock(&lock);
1283 disable_irq(irq);
1284 ...
1285 enable_irq(irq);
1286 spin_unlock(&lock);
1287
1288The :c:func:`disable_irq()` prevents the irq handler from running
1289(and waits for it to finish if it's currently running on other CPUs).
1290The spinlock prevents any other accesses happening at the same time.
1291Naturally, this is slower than just a :c:func:`spin_lock_irq()`
1292call, so it only makes sense if this type of access happens extremely
1293rarely.
1294
1295What Functions Are Safe To Call From Interrupts?
1296================================================
1297
1298Many functions in the kernel sleep (ie. call schedule()) directly or
1299indirectly: you can never call them while holding a spinlock, or with
1300preemption disabled. This also means you need to be in user context:
1301calling them from an interrupt is illegal.
1302
1303Some Functions Which Sleep
1304--------------------------
1305
1306The most common ones are listed below, but you usually have to read the
1307code to find out if other calls are safe. If everyone else who calls it
1308can sleep, you probably need to be able to sleep, too. In particular,
1309registration and deregistration functions usually expect to be called
1310from user context, and can sleep.
1311
1312- Accesses to userspace:
1313
1314 - :c:func:`copy_from_user()`
1315
1316 - :c:func:`copy_to_user()`
1317
1318 - :c:func:`get_user()`
1319
1320 - :c:func:`put_user()`
1321
1322- ``kmalloc(GFP_KERNEL)``
1323
1324- :c:func:`mutex_lock_interruptible()` and
1325 :c:func:`mutex_lock()`
1326
1327 There is a :c:func:`mutex_trylock()` which does not sleep.
1328 Still, it must not be used inside interrupt context since its
1329 implementation is not safe for that. :c:func:`mutex_unlock()`
1330 will also never sleep. It cannot be used in interrupt context either
1331 since a mutex must be released by the same task that acquired it.
1332
1333Some Functions Which Don't Sleep
1334--------------------------------
1335
1336Some functions are safe to call from any context, or holding almost any
1337lock.
1338
1339- :c:func:`printk()`
1340
1341- :c:func:`kfree()`
1342
1343- :c:func:`add_timer()` and :c:func:`del_timer()`
1344
1345Mutex API reference
1346===================
1347
1348.. kernel-doc:: include/linux/mutex.h
1349 :internal:
1350
1351.. kernel-doc:: kernel/locking/mutex.c
1352 :export:
1353
1354Futex API reference
1355===================
1356
1357.. kernel-doc:: kernel/futex.c
1358 :internal:
1359
1360Further reading
1361===============
1362
1363- ``Documentation/locking/spinlocks.txt``: Linus Torvalds' spinlocking
1364 tutorial in the kernel sources.
1365
1366- Unix Systems for Modern Architectures: Symmetric Multiprocessing and
1367 Caching for Kernel Programmers:
1368
1369 Curt Schimmel's very good introduction to kernel level locking (not
1370 written for Linux, but nearly everything applies). The book is
1371 expensive, but really worth every penny to understand SMP locking.
1372 [ISBN: 0201633388]
1373
1374Thanks
1375======
1376
1377Thanks to Telsa Gwynne for DocBooking, neatening and adding style.
1378
1379Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul Mackerras,
1380Ruedi Aschwanden, Alan Cox, Manfred Spraul, Tim Waugh, Pete Zaitcev,
1381James Morris, Robert Love, Paul McKenney, John Ashby for proofreading,
1382correcting, flaming, commenting.
1383
1384Thanks to the cabal for having no influence on this document.
1385
1386Glossary
1387========
1388
1389preemption
1390 Prior to 2.5, or when ``CONFIG_PREEMPT`` is unset, processes in user
1391 context inside the kernel would not preempt each other (ie. you had that
1392 CPU until you gave it up, except for interrupts). With the addition of
1393 ``CONFIG_PREEMPT`` in 2.5.4, this changed: when in user context, higher
1394 priority tasks can "cut in": spinlocks were changed to disable
1395 preemption, even on UP.
1396
1397bh
1398 Bottom Half: for historical reasons, functions with '_bh' in them often
1399 now refer to any software interrupt, e.g. :c:func:`spin_lock_bh()`
1400 blocks any software interrupt on the current CPU. Bottom halves are
1401 deprecated, and will eventually be replaced by tasklets. Only one bottom
1402 half will be running at any time.
1403
1404Hardware Interrupt / Hardware IRQ
1405 Hardware interrupt request. :c:func:`in_irq()` returns true in a
1406 hardware interrupt handler.
1407
1408Interrupt Context
1409 Not user context: processing a hardware irq or software irq. Indicated
1410 by the :c:func:`in_interrupt()` macro returning true.
1411
1412SMP
1413 Symmetric Multi-Processor: kernels compiled for multiple-CPU machines.
1414 (``CONFIG_SMP=y``).
1415
1416Software Interrupt / softirq
1417 Software interrupt handler. :c:func:`in_irq()` returns false;
1418 :c:func:`in_softirq()` returns true. Tasklets and softirqs both
1419 fall into the category of 'software interrupts'.
1420
1421 Strictly speaking a softirq is one of up to 32 enumerated software
1422 interrupts which can run on multiple CPUs at once. Sometimes used to
1423 refer to tasklets as well (ie. all software interrupts).
1424
1425tasklet
1426 A dynamically-registrable software interrupt, which is guaranteed to
1427 only run on one CPU at a time.
1428
1429timer
1430 A dynamically-registrable software interrupt, which is run at (or close
1431 to) a given time. When running, it is just like a tasklet (in fact, they
1432 are called from the TIMER_SOFTIRQ).
1433
1434UP
1435 Uni-Processor: Non-SMP. (CONFIG_SMP=n).
1436
1437User Context
1438 The kernel executing on behalf of a particular process (ie. a system
1439 call or trap) or kernel thread. You can tell which process with the
1440 ``current`` macro.) Not to be confused with userspace. Can be
1441 interrupted by software or hardware interrupts.
1442
1443Userspace
1444 A process executing its own code outside the kernel.