yohhoyの日記

技術的メモをしていきたい日記

Objective-C atomic属性プロパティとスレッド間同期

LLVM/Objective-Cのatomic属性プロパティが保証する性質、C言語(C11)のAtomic変数アクセスとの比較、およびマルチスレッド処理での利用についてメモ。

超要約:Objective-C @property宣言には常にnonatomic属性を指定せよ。atomic属性プロパティは人類には早すぎる。決して使うな。この門をくぐる者は一切の希望を捨てよ。

@interface SomeClass {
  _Atomic(BOOL) atomicFlag;  // C11 Atomicインスタンス変数
}

@property (atomic) BOOL atomicProp;  // スカラ型atomic属性プロパティ
// または
@property BOOL atomicProp;

@property (atomic) id atomicObject;  // 参照型atomic属性プロパティ
@end

注意:本記事の内容はLLVM/Clangコンパイラ出力から演繹的に導いた解析内容に基づく。Objective-Cの厳格な言語仕様は存在しない(?)。

まとめ:

  • Objective-C/スカラ型のatomic属性プロパティは、最も弱いアトミック性(Atomicity)しか保証しない。これはC11/Atomic変数 relaxedアクセスよりも弱い保証レベルである。*1
    • スカラ型atomic属性プロパティの読込(Read)/書出(Write)操作は、文字通り不可分(Atomic)なメモリロード/ストア命令に変換されることのみ(→id:yohhoy:20121016)保証される。RMW(Read-Modify-Write)操作のアトミック性は保証されない。*2
  • スカラ型atomic属性プロパティへのアクセスでは順序性(Ordering)が保証されず、スレッド間同期機構として用いることができない(→id:yohhoy:20140808)。
    • 例1:スレッドAでは “変数Xへデータ書出→BOOL型atomic属性プロパティへ値YESを書出”、スレッドBでは “同atomic属性プロパティから値YESを読込→変数Xからデータ読込” というスレッド間同期処理が、プログラマの期待通り動作する保証がない。コンパイラやプロセッサは、スレッドA/B上で行われる書出/読込アクセスの順序を入れ替える可能性がある。
    • 例2:ロック状態をatomic属性プロパティで表現したSpinlock同期機構は、順序性保証の欠如により排他制御として機能しない。コンパイラやプロセッサは、ロック範囲(クリティカルリージョン内)のメモリアクセス命令をその領域外へと移動する可能性がある。
  • Objective-Cプログラム中でいわゆる「アトミック変数」が必要な場合、C11 Atomic変数型_Atomic(T)を利用すること。Objective-C++プログラムならばC++11 Atomic変数型std::atomic<T>を利用すること。
    • ソースコード上で通常変数のように透過的に扱え、かつAtomic変数アクセスが逐次一貫性(Sequential Consistency)をもつため、プログラマの期待通りに振る舞うことが保証される(→id:yohhoy:20141221)。
    • 複数変数に対する並行操作を行う場合は、@synchronized構文や NSLock, NSCondition 等によるロック操作が必要。
  • ノート:Objective-C/atomic属性プロパティは何の役にも立たないばかりかむしろ有害と考える。若干のメモリアクセス・オーバーヘッドに加え、マルチスレッド処理バグの潜在要因となり得るリスク因子である。Objective-C言語仕様によりプロパティの既定属性は atomic と定義されている。すでに広く指摘される通り、@property 宣言では常に nonatomic 属性を明示すべき。*3

LLVM/Clangにより変換されたLLVM IR命令は下記の通り。Objective-C/atomic属性プロパティでは "unordered"、C11 Atomic変数の既定アクセスでは "seq_cst" が、memory_order_release+memory_order_acquire指定には直接対応、memory_order_relaxed指定には "monotonic" なload atomic/store atomic命令が出力されている。

$ clang --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.4.0
Thread model: posix
$ clang -c -S -emit-llvm source.{m,c} 
// source.m
@interface SomeClass
@property (atomic) int atomicInt;  // atomic属性プロパティ
@end

@implementation SomeClass
- (int)atomicInt;
  // load atomic i32, i32* %N unordered, align 4
- (void)setAtomicInt:(int);
  // store atomic i32 %M, i32* %N unordered, align 4
@end
// source.c
#include <stdatomic.h>

_Atomic(int) atomicInt;  // C11 Atomic変数

int val = atomicInt;
  // load atomic i32, i32* %N seq_cst, align 4
atomicInt = 42;
  // store atomic i32 42, i32* %N seq_cst, align 4

int val = atomic_load_explicit(&atomicInt, memory_order_acquire);
  // load atomic i32, i32* %N acquire, align 4
atomic_store_explicit(&atomicInt, 42, memory_order_release);
  // store atomic i32 %M, i32* %N release, align 4

int val = atomic_load_explicit(&atomicInt, memory_order_relaxed);
  // load atomic i32, i32* %N monotonic, align 4
atomic_store_explicit(&atomicInt, 42, memory_order_relaxed);
  // store atomic i32 %M, i32* %N monotonic, align 4

LLVM Language Reference Manual, Atomic Memory Ordering Constraints より "unordered", "monotonic" の説明を引用(下線部は強調)。

unordered
The set of values that can be read is governed by the happens-before partial order. A value cannot be read unless some operation wrote it. This is intended to provide a guarantee strong enough to model Java's non-volatile shared variables. This ordering cannot be specified for read-modify-write operations; it is not strong enough to make them atomic in any interesting way.
monotonic
In addition to the guarantees of unordered, there is a single total order for modifications by monotonic operations on each address. All modification orders must be compatible with the happens-before order. There is no guarantee that the modification orders can be combined to a global total order for the whole program (and this often will not be possible). The read in an atomic read-modify-write operation (cmpxchg and atomicrmw) reads the value in the modification order immediately before the value it writes. If one atomic read happens before another atomic read of the same address, the later read must see the same value or a later value in the address's modification order. This disallows reordering of monotonic (or stronger) operations on the same address. If an address is written monotonic-ally by one thread, and other threads monotonic-ally read that address repeatedly, the other threads must eventually see the write. This corresponds to the C++0x/C1x memory_order_relaxed.

LLVM Atomic Instructions and Concurrency Guide より "Unordered", "Monotonic" の説明を引用。

Unordered
Unordered is the lowest level of atomicity. It essentially guarantees that races produce somewhat sane results instead of having undefined behavior. It also guarantees the operation to be lock-free, so it does not depend on the data being part of a special atomic structure or depend on a separate per-process global lock. Note that code generation will fail for unsupported atomic operations; if you need such an operation, use explicit locking.

Relevant standard
This is intended to match the Java memory model for shared variables.
Notes for frontends
This cannot be used for synchronization, but is useful for Java and other "safe" languages which need to guarantee that the generated code never exhibits undefined behavior. Note that this guarantee is cheap on common platforms for loads of a native width, but can be expensive or unavailable for wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe" languages would normally split a 64-bit store on ARM into two 32-bit unordered stores.)
Notes for optimizers
In terms of the optimizer, this prohibits any transformation that transforms a single load into multiple loads, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise. Some examples of unsafe optimizations are narrowing an assignment into a bitfield, rematerializing a load, and turning loads and stores into a memcpy call. Reordering unordered operations is safe, though, and optimizers should take advantage of that because unordered operations are common in languages that need them.
Notes for code generation
These operations are required to be atomic in the sense that if you use unordered loads and unordered stores, a load cannot see a value which was never stored. A normal load or store instruction is usually sufficient, but note that an unordered load or store cannot be split into multiple instructions (or an instruction which does multiple memory operations, like LDRD on ARM without LPAE, or not naturally-aligned LDRD on LPAE ARM).

Monotonic
Monotonic is the weakest level of atomicity that can be used in synchronization primitives, although it does not provide any general synchronization. It essentially guarantees that if you take all the operations affecting a specific address, a consistent ordering exists.

Relevant standard
This corresponds to the C++11/C11 memory_order_relaxed; see those standards for the exact definition.
Notes for frontends
If you are writing a frontend which uses this directly, use with caution. The guarantees in terms of synchronization are very weak, so make sure these are only used in a pattern which you know is correct. Generally, these would either be used for atomic operations which do not protect other memory (like an atomic counter), or along with a fence.
Notes for optimizers
In terms of the optimizer, this can be treated as a read+write on the relevant memory location (and alias analysis will take advantage of that). In addition, it is legal to reorder non-atomic and Unordered loads around Monotonic loads. CSE/DSE and a few other optimizations are allowed, but Monotonic operations are unlikely to be used in ways which would make those optimizations useful.
Notes for code generation
Code generation is essentially the same as that for unordered for loads and stores. No fences are required. cmpxchg and atomicrmw are required to appear as a single operation.

訳注:LPAE=Large Physical Address Extension。CSE=Common Subexpression Elimination, DSE=Dead Store Elimination。

関連URL

*1:Objective-C/atomic属性プロパティは、Java言語/(非volatile修飾)通常メンバ変数と同レベルのアトミック性となっている。スマートフォン向けアプリ開発の具体例をあげると、AndroidJavaのvolatileメンバ変数 から iOSObjective-Cのatomic属性プロパティ への対応付けは不適切といえる。

*2:C11 Atomic変数では、インクリメント/デクリメント演算といったRMW操作のアトミック性も保証される。

*3:Objective-C/atomic属性プロパティは、C11 Atomic変数の relaxed アクセスよりもさらに正しく取り扱うのが困難である。特に弱いハードウェア・メモリモデルを採用するARMアーキテクチャ上では、LLVM IR が提供するメモリモデルについて正確に理解できるまでは、atomic属性プロパティの利用は避けるべきだろう...仮に理解したとしても使いたいとは思わないが。