2019-08-10

フライングWindows API待機関数

Windows

Windows APIのタイムアウト指定待機関数では、指定期間よりも僅かに早くタイムアウト発生する。この振る舞いは仕様通り(by design)とのこと。

WaitForSingleObject, WaitForSingleObjectEx
WaitForMultipleObjects, WaitForMultipleObjectsEx
MsgWaitForMultipleObjects, MsgWaitForMultipleObjectsEx
etc.

Windows OSのデフォルトタイマ割込間隔は 64 [Hz]＝15.625 [msec] となっているため、平均的に数ミリ秒だけ早くタイムアウトが発生しうる。

Wait Functions and Time-out Intervals
The accuracy of the specified time-out interval depends on the resolution of the system clock. The system clock "ticks" at a constant rate. If the time-out interval is less than the resolution of the system clock, the wait may time out in less than the specified length of time. If the time-out interval is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on.
https://docs.microsoft.com/windows/win32/sync/wait-functions

関連URL

2019-07-28

nan("is Not-a-Number")

C C++

プログラミング言語C/C++における浮動小数点数 NaN(Not-a-Number)*1 について。

C/C++標準ライブラリはquiet NaN*2を返すnan関数を提供し、同関数では処理系定義(implementation-defined)のタグ文字列を受け取る。一般的には空文字列""を指定するが、ライブラリ仕様上は任意の文字列*3を指定可能。

// C
#include <math.h>
double d1 = nan("42");   // OK
double d2 = nan("wtf");  // OK

// C++
#include <cmath>
double d1 = std::nan("42");   // OK
double d2 = std::nan("wtf");  // OK

IEC 60559浮動小数点型のバイナリ表現は {符号S, 指数部E, 仮数部T} の組で表され、指数部Eが全ビット1かつ仮数部Tが非ゼロのとき NaN と解釈される。上記で指定するタグは仮数部(significand)ビット列*4に反映される。ただし、ISO C/C++およびIEC 60559いずれの仕様でもその意味を規定しないため、実動作は処理系に依存する。

C11 7.12.11.2/p1-2, 7.22.1.3/p3-4より一部引用（下線部は強調）。C++標準ライブラリに含まれるC標準ライブラリはC言語仕様に従う。*5

#include <math.h>
double nan(const char *tagp);
float nanf(const char *tagp);
long double nanl(const char *tagp);
2 The call nan("n-char-sequence") is equivalent to strtod("NAN(n-char-sequence)", (char**) NULL); the call nan("") is equivalent to strtod("NAN()", (char**) NULL). If tagp does not point to an n-char sequence or an empty string, the call is equivalent to strtod("NAN", (char**) NULL). Calls to nanf and nanl are equivalent to the corresponding calls to strtof and strtold.

3 The expected form of the subject sequence is an optional plus or minus sign, then one of the following:

(snip)

NAN or NAN(n-char-sequence_opt), ignoring case in the NAN part, where:

　　n-char-sequence:
　　　　digit
　　　　nondigit
　　　　n-char-sequence digit
　　　　n-char-sequence nondigit
(snip)
4 (snip) A character sequence NAN or NAN(n-char-sequence_opt) is interpreted as a quiet NaN, if supported in the return type, else like a subject sequence part that does not have the expected form; the meaning of the n-char sequence is implementation-defined.²⁹³⁾(snip)
脚注293) An implementation may use the n-char sequence to determine extra information to be represented in the NaN's significand.

C99 (PDF)Rationale 5.2.4.2.2, 7.20.1.3より一部引用。

5.2.4.2.2 Characteristics of floating types <float.h>
NaNs
The primary utility of quiet NaNs, as stated in IEC 60559, "to handle otherwise intractable situations, such as providing a default value for 0.0/0.0," is supported by C99.
Other applications of NaNs may prove useful. Available parts of NaNs have been used to encode auxiliary information, for example about the NaN’s origin. Signaling NaNs might be candidates for filling uninitialized storage; and their available parts could distinguish uninitialized floating objects. IEC 60559 signaling NaNs and trap handlers potentially provide hooks for maintaining diagnostic information or for implementing special arithmetics.
However, C support for signaling NaNs, or for auxiliary information that could be encoded in NaNs, is problematic. Trap handling varies widely among implementations. Implementation mechanisms may trigger signaling NaNs, or fail to, in mysterious ways. The IEC 60559 floating-point standard recommends that NaNs propagate; but it does not require this and not all implementations do. And the floating-point standard fails to specify the contents of NaNs through format conversion. Making signaling NaNs predictable imposes optimization restrictions that anticipated benefits don’t justify. For these reasons this standard does not define the behavior of signaling NaNs nor specify the interpretation of NaN significands.

7.20.1.3 The strtod, strtof, and strtold functions
So much regarding NaN significands is unspecified because so little is portable. Attaching meaning to NaN significands is problematic, even for one implementation, even an IEC 60559 one. For example, the IEC 60559 floating-point standard does not specify the effect of format conversions on NaN significands. Conversions, perhaps generated by the compiler, may alter NaN significands in obscure ways.

IEEE 754 2.1, 6.2, 6.2.1より一部引用。

2.1 Definitions
payload: The diagnostic information contained in a NaN, encoded in part of its trailing significand field.

6.2 Operations with NaNs
Two different kinds of NaN, signaling and quiet, shall be supported in all floating-point operations. Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as complex-affine infinities or extremely wide range) that are not in the scope of this standard. Quiet NaNs should, by means left to the implementer's discretion, afford retrospective diagnostic information inherited from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.

6.2.1 NaN encodings in binary formats
All binary NaN bit strings have all the bits of the biased exponent field E set to 1 (see 3.4). A quiet NaN bit string should be encoded with the first bit (d₁) of the trailing significand field T being 1. A signaling NaN bit string should be encoded with the first bit of the trailing significand field being 0. If the first bit of the trailing significand field is 0, some other bit of the trailing significand field must be non-zero to distinguish the NaN from infinity. In the preferred encoding just described, a signaling NaN shall be quieted by setting d1 to 1, leaving the remaining bits of T unchanged.
For binary formats, the payload is encoded in the p-2 least significant bits of the trailing significand field.

関連URL

*1:https://ja.wikipedia.org/wiki/NaN

*2:C/C++言語仕様およびIEC 60559仕様（旧IEC 559）では、2種類のNaN "quiet NaN" と "signaling NaN" を規定している。

*3:厳密にはアンダースコア(_)、アルファベット(a-z A-Z)、数値(0-9)の組み合わせが許容される。（C11 6.4.2.1/p1）

*4:仮数部(significand) MSB 2ビットはquiet NaN/signaling Nanの区別に利用するため、以降のビット列がペイロード(payload)となる。

*5:C++17 20.2/p2: "The descriptions of many library functions rely on the C standard library for the semantics of those functions. In some cases, the signatures specified in this International Standard may be different from the signatures in the C standard library, and additional overloads may be declared in this International Standard, but the behavior and the preconditions (including any preconditions implied by the use of an ISO C restrict qualifier) are the same unless otherwise stated."

2019-06-30

CUDA同期メモリ転送関数 != 同期動作

C CUDA

CUDAメモリ転送系関数の Async サフィックス有無*1と、実際の同期(synchronous)／非同期(asynchronous)動作は1:1対応しない。Asyncサフィックス無しメモリ転送関数でも、条件によっては非同期動作となる可能性がある。

API synchronization behavior
The API provides memcpy/memset functions in both synchronous and asynchronous forms, the latter having an "Async" suffix. This is a misnomer as each function may exhibit synchronous or asynchronous behavior depending on the arguments passed to the function.
https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html

シングルスレッド処理のホストかつ単一デバイス利用という単純構成では、特に意識しなくとも問題にはならない（たぶん）。一方、ホスト側がマルチスレッド処理であったり、複数GPU間でのデバイス間メモリ転送を行う場合には、非同期動作には十分留意する必要がある。罠すぎる（#＾ω＾）

後者環境では、CUDAストリーム(cudaStream_tやCUstream)とAsyncサフィックス付き関数を組み合わせ、明示的な同期関数(cudaStreamSynchronizeやcuStreamSynchronize)呼び出しによってメモリ転送完了を待機する。*2

Memcpy
In the reference documentation, each memcpy function is categorized as synchronous or asynchronous, corresponding to the definitions below.
Synchronous

All transfers involving Unified Memory regions are fully synchronous with respect to the host.

For transfers from pageable host memory to device memory, a stream sync is performed before the copy is initiated. The function will return once the pageable buffer has been copied to the staging memory for DMA transfer to device memory, but the DMA to final destination may not have completed.

For transfers from pinned host memory to device memory, the function is synchronous with respect to the host.

For transfers from device to either pageable or pinned host memory, the function returns only once the copy has completed.

For transfers from device memory to device memory, no host-side synchronization is performed.

For transfers from any host memory to any host memory, the function is fully synchronous with respect to the host.

Asynchronous

For transfers from device memory to pageable host memory, the function will return only once the copy has completed.

For transfers from any host memory to any host memory, the function is fully synchronous with respect to the host.

For all other transfers, the function is fully asynchronous. If pageable memory must first be staged to pinned memory, this will be handled asynchronously with a worker thread.

https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html

関連URL

jcuda.org - JCuda

*1:例：CUDA Runtime API の cudaMemcpy と cudaMemcpyAsync、CUDA Driver API の cuMemcpy と cuMemcpyAsync など。

*2:厳密には、転送元／先メモリ区分に応じて同期／非同期動作が決定する。現実的には細かいルールを覚えて慎重にコーデイングするより、最初からCUDAストリーム×Asyncサフィックス付き関数を使うのがベターと思う。

2019-05-31

Hidden Friends

C++

プログラミング言語C++におけるライブラリ設計で「ADL(Argument Dependent Lookup)経由でのみ呼び出し可能な非メンバ関数／演算子オーバーロード定義」を実現するテクニック。

2019-12-03追記：2019年11月会合にて (PDF)P1965R0 が採択され、C++標準規格上の用語(term)として "hidden friends" への言及が明記された。

下記説明コードのみではメリットがわかりづらいが、非メンバ関数インタフェース追加による名前空間汚染の抑止と、プログラマが意図しない関数呼び出しによるトラブル回避が主目的。

namespace NS1 {
  struct C { /*...*/ };

  // 非メンバ begin/end 関数
  inline C::iterator begin(C& c) { return c.begin(); }
  inline C::iterator end(C& c) { return c.end(); }
}
namespace NS2 {
  struct C {
    /*...*/
    // "Hidden Friends" begin/end 関数
    friend iterator begin(C& c) { return c.begin(); }
    friend iterator end(C& c) { return c.end(); }
  };
}

NS1::C c1;
auto b1 = begin(c1);     // OK: ADL経由でNS1::begin関数を見つける
auto e1 = NS1::end(c1);  // OK: 完全修飾名でNS1::end関数を指定

NS2::C c2;
auto b2 = begin(c2);     // OK: ADL経由でNS2::begin関数を見つける
auto e2 = NS2::end(c2);  // NG: 完全修飾名ではNS2::end関数を呼び出せない

トラブル事例

LWG3065 よりC++標準ライブラリで実際に問題となったコードを引用する。名前空間std::filesystemの取り込み(using)によりoperator==(const path&, const path&) が導入され、左辺ではコンストラクタpath(const wchar_t(&)[N])*1が右辺では変換コンストラクタpath(string&&)*2が暗黙に呼び出されることで、文字列比較ではなくパス名(pathname)比較 path(L"a//b") == path("a/b") が行われる*3。この問題は Clang 7.0.0 にて再現確認できた。

#include <assert.h>
#include <string>
#include <filesystem>

using namespace std;
using namespace std::filesystem;

int main() {
  bool b = L"a//b" == std::string("a/b");
  assert(b); // passes. What?!
  return b;
}

同種の問題は LWG2989 でも報告、標準ライブラリ修正されている。

提案文書(PDF)P1601R0 Recommendations for Specifying "Hidden Friends"より一部引用（下線部は強調）。

When there is no additional, out-of-class/namespace-scope, declaration of the befriended entity, such an entity has become known as a hidden friend of the class granting friendship. Were there such an out-of-class/namespace-scope declaration, the entity would be no longer hidden, as the second declaration would make the name visible to qualified and to unqualified lookup.

There have been recent discussions about employing this hidden friend technique, where applicable, throughout the standard library so that the declared entities (typically operator functions such as the new spaceship operator) would be found via ADL only. Because the library has not previously deliberately restricted lookup in this way, there is no precedent for specifying such a requirement. The remainder of this paper provides specification guidance to proposal authors who intend to impose such a requirement.

C++17 6.4.2/p4, 10.3.1.2/p3より一部引用（下線部は強調）。

When considering an associated namespace, the lookup is the same as the lookup performed when the associated namespace is used as a qualifier (6.4.3.2) except that:

Any using-directives in the associated namespace are ignored.

Any namespace-scope friend functions or friend function templates declared in associated classes are visible within their respective namespaces even if they are not visible during an ordinary lookup (14.3).

All names except those of (possibly overloaded) functions and function templates are ignored.

If a friend declaration in a non-local class first declares a class, function, class template or function template the friend is a member of the innermost enclosing namespace. The friend declaration does not by itself make the name visible to unqualified lookup (6.4.1) or qualified lookup (6.4.3). [Note: The name of the friend will be visible in its namespace if a matching declaration is provided at namespace scope (either before or after the class definition granting friendship). -- end note] If a friend function or function template is called, its name may be found by the name lookup that considers functions from namespaces and classes associated with the types of the function arguments (6.4.2). If the name in a friend declaration is neither qualified nor a template-id and the declaration is a function or an elaborated-type-specifier, the lookup to determine whether the entity has been previously declared shall not consider any scopes outside the innermost enclosing namespace. [Note: The other forms of friend declarations cannot declare a new member of the innermost enclosing namespace and thus follow the usual lookup rules. -- end note] [Example:
// Assume f and g have not yet been declared.
void h(int);
template <class T> void f2(T);
namespace A {
  class X {
    friend void f(X);         // A::f(X) is a friend
    class Y {
      friend void g();        // A::g is a friend
      friend void h(int);     // A::h is a friend
                              // ::h not considered
      friend void f2<>(int);  // ::f2<>(int) is a friend
    };
  };

  // A::f, A::g and A::h are not visible here
  X x;
  void g() { f(x); }       // definition of A::g
  void f(X) { /*...*/ }    // definition of A::f
  void h(int) { /*...*/ }  // definition of A::h
  // A::f, A::g and A::h are visible here and known to be friends
}

using A::x;

void h() {
  A::f(x);
  A::X::f(x);    // error: f is not a member of A::X
  A::X::Y::g();  // error: g is not a member of A::X::Y
}
-- end example]

関連URL

https://twitter.com/yohhoy/status/1125671584897744902
cppreference std::filesystem::path, cpprefjp std::filesystem::path

*1:厳密には template<class Source> path(const Source& source, format fmt = auto_format) テンプレートコンストラクタが選択され（30.10.8.3, 30.10.8.4.1/p7）POSIXベースOSでは未規定(unspecified)なエンコード変換が行われる（30.10.8.2.2）

*2:POSIXベースOSでは path::string_type 型は basic_string<char> となり（30.10.8/p5）、厳密には path(string_type&& source, format fmt = auto_format) コンストラクタが選択される（30.10.8.4.1/p4）

*3:30.10.8.1/p2: "Except in a root-name, multiple successive directory-separator characters are considered to be the same as one directory-separator character."

2019-05-28

定数式を要求するコンセプト

C++ C++2a

C++2a(C++20) Conceptを利用した「ある式が定数式であること」を要求する制約式の定義。

型パラメータTに対して「T::size()がコンパイル時に評価されること」を要求するコンセプトHasConstantSizeの定義例。requires式(requires-expression)中の typename type-name; 構文(type-requirement)は、type-name が有効な型であることを表明する。例示コードのように、クラステンプレート特殊化に対しては完全型が要求されない。*1

// C++2a
template<auto> struct require_constant;

template<class T>
concept HasConstantSize = requires {
  typename require_constant<T::size()>;
};

C++2a標準Rangeライブラリ std::range::split_view 定義で同テクニックを利用している。N4810*2 24.7.8.2より一部引用。

namespace std::ranges {
  template<auto> struct require-constant;  // exposition only

  template<class R>
  concept tiny-range =  // exposition only
    SizedRange<R> &&
    requires { typename require-constant<remove_reference_t<R>::size()>; } &&
    (remove_reference_t<R>::size() <= 1);

  // (snip)
}

関連URL

C++ Concepts（P0734R0） - yohhoyの日記

*1:N4810 7.5.7.2: "A type-requirement that names a class template specialization does not require that type to be complete."

*2:http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/n4810.pdf

2019-05-04

書式指定子の入れ子

Python

プログラミング言語 Pythonの str.format や f-string *1 において、書式指定子部のみ１段階の入れ子が許容される。下記コードではいずれも文字列 Hello!!!!! が得られる。

msg = 'Hello'
'{:!<10}'.format(msg)
f'{msg:!<10}'

f, a, w = '!', '<', 10
'{:{}{}{}}'.format(msg, f, a, w)
f'{msg:{f}{a}{w}}'

# 各typeフィールドを明示
f'{msg:{f:s}{a:s}{w:d}s}'

Top-level format specifiers may include nested replacement fields. These nested fields may include their own conversion fields and format specifiers, but may not include more deeply-nested replacement fields. The format specifier mini-language is the same as that used by the string .format() method.
Lexical analysis, Formatted string literals

*1:formatted string literal

2019-04-20

オーバーロード解決の優先順位制御

C++

プログラミング言語C++における関数テンプレートのオーバーロードにおいて、SFINAEと組み合わせてオーバーロード解決の優先順を制御するテクニック。

選択候補が2個のケース

2つの型T, Uに対して、1) 演算 T / U が定義されていれば同演算子を、2) そうでなければ T * (1/U) を選択するケースを考える。*1

// 実装関数 第1候補
template <typename T, typename U>
auto f_impl(T a, U b, int) -> decltype(a / b)
{ return a / b; }

// 実装関数 第2候補
template <typename T, typename U>
auto f_impl(T a, U b, char) -> decltype(a * (U{1} / b))
{ return a * (U{1} / b); }

// 公開関数
template <typename T, typename U>
auto f(T a, U b)
{ return f_impl(a, b, 0); }

実装関数テンプレートの第3引数に 1)int型、2)char型を指定し、f_impl呼び出し側で 0 指定により優先順制御を行う。関数のオーバーロード解決(overload resolution)において、リテラル 0 がint型として扱われる規則が、char型として扱われる規則より優先されるため。*2

選択候補がN個のケース

優先順位付けを表現するrank<I>ヘルパクラスを導入する。公開関数で指定するrank<2>実引数に対して、オーバーロード解決時には基底クラスrank<1>よりもrank<2>へと優先的にマッチするため。同様にrank<0>よりもrank<1>が優先する。

template <int I> struct rank : rank<I-1> {};
template <> struct rank<0> {};

// 実装関数 第1候補
template <typename T, typename U>
auto f_impl(T a, U b, rank<2>) -> decltype(a / b)
{ return a / b; }

// 実装関数 第2候補
template <typename T, typename U>
auto f_impl(T a, U b, rank<1>) -> decltype(a * (U(1) / b))
{ return a * (U(1) / b); }

// 実装関数 第3候補
template <typename T, typename U>
int f_impl(T a, U b, rank<0>)
{ return 0; }

// 公開関数
template <typename T, typename U>
auto f(T a, U b)
{ return f_impl(a, b, rank<2>{}); }

メモ：Web上で本テクニックを利用するコードをちらほら見かける。名前の付いたIdiomではないのだろうか？

関連URL

*1:ここでは公開関数 f でのみ通常関数での戻り値型推論を利用している。実装関数 f_impl の末尾戻り値型宣言を省略するとSFINAEが機能せずに、式 T / U の有効有無にかかわらず1)にオーバーロード解決されてしまう。

*2:リテラル 0 はC++14 2.14.2/p2よりint型と解釈される。優先順位は13.3.3.1.1のstandard conversion sequenceにある1) No conversions required → Eact Match Rank と2) Integral conversions → Conversion Rank に基づく…ような気がする。オーバーロード解決規則は難解すぎて理解できない。