2019-05-31

Hidden Friends

C++

プログラミング言語C++におけるライブラリ設計で「ADL(Argument Dependent Lookup)経由でのみ呼び出し可能な非メンバ関数／演算子オーバーロード定義」を実現するテクニック。

2019-12-03追記：2019年11月会合にて (PDF)P1965R0 が採択され、C++標準規格上の用語(term)として "hidden friends" への言及が明記された。

下記説明コードのみではメリットがわかりづらいが、非メンバ関数インタフェース追加による名前空間汚染の抑止と、プログラマが意図しない関数呼び出しによるトラブル回避が主目的。

namespace NS1 {
  struct C { /*...*/ };

  // 非メンバ begin/end 関数
  inline C::iterator begin(C& c) { return c.begin(); }
  inline C::iterator end(C& c) { return c.end(); }
}
namespace NS2 {
  struct C {
    /*...*/
    // "Hidden Friends" begin/end 関数
    friend iterator begin(C& c) { return c.begin(); }
    friend iterator end(C& c) { return c.end(); }
  };
}

NS1::C c1;
auto b1 = begin(c1);     // OK: ADL経由でNS1::begin関数を見つける
auto e1 = NS1::end(c1);  // OK: 完全修飾名でNS1::end関数を指定

NS2::C c2;
auto b2 = begin(c2);     // OK: ADL経由でNS2::begin関数を見つける
auto e2 = NS2::end(c2);  // NG: 完全修飾名ではNS2::end関数を呼び出せない

トラブル事例

LWG3065 よりC++標準ライブラリで実際に問題となったコードを引用する。名前空間std::filesystemの取り込み(using)によりoperator==(const path&, const path&) が導入され、左辺ではコンストラクタpath(const wchar_t(&)[N])*1が右辺では変換コンストラクタpath(string&&)*2が暗黙に呼び出されることで、文字列比較ではなくパス名(pathname)比較 path(L"a//b") == path("a/b") が行われる*3。この問題は Clang 7.0.0 にて再現確認できた。

#include <assert.h>
#include <string>
#include <filesystem>

using namespace std;
using namespace std::filesystem;

int main() {
  bool b = L"a//b" == std::string("a/b");
  assert(b); // passes. What?!
  return b;
}

同種の問題は LWG2989 でも報告、標準ライブラリ修正されている。

提案文書(PDF)P1601R0 Recommendations for Specifying "Hidden Friends"より一部引用（下線部は強調）。

When there is no additional, out-of-class/namespace-scope, declaration of the befriended entity, such an entity has become known as a hidden friend of the class granting friendship. Were there such an out-of-class/namespace-scope declaration, the entity would be no longer hidden, as the second declaration would make the name visible to qualified and to unqualified lookup.

There have been recent discussions about employing this hidden friend technique, where applicable, throughout the standard library so that the declared entities (typically operator functions such as the new spaceship operator) would be found via ADL only. Because the library has not previously deliberately restricted lookup in this way, there is no precedent for specifying such a requirement. The remainder of this paper provides specification guidance to proposal authors who intend to impose such a requirement.

C++17 6.4.2/p4, 10.3.1.2/p3より一部引用（下線部は強調）。

When considering an associated namespace, the lookup is the same as the lookup performed when the associated namespace is used as a qualifier (6.4.3.2) except that:

Any using-directives in the associated namespace are ignored.

Any namespace-scope friend functions or friend function templates declared in associated classes are visible within their respective namespaces even if they are not visible during an ordinary lookup (14.3).

All names except those of (possibly overloaded) functions and function templates are ignored.

If a friend declaration in a non-local class first declares a class, function, class template or function template the friend is a member of the innermost enclosing namespace. The friend declaration does not by itself make the name visible to unqualified lookup (6.4.1) or qualified lookup (6.4.3). [Note: The name of the friend will be visible in its namespace if a matching declaration is provided at namespace scope (either before or after the class definition granting friendship). -- end note] If a friend function or function template is called, its name may be found by the name lookup that considers functions from namespaces and classes associated with the types of the function arguments (6.4.2). If the name in a friend declaration is neither qualified nor a template-id and the declaration is a function or an elaborated-type-specifier, the lookup to determine whether the entity has been previously declared shall not consider any scopes outside the innermost enclosing namespace. [Note: The other forms of friend declarations cannot declare a new member of the innermost enclosing namespace and thus follow the usual lookup rules. -- end note] [Example:
// Assume f and g have not yet been declared.
void h(int);
template <class T> void f2(T);
namespace A {
  class X {
    friend void f(X);         // A::f(X) is a friend
    class Y {
      friend void g();        // A::g is a friend
      friend void h(int);     // A::h is a friend
                              // ::h not considered
      friend void f2<>(int);  // ::f2<>(int) is a friend
    };
  };

  // A::f, A::g and A::h are not visible here
  X x;
  void g() { f(x); }       // definition of A::g
  void f(X) { /*...*/ }    // definition of A::f
  void h(int) { /*...*/ }  // definition of A::h
  // A::f, A::g and A::h are visible here and known to be friends
}

using A::x;

void h() {
  A::f(x);
  A::X::f(x);    // error: f is not a member of A::X
  A::X::Y::g();  // error: g is not a member of A::X::Y
}
-- end example]

関連URL

https://twitter.com/yohhoy/status/1125671584897744902
cppreference std::filesystem::path, cpprefjp std::filesystem::path

*1:厳密には template<class Source> path(const Source& source, format fmt = auto_format) テンプレートコンストラクタが選択され（30.10.8.3, 30.10.8.4.1/p7）POSIXベースOSでは未規定(unspecified)なエンコード変換が行われる（30.10.8.2.2）

*2:POSIXベースOSでは path::string_type 型は basic_string<char> となり（30.10.8/p5）、厳密には path(string_type&& source, format fmt = auto_format) コンストラクタが選択される（30.10.8.4.1/p4）

*3:30.10.8.1/p2: "Except in a root-name, multiple successive directory-separator characters are considered to be the same as one directory-separator character."

2019-05-28

定数式を要求するコンセプト

C++ C++2a

C++2a(C++20) Conceptを利用した「ある式が定数式であること」を要求する制約式の定義。

型パラメータTに対して「T::size()がコンパイル時に評価されること」を要求するコンセプトHasConstantSizeの定義例。requires式(requires-expression)中の typename type-name; 構文(type-requirement)は、type-name が有効な型であることを表明する。例示コードのように、クラステンプレート特殊化に対しては完全型が要求されない。*1

// C++2a
template<auto> struct require_constant;

template<class T>
concept HasConstantSize = requires {
  typename require_constant<T::size()>;
};

C++2a標準Rangeライブラリ std::range::split_view 定義で同テクニックを利用している。N4810*2 24.7.8.2より一部引用。

namespace std::ranges {
  template<auto> struct require-constant;  // exposition only

  template<class R>
  concept tiny-range =  // exposition only
    SizedRange<R> &&
    requires { typename require-constant<remove_reference_t<R>::size()>; } &&
    (remove_reference_t<R>::size() <= 1);

  // (snip)
}

関連URL

C++ Concepts（P0734R0） - yohhoyの日記

*1:N4810 7.5.7.2: "A type-requirement that names a class template specialization does not require that type to be complete."

*2:http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/n4810.pdf

2019-05-04

書式指定子の入れ子

Python

プログラミング言語 Pythonの str.format や f-string *1 において、書式指定子部のみ１段階の入れ子が許容される。下記コードではいずれも文字列 Hello!!!!! が得られる。

msg = 'Hello'
'{:!<10}'.format(msg)
f'{msg:!<10}'

f, a, w = '!', '<', 10
'{:{}{}{}}'.format(msg, f, a, w)
f'{msg:{f}{a}{w}}'

# 各typeフィールドを明示
f'{msg:{f:s}{a:s}{w:d}s}'

Top-level format specifiers may include nested replacement fields. These nested fields may include their own conversion fields and format specifiers, but may not include more deeply-nested replacement fields. The format specifier mini-language is the same as that used by the string .format() method.
Lexical analysis, Formatted string literals

*1:formatted string literal

2019-04-20

オーバーロード解決の優先順位制御

C++

プログラミング言語C++における関数テンプレートのオーバーロードにおいて、SFINAEと組み合わせてオーバーロード解決の優先順を制御するテクニック。

選択候補が2個のケース

2つの型T, Uに対して、1) 演算 T / U が定義されていれば同演算子を、2) そうでなければ T * (1/U) を選択するケースを考える。*1

// 実装関数 第1候補
template <typename T, typename U>
auto f_impl(T a, U b, int) -> decltype(a / b)
{ return a / b; }

// 実装関数 第2候補
template <typename T, typename U>
auto f_impl(T a, U b, char) -> decltype(a * (U{1} / b))
{ return a * (U{1} / b); }

// 公開関数
template <typename T, typename U>
auto f(T a, U b)
{ return f_impl(a, b, 0); }

実装関数テンプレートの第3引数に 1)int型、2)char型を指定し、f_impl呼び出し側で 0 指定により優先順制御を行う。関数のオーバーロード解決(overload resolution)において、リテラル 0 がint型として扱われる規則が、char型として扱われる規則より優先されるため。*2

選択候補がN個のケース

優先順位付けを表現するrank<I>ヘルパクラスを導入する。公開関数で指定するrank<2>実引数に対して、オーバーロード解決時には基底クラスrank<1>よりもrank<2>へと優先的にマッチするため。同様にrank<0>よりもrank<1>が優先する。

template <int I> struct rank : rank<I-1> {};
template <> struct rank<0> {};

// 実装関数 第1候補
template <typename T, typename U>
auto f_impl(T a, U b, rank<2>) -> decltype(a / b)
{ return a / b; }

// 実装関数 第2候補
template <typename T, typename U>
auto f_impl(T a, U b, rank<1>) -> decltype(a * (U(1) / b))
{ return a * (U(1) / b); }

// 実装関数 第3候補
template <typename T, typename U>
int f_impl(T a, U b, rank<0>)
{ return 0; }

// 公開関数
template <typename T, typename U>
auto f(T a, U b)
{ return f_impl(a, b, rank<2>{}); }

メモ：Web上で本テクニックを利用するコードをちらほら見かける。名前の付いたIdiomではないのだろうか？

関連URL

*1:ここでは公開関数 f でのみ通常関数での戻り値型推論を利用している。実装関数 f_impl の末尾戻り値型宣言を省略するとSFINAEが機能せずに、式 T / U の有効有無にかかわらず1)にオーバーロード解決されてしまう。

*2:リテラル 0 はC++14 2.14.2/p2よりint型と解釈される。優先順位は13.3.3.1.1のstandard conversion sequenceにある1) No conversions required → Eact Match Rank と2) Integral conversions → Conversion Rank に基づく…ような気がする。オーバーロード解決規則は難解すぎて理解できない。

2019-04-03

Customization Point Object

C++ C++2a

C++2a(C++20)標準ライブラリに導入される Customization Point Object についてメモ。*1

まとめ：

Customization Point == ユーザ定義型に対して事前定義した動作カスタマイズ可能点。具体的な処理実装ソースコードから呼び出される名前。
- 2021-06-29追記：C++20標準ライブラリのCPOには Customization Point を規定しないものもある。これらはそのCPO名によって動作カスタマイズが行えない点を除いて、本記事で説明する “真のCPO” と同じ性質を持つ。名前付けもうちょっとこう...
Customization Point Object(CPO) == 上記目的のためにライブラリ側で定義する、グローバルな関数オブジェクト。
最初に「コンセプトによる型制約を検査」してから「ADLによって適切な関数オーバーロードを検索」するための仕組み。
- C++17現在の関数オーバーロード方式では、[A]誤って利用されるリスクが高く、[B]コンセプトを用いた型制約を強制できないという問題がある。
CPO呼び出しの引数型は、CPOが要求するコンセプトを満たす。引数型リストがコンセプトを満たさない場合は、オーバーロード解決候補から除外される。
CPO呼び出しの戻り値型は、CPOに対して定めたコンセプトを満たす。*2

C++17

C++17標準ライブラリでは swap, begin, end 関数が Customization Point となっている*3。ユーザ定義型に対するカスタマイズが正しく機能するには、“名前空間std配下の名前swapを導入(using)” した後に、“非修飾名(unqualified)でswap関数を呼び出す” 必要がある。この挙動の理解には込み入った知識を要求するため、C++プログラマに誤使用されやすいという問題があった。

namespace NS {
  struct S;
  // ユーザ定義型に対するカスタム動作
  void swap(S&, S&);
}

// OK: 正しいswap呼び出し
template <typename T>
void func0(T& a, T& b) {
  using std::swap;
  swap(a, b);
  // 変数型TがNS名前空間に属する場合、ADLにより
  // カスタマイズ版(NS::swap)関数が呼び出される
}

// NG: 完全修飾名による呼び出し
template <typename T>
void func1(T& a, T& b) {
  std::swap(a, b);
  // T=NS::Sに対してカスタマイズ版が呼び出されない
}

さらにC++2aではコンセプトを用いたテンプレートパラメータ制約を表現可能となるが、ユーザ定義型に対してライブラリ側で事前設計した型制約を回避できてしまうという問題が生じる。

namespace std {
  // 仮想的なコンセプト: Fooable
  template <typename T> concept Fooable = /*...*/;
  // 仮想的なCustomization Point: T型はFooableを満たすべき
  template <Fooable T> void foo(T x) { /*...*/ }
}

namespace NS {
  // std::Fooableコンセプトを満たさないユーザ定義型
  struct S;
  // ユーザ定義型に対するカスタム動作（型制約を無視）
  void foo(S);
}

NS::S x;
using std::foo;
foo(x);  // ★OK!?
// std::Fooableコンセプト要件の検査をバイパスして
// カスタマイズ版NS::fooの呼び出しに成功してしまう

C++2a

C++標準ライブラリへのCPO導入によって、既存の2つの課題解決をはかっている。（説明用 Customization Point として名前 foo を利用）

完全修飾名呼び出しstd::foo(a);または非修飾名呼び出しusing std::foo; foo(a);は、いずれの呼び出しでも同じ振舞いになること。
using std::foo; foo(a);としても、CPO foo が要求する型制約がバイパス（無視）されないこと。

CPOの実装例は次の通り。

namespace std {
  // 仮想的なコンセプト: Fooable
  template <typename T> concept Fooable = /*...*/;
  // 少なくともint型, double型はFooableコンセプトを満たすと仮定

  namespace detail {
    // ライブラリ標準のfoo動作を定義...
    void foo(int T);     // #1
    void foo(double T);  // #2

    struct foo_cpo {
      // 型パラメータTはFooableコンセプトを満たすべき
      template <Fooable T>
      void operator()(T a) {
        // 非修飾名呼び出しによってdetail::foo()への解決
        // またはADLによる名前解決が行われる
        foo(a);
      }
    };
  }

  // グローバル関数オブジェクトとしてCPO定義
  inline constexpr detail::foo_cpo foo{}; 
}

namespace NS {
  // std::Fooableコンセプトを満たさないユーザ定義型NS::BadS
  struct BadS;
  void foo(BadS);  // #3
  // std::Fooableコンセプトを満たすユーザ定義型NS::FooU
  struct FooU;
  void foo(FooU);  // #4
}

int val = 42;
NS::BadS bad;
NS::FooU good;
// 完全修飾名による呼び出し
{
  std::foo(val);  // OK: #1を呼び出し
  std::foo(bad);  // NG: ill-formed
  std::foo(good); // OK: #4を呼び出し
}
// using＋非修飾名による呼び出し
{
  using std::foo;
  foo(val);  // OK: #1を呼び出し
  foo(bad);  // NG: ill-formed
  foo(good); // OK: #4を呼び出し
}

WD N4810現在のC++2a標準ライブラリでは、Rangesライブラリ要素として下記CPOを導入する（名前空間stdは省略）。後方互換性維持のため、従来swap, begin, endはC++17ライブラリ仕様のまま維持される。

ranges::swap
ranges::iter_move, ranges::iter_swap
ranges::(c)(r)begin, ranges::(c)(r)end*4
ranges::size
ranges::empty
ranges::data, ranges::cdata
view::single
view::iota
view::all
view::filter
view::transform
view::take
view::join
view::split
view::counted
view::common
view::reverse

2021-01-28追記：C++20標準ライブラリでは上記に加えてranges::ssize、名前空間std直下の{strong,weak,partial}_order, compare_{strong,weark,partial}_order_fallbackもCPOとして定義される。

2021-06-29追記：あるCPOがユーザ定義型に対して直接的に動作カスタマイズ可能か否かは、該当CPOの動作仕様に依存する。

Customization Pointを規定するCPO：
- 名前空間std以下のstrong_order, weak_order, partial_order
- 名前空間std::ranges以下のswap, begin, end, rbegin, rend, size, data, empty*5
例えばranges::{cbegin, cend}や名前空間views以下のRangeアダプタオブジェクト(range adaptor object)は標準ライブラリ定義上 Customization Point Object とされるが、ユーザ定義型に対する直接的な動作カスタマイズは行えない。*6

関連URL

*1:この記事は C++20を相談しながら調べる会 #1 のアウトプットとして書かれました。参加中に全てをまとめた訳ではないのですが、同イベントは強い動機付けになっています。

*2:ライブラリでは名前 foo とコンセプト C を定義しておき、ライブラリ内部実装がコンセプト C を要求する箇所において式 foo(x) を利用する。

*3:C++17言語仕様には直接 "Customization Point" という用語は登場しないが、swap, begin, end 関数テンプレートを非修飾名で呼び出すことを規定している。このようなC++ライブラリ仕様により、ユーザ定義型に対するカスタマイズ版が適切に利用される。

*4:begin, cbegin, rbegin, crbegin の略。end についても同様。

*5:CPO data, empty はユーザ定義型のメンバ関数を介してのみ動作カスタマイズ可能。それ以外のCPOでは、ADL経由非メンバ関数呼び出しによる動作カスタマイズがサポートされる。

*6:例えば cbegin, cend はカスタマイズ可能なCPO begin, end を介して、間接的かつ部分的にその動作をカスタマイズできる。

2019-03-21

swap(T, U)とis_swappable_with<T, U>とvector<bool>

C++ gcc Clang MSVC

C++17標準ライブラリには「型が異なる変数間での値交換(swap)」可能か否かを判定するメタ関数std::is_(nothrow_)swappable_with<T, U>が存在する。一般的には値交換操作は同一型変数間(swap(T&, T&))で行われるが、プロキシ型(proxy)のような特殊ケースにおいて異型変数間での値交換(swap(T, U))が必要となるため。*1

// <type_traits>ヘッダ
namespace std {
  template <class T, class U>
  struct is_swappable_with;

  template <class T, class U>
  struct is_nothrow_swappable_with;
}

C++17 20.5.3.2/p5 Exampleを一部転用したコード例：

#include <type_traits>
#include <utility>

namespace N {
  struct A { int m; };
  struct Proxy { A* a; };
  Proxy proxy(A& a) { return Proxy{ &a }; }
  void swap(A& x, Proxy p) {
    std::swap(x.m, p.a->m);
  }
  void swap(Proxy p, A& x) { swap(x, p); }
}

N::A a1 = { 1 }, a2 = { 2 };
auto p2 = N::proxy(a2);

// N::A& と N::Proxy 間で値交換可能
static_assert(std::is_swappable_with_v<N::A&, N::Proxy>);

swap(a1, p2);  // OK
assert(a1.m == 2 && a2.m == 1);

std::vector<bool>コンテナ

（一部で悪名高い）std::vector<bool>コンテナクラスはこのようなプロキシ型を利用する。同コンテナはbool値のビット単位管理によりメモリを効率的に利用できるが*2、その代償としてbool型要素への参照bool&を直接返せないため、要素への添字アクセスv[0]などはプロキシ型vector<bool>::referenceを返す実装となっている。

#include <vector>
#include <utility>

std::vector<bool> v{ true };
bool b = false;

static_assert(
  std::is_swappable_with_v<std::vector<bool>::reference, bool&>
);  // OK?

swap(v[0], b);  // OK?

上記コードはvector<bool>プロキシ型とbool型変数が値交換可能であることを期待するが、C++17現在の標準ライブラリ仕様では該当コードの動作を保証しない。GCC/libstdc++*3、Clang/libc++*4ではコンパイル＆実行可能だが、MSVC 19.16ではコンパイルエラーとなる。*5

この問題は P0022R2, 3.1 Proxy Iterator problems にて言及されている。

For all its problems, vector<bool> works surprisingly well in practice, despite the fact that fairly trivial code such as below is not portable.
std::vector<bool> v{true, false, true};
auto i = v.begin();
bool b = false;
using std::swap;
swap(*i, b);  // Not guaranteed to work.
Because of the fact that this code is underspecified, it is impossible to say with certainty which algorithms work with vector<bool>. That fact that many do is due largely to the efforts of implementors and to the fact that bool is a trivial, copyable type that hides many of the nastier problems with proxy references. For more interesting proxy reference types, the problems are impossible to hide.

関連URL

*1:本記事は nakameguro_feature.cpp vol.17 勉強会で取り上げられた疑問がきっかけ。

*2:一般的なC++処理系では、1バイト中に8個の bool 値を詰め込むことで消費メモリサイズを節約できる。

*3:https://github.com/gcc-mirror/gcc/commit/5345c53733c161a7781dd55559a4e1458751da1d

*4:https://github.com/llvm-mirror/libcxx/blob/bc8d3f97eb5c958007f2713238472e0c1c8fe02c/include/__bit_reference#L75-L93

*5:https://gcc.godbolt.org/z/Ts_vLF

2019-03-17

Goodbye "bit" in C++, (Partially)

C++ C++2a

C++2a(C++20)言語仕様の定義においては、用語 "bit" の利用はできるだけ回避される（完全に無くなる訳ではない）。これはC++2a言語仕様変更「符号付き整数型==2の補数表現を保証」の影響。

提案文書 P1236R1 Alternative Wording for P0907R4 Signed Integers are Two's Complement 冒頭部より引用。

This paper presents alternative wording for P0907R3 Signed Integers are Two's Complement by Jean François Bastien, avoiding talking about unobservable bits as much as possible.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r1.html

整数型サイズの表現では、独自の用語 "range exponent" が利用される。N4800(C++2a WD) 6.7.1/p4より一部引用。

Table 10 -- Minimum range exponent

Type Minimum range exponent N

signed char 8

short 16

int 16

long 32

long long 64

The range exponent of each signed integer type shall not be less than the values specified in Table 10. The value representation of a signed or unsigned integer type comprises N bits, where N is the respective range exponent. Each set of values for any padding bits (6.7) in the object representation are alternative representations of the value specified by the value representation. (snip)

Type	Minimum range exponent N
signed char	8
short	16
int	16
long	32
long long	64

関連URL