Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Lambda Capture Lists #117

Closed
stephentoub opened this issue Jan 28, 2015 · 29 comments
Closed

Proposal: Lambda Capture Lists #117

stephentoub opened this issue Jan 28, 2015 · 29 comments
Assignees
Labels
Area-Language Design Feature Request Feature Specification Language-C# Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on.

Comments

@stephentoub
Copy link
Member

(Note: this proposal was briefly discussed in #98, the C# design notes for Jan 21, 2015. It has not been updated further based on the discussion that's already occurred on that thread.)

Background

Today, lambdas automatically capture any referenced state into a closure object. These captures happen by reference, in that the "local" variable that’s closed over is compiled as a field onto the "display class," with an instance of that class used both by the containing method and by the lambda defined in it.

// Original code
public static void ContainingMethod()
{
    int i = 42;
    Action a = () => {
       Console.WriteLine(i);
    };
}

// Approximate compiled equivalent
public static void ContainingMethod()
{
    var class2 = new <>c__DisplayClass1();
    class2.i = 42;
    Action action = new Action(class2.<ContainingMethod>b__0);
}
private sealed class <>c__DisplayClass1
{
    public int i;
    public void <ContainingMethod>b__0()
    {
        Console.WriteLine(this.i);
    }
}

The ability to write such concise code and have the compiler generate all of the necessary boilerplate is a huge productivity win.

Problem

While this is a productivity win, it also hides some key aspects of how the mechanism works, in particular how the data makes its way into the lambda and where that data is stored, namely in an allocated object.

Solution: Explicitly Specifying What Gets Captured

When C++11 introduced lambda support, it explicitly disabled the implicit automatic capturing of any referenced state. Instead, developers are forced to state their intentions for what they want captured by using a "capture list." C# today behaves as if all captured state is captured by reference, and we would retain that by default, but we could also allow developers (optionally) to use capture lists to be more explicit about what they want captured. Using a capture list, the previously explored ContainingMethod example could be written:

public static void ContainingMethod()
{
    int i = 42;
    Action a = [i]() => {
        Console.WriteLine(i);
    };
}

This states that the lambda captures the ‘i’ variable and nothing else. As such, this code is the exact equivalent of the previous example and will result in exactly the same code being generated by the compiler. However, now that we’ve specified a capture list, any attempt by the method to use a value not included in the capture list will be met with an error. This verification helps the developer not only better understand what state is being used, it also helps to enable compiler-verification that no allocations are involved in a closure. If a closure is instead written with an empty capture list, the developer can be assured that the lambda won’t capture any state, won’t require a display class, and thus won’t result in an allocation (in most situations, the compiler will then also be able to statically cache the delegate, so that the delegate will only be allocated once for the program rather than once per method call):

public static void ContainingMethod()
{
    int i = 42;
    Action a = []() => {
        Console.WriteLine(i); // Error: can’t access non-captured ‘i'
    };
}

Additional Support: Capturing By Value

Today if a developer wants the equivalent of capturing by value instead of capturing by reference, they must first make a copy outside of the closure and reference that copy, e.g

public static void ContainingMethod()
{
    int i = 42; // variable to be captured by value
    ...
    int iCopy = i;
    Action a = () => {
        Console.WriteLine(iCopy); 
    };
}

In this example, since the lambda closes over iCopy rather than i, it's effectively capturing a copy by reference, and thus has the same semantics as if capturing i by value. This, however, is verbose and error prone, in that a developer must ensure that iCopy is only used inside the lambda and not elsewhere, and in particular not inside of another lambda that might close over the same value. Instead, we could support assignment inside of a capture list:

public static void ContainingMethod()
{
    int i = 42; // variable to be captured by value
    ...
    Action a = [int iCopy = i]() => {
        Console.WriteLine(iCopy); 
    };
}

Now, only iCopy and not i can be used inside of the lambda, and iCopy is not available outside of the scope of the lambda.

// Compiled equivalent
public static void ContainingMethod()
{
    int i = 42;
    var class2 = new <>c__DisplayClass1();
    class2.iCopy = i;
    Action action = new Action(class2.<ContainingMethod>b__0);
}
private sealed class <>c__DisplayClass1
{
    public int iCopy;
    public void <ContainingMethod>b__0()
    {
       Console.WriteLine(this.iCopy);
    }
}

With the developer explicitly specifying what to capture and how to capture it, the effects of the lambda capture are made much clearer for both the developer writing the code and someone else reading the code, improving the ability to catch errors in code reviews.

Alternate Approach

Instead of or in addition to support for lambda capture lists, we should consider adding support for placing attributes on lambdas. This would allow for a wide-range of features, but in particular would allow for static analysis tools and diagnostics to separately implement features like what lambda capture lists are trying to enable.

For example, if the "[]" support for specifying an empty capture list is unavailable but a developer was able to apply attributes to lambdas, they could create a "NoClosure" attribute and an associated Roslyn diagnostic that would flag cases where a lambda annotated with [NoClosure] actually captured something:

public static void ContainingMethod()
{
    int i = 42;
    Action a = [NoClosure]() => { // diagnostic error: lambda isn't allowed to capture
       Console.WriteLine(i);
    };
}
@giggio
Copy link

giggio commented Jan 28, 2015

I like it and I prefer the main approach over the alternate. It provides more control.

@ghost
Copy link

ghost commented Jan 28, 2015

Capturing all enclosed references by value may be common enough to warrant a shorthand syntax. This can be described using "skinny arrow" notation.

int value = 0;
Func<int> getValue = () -> value;
value++;
Console.WriteLine(getValue()); // Output is 0
Console.WriteLine(value); // Output is 1

@giggio
Copy link

giggio commented Jan 28, 2015

@Romoku I like it, but it would present 2 different syntaxes for the problem, and that might be confusing. One for copying, and another for which variables you would be closing upon.

@ghost
Copy link

ghost commented Jan 28, 2015

@giggio I hesitate to suggest making pass by value semantics the default for lambda expressions. However, considering there is code already to workaround the pass by reference semantics it may turn out to be a minor issue.

@ryancerium
Copy link

I'm inventing new keywords here, and ref may not work in this context since it should be legal to have a parameter by ref anyway:

int i = 42;
int j = 13;
Func<int> getValue1 = () => 42;
Func<int> getValue2 = (noscope) => 42;
// Func<int> getValue2 = (noscope) => i; // Error capture disallowed
Func<int> getValue3 = (ref i) => i++;
Func<int> getValue4 = (value i) => i--;
// Func<int> getValue4 = (value i) => i + j; // Error, j isn't captured

getValue1(); // 42
getValue2(); // 42
getValue3(); // 42 and increments global i
var is43 = i == 43; // true
getValue4(); // 42 and decrements getValue4.i
getValue4(); // 41 and decrements getValue4.i

@sharwell
Copy link
Member

You could always go the way of C++ and capture by value or &reference:

int x = 0;
int y = 0;
Action action = [x, &y]() =>
  {
    Console.WriteLine($"x={x}, y={y}");
    x = 1;
    y = 2;
    Console.WriteLine($"x={x}, y={y}");
  };
action();
Console.WriteLine($"x={x}, y={y}");

Outputs:

x=0, y=0
x=1, y=2
x=0, y=2

@ryancerium
Copy link

@sharwell

You could always go the way of C++ and capture by value or &reference:

The original discussion mentioned a strong distaste for the C++ syntax: "One problem is that it frankly looks horrible."

@sharwell
Copy link
Member

@ryancerium I was going to add a 😆 emoji, but then I figured for better or worse it actually is a valid option for consideration. Perhaps the merits of this approach could influence the final feature, even if it's not taken as-is.

@HaloFour
Copy link

I do kind of like the C++ capture list syntax although I agree that it doesn't quite jive with C#. Perhaps something like the following:

int i = 1;
int j = 2;
Action a = [i, ref j]() => {
    i = 3;
    j = 4;
};
a();
Debug.Assert(i == 1);
Debug.Assert(j == 4);

This doesn't really address the functionality in C++ that allows for the capture list to specify that the lambda can close any variables either by value or by reference via [=] and [&] respectively. Perhaps [*] or [ref *]? The latter is kind of redundant since that is the behavior given no capture list.

Of course if there ever is a consideration of adding support for attributes to the lambda method then using [] for the capture list might also be a problem.

@migueldeicaza
Copy link

Two bits of feedback:

= This =

It would be very useful to also have the ability to explicitly restrict capturing "this" (or alternatively, to require "this" to be specified).

This would help in scenarios where lambdas are used in managed libraries that interoperate with unmanaged libraries that can cause strong cycles to be kept in memory. We have struggled with those from the Linux/Gtk+ days to this day.

= Weak Captures =
Additionally, it would be nice if there was a way to annotate captured variables as being done with a WeakReference, automatically surfacing a captured variable as a WeakReference, also for preventing strong cycles.

@HaloFour
Copy link

Good point. If specifying a capture list then specifying this should be compulsory (except if capture lists support wildcards as mentioned in my previous post.)

Action a0 = () => {
    this.DoSomething(); // good, supported today
};

Action a1 = []() => {
    this.DoSomething(); // compiler error, with no capture list method treated as static
};

int i = 123;
Action a2 = [ref i]() => {
    this.DoSomething(); // compiler error, this not defined, doesn't even reference generated display class
};

Action a3 = [this]() => {
    this.DoSomething(); // good
};

Action a4 = [*]() => { // equiv to C++ [=]
    this.DoSomething(); // good
};

Action a5 = [ref *]() => { // equiv to C++ [&], equiv to no capture list
    this.DoSomething(); // good
};

Action a6 = [ref this]() => { // compiler error, this can't be ref
    this.DoSomething(); 
};

@stephentoub
Copy link
Member Author

@migueldeicaza, thanks!

Regarding 'this', without introducing new syntax for lambdas, I don't know that much could be done to restrict the use of 'this' by default, as that'd be a breaking change; however, with this proposal, once you specify a capture list, you would have to include 'this' in it if you wanted to be able to access it.

Regarding capturing by weak reference, with the proposal as written you could do something like:

object o = ...;
Action a = [var wro = new WeakReference(o)]() => {
    ... // can use wro here, but not o
    object local;
    if (wro.TryGetTarget(out local)) { ... }
};

If the language itself had a keyword "weak" or something that could be used on variable declarations of reference types, then that would naturally translate to this proposal as well, e.g.

object o = ...;
Action a = [weak o]() => {
    object local = o;
    if (o != null) { ... }
};

@mikedn
Copy link

mikedn commented Jan 28, 2015

+1 for the C++ syntax, I don't see anything wrong with it except that ref should be used instead of & as HaloFour already pointed out.

@tomasr
Copy link

tomasr commented Jan 29, 2015

I don't mind the C++ syntax, though, honestly, I'm not sure I like the proposed syntax for By-Value Capture. I can't quite say what, but just doesn't seem very right to me (probably because it's way too verbose for more than a single variable).

I think the option of allowing attributes on lambdas is intriguing, but I don't see it necessarily as a replacement or alternative for explicit captures (as you mentioned, it could be used for other things). Based on that, I wonder if perhaps choosing a syntax for explicit captures that doesn't clash with the regular attribute syntax wouldn't be a better option, so that, if desired, lambda attributes could be introduced later on without causing conflicts (or making parsing uglier).

Thought experiment: Could the syntax for generics (covariance/contravariance) be reused to accomplish the same thing, using in/ref as hints for by-value/by-ref capture? Something like:

public static void ContainingMethod()
{
    int i = 42;
    int j = 50;
    Action a = <in i, ref j>() =>  // i captured by-value, j captured by-ref
    {
        Console.WriteLine(i);
        Console.WriteLine(j);
    };
}

@ryancerium
Copy link

Action a = [var wro = new WeakReference(o)]() => {

@stephentoub That syntax has the strong advantage of looking like default parameters. I like it. You could probably elide the var if you wanted to closer match the current lambda parameter declarations.

@ryancerium
Copy link

Tossing out wild syntax ideas this morning. Yes, I realize that b and c are unused in the lambda, I added them for idea-flinging.

// Doesn't capture anything from the enclosing scope
[](a) => a.Frob;
<>(a) => a.Frob;
{}(a) => a.Frob;

(a)[] => a.Frob;
(a)<> => a.Frob;
(a){} => a.Frob;

(a, noscope) => a.Frob;

// Captures a bunch of stuff from the enclosing scope in a variety of ways
[wro = new WeakReference(o), ref b, value c](a) => a.frob(wro); // C++ style
<wro = new WeakReference(o), ref b, value c>(a) => a.frob(wro); 
{wro = new WeakReference(o), ref b, value c}(a) => a.frob(wro); // Looks like an anonymous object with a delegate(?) call

(a)[wro = new WeakReference(o), ref b, value c] => a.frob(wro);
(a)<wro = new WeakReference(o), ref b, value c> => a.frob(wro); // Looks like a generic
(a){wro = new WeakReference(o), ref b, value c} => a.frob(wro); // Looks like an anonymous type

(a, wro = new WeakReference(o), ref b, value c) => a.frob(wro); // Looks like a lambda function

@MgSam
Copy link

MgSam commented Feb 4, 2015

I don't like the idea of lambda capture lists or adding a value-capture syntax at all. There is a good reason C# didn't have this to begin with- it adds a lot of complexity without adding a lot of benefit.

Developers have been using implicit capture in C# lambdas successfully for many years now- I see absolutely no reason to suddenly add a bunch of ugly syntax into the language to make it explicit. Tools like ReSharper already warn you when you're doing an implicit capture- presumably a Roslyn analyzer will be able to do the same thing to bring this capability to everyone.

If there's really a demand, I could live with an optional attribute but even then it seems like language design work for a largely unnecessary feature. If you want to ensure there are no captures, make a separate method, the same as you have to do today.

Right now lambdas are simple, easy to use, and beautiful. Please don't turn this language into C++. If it ain't broke, don't fix it.

@mikedn
Copy link

mikedn commented Feb 4, 2015

Developers have been using implicit capture in C# lambdas successfully for many years now

It seems that you forgot the foreach scope hack that was added because people just couldn't get how things work or just wished they work differently. And the problem still persists for for loops and can't be solved by hacking the for loop variable scope. Value capture solves this cleanly.

Please don't turn this language into C++.

That's a dubious request. C++ may have its (perceived) shortcomings but it also did a lot of things right and this is one of them. Copying a feature from C++ doesn't mean that C# suddenly transforms in C++.

@axel-habermaier
Copy link
Contributor

@MgSam, @mikedn: Also, this feature would be completely optional for backwards compatibility reasons. If you don't like it, don't use it. I for one would find it useful, occasionally.

@HaloFour
Copy link

HaloFour commented Feb 4, 2015

This was one of those features that when I saw the lambda specs for C++ I thought, "damn, that's a really nice idea." But I do agree that most of the time what the compiler does today (even with the foreach scope trick) is generally what people would intuitively expect so allowing it to remain optional (which it would have to be in order to not break existing code) is also a good thing.

As for the syntax being tossed around, most of it seems fine but I don't know how I feel about the inline copy-variable declarations. That just seems really verbose to me and could cause someone to lose track of exactly what they're reading since it feels like normal variable declarations. Does that buy anything aside from scoping the declared name to the lambda?

@giggio
Copy link

giggio commented Feb 7, 2015

I agree with it being optional, and I don't think they have an option here, they have to make it optional so they don't break everyone's code. I like the idea of having more control and this is a very welcome feature.

@alrz
Copy link
Member

alrz commented Mar 18, 2016

I'd like to suggest using lambda capture list syntax to omit parameter list, repeating my comment from #9834:

// currently
Action      action = delegate() {};
Action<int> action = delegate(int arg) {};

Action      action = delegate {};
Action<int> action = delegate {};

// capture lists
Action      action = []() => {};
Action<int> action = [](arg) => {};

// omitted
Action      action = [] => {};
Action<int> action = [] => {};

void F(Action      action) {}
void F(Action<int> action) {}
void G(Action<int> action) {}

F(delegate {}); // ERROR
F([] => {});    // ERROR

G(delegate {}); // OK
G([] => {});    // OK

@fsacer
Copy link
Contributor

fsacer commented May 4, 2016

I would really like this feature, because it allows to limit scope of a function which is hard to achieve otherwise in C#. I would like to know if this is planned for any release of C#.

@alrz
Copy link
Member

alrz commented Jan 19, 2017

Can this be applied to local functions?

@iam3yal
Copy link

iam3yal commented Jan 19, 2017

@alrz I think that it would be better to write a proposal so people could discuss it in the context of local functions.

@gafter
Copy link
Member

gafter commented Mar 17, 2017

The LDM discussed this yesterday and we do not think we would take this on as a championed proposal. We might do something more narrowly focused to distinguish lambdas that capture nothing, or only this.

@gafter gafter closed this as completed Mar 17, 2017
@gafter gafter added Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on. and removed 2 - Ready labels Mar 17, 2017
@iam3yal
Copy link

iam3yal commented Mar 18, 2017

@gafter Can you guys share the reasons for this at some point? is it going be to be dcoumented as part of the LDM notes?

@Artfunkel
Copy link

I would also like to know why this idea was dismissed. It's not just an academic language consideration: the lack of explicit capture in C# creates real-world bugs that the suggested alternatives cannot fix. Consider this program, a distilled example of a problem that my company faced yesterday:

class Program
{
	static void Main(string[] args)
	{
		MyMethod(new byte[1024 * 1024 * 1024], string.Empty);

		GC.Collect();
		System.Diagnostics.Debugger.Break();
	}

	static List<Action> m_deferredActions = new List<Action>();

	static void WorkWithLargeObject(object largeObject, int iteration) { }
	static void WorkWithSmallObject(object smallObject) { }

	static void MyMethod(object largeObject, object smallObject)
	{
		Parallel.For(0, 10, (i) => WorkWithLargeObject(largeObject, i));

		m_deferredActions.Add(() => WorkWithSmallObject(smallObject));
	}
}

Did you spot the bug? As the precautionary GC.Collect might suggest, the 1GB array isn't freed. This makes no sense until you disassemble the compiler-generated display classes and discover that both lambdas are merged into the same display class. This class holds a reference to both largeObject and smallObject, and thus our lightweight and long-lived deferred action owns an enormous chunk of memory that it never accesses. In the original code the two lambdas were much further apart and even in different scopes, making the problem far harder to spot.

The fix that I applied to our codebase was creating a local copy of largeObject just above the creation of the first lambda, causing its capture to be moved into the display class for just that scope. (In the example above I'd need to add a dummy scope around the copy and lambda to replicate this.) The other option is to explicitly create my own display class equivalent. Both require writing copious comments to prevent someone from simplifying the code and re-introducing the bug later, and neither meet the goals of C#'s lambda support. The fact that I had to use a third-party disassembly tool to work out what was happening is pretty poor too.

"Lambdas that capture nothing, or only this" won't help here. De-optimising the way in which the compiler generates display classes would, at the cost of runtime performance across the board. The best solution is allowing C++-style explicit lambda capture as suggested in this proposal.

@uecasm
Copy link

uecasm commented Feb 25, 2022

I recently hit a similar problem, where a lambda that was queued off was retaining a huge XmlDocument unexpectedly, because a different lambda in the same method (that was executed immediately without queuing, so was not itself performance-incorrect) had one programmer be a little lazy and perform an element query inside the lambda instead of outside, thus capturing the entire document instead of a single integer value.

While this was in a .NET Framework app, and so no language improvement is likely to fix this for me, it does seem like something the language should definitely be able to do (and preferably, with a compiler/analyzer setting to forbid or warn on implicit captures in a project).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Language Design Feature Request Feature Specification Language-C# Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on.
Projects
None yet
Development

No branches or pull requests