Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for non-nullable references (and safe nullable references) #227

Closed
Neil65 opened this issue Feb 4, 2015 · 112 comments
Closed

Proposal for non-nullable references (and safe nullable references) #227

Neil65 opened this issue Feb 4, 2015 · 112 comments

Comments

@Neil65
Copy link

Neil65 commented Feb 4, 2015

#1. Overview

This is my concept for non-nullable references (and safe nullable references) in C#. I have tried to keep my points brief and clear so I hope you will be interested in having a look through my proposal.

I will begin with an extract from the C# Design Meeting Notes for Jan 21, 2015 (#98):

There's a long-standing request for non-nullable reference types, where the type system helps you ensure that a value can't be null, and therefore is safe to access. Importantly such a feature might go along well with proper safe nullable reference types, where you simply cannot access the members until you've checked for null.

This is my proposal for how this could be designed. The types of references in the language would be:

  • General References (Dog) - the traditional references we have always had.
  • Mandatory References (Dog!)
  • Nullable References (Dog?)

Important points about this proposal:

  1. There are no language syntax changes other than the addition of the '!' and '?' syntax when declaring (or casting) references.
  2. Null reference exceptions are impossible if the new style references are used throughout the code.
  3. There are no changes to the actual code compilation, by which I mean we are only adding compiler checks - we are not changing anything about the way that the compiled code is generated. The compiled IL code will be identical whether traditional (general) references or the new types of references are used.
  4. It follows from this last point that the runtime will not need to know anything about the new types of references. Once the code is compiled, references are references.
  5. All existing code will continue to compile, and the new types of references can interact reasonably easily with existing code.
  6. The '!' and '?' can be added to existing code and, if that existing code is 'null safe' already, the code will probably just compile and work as it is. If there are compiler errors, these will indicate where the code is not 'null safe' (or possibly where the 'null safe-ness' of the code is expressed in a way that is too obscure). The compiler errors will be able to be fixed using the same 'plain old C#' constructs that we have always used to enforce 'null safe-ness'.
    Conversely, code will continue to behave identically if the '!' and '?' are removed (but the code will not be protected against any future code changes that are not 'null safe').
  7. No doubt there are ideas in here that have been said by others, but I haven't seen this exact concept anywhere. However if I have reproduced someone else's concept it was not intentional! (Edit: I now realise that I have unintentionally stolen the core concept from Kotlin - see http://kotlinlang.org/docs/reference/null-safety.html).

The Design Meeting Notes cite a blog post by Eric Lippert (http://blog.coverity.com/2013/11/20/c-non-nullable-reference-types/#.VM_yZmiUe2E) which points out some of the thorny issues that arise when considering non-nullable reference types. I respond to some of his points in this post.

Here is the Dog class that is used in the examples:

public class Dog
{
    public string Name { get; private set; }

    public Dog(string name)
    {
        Name = name;
    }

    public void Bark()
    {
    }
}

#2. Background

I will add a bit of context that will hopefully make the intention of the idea clearer.

I have thought about this topic on and off over the years and my thinking has been along the lines of this type of construct (with a new 'check' keyword):

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

check (nullableDog)
{
    // This code branch is executed if the reference is non-null. The compiler will allow methods to be called and properties to be accessed.
    nullableDog.Bark(); // OK.
}
else
{
    nullableDog.Bark(); // Compiler Error - we know the reference is null in this context.
}

The 'check' keyword does two things:

  1. It checks whether the reference is null and then switches the control flow just like an 'if' statement.
  2. It signals to the compiler to apply certain rules within the code blocks that follow it (most importantly, rules about whether or not nullable references can be dereferenced).

It then occurred to me that since it is easy to achieve the first objective using the existing C# language, why invent a new syntax and/or keyword just for the sake of the second objective? We can achieve the second objective by teaching the compiler to apply its rules wherever it detects this common construct:

if (nullableDog != null)

Furthermore it occurred to me that we could extend the idea by teaching the compiler to detect other simple ways of doing null checks that already exist in the language, such as the ternary (?:) operator.

This line of thinking is developed in the explanation below.
#3. Mandatory References

As the name suggests, mandatory references can never be null:

Dog! mandatoryDog = null; // Compiler Error.

However the good thing about mandatory references is that the compiler lets us dereference them (i.e. use their methods and properties) any time we want, because it knows at compile time that a null reference exception is impossible:

Dog! mandatoryDog = new Dog("Mandatory");
mandatoryDog.Bark(); // OK - can call method on mandatory reference.
string name = mandatoryDog.Name; // OK - can access property on mandatory reference.

(See my additional post for more details.)
#4. Nullable References

As the name suggests, nullable references can be null:

Dog? nullableDog = null; // OK.

However the compiler will not allow us (except in circumstances described later) to dereference nullable references, as it can't guarantee that the reference won't be null at runtime:

Dog? nullableDog = new Dog("Nullable");
nullableDog.Bark(); // Compiler Error - cannot call method on nullable reference.
string name = nullableDog.Name; // Compiler Error - cannot access property on nullable reference

This may make nullable references sound pretty useless, but there are further details to follow.
#5. General References

General references are the references that C# has always had. Nothing is changed about them.

Dog generalDog1 = null; // OK.
Dog generalDog2 = new Dog("General"); // OK.

generalDog.Bark(); // OK at compile time, fingers crossed at runtime.

#6. Using Nullable References

So if you can't call methods or access properties on a nullable reference, what's the use of them?

Well, if you do the appropriate null reference check (I mean just an ordinary null reference check using traditional C# syntax), the compiler will detect that the reference can be safely used, and the nullable reference will then behave (within the scope of the check) as if it were a mandatory reference.

In the example below the compiler detects the null check and this affects the way that the nullable reference can be used within the 'if' block and 'else' block:

Dog? nullableDog = new Dog("Nullable");

nullableDog.Bark(); // Compiler Error - cannot dereference nullable reference (yet).

if (nullableDog != null)
{
    // The compiler knows that the reference cannot be null within this scope.
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    // The compiler knows that the reference is null within this scope.
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}

The compiler will also recognise this sort of null check:

if (nullableDog == null)
{
    return;
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

And this:

if (nullableDog == null)
{
    throw new Exception("Where is my dog?");
}

// The compiler knows that if the reference was null, this code would never be reached.
nullableDog.Bark(); // OK - reference behaves like a mandatory reference.

The compiler will also recognise when you do the null check using other language features:

string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK

Hopefully it is now clear that if the new style references are used throughout the code, null reference exceptions are actually impossible. However once the effort has been made to convert the code to the new style references, it is important to guard against the accidental use of general references, as this compromises null safety. There needs to be an attribute such as this to tell the compiler to prevent the use use of general references:

[assembly: AllowGeneralReferences(false)] // Defaults to true

This attribute could also be applied at the class level, so you could for example forbid general references for the assembly but then allow them for a class (if the class has not yet been converted to use the new style references):

[AllowGeneralReferences(true)]
public class MyClass
{
}

(See my additional post for more details.)
#7. Can we develop a reasonable list of null check patterns that the compiler can recognise?

I have not listed every possible way that a developer could do a null check; there are any number of complex and obscure ways of doing it. The compiler can't be expected to handle cases like this:

if (MyMethodForCheckingNonNull(nullableDog))
{
}

However the fact that the compiler will not handle every case is a feature, not a bug. We don't want the compiler to detect every obscure type of null check construct. We want it to detect a finite list of null checking patterns that reflect clear coding practices and appropriate use of the C# language. If the programmer steps outside this list, it will be very clear to them because the compiler will not let them dereference their nullable references, and the compiler will in effect be telling them to express their intention more simply and clearly in their code.

So is it possible to develop a reasonable list of null checking constructs that the compiler can enforce? Characteristics of such a list would be:

  1. It must be possible for compiler writers to implement.
  2. It must be intuitive, i.e. a reasonable programmer should never have to even think about the list, because any sensible code will 'just work'.
  3. It must not seem arbitrary, i.e. there must not be situations where a certain null check construct is detected and another that seems just as reasonable is not detected.

I think the list of null check patterns in the previous section, combined with some variations that I am going to put in a more advanced post, is an appropriate and intuitive list. But I am interested to hear what others have to say.

Am I expecting compiler writers to perform impossible magic here? I hope not - I think that the patterns here are reasonably clear, and the logic is hopefully of the same order of difficulty as the logic in existing compiler warnings and in code checking tools such as ReSharper.
#8. Converting Between Mandatory, Nullable and General References

The principles presented so far lead on to rules about conversions between the three types of references. You don't have to take in every detail of this section to get the general idea of what I'm saying - just skim over it if you want.

Let's define some references to use in the examples that follow.

Dog! myMandatoryDog = new Dog("Mandatory");
Dog? myNullableDog = new Dog("Nullable");
Dog myGeneralDog = new Dog("General");

Firstly, any reference can be assigned to another reference if it is the same type of reference:

Dog! yourMandatoryDog = myMandatoryDog; // OK.
Dog? yourNullableDog = myNullableDog; // OK.
Dog yourGeneralDog = myGeneralDog; // OK.

Here are all the other possible conversions. Note that when I talk about 'intent' I am meaning the idea that a traditional (general) reference is conceptually either mandatory or nullable at any given point in the code. This intent is explicit and self-documenting in the new style references, but it still exists implicitly in general references (e.g. "I know this reference can't be null because I wrote a null check", or "I know that this reference can't or at least shouldn't be null from my knowledge of the business domain").

Dog! mandatoryDog1 = myNullableDog; // Compiler Error - the nullable reference may be null.
Dog! mandatoryDog2 = myGeneralDog; // Compiler Error - the general reference may be null.
Dog? nullableDog1 = myMandatoryDog; // OK.
Dog? nullableDog2 = myGeneralDog; // Compiler Error - makes an assumption about the intent of the general reference (maybe it is conceptually mandatory, rather than conceptually nullable as assumed here).
Dog generalDog1 = myMandatoryDog; // Compiler Error - loses information about the intent of the mandatory reference (the general reference may be conceptually mandatory, or may be conceptually nullable if the intent is that it could later be made null).
Dog generalDog2 = myNullableDog; // Compiler Error - loses the safety of the nullable reference.

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).

Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .

Some of the conversions that were not possible by direct assignment can be achieved slightly less directly using existing language features:

Dog! mandatoryDog1 = myNullableDog ?? new Dog("Mandatory"); // OK.
Dog! mandatoryDog2 = (myNullableDog != null ? myNullableDog : new Dog("Mandatory")); // OK.

Dog! mandatoryDog3 = (Dog!)myGeneralDog ?? new Dog("Mandatory"); // OK, but requires cast to indicate that we are making an assumption about the intent of the general reference..
Dog! mandatoryDog4 = (myGeneralDog != null ? (Dog!)myGeneralDog : new Dog("Mandatory")); // OK, but requires a cast for the same reason as above.

#9. Class Libraries

As mentioned previously, the compiled IL code will be the same whether you use the new style references or not. If you compile an assembly, the resulting binary will not know what type of references were used in its source code.

This is fine for executables, but in the case of a class library, where the goal is obviously re-use, the compiler will need a way of knowing the types of references used in the public method and public property signatures of the library.

I don't know much about the internal structure of DLLs, but maybe there could be some metadata embedded in the class library which provides this information.

Or even better, maybe reflection could be used - an enum property indicating the type of reference could be added to the ParameterInfo class. Note that the reflection would be used by the compiler to get the information it needs to do its checks - there would be no reflection imposed at runtime. At runtime everything would be exactly the same as if traditional (general) references were used.

Now say we have an assembly that has not yet been converted to use the new style references, but which needs to use a library that does use the new style references. There needs to be a way of turning off the mechanism described above so that the library appears as a traditional library with only general references. This could be achieved with an attribute like this:

[assembly: IgnoreNewStyleReferences("SomeThirdPartyLibrary")]

Perhaps this attribute could also be applied at a class level. The class could remain completely unchanged except for the addition of the attribute, but still be able to make use of a library which uses the new style references.

(See my additional post for more details.)
#10. Constructors

Eric Lippert's post (see reference in the introduction to this post) also raises thorny issues about constructors. Eric points out that "the type system absolutely guarantees that ...[class] fields always contain a valid string reference or null".

A simple (but compromised) way of addressing this may be for mandatory references to behave like nullable references within the scope of a constructor. It is the programmer's responsibility to ensure safety within the constructor, as has always been the case. This is a significant compromise but may be worth it if the thorny constructor issues would otherwise kill off the idea of the new style references altogether.

It could be argued that there is a similar compromise for readonly fields which can be set multiple times in a constructor.

A better option would be to prevent any access to the mandatory field (and to the 'this' reference, which can be used to access it) until the field is initialised:

public class Car
{
    public Engine! Engine { get; private set; }

    public Car(Engine! engine)
    {
        Engine.Start(); // Compiler Error
        CarInitializer.Initialize(this); // Compiler Error - the 'this' reference could be used to access Engine methods and properties
        Engine = engine;
        // Can now use Engine and 'this' at will
    }
}

Note that it is not an issue if this forces adjustment of existing code - the programmer has chosen to introduce the new style references and thus will inevitably be adjusting the code in various ways as described earlier in this post.

And what if the programmer initializes the property in some way that still makes everything safe but is a bit more obscure and thus more difficult for the compiler to recognise? Well, the general philosophy of this entire proposal is that the compiler recognises a finite list of sensible constructs, and if you step outside of these you will get a compiler error and you will have to make your code simpler and clearer.
#11. Generics

Using mandatory and nullable references in generics seems to be generally ok if we are prepared to have a class constraint on the generic class:

class GenericClass<T>
    where T : class // Need class constraint to use mandatory and nullable references
{
    public void TestMethod(T? nullableRef)
    {
        T! mandatoryRef = null; // Compiler Error - mandatory reference cannot be null
        string s = nullableRef.ToString(); // Compiler Error - cannot dereference nullable reference
    }
}

However there is more to think about generics - see comments below.
#12. Var

This is the way that I think var would work:

var dog1 = new Dog("Sam"); // var is Dog! (the compiler will keep things as 'tight' as possible unless we tell it otherwise).
var! dog2 = new Dog("Sam"); // var is Dog!
var? dog3 = new Dog("Sam"); // var is Dog?
var dog4 = (Dog)new Dog("Sam"); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningMandatoryRef(); // var is Dog!
var! dog2 = MethodReturningMandatoryRef(); // var is Dog!
var? dog3 = MethodReturningMandatoryRef(); // var is Dog? (see conversion rules)
var dog4 = (Dog)MethodReturningMandatoryRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningNullableRef(); // var is Dog?
var! dog2 = MethodReturningNullableRef(); // Compiler Error (see conversion rules)
var? dog3 = MethodReturningNullableRef(); // var is Dog?
var dog4 = (Dog)MethodReturningNullableRef(); // var is Dog (see conversion rules - needs cast)

var dog1 = MethodReturningGeneralRef(); // var is Dog
var! dog2 = MethodReturningGeneralRef(); // Compiler Error (see conversion rules)
var? dog3 = (Dog)MethodReturningGeneralRef(); // var is Dog? (see conversion rules - needs cast)

The first case in each group would be clearer if we had a suffix to indicate a general reference (say #), rather than having no suffix due to the need for backwards compatibility. This would make it clear that 'var#' would be a general reference whereas 'var' can be mandatory, nullable or general depending on the context.
#12. More Cases

In the process of thinking through this idea as thoroughly as possible, I have come up with some other cases that are mostly variations on what is presented above, and which would just have cluttered up this post if I had put them all in. I'll put these in a separate post in case anyone is keen enough to read them.

@Neil65
Copy link
Author

Neil65 commented Feb 4, 2015

Follow-on Post

1. Introduction

This follows on from my previous post, which contained the main body of the proposal. This post lists some other cases which are mostly variations on what is presented in the original post, and which would just have cluttered up the original post post if I had put them all in.
Section numbering is not contiguous because it matches the numbering for the equivalent topics in the original post.

3. Mandatory References

Should an uninitialised mandatory reference trigger an error? No, because there are situations where you need more complex initialisation. But the reference can't be used until it is initialised.

Dog! mandatoryDog; // OK, but the compiler is keeping a close eye on you. It wants the variable initialised asap.

mandatoryDog.Bark(); // Compiler Error - you can't do anything with the reference until it is initialised.
anotherMandatoryDog = mandatoryDog; // Compiler Error - you can't do anything with the reference until it is initialised.

// There is some complexity in how the variable is initialised (which is why it wasn't initialised when it was declared).
if (getNameFromFile)
{
    using (var stream = new StreamReader("DogName.txt"))
    {
        string name = stream.ReadLine();
        mandatoryDog = new Dog(name);
    }
}
else
{
    mandatoryDog = new Dog("Mandatory");
}

mandatoryDog.Bark(); // OK - compiler knows that the reference has definitely been initialised

See also the Constructors section of my original post which attempts to address similar issues in the context of constructors.

6. Using Nullable References

The original post showed how to use an 'if' / 'else' statement to apply a null check to a nullable reference so that the compiler would let us use that reference inside the 'if' block. Note that when you are in the 'else' block, there is no point actually using the nullable reference because you know it is null in this context. You might as just use the constant 'null' as this is clearer. I would like to see this as a compiler error:

if (nullableDog != null)
{
    // Can do stuff here with nullableDog
}
else
{
    Dog? myNullableDog1 = nullableDog; // Compiler Error - it is pointless and misleading to use the variable when it is definitely null.
    Dog? myNullableDog2 = null; // OK - achieves the same thing but is clearer.
}

Note that even though the same reasoning applies to traditional (general) references, we can't enforce this rule or we would break existing code:

if (generalDog != null)
{
    // Can do stuff here with generalDog (actually we can do stuff anywhere because it is a general reference).
}
else
{
    Dog myGeneralDog1 = generalDog; // OK - otherwise would break existing code.
}

Now, here are some common variations on the use of the 'if' / 'else' statement that the compiler recognises:

Firstly, you do not have to have the 'else' block if you don't need to handle the null case:

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

Also you can check for null rather than non-null:

if (nullableDog == null)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

You can also have 'else if' blocks in which case the reference behaves the same in each 'else if' block as it would in a plain 'else' block:

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else if (someOtherCondition)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}

You can also have 'else if' with a check for null rather than a check for non-null:

if (nullableDog == null)
{
    nullableDog.Bark(); // Compiler Error - the reference still behaves as a nullable reference.
}
else if (someOtherCondition)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}
else
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
}

You can also have additional conditions in the 'if' statement ('AND' or 'OR'):

if (nullableDog != null && thereIsSomethingToBarkAt)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
}
else
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (we don't know whether it is null or not, as we don't know which condition made us reach here).
    Dog? myNullableDog = nullableDog; // OK - unlike the example at the top of this section, it does make sense to use the myDog reference here because it could be non-null.
}
if (nullableDog != null || someOtherCondition)
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (we don't know whether it is null or not, as we don't know which condition made us reach here).
    Dog? myNullableDog = nullableDog; // OK - unlike the example at the top of this section, it does make sense to use the myDog reference here because it could be non-null.
}
else
{
    nullableDog.Bark(); // Compiler Error - reference still behaves as a nullable reference in this scope (in fact we know for certain it is null).
    Dog? myNullableDog = nullableDog; // Compiler Error - as in the example at the top of this section, it doesn't make sense to use the myDog reference here because we know it is null. 
}

You can also have multiple checks in the same 'if' statement:

if (nullableDog1 != null && nullableDog2 != null)
{
    nullableDog1.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
    nullableDog2.Bark(); // OK - the reference behaves like a mandatory reference in this scope.
}

Note that when you are in the context of a null check, you can do anything with your nullable reference that you would be able to do with a mandatory reference (not only accessing methods and properties, but anything else that a mandatory reference can do):

if (nullableDog != null)
{
    nullableDog.Bark(); // OK - the reference behaves like a mandatory reference.
    Dog! mandatoryDog = nullableDog; // OK - the reference behaves like a mandatory reference.
}

On a slightly different note - we have established that we can use the following language features to allow a nullable reference to be dereferenced:

string name1 = (nullableDog != null ? nullableDog.Name : null); // OK
string name2 = nullableDog?.Name; // OK

But it is pointless to apply these constructs to a mandatory reference, so the following will generate compiler errors:

string name3 = (mandatoryDog != null ? mandatoryDog.Name : null); // Compiler Error - it is a mandatory reference so it can't be null.
string name4 = mandatoryDog?.Name; // Compiler Error - it is a mandatory reference so it can't be null.

In fact a mandatory reference cannot be compared to null in any circumstances.

9. Class Libraries

What about if you have an existing assembly, compiled with an older version of the C# compiler, and you want it to use a class library which has new style references? There should be no issue here as the older compiler will not look at the new property on ParameterInfo (because it doesn't even know that the new property exists), and in a state of blissful ignorance will treat the library as if it only had traditional (general) references.

On another note, in order to facilitate rapid adoption of the new style references an attribute like this could be introduced:

[assembly: IgnoreNewStyleReferencesInternally]

This would mean that the ParameterInfo properties would be generated, but the new style references would be ignored internally within the library. This would mean that the library writers could get a version of their library with the new style references to market more rapidly. The code within the library would of course not be null reference safe, but would be no less safe than it already was. They could then make their library null safe internally for a later release.

@erik-kallen
Copy link

@Miista
Copy link

Miista commented Feb 4, 2015

I'm all for this. However in example 3 where you declare a mandatory reference but then do not initialise it. Wouldn't it be better to require a mandatory reference to be initialised the moment it is declared. Kind of like the way that Kotlin does it.

@Neil65
Copy link
Author

Neil65 commented Feb 4, 2015

Hi Miista, I hadn't heard of Kotlin before, but having now read its documentation at http://kotlinlang.org/docs/reference/null-safety.html, I realise that I have (unintentionally) pretty much stolen its null safety paradigm :-)

Regarding initialisation, I have tried to allow programmers a bit of flexibility to do the sort of initialisation that cannot be done on a single line. It would be possible to be stricter and say that if they do want to to this they have to wrap their initialisation code in a method:

Dog! mandatoryDog = MyInitialisationMethod(); // The method does all the complex stuff and returns a mandatory reference

This may be seen as being too dictatorial about coding style - but it's something worthy of discussion.

@Neil65
Copy link
Author

Neil65 commented Feb 4, 2015

Having read the article by Craig Gidney (http://twistedoakstudios.com/blog/Post330_non-nullable-types-vs-c-fixing-the-billion-dollar-mistake), I now realise that I was on the wrong track saying that "the different types of references are not different 'types' in the way that int and int? are different types". I have amended my original post to remove this statement and I also re-wrote the section on 'var' due to this realisation.

@Neil65
Copy link
Author

Neil65 commented Feb 5, 2015

By the way you can vote for my proposal on UserVoice if you want: https://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/7049944-consider-my-detailed-proposal-for-non-nullable-ref

As well as voting for my specific proposal you can also vote for the general idea of adding non-nullable references (this has quite a lot of votes): https://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2320188-add-non-nullable-reference-types-in-c

@Neil65
Copy link
Author

Neil65 commented Feb 6, 2015

Craig Gidney's article (mentioned above) raises the very valid question - what is the compiler meant to do when asked to create an array of mandatory references?

var nullableDogs = new Dog?[10]; // OK.
var mandatoryDogs = new Dog![10]; // Not OK - what does the compiler initially fill the array with?

He explains: "The fundamental problem here is an assumption deeply ingrained into C#: the assumption that every type has a default value".

This problem can be dealt with using the same principle that has been used previously in this proposal - teaching the compiler to detect a finite list of clear and intuitive 'null-safe' code structures, and having the compiler generate a compiler error if the programmer steps outside that list.

So what would the list look like in this situation?

Obviously the compiler will be happy if the array is declared and populated on the same line (as long as no elements are set to null):

Dog![] dogs1 = { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.
Dog![] dogs2 = { new Dog("Ben"), null, null }; // Compiler Error - nulls not allowed.

The following syntax variations are also ok:

var dogs3 = new Dog![] { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.
Dog![] dogs4 = new [] { new Dog("Ben"), new Dog("Sophie"), new Dog("Rex") }; // OK.

The compiler will also be happy if we populate the array using a loop, but the loop must be of the exact structure shown below (because the compiler needs to know at compile time that all elements will be populated):

int numberOfDogs = 3;
Dog![] dogs5 = new Dog[numberOfDogs];
for (int i = 0; i < dogs5.Length; i++)
{
    dogs5[i] = new Dog("Dog " + i);
}

The compiler won't let you use the array in between the declaration and the loop:

int numberOfDogs = 3;
Dog![] dogs5 = new Dog[numberOfDogs];
Dog![] myDogs = dogs5; // Compiler Error - cannot use the array in any way.
for (int i = 0; i < dogs5.Length; i++)
{
    dogs5[i] = new Dog("Dog " + i);
}

The compiler will also be ok if we copy from an existing array of mandatory references:

Dog![] dogs6 = new Dog[numberOfDogs];
Array.Copy(dogs5, dogs6, dogs6.Length);

Similarly to the previous case, the array cannot be used in between being declared and being populated. Also note that the above code could throw an exception if the source array is not long enough, but this has nothing to do with the subject mandatory references.

The compiler will also allow us to clone an existing array of mandatory references:

Dog![] dogs7 = (Dog![])dogs6.Clone();

This seems to me like a reasonable list of recognised safe code structures but people may be able to think of others.

@MadsTorgersen
Copy link
Contributor

It is great to see some thinking on non-nullable and safely nullable reference types. This gist is another take on it - adding only non-nullable reference types, not safely nullable ones.

Not only Kotlin but also Swift have approaches to this. Of course, many functional languages, such as F#, don't even have the issue in the first place. Indeed their approach of using T (never nullable) and Option (where the T can only be gotten at through a matching operation that checks for null) is probably the best inspiration we can get for how to address the problem.

I want to point out a few difficulties and possible solutions.

Guards and mutability

The proposal above uses "guards" to establish non-nullness; i.e. it recognizes checks for null and remembers that a given variable was not null. This does have benefits, such as relying on existing language constructs, but it also has limitations. First of all, variables are mutable by default, and in order to trust that they don't change between the null check and the access the compiler would need to also make sure the variable isn't assigned to. That is only really feasible to do for local variables, so for anything more complex, say a field access (this.myDog) the value would need to first be captured in a local before the test in order for it to be recognized.

I think a better approach is to follow the functional languages and use simple matching techniques that test and simultaneously introduce a new variable, guaranteed to contain a non-null value. Following the syntax proposed in #206, something like:

if (o is Dog! d) { ... d.Bark(); ... }

Default values

The fact that every type has a default value is really fundamental to the CLR, and it is an uphill battle to deal with that. Eric Lipperts blog post points to some surprisingly deep issues around ensuring that a field is always definitely assigned. But the real kicker is arrays. How do you ensure that the contents of an array are never observed before they are assigned? You can look for code patterns, as proposed above. But it will be too restrictive.

Say I'm building a List<T> type wrapping a T[]. Say it gets instantiated with Dog!. The discipline of the methods on List<T> will likely ensure that no element in the array is ever accessed without having been assigned to at some point earlier. But no reasonable compile time analysis can ensure this.

Say that the same List<T> type has a TryGet method with an out parameter out T value. The method needs to definitely assign the out parameter. What value should it assign to value when T is Dog!?

One option here is to just not allow arrays of non-nullable reference types - people will have to use Dog[] instead of Dog![] and just check every time they get a value. Similarly, maybe we just shouldn't allow List<Dog!>. After all, even unconstrained type parameters allow default(T) and T[] today. Or we need to come up with an "anti-constraint" that is even more permissive than no constraint, where you can say that you take all type arguments - even nullable reference types.

Library compatibility

Think of a library method

public string GetName(Dog d);

In the light of this feature you might want to edit the API. It may throw on a null argument, so you want to annotate the parameter with a !:

public string GetName(Dog! d);

Depending on your conversion rules, this may or may not be a breaking change for a consumer of the libarary:

Dog dog = ...;
var name = GetName(dog);

If we use the "safe" rule that Dog doesn't implicitly convert to Dog!, this code will now turn into an error. The potential for that break would in turn mean that a responsible API owner would not be able to strengthen their API in this way, which is a shame.

Instead we could consider allowing an implicit conversion from Dog to Dog!. After all Dog is the "unsafe" type already, and when you can implicitly access members of it at the risk of a runtime exception, maybe you should be allowed to implicitly convert it to a non-nullable reference type at the risk of a runtime exception?

On the other end of the API there's also a problem. Assume that the method never returns null, it should be completely safe to add ! to the return type, right?

Not quite. Notice that the consumer stores the result in a var name. Does that now infer name to be string! instead of string? That would be breaking for the subsequent line I didn't tell you about, that does name = null;. Again, we may have to consider suboptimal rules for type inference in order to protect against breaks.

@Miista
Copy link

Miista commented Feb 11, 2015

The way I see it Dog! dog = ...; simply means that dog can never be null. This doesn't mean that the fields of dog cannot be null. In other words public string GetName(Dog! d); can still return a string that is null. Like so:

    public string GetName(Dog! dog) { ... }

    Dog! dog = ...;
    var name = GetName(dog); // May return null

If you wanted to return a non-nullable string you would have to say that GetName returns string! instead.

    public string! GetName(Dog! dog) { ... }

    Dog! dog = ...;
    var name = GetName(dog); // Will never return null

The non-nullability in the last example could even be enforced by the compiler (to some lengths – there may be some edge cases I can't think of).

@Miista
Copy link

Miista commented Feb 11, 2015

In order to maintain backwards compatibility I believe the type should be inferred to the loosest (may be the wrong term) possible scope e.g. public string GetName(Dog! dog) would be inferred to string and public string! GetName(Dog! dog) would be inferred to string!.

Trying to set a non-nullable reference to null should not compile.

@dotnetchris
Copy link

Yes yes yes, my god yes.

The billion dollar mistake infuriates me. It's absolutely insane that references allow null by default, since i know c# will never be willing to fix the billion dollar mistake this is atleast a viable alternative. And removes the need to use the stupid ?. operator

@gafter
Copy link
Member

gafter commented Feb 26, 2015

The "billion dollar mistake" was having a type system with null at all. This would not fix that, it just makes it slightly less painful.

@dotnetchris
Copy link

@gafter what I want most is for C# to drop nulls entirely unless a reference is specifically marked nullable, but i know that will never happen

@gafter
Copy link
Member

gafter commented Feb 27, 2015

@dotnetchris There is no way to shoehorn that into the existing IL or C# semantics.

@HaloFour
Copy link

I think there is value in stepping back and watching the Swift and Obj-C communities battle it out over this issue. Apparently despite the slick appearance of optionals in Swift it creates a number of severe nuisance scenarios, particularly in writing the initialization of a class:

Swift Initialization and the Pain of Optionals

My concern has always been that without null you end up with sentinels which, in the reference world, often make absolutely no sense. Sure, the developers can further declare their intent but then everyone is required to do the additional dance to unwrap the optional. Compiler heuristics could help there but I'm sure that there will always be those corner cases.

Ultimately, in my opinion, non-nullable references feels a little like Java checked exceptions. Sure, it seems great on paper, and even better with perfectly idiomatic examples, but it also creates obnoxious barriers in practice which encourage developers to take the easy/lazy way out thus defeating the entire purpose. It feels like erecting guard-rails along a hairpin curve on the edge of a cliff. Sure, the purpose is safety, but perceived safety can encourage recklessness, and I think that developers should be learning how to code more defensively (not just for simple validation but to also never trust your inputs) not assuming that someone else will relieve them of that burden.

Just a devil's advocate rebuttal by someone who would probably welcome them to the language if done well. 😄

@dotnetchris
Copy link

@HaloFour checked exceptions was the only positive statement I ever have to say about Java, other than ICloneable actually being you know, cloning.

@paulomorgado
Copy link

I really can't understand what the problem is about null!

Look at a string as a box of char. I can wither have a box or not (null). If I have a box, it can either be empty ("") or not ("something").

I don't know that much about F# but I can't see Option doing anything better here. It's still about wither having a box or not. But what guarantee does F# give me that I still have a box on the table just because when I asked before there was one?

Sure it's a pain to have to be lookng out for null all the time or be bitten when not doing that, but hiding the problem is not solving it.

The ?: operator introduced in C#6 solves a lot of issues and the proposed pattern matching for C#7 (#206) will solve a lot of others.

@Neil65

So far, the compiler does everything it can to generate code that will behave as intended at run time. For that, it relies on the CLR.

What you are proposing goes more on the way of "looks good on paper, hope it goes well at run time.".

Having the compiler yield warnings just because your intention is not verifiable, even at compile time, is a very bad idea. Compiler warnings should not be yield for something that the developer cannot do anything about.

There has to be some compromise in the last three cases as our code has to interact with existing code that uses general references. These three cases are allowed if an explicit cast is used to make the compromise visible (and perhaps there should also be a compiler warning).

Dog? nullableDog2 = (Dog?)myGeneralDog; // OK (perhaps with compiler warning).
Dog generalDog1 = (Dog)myMandatoryDog; // OK (perhaps with compiler warning).
Dog generalDog2 = (Dog)myNullableDog; // OK (perhaps with compiler warning) .

The cast should be either possible or not.

Regarding var, why do mandatory and nullable references need to qualify var when the same is not need for nulable value types?

Is Dog![] an mandatory array of Dog or an array of mandatory Dog?

@dotnetchris
Copy link

@paulomorgado what it fundamentally boils down is I as the author of code should have the authority to determine whether null is valid or absolutely invalid. If I state this code may never allow null, the compiler should within all reason attempt to enforce this statically.

While the ?. operator is something, it still doesn't eliminate throw if(x==null) because there is no valid response ever. Null reference exceptions are likely the number one unhandled exception of all time both in .NET and programming as a whole. Compiler enforced static analysis would easily prevent this problem from existing.

It being labeled "The Billion Dollar mistake" is not hyperbole, I actually expect it to have cost multiple billions if not tens of billions at this point.

@paulomorgado
Copy link

What costs billions of dollars are bad programmers and bad practices, not null by itself.

The great thing with null reference exceptions is that you can always point where it was and fix it.

Changing the compiler without having runtime guarantees will be just fooling yourself and, when bitten by the mistake, you might not be able to know where it is or fix it.

Sure I'd like to have non nullable reference types as much as I wanted nullable value types. But I want it done right. And I don't see how that will ever be possible without changing the runtime, like it happened with nullable value types.

@Neil65
Copy link
Author

Neil65 commented Mar 6, 2015

Thanks everyone for engaging in discussion on this topic. I have some responses to what people have said but I haven't had time yet to write them down due to working day and night to meet a deadline. I'll try and post something over the weekend.

@GeirGrusom
Copy link

On Generics:

I don't think generics should be allowed to be invoked with non-nullable references unless there is a constraint on the generic method.

public static void InvalidGeneric<T>(out T result) { result = default(T); }

public static void OkGeneric<T>(out T result) where T : class! { result = Arbitrary(); }

public static void Bar()
{
    string! input;
    InvalidGeneric(out input); // Illegal as it would return a mandatory with a null reference
    OkGeneric(out input); // OK.
}

@ssylvan
Copy link

ssylvan commented Apr 23, 2015

IMO converting a mandatory reference to a "weaker" one should always be allowed and implicit. I.e. if a function takes a nullable reference as an argument you should be able to pass in a mandatory reference. Same if the function takes a legacy reference. You're not losing anything here, the code is expecting weaker guarantees than you can provide. If your code works with a nullable or general reference, then clearly it wouldn't break if I pass it a mandatory reference (it will just always go down any non-null checks inside).

I also think nullable to general and vice versa should be allowed. They're effectively the same except for compile time vs runtime checking. So dealing with older code would be painful if you couldn't use good practices (nullable references) in YOUR code without having to make the interface to legacy code ugly. Chances are people will just keep using general references if you add a bunch of friction to that. Make it easy and encouraged to move to safer styles little by little, IMO.

This last case may warrant a warning ("this will turn compile time check into runtime check"). The first two cases (mandatory/nullable implicitly converts to general) seems like perfectly reasonable code that you would encourage people to do in order to transition to a safer style. You don't want to penalize that.

@MikeyBurkman
Copy link

@paulomorgado As much as I hate to bring Java up, its lack of reified types means that generic type information is not around at runtime, and yet in 10 years I've never once accidentally added an integer to a list of strings. (Don't get me wrong, not having reified types causes other issues, usually around reflection, but reflection can cause all sorts of bad things if you don't know what you're doing.)

While runtime checking may sound like a good sanity check, it comes at a cost, and it's by no means required to make your system verifiably correct. (Assuming of course you aren't hacking the innards of your classes through reflection.)

Re: empty string vs non empty string: Those are two different types, and should be treated as such. You couldn't do any compile-time verification that you didn't pass an empty string to the constructor of NonEmptyString, but you'd at least catch it at the place of instantiation, rather than later classes doing the check and making it difficult to trace back to where the illegal value originated. The same theory goes for converting nullable types to non-null types.

By the way, Ceylon does something very similar to this proposal. Might be worthwhile looking at them.

@kalleguld
Copy link

Is the var handling backwards compatible? Seems like this would be valid c# code now, but shouldn't compile with the proposed rules.

var dog1 = new Dog("Sam"); //dog1 is Dog!
dog1 = null; //dog1 cannot be null

@olmobrutall
Copy link

Nice that the TOP 1 C# requested feature, Non-Nullable reference types, is alive again. We discussed deeply some months ago about the topic.

It's a hard topic, with many different alternatives and implications. Consequently is easier to write a comment with a naive proposal than read and understand the other solutions already proposed.

In order to work in the same direction, I think is important to share a common basis about the problem.

On top of the current proposal, I think this links are important.

Back to the topic. I think the concept explained here lacks a solution for two hard problems:

Generics

It's explained how to use the feature in a generic type, (uisng T? inside a List<T>) but the most important problem to solve is how to let the client code use the feature when using arbitrary generic types (List<string!>).

This problem is really important to solve because generics are often used for collections, and 99% of the cases you don't want nulls in your collection.

It's also challenging because is a must that it works transparently on non-nullable references and nullable value types, even if they are implemented in a completely different way at the binary level. We already have many collections to chose (List<T>, Collection<T>, ReadonlyCollection<T>, InmutableCollection<T>...) to multiply the options for the different constraints in the type (ListValue<T>, ListReference<T>, ListNonNullReference<T>).

I think unifying the type system is really important, but this has the consequences that Nullable<T> should allow nesting and class references, and making string? mean something a nullable refenre string with a HasValue boolean.

Library compatibility

This solutions focuses in local variables, but the challenge is in integrating with other libraries, legacy or not, written in C# or other languages.

It's important that the tooling is strong in this cases, and that safety is preserved. Unfortunately this requires run-time checks.

Also, is important that library writers (including BCL) can chose to use the features without fear of undesired consequences for their client code. I propose three compilation flags: strict, transitory and legacy (similar to /warnaserror) . This allows teams to graduate how string they want to work.

As Lucian made me see, this solution is better than branching C# in two different languages: One where string is nullable (legacy) and one where string is not-nullable (string) with a #pragma option or something like this. (similar to OPTION STRING in Visual Basic).

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

@HaloFour but it's weird. I abolutely sure that I don't want to check is passed value is null when I said that i's not null. Imagine the code

public static void Foo(IBar! bar)
{
   bar.Bark();
}

it would be VERY strange if I get a NullReferenceException here.

I see only two possibilities here: compiler automaticly adds not-null checks everythere in the code, or it just will be CLR-feature, so it won't be possible to reference C# 8.0 (or whatever version) from below one.

I think they will use the first approach, becuase as I said, there is branch predictor, so extra-check for not-null will be predicted and skipped for most of time, and it also makes them easier to implement it: for example, if we leave it as compiler-feature, it's hard to make reflection work fine with it. And if we have runtime checks in methods, we have nothing to do with reflection.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

@Pzixel

I am only relaying the proposal as it is currently. Both of those approaches have already been discussed as, as of now, neither are being implemented. This will be purely a compiler/analyzer feature. It won't even result in compiler errors, just warnings, which can be disabled and intentionally worked around.

I believe the latest version of this proposal is here: #5032

As mentioned, this is at least an additional C# version out, so it's all subject to change.

@MikeyBurkman
Copy link

@Pzixel I assume (hope) that IBar! would be implemented as a different underlying type than IBar, and so it would never even be an issue. (Kind of like how int and Nullable<int> are different underlying types, and the compiler just allows for nice syntactic sugar.) Putting a null check in that method would be akin to making adding a check that an argument of type int is not actually a string.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

@MikeyBurkman

Actually, the non-nullable version would be IBar, and the nullable version would be IBar?. The only difference between the two would be an attribute, they would be the same underlying type.

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

@HaloFour I don't know if T? is good syntax because it breaks down entire existing code. No, i'm totally agree that it's more consistent, than mixing ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code. And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null. If i want a warning - i can use [NotNull] attribute, not introducing a whole new type. And it makes sense.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

@Pzixel

I don't think T? is good syntax because it breaks all existing code. No, i'm totally agree that it's more consistent, than ?, ! and so on, but if we are looking for backward comptability, it will break everything in a code

I've already made that argument, but it seems that this is the direction that the team wants to go anyway. I believe that the justification is that the vast majority case is wanting a non-nullable type so having to explicitly decorate them would lead to a lot of unnecessary noise. Pull the band-aid once.

And of course it should be an error, not warning. Why? But it's types mismatch, and it is clearly an error. We should get CS1503 and that's all. It's weird to get a null when i said that i can't get null.

Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.

@MikeyBurkman
Copy link

I'm pretty sure you're going to have to break backwards compatibility anyways, or make type inference useless.

// Pretend this is some legacy code
var x = new IBar(); // Line A
...
x = null; // Line B

What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.

Either devs won't used non-null types, or devs will stop using type inference. I can imagine which of those two options will win out...

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

@HaloFour

Primarily because of how much of a change it is to the language and because it can never be achieved with 100% certainty. I'm not particularly interested in rehashing all of the comments already on these proposals, but justifications are listed there.

You should rehash nothing, basically i only want to get type mismatch when I get it insead of warnings and so on. It will emit checks or it will be a compiler feature - that is topic to speak, but if we are talking about interface - type mismatch - it's defenitly should be an error.

@gafter
Copy link
Member

gafter commented Apr 6, 2016

@MikeyBurkman re

What is the inferred type of x on Line A? If it's inferred as non-null (the expected type), then our code at line B will no longer compile. If we infer on Line A that x is nullable, then everything compiles as it used to, but now your type inference is inferring a less useful type.

Local variables will have a nullable type state that can be different from one point in the program to another, based on the flow of the code. The state at any given point can be computed by flow analysis. You won't need to use nullable annotations on local variables, because it can be inferred.

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

@gafter var is used to infer type in point of declaration, we shouldn't analyze any flow after.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

@Pzixel "Nullability" isn't being treated as a separate type, it's a hint to the compiler. The flow analysis is intentional to prevent the need for extraneous casts when the compiler can be sure that the value would not be null, e.g.:

public int? GetLength(string? s) {
    if (s == null) {
        return null;
    }
    // because of the previous null check the compiler knows
    // that the variable s cannot be null here so it will not
    // warn about the dereference
    return s.Length;
}

@MikeyBurkman
Copy link

So I'm still a bit confused. @gafter's comment insinuated that flow analysis would go upwards, while @HaloFour's example demonstrates it going downwards. Downwards flow analysis would be pretty much required if any implementation, and in fact R# already does that sort of analysis with the [NotNull] attributes. However, without the upwards flow analysis, I don't think type inference would be able to provide much benefit, unless breaking backwards compatibility was an option.

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

@HaloFour int and int? are completly different types. I really want the same UX for reference types. I can use attribute [NotNull], [Pure] and so on for a warning. I want to be abolutely sure that I CAN'T receive null if it is marked as not null. So in provided example:

public int? GetLength(string? s) {
    string notNullS = s; // compiler error: cannot implicitly cast `string?` to `string`. 
    return GetLength(notNullS); 
}
public int GetLength(string s) {    
    return s.Length;
}

Of course, ideally i'd like to see something like unwrap from Rust, but explicit cast is good enough.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

@Pzixel

I want to be abolutely sure that I CAN'T receive null if it is marked as not null

Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null can sneak in by virtue of being the default value.

Flow analysis is a compromise, one that can be fitted onto the existing run time and one that can work with a language that has 15 years of legacy that it needs to support. It follows the Eiffel route, know where you can't make your guarantees and solve through flow analysis. Even then, sometimes the developer can (and should) override.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

@MikeyBurkman

IIRC the type inferred by var is neither necessarily nullable or non-nullable, it's a superposition of both potential states depending on how the variable is used. From @gafter's comment it sounds like that applies to any local even if the type is explicitly stated, e.g.:

string s2 = null; // no warning?
int i1 = s1.Length; // warning of potential null dereference

string? s2 = "foo";
int i2 = s2.Length; // no warning

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

Simple put, that wouldn't be possible. Even if massive CLR changes were on the table it probably couldn't be done. The notion of a default value is too baked in. Generics, arrays, etc., there's no way to get around the fact that null can sneak in by virtue of being the default value.

Generics was introduced once, another major change is possible too. Nobody says that it's easy, but they have to do it to implement it properly. It's the only way to make a strong type system. All hints and warnings are just nothing. It can be internally null, I don't think that significant changes required to accomplish this requirements. Just another type, it's not even CLR care. Compiler just checks that type is not-null and the only way to pass null value is reflection. Thus, we need to change reflection, but it's easy too - while string and string? are different types, there will be type mismatch.

Now i see that it's even simpler than I thought. Just treat them as others types and that's all. Reflection throws a mismatch error in runtime, compiler does it in compile-time and everyone are happy. And it even still be compile-time feature. The only problem is with older versions of C#, but changes in reflection should be changes in runtime, so it will be feature for net .Net.

We can do compatble version with runtime checks, for example when we use .Net 4.6 and below runtime checks (if blabla != null), with .Net 4.7 we assume that reflection do its job in runtime and remove them from the code. Elegant solution.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

Generics was introduced once, another major change is possible too.

Generics was additive and worked entirely within the existing semantics of the run time.

Just treat them as others types and that's all.

That "other" type can't prevent the reference type that it contains from being null. Either it is a reference type that itself can be null (and wastes an allocation when it's not) or it's a struct container that contains a reference type where the default value of the struct is that the reference is null. Either way, you're back to square one. Furthermore, since the majority of methods within the BCL accept reference types that should be null you're talking about a massive breaking change to all existing programs. This solution has already been proposed.

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

@HaloFour

Generics was additive and worked entirely within the existing semantics of the run time.
Okay, what's about DLR? We have a whole new runtime to support dynamics.

That "other" type can't prevent the reference type that it contains from being null. Either it is a reference type that itself can be null (and wastes an allocation when it's not) or it's a struct container that contains a reference type where the default value of the struct is that the reference is null. Either way, you're back to square one. Furthermore, since the majority of methods within the BCL accept reference types that should be null you're talking about a massive breaking change to all existing programs. This solution has already been proposed.
Treat errors as warnings is not a "solution". There is nothing about memory allocation and so on, it just a check IN COMPILE TIME that reference is initialized somewhere. In runtime we should change nothing. Maybe there will be issues with syntax, so we should use '!' instead of '?', so existing code won't break. But i am completly sure that i want errors, not "warnings". As I said, if I want warnings I write an attribute, and we don't need this feature at all, it already exists as attribute feature, and we don't need any syntax sugar for it.

But if we get a whole new types, we can programm savely, as it is in functional languages There is no nulls, there only Options with None. And value (not-null!) is default, when Option should be declared explicitly. And this is good way to do things.

Yes, C# has legacy, but it's only about syntax, not about its spirit or internals.

Again, we do not need to change anything in CLR, we do not need to change anything in C# or reflection, we just add new types with couple of rules, about upcast and downcast, and several compiler errors. It's enough to implement it in all power.

@lukasf
Copy link

lukasf commented Apr 6, 2016

@Pzixel
The thing is, if you make this a different type, then this would break gazmillions of lines of code and libraries. You would basically create a new language where IBar has a totally different meaning than IBar from an older C# version. Existing code would break, code samples would stop working, interoperability with older libraries would either break or suffer massively. Every time someone puts C# code online, he would have to clarify if this is "old" C# or "new" C#. All the samples out there would suddenly be in doubt. This would kill the language and I agree with the language design team that this approach is not an option.

Warnings have the huge benefit that you can just ignore or even suppress them. All legacy code and all samples continue to work. With this approach, you can use all the benefits of null safety, but you don't have to. If you write all your code using this new feature, you would solve nearly all your NREs. You won't ever get 100% safety anyways, because there are still things like COM interop where evil C++ might null something, you have unsafe C# where someone could sneak in null's into your non-nullable fields. So since 100% safety is not possible anyways, and breaking changes are off the table, this is our best option to still get close to 100% safety.

It might be worth discussing if it is also possible to have the compiler automatically insert null checks when referencing libraries which were not developed with the null safety feature. As an optional compiler switch, in addition to the already discussed warning switch for referenced libraries. This would solve some more corner cases and bring you closer to 100%.

@HaloFour
Copy link

HaloFour commented Apr 6, 2016

@Pzixel

Okay, what's about DLR? We have a whole new runtime to support dynamics.

The DLR is completely separate from the CLR and isn't relevant to this discussion.

But if we get a whole new types, we can programm savely, as it is in functional languages There is no nulls, there only Options with None. And value (not-null!) is default, when Option should be declared explicitly. And this is good way to do things.

The Option<T> type in F# is a normal reference type like any other. Ask the CLR to make an array of Option<T> and they're all null. Same with default(Option<T>), it's null. Not to mention, Some(null) is perfectly legal.

Yes, C# has legacy, but it's only about syntax, not about its spirit or internals.

And 15 years worth of applications/libraries that you're asking to be broken.

Again, we do not need to change anything in CLR, we do not need to change anything in C# or reflection, we just add new types with couple of rules, about upcast and downcast, and several compiler errors. It's enough to implement it in all power.

Which creates a solution which is just as leaky (since you cannot possibly define a wrapping type that actually enforces this behavior) but has the added benefit of breaking every written piece of code.

@Pzixel
Copy link

Pzixel commented Apr 6, 2016

And 15 years worth of applications/libraries that you're asking to be broken.
How? Not-null type has suffix ! when clear type has old meaning. So string is just a nullable string when string! is not-null string. It's weird a bit, but it still much better than ugly warnings "oops, we have null here". Nothing is broken.

@lukasf again, we don't change meaning of existing keywords, we just adds another one, where ! means that type with this suffix cannot be null in any case.

Which creates a solution which is just as leaky (since you cannot possibly define a wrapping type that actually enforces this behavior) but has the added benefit of breaking every written piece of code.
What does it break? Old code still have the same meaning in new language. Of course, if we replace T with non-nullable T it will break everything, but it's not a point.

Something like this.

@lukasf
Copy link

lukasf commented Apr 7, 2016

@Pzixel
Oh okay. Well, the current "official" proposal does not use "!". Instead, IBar is nun-nullable and IBar? is nullable, just like with value types. Using a different notion for classes than already in place for value types would be very confusing. This is what it is going to be:

Non-Nullable:
int a
IBar b

Nullable:
int? a
IBar? b

Even when using "!" to create a new, non-nullable type, you'd still get massive problems, especially with libraries. Update one library to non-nullable and BOOM all projects and libraries referencing that library would stop working, because they all see different, unknown types now. So once you upgrade one lib you basically would need to update all libs. Again you would create kind of a different language where you could not use new libs from old projects, and you would have lots of trouble using old libs with new code.

If the language was created from ground up, I would surely pledge for full null safety as can be seen in other new languages. But C# is out there since more than a decade and there is lots of legacy code, lots of libs, lots of samples and knowledge. You cannot introduce such a radical breaking change into an existing and well established language. It's sad, I'd love to see a really strong nullability concept, but it is not going to happen. So now we better look at what the realistic options are. Better take an almost safe nullability system than no system at all.

@yaakov-h
Copy link
Member

yaakov-h commented Apr 7, 2016

Update one library to non-nullable and BOOM all projects and libraries referencing that library would stop working, because they all see different, unknown types now.

@lukasf Use a modopt or modreq to create an overload, and you can have backwards compatibility. Sadly this proposal does not seem to be heading in that direction.

@Pzixel
Copy link

Pzixel commented Apr 7, 2016

@lukasf yes, we have legacy, thus we should choose between two evils. The first one is a bit confusing syntax, while in second you receive nothing except warning noise. Warnings was never guratantee, while I want to be SURE that if i write not-null prameter, it will NEVER be null. It's bizzare to write a not-null parameter and then check internally if it's really not null. In this case we don't even need this extra syntax while attributes does this very thing, NotNullAttribute will warn you if you pass a possible null. Why we are needing this syntax, for locals only? Well, locals are local enough and we don't really need this feature. it is useful, but there are plenty of more significant features to be implemented.

About BOOM: nobody was blaming Microsoft to make nullables of abolutely different types. Becuase it is logically correct: it's not a type, it's a wrapper for this type. When we are talking about not-null types we require one-side cast available, like this one:

string! s = "hello!";
string nullableS = s;
string! anotherS = nullableS; // compile-time error: cannot cast type `string!` to `string`, use cast
string! finalS = (string!) nullableS; // throws NullRefernceException when nullableS is null.

It could be implemented like this:

public struct NotNullReference<T> where T : class
{
    public T Value { get; }
    public NotNullReference(T value)
    {
        if (value == null)
            throw new ArgumentNullException(nameof(value), "Cannot initialize NotNullReference with null!");
        Value = value;
    }

    public static explicit operator NotNullReference<T>(T reference)
    {
        return new NotNullReference<T>(reference);
    }

    public static implicit operator T(NotNullReference<T> reference)
    {
        return reference.Value;
    }
}

It's just a concept, it requires extra space for struct etc, but it CAN be done right now! We only need some syntax sugar for it, and internally it could be managed in some other way. But idea is the same: we CAN'T get a null reference when we said that we don't want it. If we want to be warned - welcome to Attibutes world: NotNull, CanBeNull, Pure and so on.

@lukasf
Copy link

lukasf commented Apr 7, 2016

You can always turn on warnings as errors and you will get your errors on compile time. If you ignore warnings and then complain that you have not been warned about problems, well, that does not really make sense. The "!" syntax is problematic. The normal case should be that a variable is not nullable. Only very few variables are really ment to be nullable. It does not make sense to annotate > 97% of all variables with a "!". This is useless clutter. The default must be non nullable, with only the few exceptions getting specially marked. Also with your concept, you would not only add clutter to almost all variables, but you would also add one struct per reference which is again unneccessary memory and runtime overhead.

I think that the general direction has already been decided by the language team, it's not much use to continue this discussion. The "!" operator is not going to come, and a strong type system is also not going to come. C# was just not made with this in mind, and it would cause trouble in various points if a strong system is now somehow forced onto it. The warning approach feels very natural, it's easy to use, has the right syntax as already known from value types, does not cause breaking changes or other compatibility issues. When used properly, it will lead to the same safety as you would get from a strong system. Null safety is always going to be a compromise for an existing language. I think that the warning approach here is a good compromise. You can't have it all, not on C#. Maybe at some point we will get a new language where all this is taken care of from the beginning, M#, D#, whatever...

@HaloFour
Copy link

HaloFour commented Apr 7, 2016

@yaakov-h

It'd have to be modopt because modreq is not CLS. CLS offers no guidelines as to how a consuming language should handle modopt, other than it must be copied into the signature. So while the CLR allows for overloading based on modopt, the potential for the various languages to do so successfully isn't likely that great. Not to mention, you're asking them to basically double the size of the BCL rather than keep the intent of the existing methods.

@Pzixel

Nope, a struct wrapper doesn't work:

// all structs have a default constructor that zero-inits the struct
NotNullReference<string> s1 = new NotNullReference();

// but constructors aren't necessary anyway since the stack is zero-inited anyway
NotNullReference<string> s2 = default(NotNullReference<string>);

// and the CLS has to zero init array allocations
NotNullReference<string>[] s3 = new NotNullReference<string>[10];

// and then you have generics
public T Foo1<T>() { return default(T); }
public T Foo2<T>() { return new T(); }

NotNullReference<string> s4 = Foo1<NotNullReference<string>>();
NotNullReference<string> s5 = Foo2<NotNullReference<string>>();

@yaakov-h
Copy link
Member

yaakov-h commented Apr 7, 2016

@HaloFour there's also always FormatterServices.GetUninitializedObject...

@Pzixel
Copy link

Pzixel commented Apr 12, 2016

@HaloFour as i said, it's a concept. Sure, they could be some workaround with it. The same manner as immutable strings are not immutable (you can always pinn a string and change its content!). But it's what I call fair use. You do not blame microsoft for be able to modify a string, so why you think this it worse?

And again, it's a concept and it might not work quite as expected, but C# devs doesn't have such limits. They even have get CLR support for any feature they request, if it worth.

@gafter
Copy link
Member

gafter commented Mar 27, 2017

This is now tracked at dotnet/csharplang#36 .

@gafter gafter closed this as completed Mar 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests