Rules for Translating Code from C# to C++: Class Members and Control Structures

In this article, we will explore how our translator converts class members, variables, fields, operators, and C# control structures. We will also touch on the use of the translator support library for the correct conversion of .NET Framework types into C++.

Class members

Class methods map directly onto C++. This also applies to static methods and constructors. In some cases, additional code may appear—for example, to emulate calls to static constructors. Extension methods and operators are translated into static methods and are called explicitly. Finalizers become destructors.

C# instance fields become C++ instance fields. Static fields also remain unchanged, except in cases where the initialization order is important—this is implemented by translating such fields as singletons.

Properties are split into a getter method and a setter method, or just one if the second method is absent. For auto-properties, a private value field is also added. Static properties are split into a static getter and setter. Indexers are processed using the same logic.

Events are translated into fields, the type of which corresponds to the required specialization of System::Event. Translation in the form of three methods (add, remove, and invoke) would be more correct and, moreover, would allow supporting abstract and virtual events. Possibly, in the future, we will come to such a model, but at the moment the Event class option fully covers our needs.

The following example illustrates the above rules:

public abstract class Generic<T>
{
    private T m_value;
    public Generic(T value)
    {
        m_value = value;
    }
    ~Generic()
    {
        m_value = default(T);
    }
    public string Property { get; set; }
    public abstract int Property2 { get; }
    public T this[int index]
    {
        get
        {
            return index == 0 ? m_value : default(T);
        }
        set
        {
            if (index == 0)
                m_value = value;
            else
                throw new ArgumentException();
        }
    }
    public event Action<int, int> IntIntEvent;
}

C++ translation result (insignificant code removed):

template<typename T>
class Generic : public System::Object
{
public:
    System::String get_Property()
    {
        return pr_Property;
    }
    void set_Property(System::String value)
    {
        pr_Property = value;
    }
    
    virtual int32_t get_Property2() = 0;
    
    Generic(T value) : m_value(T())
    {
        m_value = value;
    }
    
    T idx_get(int32_t index)
    {
        return index == 0 ? m_value : System::Default<T>();
    }
    void idx_set(int32_t index, T value)
    {
        if (index == 0)
        {
            m_value = value;
        }
        else
        {
            throw System::ArgumentException();
        }
    }
    
    System::Event<void(int32_t, int32_t)> IntIntEvent;
    
    virtual ~Generic()
    {
        m_value = System::Default<T>();
    }

private:
    T m_value;
    System::String pr_Property;
};

Variables and fields

Constant and static fields are translated into static fields, static constants (in some cases – constexpr), or into static methods that provide access to a singleton. C# instance fields are converted into C++ instance fields. Any complex initializers are moved to constructors, and sometimes it is necessary to explicitly add default constructors where they did not exist in C#. Stack variables are passed as is. Method arguments are also passed as is, except that both ref and out arguments become references (fortunately, overloading on them is prohibited).

The types of fields and variables are replaced with their C++ equivalents. In most cases, such equivalents are generated by the translator itself from the C# source code. Library types, including .NET Framework types and some others, are written by us in C++ and are part of the translator support library, which is supplied along with the converted products. var is translated into auto, except in cases where explicit type indication is needed to smooth out differences in behavior.

Furthermore, reference types are wrapped in SmartPtr. Value types are substituted as is. Since type arguments can be either value or reference types, they are also substituted as is, but when instantiated, reference arguments are wrapped in SharedPtr. Thus, List<int> is translated as List<int32_t>, but List<Object> becomes List<SmartPtr<Object>>. In some exceptional cases, reference types are translated as value types. For example, our implementation of System::String is based on the UnicodeString type from ICU and optimized for stack storage.

To illustrate, let's translate the following class:

public class Variables
{
    public int m_int;
    private string m_string = new StringBuilder().Append("foobazz").ToString();
    private Regex m_regex = new Regex("foo|bar");
    public object Foo(int a, out int b)
    {
        b = a + m_int;
        return m_regex.Match(m_string);
    }
}

After translation, it takes the following form (insignificant code removed):

class Variables : public System::Object
{
public:
    int32_t m_int;
    System::SharedPtr<System::Object> Foo(int32_t a, int32_t& b);
    Variables();
private:
    System::String m_string;
    System::SharedPtr<System::Text::RegularExpressions::Regex> m_regex;
};
System::SharedPtr<System::Object> Variables::Foo(int32_t a, int32_t& b)
{
    b = a + m_int;
    return m_regex->Match(m_string);
}
Variables::Variables()
    : m_int(0)
    , m_regex(System::MakeObject<System::Text::RegularExpressions::Regex>(u"foo|bar"))
{
    this->m_string = System::MakeObject<System::Text::StringBuilder>()->
        Append(u"foobazz")->ToString();
}

Control structures

The similarity of the main control structures played into our hands. Such operators as if, else, switch, while, do-while, for, try-catch, return, break, and continue are mostly transferred as is. The exception in this list is perhaps only the switch, which requires a couple of special treatments. Firstly, C# allows its use with the string type—in C++ we generate a sequence of if-else if in this case. Secondly, the relatively recent addition of the ability to match the checked expression to a type template—which, however, is also easily unfolded into a sequence of ifs.

Constructions that are not present in C++ are of interest. Thus, the using operator guarantees the call of the Dispose() method upon exiting the context. In C++, we emulate this behavior by creating a guard object on the stack, which calls the required method in its destructor. Before that, however, it is necessary to catch the exception that thrown by the code that was the body of using, and store the exception_ptr in the field of the guard—if Dispose() does not throw its exception, the one we stored will be rethrown. This is just that rare case when the throwing of an exception from a destructor is justified and is not an error. The finally block is translated according to a similar scheme, only instead of the Dispose() method, a lambda function is called, into which the translator wrapped its body.

Another operator that is not present in C# and which we are forced to emulate is foreach. Initially, we translated it into an equivalent while, calling the MoveNext() method of the enumerator, which is universal but quite slow. Since most C++ implementations of .NET containers use STL data structures, we have come to use their original iterators where possible, converting foreach to range-based for. In cases where the original iterators are not available (for example, the container is implemented in pure C#), wrapper-iterators are used, which work with enumerators internally. Previously, the choice of the right iteration method was the responsibility of an external function, written using the SFINAE technique, now we are close to having the correct versions of the begin-end methods in all containers, including those translated.

Operators

As with control structures, most operators (at least arithmetic, logical, and assignment) do not require special processing. However, there is a subtle point: in C#, the order of evaluation of parts of an expression is deterministic, whereas in C++ there can be undefined behavior in some cases. For example, the following translated code behaves differently after compilation by different tools:

auto offset32 = block[i++] + block[i++] * 256 + block[i++] * 256 * 256 +
    block[i++] * 256 * 256 * 256;

Fortunately, such problems are quite rare. We have plans to teach the translator to deal with such moments, but due to the complexity of the analysis that identifies expressions with side effects, this has not yet been implemented.

However, even the simplest operators require special processing when applied to properties. As shown above, properties are split into getters and setters, and the translator has to insert the necessary calls depending on the context:

obj1.Property = obj2.Property;
string s = GetObj().Property += "suffix";
obj1->set_Property(obj2->get_Property());
System::String s = System::setter_add_wrap(static_cast<MyClass*>(GetObj().GetPointer()),
    &MyClass::get_Property, &MyClass::set_Property, u"suffix")

In the first line, the replacement turned out to be trivial. In the second, it was necessary to use the setter_add_wrap wrapper, ensuring that the GetObj() function is called only once, and the result of concatenating the call to get_Property() and the string literal is passed not only to the set_Property() method (which returns void), but also further for use in the expression. The same approach is applied when accessing indexers.

C# operators that are not in C++: as, is, typeof, default, ??, ?., and so on, are emulated using translator support library functions. In cases where it is necessary to avoid double evaluation of arguments, for example, to not unfold GetObj()?.Invoke() into GetObj() ? GetObj().Invoke() : nullptr, an approach similar to the one shown above is used.

The member access operator (.) may be replaced by an equivalent from C++ depending on the context: the scope resolution operator (::) or the “arrow” (->). Such a replacement is not required when accessing members of structures.

Related articles