Rules for Translating Code from C# to C++: Basics

Let's discuss how our translator converts syntactic constructs from the C# language to C++. We'll explore the translation specifics and the limitations that arise during this process.

Projects and compilation units

Translation occurs on a per-project basis. One C# project is converted into one or two C++ projects. The first project mirrors the C# project, while the second serves as a googletest application for running tests if they exist in the original project. A CMakeLists.txt file is generated for each input project, allowing the creation of projects for most build systems.

Usually, one .cs file corresponds to one .h file and one .cpp file. Typically, type definitions go into header files, while method definitions reside in source code files. However, this is different for template types, where all code remains in header files. Header files containing at least one public definition end up in the include directory, accessible to dependent projects and end users. Header files with only internal definitions go into the source directory.

In addition to code files obtained from translating the original C# code, the translator generates additional files containing service code. Configuration files with entries specifying where to find types from this project in header files are also placed in the output directory. This information is necessary for handling dependent assemblies. Additionally, a comprehensive translation log is stored in the output directory.

General structure of source code

  1. C# namespaces are mapped to C++ namespaces. Namespace usage operators are transformed into their C++ equivalents.
  2. Comments are transferred as-is, except for type and method documentation, which is handled separately.
  3. Formatting is partially preserved.
  4. Preprocessor directives are not transferred because all constants must be defined during syntax tree construction.
  5. Each file begins with a list of included files, followed by a list of forward declarations of types. These lists are generated based on the types mentioned in the current file so that the list of inclusions is as minimal as possible.
  6. Type metadata is generated as special data structures accessible at runtime. Because unconditional metadata generation significantly increases the size of compiled libraries, it is manually enabled for specific types as needed.

Type definitions

  1. Type aliases are translated using the syntax using <typename> = ...
  2. C# enumerations are mapped to C++14 enumerations (using the enum class syntax).
  3. Delegates are transformed into aliases for specializations of the System::MulticastDelegate class:
public delegate int IntIntDlg(int n);
using IntIntDlg = System::MulticastDelegate<int32_t(int32_t)>;
  1. C# classes and structures are represented as C++ classes. Interfaces become abstract classes. The inheritance structure mirrors that of C#, and implicit inheritance from System.Object becomes explicit.
  2. Properties and indexers are split into separate methods for getters and setters.
  3. Virtual functions in C# correspond to virtual functions in C++. Interface implementation is also achieved using the mechanism of virtual functions.
  4. Generic types and methods are transformed into C++ templates.
  5. Finalizers are converted into destructors.

Limitations

All of these factors together impose several limitations:

  1. Translation of virtual generic methods is not supported.
  2. Interface method implementations are virtual, even if they were not in the original C# code.
  3. Introducing new methods with the same names and signatures as existing virtual and/or interface methods is not possible. However, the translator allows you to rename such methods.
  4. If base class methods are used to implement interfaces in a derived class, additional definitions appear in the derived class that were not present in C#.
  5. Calling virtual methods during construction and finalization behaves differently after translation, and it should be avoided.

We understand that strictly mimicking C# behavior would require a somewhat different approach. Nevertheless, we chose this logic because it aligns the API of the converted libraries more closely with C++ paradigms. The example below illustrates these features:

C# code:

using System;

public class Base
{
    public virtual void Foo1()
    { }
    public void Bar()
    { }
}
public interface IFoo
{
    void Foo1();
    void Foo2();
    void Foo3();
}
public interface IBar
{
    void Bar();
}
public class Child : Base, IFoo, IBar
{
    public void Foo2()
    { }
    public virtual void Foo3()
    { }
    public T Bazz<T>(object o) where T : class
    {
        if (o is T)
            return (T)o;
        else
            return default(T);
    }
}

C++ header file:

#pragma once

#include <system/object_ext.h>
#include <system/exceptions.h>
#include <system/default.h>
#include <system/constraints.h>

class Base : public virtual System::Object
{
    typedef Base ThisType;
    typedef System::Object BaseType;
    
    typedef ::System::BaseTypesInfo<BaseType> ThisTypeBaseTypesInfo;
    RTTI_INFO_DECL();
    
public:

    virtual void Foo1();
    void Bar();
};

class IFoo : public virtual System::Object
{
    typedef IFoo ThisType;
    typedef System::Object BaseType;
    
    typedef ::System::BaseTypesInfo<BaseType> ThisTypeBaseTypesInfo;
    RTTI_INFO_DECL();
    
public:

    virtual void Foo1() = 0;
    virtual void Foo2() = 0;
    virtual void Foo3() = 0;
};

class IBar : public virtual System::Object
{
    typedef IBar ThisType;
    typedef System::Object BaseType;
    
    typedef ::System::BaseTypesInfo<BaseType> ThisTypeBaseTypesInfo;
    RTTI_INFO_DECL();
    
public:

    virtual void Bar() = 0;
};

class Child : public Base, public IFoo, public IBar
{
    typedef Child ThisType;
    typedef Base BaseType;
    typedef IFoo BaseType1;
    typedef IBar BaseType2;
    
    typedef ::System::BaseTypesInfo<BaseType, BaseType1, BaseType2> ThisTypeBaseTypesInfo;
    RTTI_INFO_DECL();
    
public:

    void Foo1() override;
    void Bar() override;
    void Foo2() override;
    void Foo3() override;
    template <typename T>
    T Bazz(System::SharedPtr<System::Object> o)
    {
        assert_is_cs_class(T);
        
        if (System::ObjectExt::Is<T>(o))
        {
            return System::StaticCast<typename T::Pointee_>(o);
        }
        else
        {
            return System::Default<T>();
        }
    }
};

C++ source code:

#include "Class1.h"
RTTI_INFO_IMPL_HASH(788057553u, ::Base, ThisTypeBaseTypesInfo);
void Base::Foo1()
{
}
void Base::Bar()
{
}
RTTI_INFO_IMPL_HASH(1733877629u, ::IFoo, ThisTypeBaseTypesInfo);
RTTI_INFO_IMPL_HASH(1699913226u, ::IBar, ThisTypeBaseTypesInfo);
RTTI_INFO_IMPL_HASH(3787596220u, ::Child, ThisTypeBaseTypesInfo);
void Child::Foo1()
{
    Base::Foo1();
}
void Child::Bar()
{
    Base::Bar();
}
void Child::Foo2()
{
}
void Child::Foo3()
{
}

The series of aliases and macros at the beginning of each translated class are used to emulate certain C# mechanisms, primarily GetType, typeof, and is. Hash codes from the .cpp file are used for efficient type comparison. All functions implementing interfaces are virtual, even though this differs from C# behavior.

Related articles