From C# to C++: How We Have Automated Project Conversion – Part 2

Development

The design and development of C# to C++ code translator was performed solely by CodePorting. It required many investigations, applying multiple approaches, and tests, differing by memory model and other aspects. In the end, two solutions were chosen. One of them is currently being used for C++ releases of Aspose products.

Technologies

Now it's time to explain the technologies we use in the code translator. The translator is a console application written in C#, which makes it easy to embed into scripts performing typical sequences like translate-compile-test. There is also a GUI component allowing you to do the same by clicking on the buttons.

Syntax analysis is being performed by the NRefactory library in the outdated generation of the translator and by Roslyn in the new one.

The translator uses several AST tree walkthroughs to collect information and generate output C++ code. For C++ code there is no AST representation created, instead, we handle output code in pure text form.

There are many cases when extra information is required to fine-tune the translator. This information is passed via options and attributes. Options are applied to the whole project. Typically, they are used to specify the class export macro name or C# conditional symbols used when parsing the code. Attributes are applied to the types and entities and provide some specific information for them, e.g.: mark which class members require const or mutable qualifiers in the translated code or which entities should be excluded from translation.

C# classes and structures are being converted into C++ classes. Their members and source code – into closest analogs. Generic types and methods are mapped to C++ templates. C# references are translated into smart pointers (shared or weak). Reference classes are defined in the Library. Other internal details of the code translator will be described in a separate article.

So, the project translated from C# to C++ depends on our Library instead of .NET libraries:

C# to C++

To build the code translator Library and the translated projects, we use Cmake. Currently, we support VS 2017 and 2019 (Windows), GCC, and Clang (Linux) compilers.

As already mentioned, most of our .NET implementations are thin adapters over third-party libraries, including:

  • Skia — graphics support.
  • Botan — encryption functions.
  • ICU — strings, codepages, and cultures support.
  • Libxml2 — XML operations.
  • PCRE2 — regular expressions support.
  • zlib — compression functions.
  • Boost — different purposes.
  • Few other libraries.

Both the Translator and Library are covered with many tests. Library tests use the GoogleTest framework. Translator tests are mostly written in NUnit/xUnit and are split into several categories, which ensure that:

  • The translator's output matches its target on specific input data.
  • Translated programs' output matches its target.
  • NUnit/xUnit tests from the input projects are translated into GoogleTest ones and pass.
  • Translated projects' API works fine in C++.
  • Translator options and attributes work as expected.

We use GitLab as a version control system. For CI, we use Jenkins. Translated products are available as NuGet packages and downloadable archives.

Issues

While working on this project, we faced a lot of different problems. Some of them were expected, and others were uncovered on the way:

  1. Type system differences between .NET and C++.
    C++ doesn't have any substitution for Object type, and most library classes don't have RTTI. This makes it impossible to map .NET types to STL ones.
  2. Translation algorithms are complicated.
    Many untrivial nuances need to be uncovered in translated code. For example, C# has a defined order of calculating the method's arguments, while C++ has UB here.
  3. Troubleshooting is hard.
    Debugging translated code requires specific skills. Nuances like the one described above can impact a program's work crucially, producing hard-to-explain errors. On the other hand, they can easily turn into hidden bugs and remain for a long time.
  4. Memory management systems differ.
    C++ doesn't have garbage collection. Due to that, more resources are required to make the translated code behave like the original one.
  5. Discipline is required for C# developers.
    C# developers have to get used to the limitations caused by the code translation process. The reasons for such limitations:
    • The language version should be supported by a translator syntax analyzer.
    • Code constructs not supported by the translator are forbidden (e.g. yield).
    • Code style is limited by translated code structure (e.g. each reference field must unambiguously be either a weak reference or shared reference, while for arbitrary C# code, this is not necessarily the case).
    • C++ language imposes its restrictions (e. g. in C# static variables aren't deleted before all foreground threads finish, while in C++ this is not the case).
  6. A large amount of work.
    The subset of the .NET library which is used by our products is large enough, and it takes much time to implement all classes and methods.
  7. Special requirements for developers.
    The necessity to go deep into complicated platform internals, and work with two or more programming languages limits the number of available candidates. On the other hand, developers interested in compilers theory or other exotic disciplines find their place in the project easily.
  8. Fragility of the system.
    Although we have thousands of tests and millions of lines of code to test the translator, sometimes we face problems when changes made for fixing the compilation of one project break it for the other one. For example, this may happen with rare syntax constructs and specific code styles in projects.
  9. High entry barriers.
    Most tasks in the code translator project require deep analysis. Because of the wide number of subsystems and scenarios, each new task requires getting familiar with new aspects of the project for a long time.
  10. Intellectual property protection issues.
    While there are a lot of ready solutions to obfuscate C# code effectively, in C++ much information is preserved in class headers. Moreover, some definitions can't be removed from public headers without consequences. Mapping generic classes and methods to templates creates another vulnerability, as it reveals the algorithms.

Despite all of that, the code translator project is very interesting from a technical point of view, and its academic complicity forces us to learn something new all the time.

Conclusion

While working on the code translator project, we have succeeded in implementing a system that solves an interesting academic task of code translation. We have organized monthly releases of Aspose libraries for the language they were not supposed to work with.

It is planned to publish more articles about the code translator. The next one will explain the conversion process in detail, including how concrete C# constructions are mapped onto C++ ones. Another one will talk about the memory management model.

We will try our best to reply to the questions asked. If the readers are interested in other aspects of code translator development, we may consider writing more articles on it.

Related articles