Story of creating an advanced C# to C++ translator

Customers value Aspose products, that allow manipulating protocols and files of popular formats. Most of them were initially developed for .NET. At the same time, business applications for file formats, run in different environments. This article will describe how we have succeeded in setting up the releases of Aspose products for C++, by building a framework for code translation from C#. Keeping the functionality of .NET versions for these products was technically challenging.

We developed the necessary infrastructure ourselves, enabling code translation between languages and emulation of .NET library functions. By doing so, we solved a problem that is typically considered academic. This allowed us to begin releasing monthly .NET products for the C++ language, obtaining the code for each release from the corresponding C# code version. Additionally, the tests that covered the original C# code are translated alongside it, ensuring the functionality of the resulting solution is monitored, on par with specially written tests in C++.

Background

The success of C# to C++ code translator is based on the successful experience, that the CodePorting team had while setting up the automated C# to Java code translation. The created framework was transforming C# classes into Java ones while replacing system library calls properly.

The different approaches had been considered for the framework. The development of pure Java versions from scratch would require too many resources. One option was marshaling the calls from Java code to .NET environment, but this would limit the set of programming platforms we could support in the future. Back then, .NET was present on Windows only. Calls marshaling is convenient with rarely happening calls carrying widely used data types. However, it becomes overwhelming while working with plenty of objects and custom data types.

Instead, we wondered how to fully translate existing code to a new platform. This was a topical issue because code migration had to be done monthly and for all products, producing a synchronized flow of similarly featured releases.

The solution was split into two parts:

Translator — application to transform C# syntax into Java one, replacing .NET types and methods with proper substitutions from target language libraries.
Library — component to emulate the parts of .NET library that could not be mapped to Java properly. To simplify the task, the available third-party components could be used.

The following arguments confirmed that the plan was technically viable:

C# and Java languages have a similar ideology. At least, when it comes to types structure and memory management model.
We had to translate the libraries only, so moving GUIs to a different platform was not the case.
The translated libraries mostly contained business logic and low-level file operations, with the most complex dependencies being System.Net and System.Drawing.
From the very beginning, the libraries were developed to work on a wide range of .NET versions (including Framework, Standard, and even Xamarin). Therefore, minor platform differences could be ignored.

We won't go into further details of C# to Java translator, this would require dedicated articles. To summarize, converting C# products to Java had become the company's regular practice, thanks to the code translator created. The translator had grown from a simple rule-driven text transformer into a complicated code generator that works with AST representation of source code.

The success of C# to Java translator helped us to enter the Java market, and the subject was raised to start releasing for C++ using the same scenario.

Requirements

To make it possible to release C++ version of our products, it was required to create a framework that would allow us to translate C# code to C++, compile it, test it, and send it to the customer. The code was a set of libraries, each up to a few million lines of code. The Library component of the code translator had to cover the following:

Emulate .NET environment for the translated code.
Adapt translated code for C++: types structure, memory management, etc.
Move from translated C# code style to C++ style, to make it easy to use the code for the developers not familiar with .NET paradigms.

Many readers are likely to ask why we didn't consider using existing solutions, such as Mono project. There were several reasons to do so:

This would not cover the second and third requirements.
Mono is implemented on C# and is dependent on its runtime.
Adapting third-party code to our needs (API, type system, memory management model, optimization, etc.) would require the amount of time comparable to creating our solution.
Our products do not require the full .NET implementation. However, if we had a full implementation, it would be hard to distinguish which methods and classes we need and which ones do not. We would spend much time fixing the features we never use.

Theoretically, we could use our translator to convert an existing solution to C++. However, this would require having a fully functional translator at the very beginning, because it is impossible to debug any translated code without a system library. Besides, the optimization issues would become even more essential than for the translated products' code, because system library calls tend to become bottlenecks.

Let's come back to our requirements for the code translator. Because of the inability to map .NET types to STL ones, we decided to use custom Library types as substitutions. The library was developed as a set of adapters allowing the use of third-party libraries' features through a .NET-like API (same as in Java).

As we were translating the libraries with existing API, an important requirement for the translated code was that it should run inside any customer's application. Therefore, we couldn't use garbage collection for the translated code as it would cover the whole application. Instead, our memory management model had to be clear for C++ developers. Using smart pointers was chosen as a compromise. We will describe how we have succeeded in changing the memory model in a separate article.

CodePorting has a strong test coverage culture, and the ability to apply the tests written for C# code to C++ products would simplify troubleshooting significantly. The code translator had to be able to translate the tests too.

Initially, manual fixing of translated Java code allowed to speed up the development and product releases. However, in the long run, this significantly raised the expenses needed to prepare each version for the release, as every translation error had to be fixed each time it appeared. This could be manageable by feeding the resulting Java code with the patches calculated as the difference between the translator's outputs generated for two consequential C# code revisions instead of converting it from zero each time. Nevertheless, it was decided to prioritize C++ framework fixing over resulting code fixing, thus fixing each translation error only once.

From C# to C++: How We Have Automated Project Conversion – Part 1

Background

Requirements

Related articles

Related articles

From C# to C++: How We Have Automated Project Conversion – Part 2

C# to C++ Translator: Operations on the Source Code

Rules for Translating Code from C# to C++: Basics

Rules for Translating Code from C# to C++: Class Members and Control Structures

Rules for Translating Code from C# to C++: Object Creation and Method Calls

From C# to C++: How We Have Automated Project Conversion – Part 1

Background

Requirements

Related articles

Contact Us