Customers value Aspose products, that allow manipulating protocols and files of popular formats. Most of them were initially developed for .NET. At the same time, business applications for file formats, run in different environments. This article will describe how we have succeeded in setting up the releases of Aspose.BarCode, Aspose.Email, Aspose.Font, Aspose.Page, Aspose.PDF, Aspose.PUB, Aspose.Slides, Aspose.Tasks, Aspose.TeX, and Aspose.Words for C++, by building a framework for code translation from C#. Keeping the functionality of .NET versions for these products was technically challenging.
The success of C# to C++ code translator is based on the successful experience, that the CodePorting team had while setting up the automated C# to Java code translation. The created framework was transforming C# classes into Java ones while replacing system library calls properly.
The different approaches had been considered for the framework. The development of pure Java versions from scratch would require too many resources. One option was marshaling the calls from Java code to .NET environment, but this would limit the set of programming platforms we could support in the future. Back then, .NET was present on Windows only. Calls marshaling is convenient with rarely happening calls carrying widely used data types. However, it becomes overwhelming while working with plenty of objects and custom data types.
Instead, we wondered how to fully translate existing code to a new platform. This was a topical issue because code migration had to be done monthly and for all products, producing a synchronized flow of similarly featured releases.
The solution was split into two parts:
The following arguments confirmed that the plan was technically viable:
We won't go into further details of C# to Java translator, this would require dedicated articles. To summarize, converting C# products to Java had become the company's regular practice, thanks to the code translator created. The translator had grown from a simple rule-driven text transformer into a complicated code generator that works with AST representation of source code.
The success of C# to Java translator helped us to enter the Java market, and the subject was raised to start releasing for C++ using the same scenario.
To make it possible to release C++ version of our products, it was required to create a framework that would allow us to translate C# code to C++, compile it, test it, and send it to the customer. The code was a set of libraries, each up to a few million lines of code. The Library component of the code translator had to cover the following:
Many readers are likely to ask why we didn't consider using existing solutions, such as Mono project. There were several reasons to do so:
Theoretically, we could use our translator to convert an existing solution to C++. However, this would require having a fully functional translator at the very beginning, because it is impossible to debug any translated code without a system library. Besides, the optimization issues would become even more essential than for the translated products' code, because system library calls tend to become bottlenecks.
Let's come back to our requirements for the code translator. Because of the inability to map .NET types to STL ones, we decided to use custom Library types as substitutions. The library was developed as a set of adapters allowing the use of third-party libraries' features through a .NET-like API (same as in Java).
As we were translating the libraries with existing API, an important requirement for the translated code was that it should run inside any customer's application. Therefore, we couldn't use garbage collection for the translated code as it would cover the whole application. Instead, our memory management model had to be clear for C++ developers. Using smart pointers was chosen as a compromise. We will describe how we have succeeded in changing the memory model in a separate article.
CodePorting has a strong test coverage culture, and the ability to apply the tests written for C# code to C++ products would simplify troubleshooting significantly. The code translator had to be able to translate the tests too.
Initially, manual fixing of translated Java code allowed to speed up the development and product releases. However, in the long run, this significantly raised the expenses needed to prepare each version for the release, as every translation error had to be fixed each time it appeared. This could be manageable by feeding the resulting Java code with the patches calculated as the difference between the translator's outputs generated for two consequential C# code revisions instead of converting it from zero each time. Nevertheless, it was decided to prioritize C++ framework fixing over resulting code fixing, thus fixing each translation error only once.
The design and development of C# to C++ code translator was performed solely by CodePorting. It required many investigations, applying multiple approaches, and tests, differing by memory model and other aspects. In the end, two solutions were chosen. One of them is currently being used for C++ releases of Aspose.BarCode, Aspose.Email, Aspose.Pdf, Aspose.Slides, Aspose.Tasks and Aspose.Words products.
Now it's time to explain the technologies we use in the code translator. The translator is a console application written in C#, which makes it easy to embed into scripts performing typical sequences like ‘translate-compile-test’. There is also a GUI component allowing you to do the same by clicking on the buttons.
The translator uses several AST tree walkthroughs to collect information and generate output C++ code. For C++ code there is no AST representation created, instead, we handle output code in pure text form.
There are many cases when extra information is required to fine-tune the translator. This information is passed via options and attributes. Options are applied to the whole project. Typically, they are used to specify the class export macro name or C# conditional symbols used when parsing the code. Attributes are applied to the types and entities and provide some specific information for them, e.g.: mark which class members require ‘const’ or ‘mutable’ qualifiers in the translated code or which entities should be excluded from translation.
C# classes and structures are being converted into C++ classes. Their members and source code - into closest analogs. Generic types and methods are mapped to C++ templates. C# references are translated into smart pointers (shared or weak). Reference classes are defined in the Library. Other internal details of the code translator will be described in a separate article.
So, the project translated from C# to C++ depends on our Library instead of .NET libraries:
To build the code translator Library and the translated projects, we use Cmake. Currently, we support VS 2017 and 2019 (Windows), GCC, and Clang (Linux) compilers.
As already mentioned, most of our .NET implementations are thin adapters over third-party libraries, including:
Both the Translator and Library are covered with many tests. Library tests use the GoogleTest framework. Translator tests are mostly written in NUnit/xUnit and are split into several categories, which ensure that:
While working on this project, we faced a lot of different problems. Some of them were expected, and others were uncovered on the way:
Despite all of that, the code translator project is very interesting from a technical point of view, and its academic complicity forces us to learn something new all the time.
While working on the code translator project, we have succeeded in implementing a system that solves an interesting academic task of code translation. We have organized monthly releases of Aspose libraries for the language they were not supposed to work with.
It is planned to publish more articles about the code translator. The next one will explain the conversion process in detail, including how concrete C# constructions are mapped onto C++ ones. Another one will talk about the memory management model.
We will try our best to reply to the questions asked. If the readers are interested in other aspects of code translator development, we may consider writing more articles on it.