22 November 2024
Creating an efficient code translator between languages such as C# and C++ is a complex task. During the development of the CodePorting.Translator Cs2Cpp tool, we encountered numerous challenges related to the differences in syntax, semantics, and programming paradigms of these two languages. This article will discuss the key difficulties we faced and possible ways to overcome them.
This pertains to constructs such as using
and yield
:
using (var resource = new Resource())
{
// Resource usage
}
public IEnumerable<int> GetAllNumbers()
{
for (int i = 0; i < int.MaxValue; i++)
{
yield return i;
}
}
In such cases, we either have to write fairly complex code to emulate the behavior of the original code in both the translator and the library—in the first case, or abandon support for such constructs—in the second case.
For example, the original code may contain virtual generic methods or constructors using virtual functions:
public class A
{
public virtual T GenericMethod<T>(T param)
{
return param;
}
}
public class A
{
public A()
{
VirtualMethod();
}
public virtual void VirtualMethod()
{
}
}
public class B : A
{
public override void VirtualMethod()
{
}
}
In such cases, we have no choice but to rewrite the problematic code in terms that allow conversion to C#. Fortunately, such cases are rare and usually involve small code fragments.
This includes resources, reflection, dynamic assembly loading, and function imports:
static void Main()
{
var rm = new ResourceManager("MyApp.Resources", typeof(Program).Assembly);
var value = rm.GetString("MyResource");
}
static void Main()
{
var type = typeof(MyClass);
var method = type.GetMethod("MyMethod");
var result = method.Invoke(null, null);
Console.WriteLine(result);
}
public class MyClass
{
public static string MyMethod()
{
return "Hello, World!";
}
}
static void Main()
{
var assembly = Assembly.Load("MyDynamicAssembly");
var type = assembly.GetType("MyDynamicAssembly.MyClass");
var instance = Activator.CreateInstance(type);
var method = type.GetMethod("MyMethod");
method.Invoke(instance, null);
}
In such cases, we have to emulate the corresponding mechanisms. This includes support for resources (embedded in the assembly as static arrays and read through specialized stream implementations) and reflection. Obviously, directly linking .NET assemblies to C++ code or importing functions from dynamic Windows libraries when running on another platform is not possible, so such code has to be trimmed or rewritten.
In this case, we implement the necessary behavior, usually using implementations from third-party libraries whose licenses do not prohibit use in a commercial product.
In some cases, this involves simple implementation errors that are usually easy to fix. Much worse, however, is when the difference in behavior lies at the subsystem level used by the library code.
For example, many of our libraries actively use classes from the System.Drawing
library built on GDI+. The versions of these classes we developed for C++ use Skia as the graphics engine. Skia's behavior often differs from that of GDI+, especially on Linux, and achieving consistent rendering requires significant resources. Similarly, libxml2, on which our System::Xml
implementation is based, behaves differently in some cases, and we have to patch it or complicate our wrappers.
C# programmers optimize their code for the conditions in which it executes. However, many structures begin to run slower in an unfamiliar environment.
For instance, creating a large number of small objects in C# generally works faster than in C++ due to different heap management schemes (even considering garbage collection). Dynamic type casting in C++ is also somewhat slower. Reference counting when copying pointers is another overhead source absent in C#. Finally, using translated concepts from C# (enumerators) instead of built-in, optimized C++ ones (iterators) also slows down code performance.
The way to eliminate bottlenecks largely depends on the situation. If library code can be relatively easily optimized, retaining the behavior of translated concepts while optimizing their performance in an unfamiliar environment can be quite challenging.
For example, public APIs might have methods that accept SharedPtr<Object>
, containers lack iterators, and stream-handling methods accept System::IO::Stream
instead of istream
, ostream
, or iostream
, and so on.
We continuously expand the translator and library to make our code convenient for C++ programmers. For instance, the translator can already generate begin
-end
methods and overloads that work with standard streams.
C++ header files contain types and names of private fields, as well as the complete code of template methods. This information is usually obfuscated when releasing .NET assemblies.
We strive to exclude unnecessary information using third-party tools and special modes of the translator itself, but this is not always possible. For example, removing private static fields and non-virtual methods does not affect client code operation; however, it is impossible to remove or rename virtual methods without losing functionality. Fields can be renamed, and their types can be replaced with stubs of the same size, provided constructors and destructors are exported from the code compiled with full header files. At the same time, it is impossible to hide the code of public template methods.
The releases of products for the C++ language, created using our framework, have been successfully launched for many years. Initially, we released reduced versions of the products, but now we manage to maintain much more complete functionality.
At the same time, there is still plenty of room for improvements and corrections. This includes supporting previously omitted syntactic constructs and library parts, as well as enhancing the ease of using the translator.
Apart from resolving current issues and planned improvements, we are working on migrating the translator to the modern Roslyn syntax analyzer. Until recently, we used the NRefactory analyzer, which was limited to supporting C# versions up to 5.0. Transitioning to Roslyn will allow us to support modern C# language constructs, such as:
Finally, we plan to expand the number of supported languages—both target and source. Adapting Roslyn-based solutions for reading VB code will be relatively easy, especially considering that libraries for C++ and Java are already ready. On the other hand, the approach we used to support Python is much simpler, and similarly, other scripting languages like PHP can be supported.