My research interests are at the intersection of Systems and Language runtimes (e.g., the Java Virtual Machine, JavaScript engines, WebAssembly runtimes etc), Data analytics (e.g., SQL query compilation and optimization, data serialization and de-serialization, etc), Parallel, Distributed and Concurrent programming (e.g., GPU programming, SIMD parallelization, shared-memory concurrency, etc), and broadly speaking Cloud computing (e.g., serverless). In general, I like to build things, and I am interested in all performance-related aspects of the topics above.
All offered projects are challenging and require good systems programming and knowledge. I assume that at the Bachelor’s level you have taken (or you are familiar with some of the topics of) Computer Organization, Advanced Network Programming, Compiler Construction, Operating Systems, and Concurrency and Multithreading. At the Master’s level, you have taken/or you are familiar with at least some of Programming Large-Scale Parallel Systems, Systems Seminar, Programming Multi-core and Many-core Systems. Very good knowledge of Java, C, C++, Rust, or comparable languages is strongly recommended.
The projects below can either be a MSc thesis (typically with a literature study) or large (12 ECTS) MSc research projects. Some of the projects can be adapted to BSc-level (for ambitious BSc students!)
[User-level eBPF scheduling for the JVM]. The new sched_ext
functionality in the Linux kernel enables user-space applications to customize CPU OS scheduling, allowing applications to implement domain-specific scheduling policies. sched_ext
is based on eBPF and can be easily exposed to high-level language runtimes such as the Java Virtual Machine. In this project, you will study the impact of user-level domain-specific scheduling on serverless applications. The goal of the project is to implement and evaluate different user-level scheduling policies targeting the Java VM, and evaluate the impact of different scheduling decisions on performance properties of the application. Related work (1) (2).
[Stream-loading of large WebAssembly binary files]. WebAssembly modules can often be relatively large (in the order of 100s of MBs). Loading such big files in a WebAssembly runtime can take several seconds. This is often not acceptable in a Web browser or in microservices. In this project we want to explore how data streaming techniques can be used in a WebAssembly runtime to improve application startup. In particular, we want to implement “lazy” loading of WebAssembly binary sources from a remote server, effectively enabling to start running a WebAssembly application before all sources have been downloaded. Related work (1) (2).
[Compilation service for WebAssembly]. Modern WebAssembly runtimes such as wasmer provide in-process Just-in-time compilation. JIT compilers are normally running as a concurrent thread in the same process of the application. In this project we want to study an alternative runtime design, where the JIT compiler runs as an external “service”, i.e., in a different process, potentially running on a different machine. Such client-server design would allow to reduce even more the memory consumption of the WebAssembly runtime, as well as enable distributed code caching and other optimizations. Related work (1).
[Fast access to DuckDB/SQLlite raw data]. In-process database systems such as SQLlite or DuckDB have gained significant popularity over the last years. When used in programming languages such as JavaScript, Python or Java, such systems suffer from high overhead caused by reading and writing data from the language VM memory space to/from the binary data representation in the DataBase storage format. In this project we want to explore how to access data stored in the DuckDB or SQLlite binary formats without overhead. To this end, we will investigate how to memory-map the data stored in a DuckDB/SQLlite file directly to the memory space of a language VM. Related work (1).
[Profile-guided inlining for WebAssembly]. Profile-guided optimization (PGO) is a compiler optimization technique where profiling information and runtime traces from previous execution of a given application are used to optimize future executions of the same application. The intuition is that the same application will most likely behave in the same way across multiple runs, with the consequence that knowledge from past runs can be used to optimize future executions. In this project, we want to explore the usage of PGO-style optimizations for WebAssembly, starting from function inlining. Related work (1).
[Profile-driven Garbage collection]. The Garbage Collection runtime of modern language VMs such as V8 and the Java VM is triggered by runtime heuristics based on the current and predicted memory utilization. Such heuristics are often not optimal. In this project, we want to explore the usage of PGO style optimizations to improve the performance of Garbage Collection. The intuition is that runtime traces and profiles from previous garbage collections can be used to fine-tune and improve existing language VMs. Related work (1) (2).
[Accelerated JSON parsing]. JSON parsing is typically a very CPU-heavy operation, often resulting in major performance bottlenecks in web services and Data analytics applications. In this project we want to explore using accelerators (such as NPUs, FPGAs or others) to speed up JSON parsing performance. Related work (1).
[Accelerated RexEx engines]. Regular expressions are natively supported in many high-level programming languages such as JavaScript, Java or Python. RegEx execution performance is often critical, and state-of-the-art langauge VMs employ advanced JIT compilation techniques or (SIMD) parallel processing to optimize RegEx performance. In this project we want to explore using accelerators (such as FPGAs, NPUs or others) to speed up RegEx matching performance.
[Many-tiers compilation]. Language VMs such as Google V8 or the JVM have multiple JIT compilers (typically from 2 to 5/6). Each of such compilers can internally be configured to perform a very different number of optimizations. Typically, each compiler is configured once and then used in the same configuration for all methods it has to compile. In principle, however, a JIT compiler could be “tailored” to use a different configuration for each method to be compiled. In this project, we want to extend an existing JIT compiler in order to allow fine-grained per-method configuration.
[Human readable dump of GraalVM native image binary files]. The GraalVM native image framework can be used to (ahead-of-time) compile Java applications to highly-optimized binary files, with great startup performance. In this project we want to develop a tool to disassemble GraalVM native image binary files, making it possible to reverse-engineer their content producing (where possible) a human-readable output. Related work (1).
[Snapshot-repeat compilation caching]. Advanced compiler optimizations are often implemented with Graph traversal and Graph rewriting algorithms, converting a Graph data structure (IR) from after applying each optimization. Given the often complex nature of such graphs, a typical compiler would always perform all optimizations on all graphs, without caching. In this project we want to explore the usage of alternative data structures that would allow caching of a compiler IRs. In this way, the overall runtime cost of JIT compilation would be significantly reduced, as the JIT compiler could re-use graphs from previous compilations. Related work (1).
[Combining CRIU and GraalVM native-image]. GraalVM native image is an open-source language technology that can be used to create optimized, cloud-ready binary executables for Java. By leveraging Ahead-of-time compilation of Java code, GraalVM native images can significantly reduce applications’ startup time, leading to reduced cold starts in Cloud deployments such as AWS Lambda. CRIU is an emerging technology provided by the Linux kernel aimed at the same goal: minimize applications startup time. Unlike GraalVM native image, CRIU leverages “user-space” snapshotting. In this project we want to explore how the two technologies can be combined in order to minimize even further the startup latency of Cloud applications. Related work (1).
[Compressed in-memory objects in a language VM]. The language VM for programming languages such as Java or JavaScript stores objects in the VM heap memory space. Some objects are used very often, while other objects are accessed only at a specific time (e.g., during startup). In this project we want to explore how compression algorithms could be used in the context of the Java VM in order to reduce memory consumption. The key intuition for the project is that certain objects could be stored in compressed forms, with the goal of minimizing memory consumption.
[Learned data structures in a language VM]. Learned data structures have been used successfully in the context of DataBase processing systems. In this project we want to explore using such data structures to implement some of the internal components of a modern language VM such as Google V8 or the Java Virtual Machine. Related work (1).
[Adaptive SQL compilation in Apache Spark]. Data processing systems such as Apache Spark perform runtime JIT compilation of SQL to machine code. Such compilation is typically static: once compiled, the code is never modified. In this project we want to instead explore “dynamic” compilation techniques in the context of Apache Spark SQL. Related work (1).
[Predictive JSON parsing]. ML-based JSON data access. JSON parsing libraries such as jackson do not make any assumptions on the actual structure of the JSON data being parsed. However, very often multiple JSON objects have a somehow “similar” structure. In this project we want to explore using Machine Learning models to speed up JSON data analytics. Specifically, we want to develop a JSON parsing library that “learns” the most-likely structure of the JSON data being parsed, and generates a JSON parser that is optimized for the predicted JSON data. The expectation is that such “learned” JSON parser should be significantly faster than a general-purpose one. Related work (1).
[GPU-based parsing of a general-purpose programming language]. such as Java, Python, or JavaScript. Parsing languages such as Java, Python or JavaScript is typically implemented as a single-threaded sequential operation. Depending on the language, it is actually possible to implement “parallel”, much faster parsing. In this project, we want to explore using GPUs to speed up the parsing step of one of such very popular programming languages.
[Accelerating Parallel Byte-code interpreters on GPUs]. Managed programming languages like Java offer robust environments with automatic memory management, cross-platform capabilities, and a large ecosystem of libraries and tools. However, the performance of bytecode execution in these languages can be a bottleneck, especially for compute-intensive applications. What optimising runtimes and Language Virtual Machines (VMs) do is that the runtime itself collects profiling information about the program. This profiling information is then used by a Just In Time compiler to produce optimized code for the hot-spots of the input program at a later stage of the execution. However, while collecting the profiling, the application still runs on a single core byte-code interpreter. This project will explore how to accelerate the byte-code execution by making use of Graphics Processing Units (GPUs). Related work (1), (2), (3). This project will be co-supervised with Dr Juan Fumero, Univ.Manchester (UK)
[GPU-based garbage collection acceleration]. Google V8 or the Java Virtual Machine implement very advanced Garbage Collection techniques. Most of such GCs have a parallel, multithreaded implementation. In this project, we want to study how GPUs can be used to speed-up Garbage Collection instead of using multiple threads. This project will be co-supervised with Dr Juan Fumero, Univ.Manchester (UK)
Depending on the topic, most of them can be extended to more challenging MSc-level projects.
[Data access pattern performance impact on Apache Arrow]. In this project we want to answer and understand a simple question: does the order in which we access large data files stored in the Apache Arrow format have an impact on performance? And, if so, can the application and/or the language VM optimize the way data is accessed to improve performance? Related work (1).
[DPDK bindings for Node.js]. The DPDK user-space networking stack is the state-of-the-art solution for high-performance networking. Unfortunately, using DPDK from high-level programming languages such as Python or JavaScript is not always easy. In this project we want to develop Node.js native bindings for the DPDK library, so to enable high-performance netowking in Node.js. Such bindings already exist for other languages (e.g. Go); in this project you will implement them for Node.js. Related work (1) (2).
[Performance analysis of Scatter/gather NICs in Java or Node.js]. Modern NICs with scatter/gather capabilities are often found in commodity hardware. Programming languages or frameworks used to write networking applications such as Node.js or Java (NIO) feature explicit APIs to access the scatter/gather capabilities of a NIC. However, little is understood of the actual performance of using such NICs from a langauge VM like the JVM or V8. In this project we want to study and characterize the performance of networking applications developed using such languages on modern scatter/gather NICs. Related work (1).
[Performance analysis of SIMD support in WebAssembly]. SIMD instructions are at the core of modern data processing. For example, several data ingestion operations (e.g., importing data from a CSV file) requires UTF-8 validation: the input data needs to be analyzed (one character at a time) to ensure that the input text is valid. Parallel execution techniques can be employed to speed-up the validation of large text files. In this project, we want to explore the performance of existing WebAssembly runtimes in the context of SIMD execution, looking at common operations such as e.g. UTF-8 validation. The goal of the project is to implement multiple SIMD-based data processing techniques targeting WebAssembly (either in WebAssembly itself, or by leveraging emscripten or alternative WASM compilers), to assess the current performance of WebAssembly in such Data-parallel operations. Related work (1) (2).
[Performance analysis of the Java Vector API]. for JSON data processing applications. Upcoming versions of the Java language will feature built-in support for SIMD programming. Simdjson is one of the most popular high-performance JSON processing libraries; it is implemented in highly-optimized C code, and leverages advanced SIMD instructions to perform data de-serialization in parallel. Achieving similar performance using a managed programming language such as Java would be simply impossible without access to SIMD parallelism. The recently-introduced Java Vector API promises to bring high-performance SIMD parallel programming to the Java language. In this project we want to explore the usage of such a new API in the context of JSON data processing. In particular, the goal of the project is to implement the simdjson library using the new Java Vector API, and assess its performance profile compared against the native, highly-optimized simdjson implementation. Related work (1) (2).
[Power consumption profile and analysis of Java serverless]. Java and the Java Virtual Machine are among the most popular technologies on the planet, with billions of daily users. Despite such great popularity, we still know very little about the carbon footprint of a language such as Java compared to statically-compiled languages such as C. In this project, we want to explore and understand the impact of language runtimes such as the JVM on energy consumption, ideally identifying guidelines and recommendations for the most energy-friendly configuration of a Java Virtual Machine. Related work (1) (2).
[Performance analysis of Database connectors]. Database connectors are software components responsible for receiving query result data from a remote Database system and exposing them to a given programming language. As such, they have to serialize and de-serialize raw binary data in order to make it available to the language VM. In this project, we want to explore the performance impact of such serialization/de-serialization step, and we want to explore how language VMs can optimize or reduce associated performance overheads. Related work (1).
[Performance evaluation of popular binary and textual encoding formats]. Data is often encoded using binary formats such as Parquet, Cap’n Proto, Protocol Buffers, Arrow, FlatBuffers etc. In this project, we want to build a comprehensive performance evaluation suite (i.e., a benchmark) to compare the performance of popular encoding formats on popular programming languages such as Java, JavaScript, Python, C/C++ and Rust.
[Energy-aware JIT compilation]. Just-in-time (JIT) compilation is often one of the most energy-intensive operations performed by modern language runtimes such as the Java VM and Google’s V8 JavaScript engine. Despite the importance of JIT compilation in modern programming languages, very little is known about the energy consumption of JIT compilers. In this project we want to analyze the performance and energy consumption of modern JIT compilers [POW]. The goal of this project is to enable fine-grained power consumption measurements in a language VM, allowing one to understand the energy costs of individual compiler operations and optimizations (e.g., “how expensive is inlining?”, “does energy consumption grow with the number of methods being compiled?”, etc).
[Performance evaluation of the new CPython JIT compiler]. The CPython engine is the most popular language VM for Python. In recent weeks, the engine was finally extended with a JIT compiler. In this project, we want to assess the performance impact of the new JIT, and want to understand what its properties and limitations are. Related: (1).
Literature studies are often paired with a MSc thesis. In exceptional cases (e.g., a MSc thesis with industry) I am happy to supervise literature studies on all areas related to the BSc and MSc topics above.
These project have been completed and are no longer offered. If you find the topic very interesting, I am happy to discuss about possible follow-up projects on the same topics.
[Compressed strings in a language VM]. Modern language VMs such as Google V8 or the Java Virtual Machine (JVM) represent in-memory strings using advanced runtime techniques such as Ropes. In this project we want to investigate if/how compression algorithms can be used in the context of language VMs to reduce memory consumption for large in-memory strings without negative performance impacts. Related work (1).
[ML-based inlining for the Graal compiler]. Advanced dynamic JIT compilers such as Graal rely on several of heuristics-based algorithms to perform compiler optimizations. Such heuristics are often hand-tuned. In this project we want to explore using Machine Learning to replace the inlining heuristics used in a modern compiler with a ML model, with the goal of improving performance.
[Binary layout performance impact]. The way an executable file is stored on disk has a potential impact on the application’s startup performance. For example, large binary files might result in hundreds of page faults depending on the order in which functions are declared and executed. In this project we want to develop automatic compiler-driven techniques to optimize the layout of binary files with the goal of improving application startup.
_