Andrei Romashchenko. Proposed internship projects.

Les sujets de stage proposés

Random number generator: testing and whitening. Andrei Romashchenko and Alexander Shen. Description and presentation.
Abstract: Generation of random bits is a classical problem known in the context of pseudo-random generators and also in connection with of truly random physical processes (there exist electronic devices that produce random bits using an unpredictable physical noise or intrinsically nondeterministic quantum phenomena). However, the quality of physical generators of random bits remains badly founded and poorly tested. The first objective of this project is an experimental study of the validity and quality of several physical random numbers generators.

When we talk about the quality of random or pseudo-random generators, we have to use randomness tests. The second objective of the project is an inventory and revision of statistical tests for random and pseudo-random generators. We suggest to improve the quality of statistical tests and develop new techniques of "whitening" that improves the quality of non-ideal sources of random bits. Another axis of the project is a conversion of various probabilistic proofs into unconventional randomness tests.

Combinatorial properties of random and pseudo-random graphs. Andrei Romashchenko and Alexander Shen. Description.
Abstract: The study of random graphs is a large and a well developed branch of graph theory. It deals with the typical properties that hold with high probability for a randomly chosen graph. For example, it is known that the vast majority of graphs of fixed degree satisfy the definition of an expander. The expander graphs posses several interesting properties : high vertex expansion, strong connectivity, fast mixing, etc. Graphs of this type have many important applications in computer science and in coding theory.

Thus, for several practical applications we need expander graphs. But though almost any randomly chosen graph is an expander, it is rather hard to produce such a graph explicitly, in a deterministic way. On the other hand, generating and storing a large random graph is a an expensive procedure, which consumes much resources.

We propose to study an intermediate approach that combines the usual deterministic and probabilistic paradigms : we propose to produce graphs using (classical) pseudo-random number generators. The idea is to compare the properties of pseudo-random graphs with the typical properties of truly random graphs. More technically, we propose to estimate theoretically some specific combinatorial properties of truly random graphs (expansion rate,spectral gap) and then compare them with the properties of pseudo-random graphs by numerical experiments. Both positive and negative results would be interesting : either we obtain efficient constructions that produces pseudo-random graphs with interesting combinatorial properties (useful for applications), or we discover new tests to distinguish truly random sequences from pseudo-random ones.

Efficient synchronization of remote files. Andrei Romashchenko and Alexander Shen. Description.
Abstract: It is well known that some programs (e.g., rsync, or btsync, or some cloud services) can synchronize huge files in fraction of the time required to transmit the file — using the fact that the previous version of the file is available on the other side and only a few changes were made in the new version. These programs try to find the unchanged parts of the files, use some digital fingerprints to check they are indeed the same in both versions of the file, and avoid transmitting these parts in full.

Is there some theoretical model for this approach ? Can it be used to improve the existing tools? The answer to the first question is yes, there are several results in the theory of communication complexity that show how one could synchronize remote files with a small editing distance between them. Still, the existing theoretical approaches are quite limited and, on the other hand, the practical tools are often written ad hoc without using any theoretical groundwork.

So the question is twofold. Is it possible to extend the existing theoretical results and obtain faster algorithms for wider class of situations (types of changes between two file versions)? Is it possible to improve existing practical tools using the known theoretical approaches or, possibly, new ideas ? A good starting point is to try random hashing instead of ad hoc fingerprints in existing tools and investigate if it helps in practice. This internship can be followed by a longer research project, with possibly theoretical or more engineering aims.

Last update October 8, 2020