This is a work in progress. All comments are appreciated!

Note: This page was updated on Oct. 2 1996 to remove the feature of assigning each instance its own seed. The previous version is still available.

# Random Uniform CSP Generators

Many CSP researchers around the world use random uniform instances to evaluate their Constraint Satisfaction Algorithms. Although it is generally agreed that the ultimate test of a CSP algorithm is its performance on "real world" (read, economically important) problems, there is also widespread consensus on the value of "laboratory" experimentation on random problems.

Random problems offer the following advantages for empirically evaluating the performance of CSP algorithms:

1. Large quantities can be generated, so that statistically significant means and variances can be reported.
2. It is easy to vary systematically the parameters of the generator and thus to observe how an algorithm's performance relates to, for example, the number of constraints.
3. It is easy to find parameters which generate problems of which 50% are soluble; on average such problems are particularly difficult and thus tend to highlight differences in algorithm performance.
4. A fourth benefit of random problems has not been much realized. For several reasons, using random problems should permit the easy interchange of problems among experimenters:
1. These problems embody no trade secrets or sensitive corporate information.
2. No specialized domain knowledge is require to understand them.
3. They can be succinctly specified by an algorithm and a random number seed.
The goal of this Web page is promote benefit #4 by providing CSP researchers with a simple, compact program which can be used to generate uniform, random, binary CSPs. If the parameters to the function (written in C) are specified in a paper, then other workers will be able to generate the same instances on which to test their own algorithms.

## The random problem model

Although an infinite number of random CSP instance generating models might be imagined, in practice most workers in the last few years have used a simple one, which takes four parameters:
• The number of variables in the problem.
• The number of values in the domain of each variable. Each variable has a domain of the same size.
• The number of constraints. All constraints are binary (between exactly two variables). The constraints are chosen at random from a uniform distribution. This number may be specified either as an integer or as a fraction between 0 and 1. For instance, if a problem has 20 variables, then the maximum number of constraints is 20*19/2 = 190. A particular problem could be specified with number of constraint equals 95 or 0.5. The C function takes this parameter as an integer, so that it doesn't have to do rounding or truncation.
• The tightness of each constraint. All constraints have the same tightness. Tightness refers to the number of value pairs which are disallowed by the constraint. The specific pairs are chosen at random from a uniform distribution. Tightness may be specified either as an integer or as a fraction between 0 and 1. For instance, if a problem has variables with domain size of 5, then the maximum number of value pairs disallowed by a constraint is 5*5 = 25. A particular problem could be specified with tightness of 5 or 0.2. The C function takes this parameter as an integer, so that it doesn't have to do rounding or truncation.
The C code below generates random instances based on this model.

## The implementation

The C program was designed to be easy to use and to make problems easy to replicate. In particular, our goals were
• truly uniform random instances;
• a well-defined and high quality pseudo-random number generator;
• portable code that makes no assumptions about underlying data structures.
How each of these goals was achieved is discussed below.

### Truly Uniform

Something here about each possible constraint having an equal chance of being chosen; likewise each pair of values.

### Random Numbers

The C code includes an explicitly specified pseudo-random number generator. Relying on rand() or random() functions defined by an operating system or local library of course greatly reduces the likelihood that another researcher will be able to duplicate the instances. We use a routine, ran2, from the well known Numerical Recipes in C, by William H. Press et al.; the discussion there is well worth reading. This generator has a period of about 2.3 * 10^18. A brief quotation: "We think that, within the limits of its floating-point precision, ran2 provides perfect random numbers" (p. 281). This excellent book is available through the WWW at http://nr.harvard.edu/nr/bookc.html. The specific section concerning ran2, Section 7.1, is in http://cfatab.harvard.edu/nr/bookc/c7-1.ps (postscript).

### Portable Code

The function MakeURBCSP does not actually generate CSPs. Instead, it calls four other "hook" or "callback" functions which do the job of updating the appropriate data structures. In the sample code, these functions, StartCSP, AddConstraint, AddNogood, and EndCSP, just write to stdout. To make MakeURBCSP functional, modify these fabulous four to set the data structures your system uses.

## Implementation Notes

The code supplied is not meant to be sacred; it is just a useful tool. Feel free to make any modifications necessary, as long as the changes do not alter the CSPs which are produced. In general, in writing the program, we tried first to adhere to the four principles described above. The second concern was clarity; hopefully, the program is easy to read and understand. Efficiency was a lesser concern, and you may well see several ways to make the program run faster (we certainly do!).

### primitive sizes

Several parts of the code rely on the length of a long being 4 bytes. If that's not what your compiler produces, you'll have to substitute something else for long. In contrast, the length of an int is not important.

### zero-based numbering

As is typical in C programs, numbering starts from 0. Thus in a CSP with 100 variables, they are numbered from 0 to 99. If there are 8 values, they are numbered from 0 to 7. You can easily change the ranges by adding a constant in the calls to AddConstraint and AddNogood.

### malloc and free

Depending on how smart your compiler is, and on whether speed is an issue, you may not want to malloc and free the CTarray and NGarray fields on every call.

### randomly selecting disallowed value pairs

The program uses the random number generator to select the disallowed or illegal value pairs in each constraint. This is an arbitrary decision; it would be equally possible to select the allowed or valid pairs. If you use a data structure that stores the valid pairs, you'll have to do some intermediate processing.

## Verification

If you implement urbcsp on your computer, neccessarily making minor modifications, how do you know that your program is generating the same series as everyone else's? A validation suite is required; for now, note that the command urbcsp 100 10 10 10 100 100 will generate 100 (very easy) CSPs and the last instance will have these constraints:
```Instance 99
17  19: (6 6) (2 0) (3 8) (5 7) (9 6) (2 7) (5 6) (8 2) (9 9) (4 8)
57  94: (0 3) (0 1) (8 0) (2 2) (7 6) (9 1) (8 4) (3 0) (9 2) (8 8)
10  28: (5 0) (4 6) (9 2) (8 2) (1 2) (3 5) (4 8) (1 1) (3 3) (4 0)
1  90: (1 4) (4 5) (5 3) (7 8) (7 2) (7 1) (0 0) (0 4) (0 5) (1 9)
55  64: (2 0) (5 9) (0 8) (0 2) (9 0) (5 1) (5 4) (2 7) (1 6) (5 0)
9  32: (0 5) (1 1) (6 3) (1 8) (2 4) (5 6) (3 5) (2 8) (9 9) (5 3)
3  12: (0 7) (3 6) (8 8) (0 8) (6 1) (1 4) (2 0) (3 2) (4 1) (3 0)
52  69: (5 5) (7 8) (8 2) (1 8) (9 7) (9 2) (9 3) (3 1) (9 9) (4 8)
11  59: (9 0) (0 1) (8 7) (5 8) (7 4) (2 2) (2 1) (8 4) (9 8) (6 9)
14  44: (9 0) (2 4) (3 3) (5 0) (2 7) (1 4) (3 9) (9 6) (6 8) (7 0)
```
At a minimum, your program should duplicate this result.

## Other generators

We welcome other random instance generators. It would probably make sense for other problem generators to use the same pseudo-random number generator.

# The code

Here is the code for urbcsp.c: (it's also available as a stand-alone file).
```/* urbcsp.c -- generates uniform random binary constraint satisfaction problems
*/
#include <stdio.h>
#include <math.h>

/* function declarations */
float ran2(long *idum);
void StartCSP(int N, int K, int instance);
void EndCSP();
void AddConstraint(int var1, int var2);
void AddNogood(int val1, int val2);

/*********************************************************************
This file has 5 parts:
0. This introduction.
1. A main() function, which can be used to demonstrate MakeURBCSP().
2. MakeURBCSP().
3. ran2(), a random number generator.
4. The four functions StartCSP(), AddConstraint(), AddNogood(), and
EndCSP(), which are called by MakeURBCSP().  The versions
of these functions given here print out each instance, listing
the incompatible value pairs of each constraint.  You will need
to replace these functions with versions that mesh with your
system and data structures.
*********************************************************************/

/*********************************************************************
1. A simple main() function which reads in command line parameters
and generates CSPs.
*********************************************************************/

int main(int argc, char* argv[])
{
int N, D, C, T, I, i;
long S;

if (argc != 7)
{
printf("usage: urbcsp #vars #vals #constraints #nogoods seed "
"instances\n");
return 0;
}

N = atoi(argv[1]);
D = atoi(argv[2]);
C = atoi(argv[3]);
T = atoi(argv[4]);
S = atoi(argv[5]);
I = atoi(argv[6]);

/* Seed passed to ran2() must initially be negative. */
if (S > 0)
S = -S;

for (i=0; i<I; ++i)
if (!MakeURBCSP(N, D, C, T, &S))
return 0;

return 1;
}

/*********************************************************************
2. MakeURBCSP() creates a uniform binary constraint satisfaction
problem with a specified number of variables, domain size,
tightness, and number of constraints.  MakeURBCSP() calls
EndCSP(), which actually create the CSP (that is, build a data
structure).  Feel free to change the signatures of these functions.
Note that numbering starts from 0: the variables are numbered 0..N-1,
and the values are numbered 0..K-1.

INPUT PARAMETERS:
N: number of variables
D: size of each variable's domain
C: number of constraints
T: number of incompatible value pairs in each constraint
Seed: a negative number means start a new sequence of
pseudo-random numbers; a positive number means continue
with the same sequence.  S is turned positive by ran2().
RETURN VALUE:
Returns 0 if there is a problem; 1 for normal completion.
*********************************************************************/

int MakeURBCSP(int N, int D, int C, int T, long *Seed)
{
int PossibleCTs, PossibleNGs;       /* CT means "constraint" */
unsigned long *CTarray, *NGarray;   /* NG means "nogood pair" */
long selectedCT, selectedNG;
int i, c, r, t;
int var1, var2, val1, val2;
static int instance;

/* Check for valid values of N, D, C, and T. */
if (N < 2)
{
printf("MakeURBCSP: ***Illegal value for N: %d\n", N);
return 0;
}
if (D < 2)
{
printf("MakeURBCSP: ***Illegal value for D: %d\n", D);
return 0;
}
if (C < 0 || C > N * (N - 1) / 2)
{
printf("MakeURBCSP: ***Illegal value for C: %d\n", C);
return 0;
}
if (T < 1 || T > ((D * D) - 1))
{
printf("MakeURBCSP: ***Illegal value for T: %d\n", T);
return 0;
}

if (*Seed < 0)      /* starting a new sequence of random numbers */
instance = 0;
else
++instance;       /* increment static variable */

StartCSP(N, D, instance);

/* The program has to choose randomly and uniformly m values from
n possibilities.  It uses the following logic for both constraints
and nogood value pairs:
1. Let t[] be an array of the n possibilities
2. for i = 0 to m-1
3.    r = random(i, n-1)    ; random() returns an int in [i,n-1]
4.    swap t[i] and t[r]
5. end-for
At the end of the for loop, the elements from t[0] to t[m-1] are
the m randomly selected elements.
*/

/* Create an array for each possible binary constraint. */
PossibleCTs = N * (N - 1) / 2;
CTarray = (unsigned long*) malloc(PossibleCTs * 4);

/* Create an array for each possible value pair. */
PossibleNGs = D * D;
NGarray = (unsigned long*) malloc(PossibleNGs * 4);

/* Initialize the CTarray.  Each entry has one var in the high two
bytes, and the other in the low two bytes. */
i=0;
for (var1=0; var1<(N-1); ++var1)
for (var2=var1+1; var2<N; ++var2)
CTarray[i++] = (var1 << 16) | var2;

/* Select C constraints. */
for (c=0; c<C; ++c)
{
/* Choose a random number between c and PossibleCTs - 1, inclusive. */
r =  c + (int) (ran2(Seed) * (PossibleCTs - c));

/* Swap elements [c] and [r]. */
selectedCT = CTarray[r];
CTarray[r] = CTarray[c];
CTarray[c] = selectedCT;

/* Broadcast the constraint. */
AddConstraint((int)(CTarray[c] >> 16), (int)(CTarray[c] & 0x0000FFFF));

/* For each constraint, select T illegal value pairs. */

/* Initialize the NGarray. */
for (i=0; i<(D*D); ++i)
NGarray[i] = i;

/* Select T incompatible pairs. */
for (t=0; t<T; ++t)
{
/* Choose a random number between t and PossibleNGs - 1, inclusive.*/
r =  t + (int) (ran2(Seed) * (PossibleNGs - t));
selectedNG = NGarray[r];
NGarray[r] = NGarray[t];
NGarray[t] = selectedNG;

/* Broadcast the nogood value pair. */
AddNogood((int)(NGarray[t] / D), (int)(NGarray[t] % D));
}
}

EndCSP();
free(CTarray);
free(NGarray);
return 1;
}

/*********************************************************************
3. This random number generator is from William H. Press, et al.,
_Numerical Recipes in C_, Second Ed. with corrections (1994),
p. 282.  This excellent book is available through the
WWW at http://nr.harvard.edu/nr/bookc.html.
The specific section concerning ran2, Section 7.1, is in
http://cfatab.harvard.edu/nr/bookc/c7-1.ps
*********************************************************************/

#define IM1   2147483563
#define IM2   2147483399
#define AM    (1.0/IM1)
#define IMM1  (IM1-1)
#define IA1   40014
#define IA2   40692
#define IQ1   53668
#define IQ2   52774
#define IR1   12211
#define IR2   3791
#define NTAB  32
#define NDIV  (1+IMM1/NTAB)
#define EPS   1.2e-7
#define RNMX  (1.0 - EPS)

/* ran2() - Return a random floating point value between 0.0 and
1.0 exclusive.  If idum is negative, a new series starts (and
idum is made positive so that subsequent calls using an unchanged
idum will continue in the same sequence). */

float ran2(long *idum)
{
int j;
long k;
static long idum2 = 123456789;
static long iy = 0;
static long iv[NTAB];
float temp;

if (*idum <= 0) {                             /* initialize */
if (-(*idum) < 1)                           /* prevent idum == 0 */
*idum = 1;
else
*idum = -(*idum);                         /* make idum positive */
idum2 = (*idum);
for (j = NTAB + 7; j >= 0; j--) {           /* load the shuffle table */
k = (*idum) / IQ1;
*idum = IA1 * (*idum - k*IQ1) - k*IR1;
if (*idum < 0)
*idum += IM1;
if (j < NTAB)
iv[j] = *idum;
}
iy = iv[0];
}

k = (*idum) / IQ1;
*idum = IA1 * (*idum - k*IQ1) - k*IR1;
if (*idum < 0)
*idum += IM1;
k = idum2/IQ2;
idum2 = IA2 * (idum2 - k*IQ2) - k*IR2;
if (idum2 < 0)
idum2 += IM2;
j = iy / NDIV;
iy = iv[j] - idum2;
iv[j] = *idum;
if (iy < 1)
iy += IMM1;
if ((temp = AM * iy) > RNMX)
return RNMX;                                /* avoid endpoint */
else
return temp;
}

/*********************************************************************
4. An implementation of StartCSP, AddConstraint, AddNogood, and EndCSP
which prints out the CSP, just listing incompatible value pairs.
Each constraint starts one a new line, and the id-numbers of the
variables appear before the colon.  For instance, the output of
urbcsp 10 5 4 3 9999 10
begins
Instance 0
8   9: (1 1) (4 0) (0 4)
2   4: (0 3) (3 1) (4 0)
6   9: (4 1) (2 0) (0 3)
1   5: (0 3) (4 0) (0 0)
*********************************************************************/

void StartCSP(int N, int D, int instance)
{
printf("\nInstance %d", instance);
}

void AddConstraint(int var1, int var2)
{
printf("\n%3d %3d: ", var1, var2);
}

void AddNogood(int val1, int val2)
{
printf("(%d %d) ", val1, val2);
}

void EndCSP()
{
printf("\n");
}

```