Time safety and Rust

Recently I have had the great fortune to work on this ticket . This was an issue that stemmed from an attempt to make clock performance faster. Previously, a call to time or clock_gettime would involve a context switch an a system call (think solaris etc). On linux we have VDSO instead, so we can easily just swap to the use of raw time calls.

The problem

So what was the problem? And how did the engineers of the past try and solve it?

DS heavily relies on time. As a result, we call time() a lot in the codebase. But this would mean context switches.

So a wrapper was made called “current_time()”, which would cache a recent output of time(), and then provide that to the caller instead of making the costly context switch. So the code had the following:

static time_t   currenttime;
static int      currenttime_set = 0;

time_t
poll_current_time()
{
    if ( !currenttime_set ) {
        currenttime_set = 1;
    }

    time( &currenttime );
    return( currenttime );
}

time_t
current_time( void )
{
    if ( currenttime_set ) {
        return( currenttime );
    } else {
        return( time( (time_t *)0 ));
    }
}

In another thread, we would poll this every second to update the currenttime value:

void *
time_thread(void *nothing __attribute__((unused)))
{
    PRIntervalTime    interval;

    interval = PR_SecondsToInterval(1);

    while(!time_shutdown) {
        poll_current_time();
        csngen_update_time ();
        DS_Sleep(interval);
    }

    /*NOTREACHED*/
    return(NULL);
}

So what is the problem here

Besides the fact that we may not poll accurately (meaning we miss seconds but always advance), this is not thread safe. The reason is that CPU’s have register and buffers that may cache both stores and writes until a series of other operations (barriers + atomics) occur to flush back out to cache. This means the time polling thread could update the clock and unless the POLLING thread issues a lock or a barrier+atomic, there is no guarantee the new value of currenttime will be seen in any other thread. This means that the only way this worked was by luck, and no one noticing that time would jump about or often just be wrong.

Clearly this is a broken design, but this is C - we can do anything.

What if this was Rust?

Rust touts mulithread safety high on it’s list. So lets try and recreate this in rust.

First, the exact same way:

use std::time::{SystemTime, Duration};
use std::thread;


static mut currenttime: Option<SystemTime> = None;

fn read_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        thread::sleep(interval);
        let c_time = currenttime.unwrap();
        println!("reading time {:?}", c_time);
    }
}

fn poll_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        currenttime = Some(SystemTime::now());
        println!("polling time");
        thread::sleep(interval);
    }
}

fn main() {
    let poll = thread::spawn(poll_thread);
    let read = thread::spawn(read_thread);
    read.join().unwrap();
    poll.join().unwrap();
}

Rust will not compile this code.

> rustc clock.rs
error[E0133]: use of mutable static requires unsafe function or block
  --> clock.rs:13:22
   |
13 |         let c_time = currenttime.unwrap();
   |                      ^^^^^^^^^^^ use of mutable static

error[E0133]: use of mutable static requires unsafe function or block
  --> clock.rs:22:9
   |
22 |         currenttime = Some(SystemTime::now());
   |         ^^^^^^^^^^^ use of mutable static

error: aborting due to 2 previous errors

Rust has told us that this action is unsafe, and that we shouldn’t be modifying a global static like this.

This alone is a great reason and demonstration of why we need a language like Rust instead of C - the compiler can tell us when actions are dangerous at compile time, rather than being allowed to sit in production code for years.

For bonus marks, because Rust is stricter about types than C, we don’t have issues like:

int c_time = time();

Which is a 2038 problem in the making :)

indexed search performance for ds - the mystery of the and query

Directory Server is heavily based on set mathematics - one of the few topics I enjoyed during university. Our filters really boil down to set queries:

&((attr=val1)(attr=val2))

This filter describes the intersection of sets of objects containing “attr=val1” and “attr=val2”.

One of the properties of sets is that operations on them are commutative - the sets to a union or intersection may be supplied in any order with the same results. As a result, these are equivalent:

&(a)(b)(c)
&(b)(a)(c)
&(c)(b)(a)
&(c)(a)(b)
...

In the past I noticed an odd behaviour: that the order of filter terms in an ldapsearch query would drastically change the performance of the search. For example:

&(a)(b)(c)
&(c)(b)(a)

The later query may significantly outperform the former - but 10% or greater. I have never understood the reason why though. I toyed with ideas of re-arranging queries in the optimise step to put the terms in a better order, but I didn’t know what factors affected this behaviour.

Over time I realised that if you put the “more specific” filters first over the general filters, you would see a performance increase.

What was going on?

Recently I was asked to investigate a full table scan issue with range queries. This led me into an exploration of our search internals, and yielded the answer to the issue above.

Inside of directory server, our indexes are maintained as “pre-baked” searches. Rather than trying to search every object to see if a filter matches, our indexes contain a list of entries that match a term. For example:

uid=mark: 1, 2
uid=william: 3
uid=noriko: 4

From each indexed term we construct an IDList, which is the set of entries matching some term.

On a complex query we would need to intersect these. So the algorithm would iteratively apply this:

t1 = (a, b)
t2 = (c, t1)
t3 = (d, t2)
...

In addition, the intersection would allocate a new IDList to insert the results into.

What would happen is that if your first terms were large, we would allocate large IDLists, and do many copies into it. This would also affect later filters as we would need to check large ID spaces to perform the final intersection.

In the above example, consider a, b, c all have 10,000 candidates. This would mean t1, t2 is at least 10,000 IDs, and we need to do at least 20,000 comparisons. If d were only 3 candidates, this means that we then throw away the majority of work and allocations when we get to t3 = (d, t2).

What is the fix?

We now wrap each term in an idl_set processing api. When we get the IDList from each AVA, we insert it to the idl_set. This tracks the “minimum” IDList, and begins our intersection from the smallest matching IDList. This means that we have the quickest reduction in set size, and results in the smallest possible IDList allocation for the results. In my tests I have seen up to 10% improvement on complex queries.

For the example above, this means we procees d first, to reduce t1 to the smallest possible candidate set we can.

t1 = (d, a)
t2 = (b, t1)
t3 = (c, t2)
...

This means to create t2, t3, we will do an allocation that is bounded by the size of d (aka 3, rather than 10,000), we only need to perform fewer queries to reach this point.

A benefit of this strategy is that it means if on the first operation we find t1 is empty set, we can return immediately because no other intersection will have an impact on the operation.

What is next?

I still have not improved union performance - this is still somewhat affected by the ordering of terms in a filter. However, I have a number of ideas related to either bitmask indexes or disjoin set structures that can be used to improve this performance.

Stay tuned ....

TLS Authentication and FreeRADIUS

In a push to try and limit the amount of passwords sent on my network, I’m changing my wireless to use TLS certificates for authentication.

Read more...

Kerberos - why the world moved on

For a long time I have tried to integrate and improve authentication technologies in my own environments. I have advocated the use of GSSAPI, IPA, AD, and others. However, the more I have learnt, the further I have seen the world moving away. I want to explore some of my personal experiences and views as to why this occured, and what we can do.

Read more...

Custom OSTree images

Project Atomic is in my view, one of the most promising changes to come to linux distributions in a long time. It boasts the ability to atomicupgrade and alter your OS by maintaining A/B roots of the filesystem. It is currently focused on docker and k8s runtimes, but we can use atomic in other locations.

Read more...