Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
840 views
in Technique[技术] by (71.8m points)

rust - Is it safe to logically split a borrow to work around a limitation of the NLL-enabled borrow-checker?

The following code involves a very subtle bit of borrow checker dodging. The code itself describes the reasoning. The questions:

  • Is this actually safe?
  • Is this the recommended way to express the unsafe operations performed? Should I use pointers instead?
  • Will the new Polonius borrow checker be able to reason about patterns like this?
/// Insert a new data element at a given key.
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    // Safety: As indicated below, we would like to return val1 directly in the loop,
    // but rust will reject this, claiming a double borrow, and we instead use some
    // unsafe hacks to circumvent the borrow checker. To show this is safe, consider
    // two cases.
    // - If the return is exercised (we found an element and early out):
    //   - let 'b be 'a (the borrow of self),
    //   - and let 'c be empty
    // - Otherwise (we did not find an element, exit the loop and terminate normally):
    //   - let 'b be the duration of the loop,
    //   - and let 'c be from the end of the loop until the end of 'a
    // In either case, 'b and 'c are disjoint, so the "double borrow" is safe.
    // The borrow checker reasons that 'b has to be at least 'a because it is returned,
    // and therefore it overlaps with 'c, but these happen in mutually exclusive
    // situations.
    for (key1, val1) in & /* 'b */ mut *this {
        if key == *key1 {
            // return val1; // we would like to write this
            return unsafe { // safety, see above. We know we are in the first case, so 'b = 'a
                std::mem::transmute::<&/* 'b */ mut V, &/* 'a */ mut V>(val1)
            }
        }
    }
    let this = & /* 'c */ mut *this;
    this.push((key, val));
    &mut this.last_mut().unwrap().1
}

This is what I'd prefer to write:

/// Insert a new data element at a given key.
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    for (key1, val1) in &mut *this {
        if key == *key1 {
            return val1;
        }
    }
    let this = &mut *this;
    this.push((key, val));
    &mut this.last_mut().unwrap().1
}

but it fails:

error[E0499]: cannot borrow `*this` as mutable more than once at a time
 --> src/lib.rs:8:16
  |
2 | pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
  |               -- lifetime `'a` defined here
3 |     for (key1, val1) in &mut *this {
  |                         ---------- first mutable borrow occurs here
4 |         if key == *key1 {
5 |             return val1;
  |                    ---- returning this value requires that `*this` is borrowed for `'a`
...
8 |     let this = &mut *this;
  |                ^^^^^^^^^^ second mutable borrow occurs here
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As stated in the related questions, the easiest thing to do is to use an index instead as it requires no unsafe code. I might write it like this:

pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    let idx = this
        .iter()
        .enumerate()
        .find_map(|(i, (k, _))| if key == *k { Some(i) } else { None });

    let idx = idx.unwrap_or_else(|| {
        this.push((key, val));
        this.len() - 1
    });

    &mut this[idx].1
}

You should perform benchmarking to know if this is not fast enough for some reason. Only in that case should you opt in to unsafe code to get the last bit of speed. You should then benchmark again to see if the code is measurably faster.

For example, you might be able to get the speedup by using slice::get_unchecked_mut instead of &mut this[idx].1, which is a much easier bit of unsafe code to rationalize.

The nice thing about using indices in our safe code is that they directly translate into pointer offset logic. We can take this safe example and make minimal modifications to it to get a version using unsafe code:

pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    // I copied this code from Stack Overflow without reading the surrounding
    // text which explained why this code is or is not safe.
    unsafe {
        let found = this
            .iter_mut()
            .find_map(|(k, v)| if key == *k { Some(v as *mut V) } else { None });

        match found {
            Some(v) => &mut *v,
            None => {
                this.push((key, val));
                &mut this.last_mut().unwrap().1
            }
        }
    }
}

The main points of safety revolve around the pointer to the value in found. It started as a mutable reference, so we know that it was valid when we were iterating. We know that find_map stops iterating on the first Some, and we know that iterating using iter_mut() shouldn't change our values anyway. Since this cannot change between the binding of found and the usage of it in the match, I believe that this piece of code is safe.

It's always valuable to exercise your code through Miri. You must actually exercise the code, as Miri only flags code that causes undefined behavior, ignoring any dormant code paths. This code is Miri-clean:

fn main() {
    let mut things = vec![(1, 2), (3, 4)];

    let v = insert(&mut things, 1, 2);
    println!("{} ({:p})", v, v);

    let v = insert(&mut things, 1, 2);
    println!("{} ({:p})", v, v);

    let v = insert(&mut things, 5, 6);
    println!("{} ({:p})", v, v);

    let v = insert(&mut things, 5, 6);
    println!("{} ({:p})", v, v);
}
2 (0x2829c)
2 (0x2829c)
6 (0x41054)
6 (0x41054)

Is [the original implementation] actually safe?

Miri reports no issues for the same test code I used above, and I don't see anything obviously wrong.

Is this the recommended way to express the unsafe operations performed? Should I use pointers instead?

When it's possible to avoid mem::transmute, it generally should be avoided. transmute is The Big Hammer and can do quite a lot of things that you might not intend (changing types is a key one). Using pointers feels much simpler in this case.

I agree with the usage of a comment to demonstrate why the unsafe code is safe. Even if it's wrong it still demonstrates the mindset of the original author. A future reviewer may be able to say "ah, they didn't think about checklist item #42, let me test that!".

Specifically for the reasoning in your comment, it's overly dense / academic to me. I don't see why there's talk about multiple lifetimes or double borrows.

Will the new Polonius borrow checker be able to reason about patterns like this?

Yes:

% cargo +nightly rustc --
   Compiling example v0.1.0 (/private/tmp/example)
error[E0499]: cannot borrow `*this` as mutable more than once at a time
 --> src/main.rs:8:16
  |
2 | pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
  |               -- lifetime `'a` defined here
3 |     for (key1, val1) in &mut *this {
  |                         ---------- first mutable borrow occurs here
4 |         if key == *key1 {
5 |             return val1;
  |                    ---- returning this value requires that `*this` is borrowed for `'a`
...
8 |     let this = &mut *this;
  |                ^^^^^^^^^^ second mutable borrow occurs here

% cargo +nightly rustc -- -Zpolonius
   Compiling example v0.1.0 (/private/tmp/example)
    Finished dev [unoptimized + debuginfo] target(s) in 0.86s

% ./target/debug/example
2 (0x7f97ea405b24)
2 (0x7f97ea405b24)
6 (0x7f97ea405ba4)
6 (0x7f97ea405ba4)

See also:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...