c++ - std::atomic with multiple threads - can accesses happen simultaneously? How fine-grained is it? - Stack Overflow

admin2025-04-17  3

I can't understand from the documentation what happens if many threads access an std::atomic for read/read or write/read:

std::atomic<int> counter = 0;

void fn()
{
    while(true)
    {
        ++counter;

        if(counter > 1000)
            break;
    }
}

This fn() will executed in two threads. Thread 1 executes ++counter; and thread 2 at same time wants to read counter here: if (counter > 1000).

Will thread 2 wait to read counter until thread 1 finishes incrementing?

What if both threads want to read counter inside if (counter > 1000)? Will they read at same time or will one thread lock variable next read next release only then other thread can read?

I can't understand from the documentation what happens if many threads access an std::atomic for read/read or write/read:

std::atomic<int> counter = 0;

void fn()
{
    while(true)
    {
        ++counter;

        if(counter > 1000)
            break;
    }
}

This fn() will executed in two threads. Thread 1 executes ++counter; and thread 2 at same time wants to read counter here: if (counter > 1000).

Will thread 2 wait to read counter until thread 1 finishes incrementing?

What if both threads want to read counter inside if (counter > 1000)? Will they read at same time or will one thread lock variable next read next release only then other thread can read?

Share Improve this question edited Jan 31 at 2:07 Peter Cordes 368k49 gold badges717 silver badges981 bronze badges asked Jan 30 at 16:57 vovavova 512 bronze badges 7
  • 1 Atomic is atomic, read-modify are happen at once. No wait or lock. – 3CxEZiVlQ Commented Jan 30 at 17:01
  • 1 This might help you out: amazon.com/dp/1933988770 – NathanOliver Commented Jan 30 at 17:04
  • @3CxEZiVlQ Waiting definitely can happen, especially if is_lock_free() == false. And even for lock-free atomic it still takes some time to propagate change from caches in one processor core to caches in others. – Yksisarvinen Commented Jan 30 at 17:05
  • @Yksisarvinen I doubt that a specific hardware is used in this basic example and std::atomic<int>::is_always_lock_free() is false. And later you describe hardware details. That can be told about any hw instruction so it does not worth to mention that about atomics specifically. – 3CxEZiVlQ Commented Jan 30 at 17:12
  • if counter was not atomic then already counter++ would be undefined when called from 2 threads. Beyond that std::atomic doesnt do more magic – 463035818_is_not_an_ai Commented Jan 30 at 17:28
 |  Show 2 more comments

2 Answers 2

Reset to default 7

Thread 1 execute ++counter; and thread 2 at same time want read counter here: if (counter > 1000). Will thread 2 wait to read counter until thread 1 finished incrementing?

It sounds like you're asking whether ++counter and the subsequent if (counter > 1000) are atomic as a whole, and the answer is no. The only guarantee is that ++counter is thread-safe and reading counter later is thread-safe. This means that between thread 2 doing ++counter and thread 2 reading counter, thread 1 could modify counter. Neither thread waits for anything.

Also note that you can easily improve the code as follows:

if (++counter > 1000)

This atomically increments counter and obtains the updated value. See also std::atomic::operator++ on cppreference.

However, if you're asking if thread 2 could read counter between thread 1 obtaining the value of counter during ++counter and writing the incremented value, then no. The increment is atomic, as the name states, and it happens as a whole instantly from other thread's perspectives. This atomicity could be ensured by using locks, or through an atomic instruction like lock add (see also How "lock add" is implemented on x86 processors).

What if both threads want to read counter inside if (counter > 1000)? Will they read at same time or will one thread lock variable next read next release only then other thread can read?

Atomics aren't guaranteed to use locks at all, and you can test for this with std::atomic::is_lock_free. ++counter would lock once, and reading counter subsequently would lock separately. Your code might just be a bit slower if a lock is involved, but it doesn't change anything about its behavior, and it's definitely thread-safe.

However, it is fair to say that if the atomic uses locks, then the threads definitely won't "read at the same time".

Assuming you compile this program for a normal mainstream CPU with a normal compiler, counter will be lock-free so the behaviour just depends on MESI hardware cache-coherency requests for ownership (E or M state) of the cache line containing it.

Inter-core latency depends on the hardware; often in the 30 to 100 nanosecond timescale. If the hardware lets one core keep ownership of the cache line for long enough to do multiple increments, they'll happen faster, like a few nanoseconds (e.g. about 20 clock cycles for back-to-back atomic RMWs without contention on a modern Intel CPU; https://uops.info/ - look for lock add or lock inc).

You load separately instead of using the ++ return value, so you're also effectively doing counter.load(memory_order_seq_cst) in the implicit cast-to-int member function of std::atomic<int>. Read access contends with writes, but multiple readers can have the cache line in Shared state simultaneously.

See also Is incrementing an int effectively atomic in specific cases? for more about atomic RMW on real CPUs, and about pure-loads and pure-stores.

Even if counter wasn't lock-free, each operation (++ and load) would be a separate lock/unlock. (Where is the lock for a std::atomic?).

Since you use the default memory_order_seq_cst instead of counter.fetch_add(1, std::memory_order_relaxed) for example, and/or because there's only one shared variable, counter.is_always_lock_free() == false wouldn't change much about the possible behaviours of your program. (Locking would be slower, though, especially with the unnecessary separate read of counter.)

Separately loading an atomic variable after incrementing it is often a correctness problem, e.g. if you want the set of all numbers between 0 and 1000 to be observed exactly once across the threads doing increments. In this case you aren't doing anything besides checking for an exit condition. (So taking a lock and letting one thread do all the increments while the other waits would be much faster, vs. ping-ponging the cache line back and forth. Because your example doesn't do anything with the counter value.)

转载请注明原文地址:http://www.anycun.com/QandA/1744903810a89259.html