cpu architecture - How does Load Store Queue work in the presence of MSHR?

Question

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

I understand the basic working of load-store queue, which is

when loads compute their address, they check the store queue for any prior stores to the same address and if there is one then they gets the data from the most recent store else from write buffer or data cache.
When stores compute their address, they check load queue for any load violations

My doubt is what happens when

In the first case when the load access data cache due to some unresolved store addresses in the store queue and the access is miss in L1 data cache and before the data can be retrieved from the cache, the store address resolves. Now, the store does load queue checking for any violations. The dependent load has already accessed the data cache prior but didn't receive the value from cache yet due to long latency miss. Does the store post load violation or does it do store-to-load forwarding and cancel the data from cache?
When load access miss in the l1 data cache, then the loads are placed in MSHR so as to not block the execute stage. When the miss resolves, the MSHR entry for that load has information regarding destination register and physical address. So the value can be updated in the physical register but how does the MSHR communicate with load queue that the value is available? when does this happen in the pipeline stage? Because I have read somewhere that MSHR store physical addresses and Load-store queue store virtual addresses. So how does MSHR communicate with LSQ?

I haven't found any resources regarding these doubts.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:29:31+0000

This is speculative execution where loads bypass older stores. When the older store is resolved, we can throw a load violation. If the probability of address aliasing is low then speculative execution is profitable (more throughput) - typically should be true for programs. On detecting a load violation, we can take appropriate step - (a) store-to-load forward, or (b) rollback pipeline to the resolved store.
Same as when loads are served via cache hits (that can take 1-3 cycles for a L1 hit). For example in a reservation station with a CDB (common data bus), the result will be shared with all HW structures that need it.