Questions from the Compute Express Link™ (CXL™): Supporting Persistent Memory Webinar
By Mahesh Natu and Thomas Won Ha Choi
The recent webinar on “Compute Express Link™ (CXL™): Supporting Persistent Memory” explored how the CXL specification has evolved to support persistent memory devices and the established software model. The webinar also covered how enhancements to the CXL protocol, error handling and standardized configuration interface enable innovative designs that are based on a variety of non-volatile media and form factors. If you missed the live event the webinar recording is available on BrightTALK and YouTube. The presentation is also available for download here on the CXL Consortium’s website.
We received great questions during the Q&A portion of the webinar and have shared a recap of the questions and answers discussed during the webinar below.
Q: Will this only work on a PCIe® interface? If so, how does a CXL device access persistent memory? As far as I understand, persistent uses a DDR interface such as NVDIMM.
CXL uses PCIe physical interface, connectors and form factors but it layers a different protocol. The CXL.mem protocol can be used to access persistent memory. Today, you can use the DDR interface to access NVDIMM, which is similar to how CXL.mem allows access to persistent memory over the PCIe physical interface.
Q: Compared to the DDIMM solution, what are the pros and cons of the CXL-based memory solution, in terms of persistent memory?
If you’re referring to DDIMM it usually refers to the DRAM. I believe that the CXL-based memory solution provides a lot more flexibility and you will be able to use a lot of the existing infrastructure in the PCIe. On the downside, some latency optimizations may vary depending on different situations.
Q: Persistent memory includes NVVDIMMS as media, not just Intel’s Optane memory, correct?
Persistent memory can use different types of media and is not specifically tied to media such as those used in NVDIMMS or Intel’s Optane. When we say CXL persistent memory, it can have different types of media and different types of controller designs all attached to CXL and at the end provide persistent storage and other characteristics mentioned in the presentation.
Q: Can you expand upon the architectural elements in the persistent memory configuration interface?
As mentioned during the presentation, a persistent memory device requires quite a bit of management and hand-holding from the system software. The interface defines a standard way that the OS connected to the device can manage its operation. The configuration interface is described in more detail in the CXL 2.0 specification.
Q: How does persistent memory compare to Single Logical Device (SLD)/Multiple Logical Devices (MLD)?
Single Logical Device and Multiple Logical Devices can have persistent memory. Therefore, a persistent memory device can be an SLD or a memory pool MLD that supports persistent memory. They can also work together to provide new system design options.
Q: What are typical latencies when persistent memory is used through a CXL switch when interleaving?
A lot of it would depend on the individual switch design, the number of layers of switches and the latency that it introduces. Also, to a great extent, the persistent memory media. The different media types that are used in persistent memory designs tend to have very different latency and bandwidth characteristics. It is hard to provide an exact number but the CXL protocol is transactional in nature so it can tolerate longer latency and address more variables in latency. Generally, latency to persistent memory is higher than volatile memory because the media latency is often high. However, moving it behind CXL and latency added through the switch doesn’t seem to affect the performance too much and is comparable to today’s persistent memory design.
Q: What is the standard register interface?
It is similar to the NVM Express register interface. The register interface that a CXL device implements help software such as UEFI firmware and OS driver manage the device using a standard mechanism. More details can be found in sections 8.2.8 and 8.2.9 in the CXL 2.0 specification.
Q: Is there open-source driver support (Linux inbox) for persistent memory? Is there a sample reference driver?
Yes, there is support being added to Linux to support both the CXL bus interface and memory interface. There is a mailing list to keep track of the progress and make contributions.
Q: What is the preferred form factor for CXL persistent memory?
CXL does not have a preferred form factor. A key benefit of CXL is that it defines the protocol and is form factor agnostic, which allows system vendors and device vendors to innovate and pick form factors that provide them with what they need.
Q: You mentioned Non-Fatal Errors but not Fatal Errors. Can you share what happens if a Fatal Error occurs?
If the device has Fatal Errors that are exposed to the system, the CXL specification defines a capability called “Viral” to handle fatal errors and create error containment. A device that is experiencing fatal errors can communicate with the rest of the system that it has a fatal error and, if it’s able, the entire system can go into the mode where it’s going to stop committing any further data to the persistent memory to prevent further corruption. For additional information, please download the “An Overview of Reliability, Availability, and Serviceability (RAS) in Compute Express Link™ 2.0” white paper.
Q: Is a Poison list similar to a bad block list that a host uses to create a list of unusable addresses?
Yes, that is correct. It is similar to a bad block list except the granularity will be smaller than a block. In persistent memory, it can be measured by cache line.
Q: Can you give an example of a poison that would affect persistent memory performance?
Usually, the poison location is unusable, and the host can decide how the poison will be handled. In this case, where poison is detected within the device, without the host interfering, some internal management may be needed to handle the poison. There may be a little bit of a performance impact, but we do not expect it to be significant. There may be more cases that we have not explored in implementation. This will be a great question to address later down the road and share different implementation examples.
Q: Can you explain what occurs during a dirty shutdown event?
There can be a time out or power issues when the GPF flow happens and so there may be multiple cases in the time out.
Compute Express Link™ and CXL™ Consortium are trademarks of the Compute Express Link Consortium.