Cadence Supports NVMe
posted by Bryon Moyer
Last year, a new standard was overlaid on PCI Express (PCIe) to reset the way non-volatile memory (NVM) is accessed. To date, solid-state disk (SSD) access methodologies had been modeled around the existing mechanisms and limitations surrounding “spinning media” – hard drives. As solid-state memories start to proliferate in roles that used to be dominated by hard drives, those limitations and mechanisms change.
The new standard that accomplishes this is called NVM Express (NVMe), and it uses the basics of PCIe to handle moving the data around, since that’s often how these memory subsystems are connected to the CPU subsystem. But the higher layer adapts PCIe to a specific NVM context.
The standard sets up submission and completion queues – up to 64K of them, each of which can hold up to 64K 64-byte commands. Features include:
- End-to-end data protection
- No uncacheable memory-mapped I/O register reads in either the submission or completion path
- No more than one memory-mapped I/O write to submit a command
- Queue priority and arbitration
- Ability to do a 4K-byte read in a single 64-byte command
- A small basic command set (Read, Write, Write Uncorrectable, Flush, Compare, Dataset Mgmt)
- Support for interrupt aggregation (including message-signaled interrupts)
- Multiple namespaces – a device can be decoupled from a “volume”
- Support for I/O virtualization (like SR-IOV)
- Error reporting and management
- Ability to support low-power modes
There are register sets for:
- Declaring what a particular controller supports
- Device failure status
- Configuring an admin queue for managing I/O queues
- Doorbell registers for submission and completion queues
Cadence just announced their NVMe IP offering, which is based on their existing PCIe IP; the NVMe layer is new, along with the firmware needed to support it. They’ve optimized the underlying PCIe implementation for this particular context, making the overall implementation smaller. They’ve merged the APIs up to the top level so that there is one interface regardless of which layer might be accessed by any given operation. They’ve also coordinated their DMAs for smoother operation and less contention.
They’ve hardware-accelerated the basic commands; the command set itself can be extended through the firmware.
The PCIe PHY is hard IP; the rest is RTL and firmware. They’ve got a tool to configure the IP via an XML description that describes the configuration to their implementation tools.
You can find out more about Cadence’s NVMe IP in their announcement.
Graphene Quilts
posted by Bryon Moyer
A while back we looked at wide bandgap materials like GaN when used for power devices, but, along with power comes heat and the need for it to be dissipated. GaN isn’t great for that; the old sapphire substrates were very bad, and newer (and more expensive) SiC substrates are better but not sufficient, according to researchers at UC Riverside.
Metal is often used as a heat sink, but its ability to do so in very thin films dissipates because the main “mobile” element is the electron. In graphene, however, the main dissipative component is the phonon – essentially, the ability to carry the crystal vibration efficiently, and this remains effective even for just a few layers.
So they built transistors with “quilts” of graphene on the actual transistor drains, connecting it to graphite heat sinks (although they say other heat sinks are possible too). These can be more effective because they insinuate themselves deep into the circuit without substantially disrupting the topology due to their use of “few-layer graphene” (FLG).
Of course, today, graphene is typically obtained by flaking graphite – something of an inexact science. They see graphene growth as a reality in the future, which would make the whole process a bit more deterministic.
You can find out more and get access to a paper published in Nature here…
Powering Up Power Analysis
posted by Bryon Moyer
Apache has just released their latest RedHawk version, RedHawk-3DX. In it they’ve focused on areas of growing importance for power: 3D ICs, working at the RTL level, and scaling up the size of sub-20-nm designs.
Power is of particular concern for 3D ICs because of the fact that a “cube” of silicon is much harder to cool than a plane. And it’s not a monolithic cube; it’s a bunch of interconnected planes that can become detached if you’re not careful. Even the TSVs can be problematic.
They’ve allowed concurrent analysis of each die and the interconnects and TSVs, with the ability to view each piece separately to see where the hot spots and physical stresses are. And they’re not just calculating heat or power; they are determining physical stresses as well. They do this with models, not with full finite-element (FE) analysis, although the models themselves may be created through more accurate FE methods.
RTL-level analysis is important for debug reasons. Most analysis is now done at the gate level, but most designers won’t have vectors at the gate level; only at the RTL level. And if problems are found at the gate level, it’s hard to debug them since that’s not the level designers work at.
So they now have the logic propagation technology in place to support RTL-level analysis with vector inputs. Vectorless analysis is also possible at the RTL and global levels; this is where you specify approximate transition frequencies on pins, and then probabilities (instead of actual events) are propagated to perform the analysis.
For scaling purposes, they have enabled hierarchical analysis, allowing different blocks to be analyzed independently, creating something akin to a bus-function model, where the periphery of the block is accurate while the internals aren’t. That way you can plug the blocks together to see how they interact and still complete the analysis in a reasonable time. A full chip can thus be analyzed with blocks done with or without vectors, at the gate or RTL level; you can mix and match.
There are lots of other details that you can get to via their announcement.