Emulex Blog: The Implementer's Blog

Interop 2012 – Emulex New Product Demos at the Emulex Booth

Posted May 8th, 2012 by Mark Jones

If you’re planning on attending Interop 2012 at the Mandalay Bay in Las Vegas, be sure to stop by the Emulex booth to see demonstrations of some of our newly announced products. You can find us at booth #1117 at the show, and it will be hard to miss since we will be displaying a Ducati motorcycle doing a wheelie in our booth, and we are giving it away! Of particular excitement for our Implementer’s Lab team are the demonstrations that we built that highlight our new OneConnect® OCe12000 10Gb Ethernet (10GbE) Network Xceleration™ solution line of products. These demos showcase the key performance benefits that each of the three new OneConnect Network Xceleration solutions have to offer.

FastStack DBL:
This demo showcases the low latency benefits of our new OCe12000 adapter combined with FastStack™ DBL™ software, which should be of interest to High Frequency Trading environments or anyone looking for the lowest possible Ethernet network latency. In our demonstration, we will be comparing the UDP and TCP latency of our network adapter when using the host network stack compared to FastStack DBL.

Fig 1. FastStack DBL Demo Screen

FastStack Sniffer10G
In this demonstration, we will show server-efficient 10Gb bandwidth and 100 percent lossless performance of the OCe12000 adapter with FastStack Sniffer10G software. This solution can provide network traffic capture, injection and analysis for performance-sensitive and mission-critical market segments, such as network surveillance, monitoring and analysis, deep packet inspection (DPI), test and measurement, and distributed denial-of-service (DDoS) defense appliances. Our demonstration highlights the performance aspect required of these missions by showing maximum 10GbE performance of more than 14 million packets per second, while only utilizing ~4.5% of the server CPU resources.

Fig 2. Sniffer10G Demo Screen

FastStack VideoPump
The third demo is a beta showing of our new FastStack VideoPump™ software that will be available later this summer. As the name implies, this product is targeted toward video streaming servers and appliances that require very high amounts of individual streams per adapter, while assuring predictable QoS. Our demonstration will showcase FastStack VideoPump’s extreme scalability and performance while maintaining low server CPU utilization. The demo uses 8 Network Interface Card (NIC) ports in a single server, communicating over 17,000 individual 3.5Mbit/sec traffic streams for an aggregate bandwidth of over 60Gb/s, all the while only using 25% of the server CPU resources.

Figure 3. FastStack VideoPump demo screen

If you would like a personal walk-through of these demos, please stop by the booth and ask to speak with me or anyone else from the Implementer’s Lab team. Also be sure to visit the Ethernet Alliance booth #2360 and ask to see Alex Amaya who is representing us in an industry-wide demonstration of various Ethernet technologies including our new OneConnect Network Xceleration solutions.

Are we up, or are we down?

Posted April 5th, 2012 by Alex Amaya

During our testing with HP’s ProLiant DL380 G7 server and HP’s 82E 8Gb Fibre Channel (8GFC) adapter, we encountered some connectivity issues with our internal infrastructure. With daily changes to our test lab infrastructure to accommodate the different tests we perform, there is always the possibility of something getting damaged along the way.

Deploying HP 8GbFC adapters with VMware ESXi 5.0 is a straightforward install since our Emulex lpfc820 drivers are already inbox . However, we did experience intermittent problems with our LUNs disconnecting and then reconnecting. With Emulex OneCommand® Manager vCenter Server plug-in, there is an option to track up and down link connectivity. This feature is not on by default. When enabled, we noticed our link status in the Tasks & Events tab from vCenter Server showing one of our ports disconnecting often. First, we tried replacing the SFP and we still experienced the intermittent disconnect. Next, we replaced the Fibre cable and the problem was solved. The description in the Task & Events tab will provide the WWWN of the Fibre Channel ports with a link down and up status. The image below illustrates the link up status after the cable was replaced.

For more information, check out the latest technical whitepaper from HP, which covers some of the features with ESXi 5.0. The deployment guide entitled, VMware vSphere 5.0: 8Gb/s Fibre Channel SANs with HP ProLiant DL380 G7 Servers and HP 3PAR Utility Storage, can be downloaded from the Implementer’s Lab

Why do I need hardware offloads, I have CPUs to burn!

Posted March 7th, 2012 by Mark Jones

It wasn’t that long ago that enterprise x86 computing was performed on single processor cores of just a few megahertz (Mhz). Getting data in and out of the computer was an expensive consumer of the processing resources. If you were serious about I/O, it made perfect sense to consider buying one of those fancy Host Bus Adapters (HBAs) that offloaded the I/O protocol processing to specialized processors made just for that, saving the computer processor to perform other general compute functions. But since then, processor technology has marched forward at a tremendous pace, processing speed has increased from a few MHz up to ~3Ghz, which is now the practical limit due to power/thermal efficiency issues. Multithreading, multi-cores and increased processor cache have also been big news in computing to the point where we now can have a tremendous amount of compute power in a very small space in the data center.
Why do I need hardware offloads? I have CPUs to burn!
This week, Intel announced availability of its new Xeon E5-2600 processor family, the platform codenamed “Romley” has a top model of which will be offered by server manufacturers with 16 physical cores and whole menu of other great technologies to improve performance and efficiency. So with all this new compute power, you may be thinking: “Why do I need hardware offloads? I have CPUs to burn!”

Wikipedia is the first place to look to put water on that fire. Moore’s Law (1) is famous for predicting the long-term relationship of the growth of compute power, basically the doubling of processor performance every 18 months. Related to this is Wirth’s law, (2) which states that “software is getting slower more rapidly than hardware becomes faster” or Gate’s law “the speed of commercial software generally slows by 50% every 18 months.” So no matter how fast hardware gets, the data center will evolve to find a way to consume all its resources through software.

If you have worked in a data center during this technology march in recent years, you have noticed that this compute power is getting packed more densely, and it’s possible to get hundreds, if not a few thousand cores into a single rack. This has shifted the data center problem from performance capacity to power and cooling capacity. It’s not about how many servers can fit in a room, it’s about what the maximum power and cooling capacity of the room is. You do not have to look too closely at Intel’s Xeon Processor E5-2600 product announcement before you notice that much of what they promote are features that deliver performance at efficient power levels and features to lower power consumption when not needed. Turbo Boost Technology 2.0 raises CPU performance (increases power draw) only when needed and reduces it when not needed. We have noticed in the lab that these power efficiency features have significant effect on the servers’ power consumption as measured at the AC power cord. For instance, in our Implementer’s Lab, we have measured over a ~110 Watt swing at the power cord between a server at idle to highly loaded (~80% CPU usage).

Emulex HBAs and converged network adapters (CNAs) offload I/O with low power processors that are specifically designed to efficiently process I/O protocols in a far more efficient way than a generalized system processor, and are complementary to the new power/performance efficiency features of the Xeon E5-2600 product family. By offloading the protocol processing from the server operating software stack, we lower the CPU load significantly, which causes the CPU to use power-saving strategies that results in far lower system power usage.

An example of this is with a server running VMware ESX5i and comparing realistic virtual machine (VM) I/O workloads to storage devices over an Fibre Channel over Ethernet (FCoE) network. You have a choice of using software FCoE over a 10Gb Ethernet (10GbE) Network Interface card (NIC) or using an Emulex CNA which will offload the FCoE protocol processing. Our test used four VMs with an equal load to storage of 35k I/O transactions per VM. We measured both the CPU used on the hypervisor and the AC input power usage of the server and found that the server used 53% of the server overall CPU resource while running the I/O using the software FCoE and just 23% when using the offload CNA. Saving 30% of the servers’ CPU resources is significant enough to trigger the servers’ power-saving strategies to use less power and this showed up on the computers’ input power measurements. At idle with no I/O workload running, the server was drawing 110W. While running the I/O over software FCoE, the server was drawing 167 watts. When running over our CNAs with hardware FCoE, it measured 129 watts. The server used 37 less watts to perform at the same performance level, which is significant power savings that can add up over time or when applied throughout the data center.
Remember…it takes energy to cook!
So the next time you get a new super-fast server and you are tempted to burn some of its CPU cycles on running software FCoE or software iSCSI, remember…it takes energy to cook!


(1) George E. Moore 1965, periodically updated by Intel: http://en.wikipedia.org/wiki/Moore’s_law
(2) Nicholas Wirth, 1995: http://en.wikipedia.org/wiki/Wirth%27s_law

How to Configure Universal Multi-Channel for Emulex OneConnect OCe11102 10 Gigabit Ethernet Adapters

Posted December 20th, 2011 by Emulex

By Roy Hughes, Emulex

A basic of what’s required to get the Universal Multi-Channel (UMC) feature running with Emulex OCe11102 10 Gigabit Ethernet Adapters.

The Emulex Universal Multi-Channel feature provides administrators the ability to partition an Emulex OneConnect OCe11102 Ethernet adapter into 8 logical ports at varied bandwidths for storage (iSCSI or Fibre Channel over Ethernet [FCoE]) and Ethernet traffic types. Each logical port has a unique MAC, VLAN and bandwidth attributes and is mapped to 8 PCIe functions (PF0 – PF7). Basically, operating systems and hypervisors see eight independent physical adapters. UMC does not require SR-IOV, meaning you can deploy UMC on most operating systems and hypervisors in use today. For hypervisors such as VMware, a UMC port can be provisioned for various connection types such as virtual machine (VM), vSphere vMotion, iSCSI, NFS and host management traffic types.

To get Universal Multi-Channel working:

UMC only works on Emulex OCe11102 Ethernet adapters. Download and load all drivers and firmware from the Emulex site here. Prerequisites:

  • Install Emulex OCe11102 10 Gigabit Ethernet (10GbE) adapter in a PCIe x8 slot.
  • Load latest driver, firmware and OneCommand Manager software.
  • Confirm software drivers loaded correctly.
  • Reboot server to access Emulex PXESelect Utility

On server restart invoke the Emulex PXESelect ™ Utility by pressing . Enable Multi-Channel Support then press , then Save configuration. From Port Selection Menu, select Controller and Port number Enable Administrative Logical Link, configure Bandwidth and assign logical port VLAN ID (LPVID).

Windows Device Manager shows eight Emulex OneConnect OCe11102-F 10GbE adapters.

    For a complete guide on how to deploy Emulex Universal Multi-Channel feature, check out our complete guide here.

    How to Enable NPIV for Emulex OneConnect UCNA Adapters Configured for FCoE

    Posted November 28th, 2011 by Alex Amaya

    Recently, I was asked how to enable N_Port ID Virtualization (NPIV) for our high-performance Emulex OneConnect 10Gb Universal Converged Network Adapters (UCNAs) configured for Fibre Channel over Ethernet (FCoE) Searching through the Emulex documentation pages as the requester did, I was also unable to locate any information on this configuration. I didn’t think this could be any more difficult than configuring Fibre Channel, so I thought I’d take a stab at it. A Microsoft Windows Server 2008 host was used with an Emulex OneConnect OCe10102 adapter, with Emulex OneCommand Manager 5.2.12.1 and 5.2.12.2 for one FCoE port. Since our adapters have two ports, you would perform the steps below for the second port.
    Here we go:

    1. Open OneCommand Manager and select “View” from the drop down menus and select “Group” by adapters
    2. Select the FCoE port
    3. Select the Driver Parameters tab
    4. From the Adapter Parameter, left mouse click once to select Enable NPIV
    5. Select “Enable” from the Modify Adapter Parameter. This will make the Adapter Parameter turn red, requiring a reboot. Because it will only enable one port, a reboot will also be required for the second port.
    6. Select “Apply” and reboot the server
    7. When the server comes back up, login to your Windows server and open OneCommand Manager
    8. Select “View” then “Group Adapters by Virtual Port”
    9. Select the FCoE port and you should now be able to create your virtual ports
    10. Select “Create Virtual Port” and a new virtual port confirmation window will appear
    11. As shown in the image below, the new virtual port will appear just below the physical port

    I hope this helps. If you still have questions, please contact Emulex technical support.

    Installing or Updating Emulex Drivers on VMware ESXi 5.0

    Posted November 22nd, 2011 by Alex Amaya

    Most likely, you are not surprised to hear that VMware ESXi 5.0 users no longer have access to a Service Console. You may have also noticed that there are several new features and changes. One change is the install procedure to manually update or install Emulex drivers. Most of the Emulex drivers are inbox drivers and will need to be updated whenever a new version is released. I’d like to share the process for updating your Emulex drivers in this blog post. Other options you may wish to consider are auto deploy, or using the VSphere Management Assistant (vMA) appliance.
    Here are the steps to updating your drivers:

    1. Login with your VMware vSphere Client to vCenter Server.
    2. Select the host you want to update or install new drivers.
    3. Go into Tech Support Mode to enable SSH. It is a simple task to perform: Highlight the host-> select Configuration Tab -> then select Security Profile from the software table of contents. Highlight TSM-SSH then Properties. Once you enable SSH, a warning symbol will appear to let you know your host is no longer secure. See VMware KB 1016205 & 2003637.
    4. From your Windows or Linux client, download the Emulex driver for the adapter and store it in a temporary location.
    5. From your Windows or Linux client, run a program such as WinSCP for Windows and move the driver you downloaded from VMware’s website to the ESXi host. I prefer to place the Emulex driver in the /var/log/vmware directory.
    6. Next, SSH into the ESXi 5 host by using a tool called putty.exe
    7. Once logged in, run the following commands to install the driver:
      1. # esxcli software vib install –no-sig-check –maintenance-mode -d
      2. Example: #esxcli software vib install –no-sig-check –maintenance-mode –d Emulex-FCoE-FC-lpfc829-8.2.3.108.36-offline-bundle.zip
    8. Reboot the host to activate the new or updated driver
    9. If for some reason you need to remove the driver, execute the following esxcli command: # esxcli software vib remove –n –f

    We hope this helped you learn how to install or update Emulex drivers on VMware ESXi 5.0. If you have any feedback or questions, comment here or contact Emulex Technical Support.

    How to optimize your SAN resource utilization and save $200K per year!

    Posted November 1st, 2011 by Erick Crowell

    Administrators are seeking creative ways to get more from their existing infrastructure. A recent survey of CIOs reveals what many already know; IT budgets are stagnant or shrinking. At a time of explosive growth and increased demand for performance, organizations are being pushed to innovate to survive. Given the limited ability to grow, administrators look to optimize existing resources, squeezing out performance, to help them to meet demand.

    One strategy involves auditing to find unused or underutilized SAN-attached storage, which is something Emulex OneCommand Vision does (which we announced version 2.0 of today at SNW Europe, check out our announcement here). Inactivity on a LUN, for example, is an indication that an applications demand for storage may be changing. Each SAN-attached LUN represents a ‘chunk’ of infrastructure dedicated to a particular compute resource, such as a server. As the demand for SAN-attached resources rises, the opportunity cost of letting underutilized resources remain in place rises.

    Auditing for underutilized resources at the current storage tier allows administrators to reprovision costly infrastructure, moving resources to alternative storage tiers or retiring them altogether. Repurposing allows organizations to avoid or defer the costs to acquire additional capacity. Reclaiming as little as two percent of the storage infrastructure can save nearly $200,000 per year for a mid-sized SAN deployment.

    Let’s consider the numbers. What does it cost to provision SAN-attached storage to your application or database server? To arrive at an answer we chose ‘street prices’ for equipment typically found in a ‘mid-sized’ SAN deployment (500 servers).

    The general costs are:

    • Plumbing the storage network between the server and storage array (network adapter, multi-tier network/fabric), about $8K for two redundant paths
    • Fault-tolerant SAN-attached 4 TB LUN, approx. $8K
    • Storage Management software, approx. $4K

    That’s about $20K to connect that super fast, highly available storage to each of your application or database servers.

    Multiply that across all 500 SAN-attached servers in this ‘mid-sized’ environment and the total SAN infrastructure cost totals about $10 Million. Organizations that can repurpose just 2% of the previously under-utilized infrastructure can save about $200K a year, in deferred infrastructure costs.

    These innovative organizations look for this kind of ‘low-hanging fruit’ as they struggling to maximize their limit budgets. What strategies have your organization tried? Please take a moment and share your experience in the comments below!


    ¹New C4 Agenda: Perspectives and Trends from State Government IT Leaders, sponsored by TechAmerica, the National Association of State Chief Information Officers (NASCIO), and Grant Thornton LLP.

    Error Message Resolution for Emulex OneCommand Manager VMware vCenter Plug-in v1.1

    Posted October 17th, 2011 by Alex Amaya

    Emulex recently released OneCommand Manager for VMware vCenter Server 1.1 to support the release of VMware ESXi 5.0 (download it here and try it out!). Our technical marketing team has created an application note to help those who run into a privileges error when trying to register the plug-in.

    Here’s what we found: After installing Emulex OneCommand Manager for VMware vCenter Server 1.1, you need to register the plug-in. Unfortunately, as you try to do so, this pop-up window appears as shown here:

    It’s easy to resolve, and only takes minutes. Click here to view the complete application note, Error Message Resolution for Emulex OneCommand Manager VMware vCenter Plug-in v1.1.

    16GFC: Much more than a speed bump

    Posted October 10th, 2011 by Mark Jones

    In preparation of the release of our latest Host Bus Adapter (HBA) – the LightPulse® 16Gb Fibre Channel (16GFC) – we took a look at its performance and were pretty amazed by the results compared to our previous gen product. The LPe16002 is of course capable of running 16GFC, so you would expect performance to be about double that of 8GFC, but then again, when you dig into the details of the 16GFC spec, you may be disappointed to find out that your data bits aren’t actually flying over the wire at double the speed compared to 8GFC. 16GFC actually runs at 14.025Gbp/s baud rate where 8GFC runs at 8.5 so it’s slower right? Wrong! The designers of the specification did a clever thing when they came up with 16GFC. All previous speeds used a 8b/10b encoding scheme, meaning that for every 10 bits flying over the wire, 8 of them are data and 2 are used to make sure the data is correct, so only 80% of the bits are your data. For 16GFC, they changed the encoding to a much more efficient 64b/66b scheme, so much less of the bits are wasted for coherency, and a bigger chunk of it is your data. So the bottom line is that 16GFC link rate delivers twice the data deliver over 8GFC.

    But saying the new LPe16002 HBA can deliver twice the performance over 8GFC HBAs is the expectation, but there is much more to the story. So sure, as you can see in figure 1, the 16GFC LPe16002 HBA is capable of 1576MB/s compared to 789MB/s for our 8GFC LPe12002 HBA, almost exactly double. The LPe16002 is the first HBA with an 8-core processor and can deliver performance that is actually 5x that of previous adapters.

    Figure 1.

    Max I/O per second (IOPS) for a single port of the LPe16002 is just over 1 million! This is one area where the 8-core processor shines, it has the ability to crank out millions of I/O processing instructions when the data size isn’t too burdensome. At this small data block size that we measure, this kind of performance is all about the HBA processor performance and not the link rate, this HBA would be capable of 1 million IOPS even at 8GFC…..that’s right, you can get this kind of IOPS without having to buy a new 16GFC switch.

    Figure 2.

    Ok, I know what you’re thinking….”My application doesn’t use 512 byte data blocks”. Good point, the fact is that the highest transaction applications actually use mostly 4k or 8k data blocks and that area is another measure of the muscle that the LPe16002 has. Most HBAs cannot reach the link rate maximum until the data blocks get very large, usually around 16k data block sizes, so I/O transactions for database applications can be limited by the HBA processor and can’t get close to the Fibre Channel link rate ceiling. This is not the case with the LPe16002, because its massive power allows for near link rate performance at 4k and 8k data blocks.

    Like previous generations of Emulex HBAs, they are backwards compatible and leverage a common driver stack. This means that you can buy a LPe16000 HBA today and plug it into your current server alongside our previous products and connect the card into your current 4 or 8GFC switch. You will not be able to utilize the full size of a 16GFC link until you upgrade your switch, but that doesn’t mean you won’t see other massive performance benefits like lower I/O response times and higher transaction rates. To see proof of this and a full performance demo, see the LPe16000 performance demonstration video.

    Figure 3.

    Sometimes you need to see the protocol conversation to understand and solve SAN performance issues

    Posted October 5th, 2011 by Erick Crowell

    This is the second installment in a series of blogs that will discuss SAN performance monitoring and troubleshooting.

    Consider this situation faced by one of our ‘large financial’ customers with a complex SAN environment running critical trading applications. This customer had been experiencing periodic performance problems with a Windows cluster running a critical business application. In a nutshell, application I/O was taking too long, internal timers would expire and the application would shut down.

    After each occurrence, administrators would quickly stabilize the application, balancing the need to collect information about the issue with the requirement to minimize the amount of money the company was losing. Many trouble tickets where opened and symptoms were well-understood, but a root cause could not be found using the tools available to the team.

    Eventually, vendors were asked to help prove that their equipment was not at fault. One by one, each vendor used their own management applications to demonstrate that no faults existed in their equipment. Finally, under the guidance of Emulex Technical Support, the customer enabled extended logging on the servers experiencing the slowdown. This extended logging allowed for the typically hidden Extended Link Service (ELS) and SCSI protocol events captured by the Emulex adapters to be collected in a LOG file. These low-level events are processed by the adapters and driver software but are not reported and are certainly not available through any native OS API or event interface.

    After a brief review of the protocol events collected, the teams identified an unexpected pattern: the storage target was repeatedly sending Port Logout (LOGO) commands to each server in the Windows cluster. This LOGO command would cause all outstanding I/O operations to be cancelled, which would require each server to login (again) and re-send outstanding I/O. Although some number of I/O would complete, the net effect was to dramatically slow the overall performance, pushing the application past an internal limit.

    After some searching using similar methods, the customer was able to identify a separate Linux machine, connected to the same target array, that was periodically sending ‘Target Reset’ commands to the storage array. Each time the storage array received one of these ‘reset’ commands it would, in-turn force the ‘Logout’ of all attached initiators. In the end, the customer was able to isolate the Linux server, thereby removing the source of the problem and resolving the application performance issue.

    So why couldn’t other tools already in the environment detect this problem? To put it simply, from the perspective of device and SAN management tools, there was no problem. Storage links remained active, devices along the storage path were on-line with plenty of capacity, and I/O that made it to back-end storage devices was returned in a timely fashion. None of these tools monitor the end-to-end protocol conversation, which explains why these applications missed the problem in the first place.

    Tools designed to collect, consolidate and analyze the information from each of the initiators in your environment help reveal hidden performance and health problems that other applications miss. Emulex offers this power in a tool we call OneCommand Vision. I invite you to cruise over to our landing page to learn more.

    «Older Posts