HECToR
(High-End Computing Terascale Resource)

2007–2014

HECToR was the UK's national supercomputing service. 

HECToR's job was to solve the previously unsolvable, its speed allowing it to model real-world situations from complex financial markets to intricate effective treatments for diabetes and Parkinson's disease.

Another key role for HECToR was training postgraduate students in code development and optimisation as part of the UK's efforts to become a world-leader in the sector.

WORKING WITH HECToR

ANDY TURNER, EPCC

"HECToR spanned the multi-core revolution, moving from dual-core to multi-core. It also saw the rebirth of parallelism."

"During the life of the service, memory-per-core was reduced, but the cores-per-node increased. Users had to adapt to these changes and the dCSE programme was central to equipping codes for the changeover.

“HECToR's reliability was initially quite shaky. This was the first large Cray XT to be installed outside the USA, and it used an experimental architecture that had evolved from the Red Storm collaboration between Cray and the US government. Cray continued to develop this hardware and the XE used in HECToR Phase 2B was much more reliable.

“HECToR paved the way for Cray’s return to the UK and was followed by systems at the Met Office and ECMWF.

“The flexible contract with Cray allowed HECToR to be upgraded in phases, each guided by users’ evolving needs, giving researchers the chance to explore the best new technologies. The XT and X2 phases were the only academic systems in UK/Europe to use the Black Widow vector processor. This hardware did not prove to be useful and was replaced.


RESEARCH: HIGH CONDUCTIVITY OF LEAD DIOXIDE

Lead-acid batteries are able to deliver the very large currents needed to start a car engine because of the exceptionally high electrical conductivity of the battery anode material, lead dioxide. However, even though this type of battery was invented in 1859, until recently the fundamental reason for the high conductivity of lead dioxide had eluded scientists. 

A team of researchers from Oxford University, the University of Bath, Trinity College Dublin, and the ISIS neutron spallation source, explained for the first time the fundamental reason for the high conductivity of lead dioxide. 

"The unique ability of lead acid batteries to deliver surge currents in excess of 100 amps to turn over a starter motor in an automobile depends critically on the fact that the lead dioxide which stores the chemical energy in the battery anode has a very high electrical conductivity, thus allowing large current to be drawn on demand," said Professor Russ Egdell of Oxford University's Department of Chemistry. "However the origin of conductivity in lead oxide has remained a matter of controversy. Other oxides with the same structure, such as titanium dioxide, are electrical insulators." 

Through a combination of computational chemistry and neutron diffraction, the team demonstrated that lead dioxide is intrinsically an insulator with a small electronic band gap, but invariably becomes electron-rich due to the loss of oxygen from the lattice, causing the material to be transformed from an insulator into a metallic conductor. 

The researchers believed these insights could open up new avenues for the selection of improved materials for modern battery technologies. Professor Egdell said: "The work demonstrates the power of combining predictive materials modelling with state-of-the-art experimental measurements."


RESEARCH: CASCADE (CLOUD SYSTEM RESOLVING MODELLING OF THE TROPICAL ATMOSPHERE) 

Strong surface heating by the sun in the tropics is communicated to the atmosphere through the process of convection; this is responsible for significant amounts of tropical rainfall and plays a pivotal role in the heat and moisture budgets of the tropical climate system.

Convection releases latent heat which drives the tropical circulation and controls tropical sea-surface temperatures and, through interaction with atmospheric wave activity, contributes to shaping the global climate. Convection manifests itself through scales from the size of an individual cloud through the mesoscale and up to the synoptic and planetary scales.

 To represent these scale interactions in a model, it must cover a sizeable fraction of the Earth in order to capture the planetary/synoptic, yet be able to resolve km-sized single clouds. The CASCADE project investigated these phenomena by means of very large high-resolution simulations. CASCADE uses as its main tool the Met. Office Unified Model (UM). This software system is used by academic researchers and meteorological services worldwide in a multitude of configurations to model a wide-ranging set of processes from Earth-System simulations to operational weather forecasts. 

The National Centre for Atmospheric Science - Computational Modelling Services group (NCAS-CMS), based at the University of Reading, has ported the software to HECToR and provide support for several versions of it. Early CASCADE integrations at 40km, 12km, and 4km horizontal grid resolution have run successfully on HECToR using version 7.1 of the UM. The largest of these relatively low-resolution simulations covers the Indian Ocean, Maritime Continent, and extends into the Western Pacific. At 4km resolution, this domain comprises ~290 million grid points and generates ~4TB of diagnostic model output per model day. I/O expense represented a significant portion of the overall cost of these integrations in which all output was directed through a single processor, and it soon became apparent that the desired 1.5km resolution Indian Ocean simulation would not be feasible without adopting a modified I/O strategy. 

In collaboration with Cray, an asynchronous I/O model was implemented in version 6.1 of the UM and has now been ported to version 7.6 allowing us to run the model on the domain described above at 1.5km resolution. The model has 10222x2890x70 (~2 billion) grid points and generates ~12TB of output per model day. Eight I/O servers run on four under-populated XE6 nodes in order to satisfy their large memory requirements, and the model itself runs on up to 3072 processors, again running under-populated to secure its required memory. The 5-day simulation ran for close to 400hrs of wallclock time. 

Some initial data processing is performed at HECToR using the NERC lms (large-memory server) machine and processed data is transferred to Reading by GridFTP for analysis on local machines.


RESEARCH: MODELLING OF LILLGRUND OFFSHORE WIND FARM USING COMPUTATIONAL FLUID DYNAMICS 

Vattenfall AB is one of Europe's largest generators of electricity. In 2012 the Institute of Energy Systems at the University of Edinburgh, EPCC and Heriot-Watt University worked with Vattenfall to model its Lillgrund offshore wind farm.

Advanced computational fluid dynamics (CFD) techniques were used to dynamically model the air flow around the turbines, including wakes, and the effect of the flow on the turbines (performance). The model domain had to be large enough to allow the flow features to fully develop, reaching almost to 1 km in altitude, and extending over 8 km laterally and longitudinally.

Accurately modelling the turbines’ power outputs and the unsteady turbulent air flow is extremely demanding. Simulating the wind farm over a variety of wind speeds and directions, with the detail required to resolve individual turbine wakes, yet also capture flow ensemble effects kilometres in extent, such as wind farm wakes, would have been impossible without large-scale parallel computing platform offered on HECToR. The software was designed to take advantage of large numbers of processors parallelism through MPI, necessary to finish the required simulations in a timely fashion.

The results from the simulations were validated against measurements from the real Lillgrund wind farm, to great success. Now that the model has demonstrated accurate results, this software can now be used to develop strategies to design new wind farms, ensuring that turbines are placed in formations that extract the optimal power for the prevailing wind conditions.

HECToR was hosted by EPCC at the University of Edinburgh's recently upgraded Advanced Computing Facility, which was opened by HRH Prince Philip.

Evolution of the system

2007: HECToR Phase 1 : Cray XT4-based system with a peak performance of around 60 Tflops. This £113m machine was contained in 60 cabinets and comprised 1416 compute blades, each with 4 dual-core processor sockets. This amounted to 11,328 cores, each acting as a single CPU. The processor was an AMD 2.8 GHz Opteron. Each dual-core socket shared 6GB of memory, giving a total of 33.2 TB. The theoretical peak performance of the system was over 60 Tflops, with a LINPACK performance in excess of 52 Tflops.

2008: HECToR Vector X2 component successfully integrated with the HECToR XT4. The resulting 'XT5h' hybrid supercomputer was the first X2/XT4 integration on this scale. Unlike the XT4 which is air cooled, the X2 used chilled water for cooling. The Cray vector system – known as "Black Widow" – consisted of 28 vector compute nodes, each with 4 Cray vector processors, making 112 processors in all. Each processor was capable of 25.6 Gflops, giving a theoretical peak performance of 2.87 Tflops. Each 4-processor node shared 32Gb of memory. The Black Widow interconnection network had a point-to-point bandwidth of 16 Gb/s and a bi-section bandwidth of 254 Gb/s. The average ping-pong MPI latency was ~ 4.6 μsec. 

2009: Phase2A upgrade: HECToR service moved from dual-core to quad-core processors. All 5664 nodes were converted to quad-core processors, amounting to 22,656 cores, each of which acted as a single CPU. The processor used was an AMD 2.3 GHz Opteron. Users could now run simulations using up to 16,384 processors on HECToR. This upgrade increased the theoretical peak performance of the service from 59Tflops to 208Tflops.

2010: Phase2B: a 20-cabinet CRAY 'Baker' system became available. This was a major step on the multi-core ladder as the Baker used 12-core processors. The system consisted of 44,544 cores, delivering an estimated peak performance of over 300 TFlops. In addition to the ‘Baker’ system, approximately half of the existing XT4 system were retained. This was followed by an upgrade to the network, and late 2010 HECToR moved to the Gemini interconnect.

2013: Phase 3: Cray XE6 contained in 30 cabinets and comprising a total of 704 compute blades. Each blade contained four compute nodes, each with two 16-core AMD Opteron 2.3GHz Interlagos processors. This amounted to a total of 90,112 cores. Each 16-core socket was coupled with a Cray Gemini routing and communications chip. Each 16-core processor shared 16Gb of memory, giving a system total of 90.1 Tb. 

The theoretical performance of the Phase 3 systems was over 800 Tflops.There were 16 service blades on Phase 3, each with two dual-processor sockets. They acted as login nodes, controllers for the I/O and for the network. There was one Gemini router chip for every two XE nodes. The Gemini chip had 10 network links which are used to implement a 3D-torus of processors. The GPGPU system consisted of four compute nodes connected by Quad-band Infiniband interconnects. All of the compute nodes had a single quad-core Intel Xeon 2.4GHz CPU and 32 GB of main memory. Three of the compute nodes had 4NVidia Fermi GPGPU cards installed and the remaining compute node had 2 AMD FireStream GPGPU cards installed.