Different instruction sets on different cores

discussion of forwardcom instruction set and corresponding hardware and software

Moderator: agner

Post Reply
JoeDuarte
Posts: 1
Joined: Tue Dec 19, 2017 6:51 pm

Different instruction sets on different cores

Post by JoeDuarte » Tue Dec 19, 2017 7:00 pm

Hi Agner – Do we need every core to support the same registers and instructions? There is some evidence that a logarithmic number system would be more efficient than floating point for many workloads (https://en.wikipedia.org/wiki/Logarithmic_number_system). It would be nice to have floating point on two cores, and logarithmic on two other cores, for example.

And if ForwardCom were to have AES and other crypto instructions, it seems like it would be fine to have them on just one core. There's no need to have that on every core – they won't be used.

Separately, how would ForwardCom fare with strings compared to SSE 4.2? I don't see any comparable instructions.

agner
Site Admin
Posts: 52
Joined: Sun Oct 15, 2017 8:07 am
Contact:

Re: Different instruction sets on different cores

Post by agner » Wed Dec 20, 2017 5:55 pm

A logarithmic number system is efficient as long as you are using it for multiplication only, but difficult if you want to do addition. You need no extra hardware for multiplying logarithmic numbers - this is simply addition of integers. Another possibility is to use standard floating point numbers and add the exponents. ForwardCom has an instruction mul_2pow that adds an integer n to the exponent of a floating point number. This corresponds to multiplying by 2^n, or dividing if n is negative. This does floating point multiplication at the speed of integer addition.

I have not implemented something like Intel's SSE4.2 instructions for the following reasons:
  • These instructions are used mainly for manipulating human-readable text. Such texts are usually so short that execution time is negligible. Only applications such as DNA analysis are critical.
  • I don't want complicated instructions that need to be split up into micro-operations. This makes the whole pipeline more complicated and slower.
  • SSE4.2 is rarely used because it doesn't easily integrate into high level programming languages.
  • You can have an FPGA for application-specific instructions. This can be used for SSE4.2-like operations, cryptographic instructions, etc.

-.-
Posts: 2
Joined: Sun Dec 24, 2017 5:10 am

Re: Different instruction sets on different cores

Post by -.- » Sun Dec 24, 2017 5:28 am

JoeDuarte wrote:
Tue Dec 19, 2017 7:00 pm
And if ForwardCom were to have AES and other crypto instructions, it seems like it would be fine to have them on just one core. There's no need to have that on every core – they won't be used.
I would've thought that a very common application of crypto acceleration would be a multi-threaded HTTPS/VPN/etc server, where the acceleration units would need to be on each core to be used. You could just lock the server to one core, but then you'll be unable to use the other cores on the chip. Alternatively, you could have a process/thread running on the "crypto core" and pass data back and forth between the server's worker threads and the crypto thread, but that'd complicate the programming model a little (not too sure how much of a performance penalty this is) - still, it'd work I suppose.

It's interesting to note that Intel has announced the AVX512 VAES extension for upcoming Icelake processors, which can encrypt 4 streams in parallel. I don't know what purpose this is aimed at, but clearly they see a benefit for enabling more parallel encryption (or maybe it helps accelerate a single stream AES-CTR, though it being released along with VPCLMUL seems to suggest 4 parallel AES-GCM streams being the aim).

I've never done any work with FPGAs so cannot comment how it'd compare with a "dedicated" crypto core.

Kulasko
Posts: 2
Joined: Tue Nov 14, 2017 9:41 pm
Location: Germany

Re: Different instruction sets on different cores

Post by Kulasko » Sun Jan 14, 2018 6:28 am

-.- wrote:
Sun Dec 24, 2017 5:28 am
It's interesting to note that Intel has announced the AVX512 VAES extension for upcoming Icelake processors, which can encrypt 4 streams in parallel. I don't know what purpose this is aimed at, but clearly they see a benefit for enabling more parallel encryption (or maybe it helps accelerate a single stream AES-CTR, though it being released along with VPCLMUL seems to suggest 4 parallel AES-GCM streams being the aim).

I've never done any work with FPGAs so cannot comment how it'd compare with a "dedicated" crypto core.
FPGA implementations have a few drawbacks compared to ASIC implementations, the most notable perhabs being the attainable clock rate of a given block of logic (typically a few hundred Mhz today), therefore you will see higher latency. However, the forwardcom-ISA should cover the vast majority of latency-sensitive algorithms, as it describes a general purpose processor. For throughput-sensitive algorithms, you usually can just increase parellelism. In theory, you can design a wider FPGA implementation with a higher total throughput than a narrower ASIC implementation.

A current idea for forwardcom is to integrate FPGAs in CPU cores, the current specification version has reserved instruction codes for this purpose. It should be possible to supply a libary for the FPGA programming (by the operating system?) and then using these designs as one would use regular instruction extensions in other architectures. Of course, the supplied algorithm has to exploit enough parallelism and the program has to tell the operating system what algorithm it wants to run.

-.-
Posts: 2
Joined: Sun Dec 24, 2017 5:10 am

Re: Different instruction sets on different cores

Post by -.- » Fri Jan 19, 2018 10:06 am

I'd imagine that mostly serial encryption, such as AES-CBC, would suffer, speed-wise, on an FPGA compared to a CPU with dedicated AES instructions, though mostly parallel methods like AES-CTR could be better (for large enough amounts of data).

I haven't really looked at what ForwardCom provides though, so maybe it has other mitigations in place.

Post Reply