HubertLamontagne wrote: ↑2018-02-02, 18:11:06
I've never seen half-float being used. Not in game code, and not in sound applications (where 32bit float is very much the sweet spot). There's very little x86 support - only AVX conversion instructions to and from 32bit float vectors (vcvtps2ph and vcvtph2ps). There is no standard C/C++ type name for it either (the only trace of half float on x86 is the AVX conversion intrinsics).
Hi Hubert, I've lost the plot here a bit. Why are you talking about half-float? Is this related to my enthusiasm for 40-bit registers and address spaces for client devices? How so?
In any case, half-float, by which I assume you mean 16-bit FP, is extremely relevant right now, much more so than it was even ten years ago. It features prominently in a lot of deep learning APIs and platforms, most recently in NVIDIA's new Volta "GPU" architecture with its plethora of dedicated tensor cores (I put "GPU" in quotes because this product is no more a GPU than my rooftop antenna – it's meant exclusively for data centers, particularly for deep learning applications. Perhaps one day the Volta architecture will be spun into a GPU, and one can even dream that cryptocurrrency miners won't make it impossible to actually buy these "GPUs" for ≤ 120% of their MSRP.)
Since interesting expansions on the 16-bit FP renaissance:
https://devblogs.nvidia.com/mixed-preci ... ng-cuda-8/
Facebook's Caffe2 platform:
https://caffe2.ai/blog/2017/05/10/caffe ... pport.html
Deep dive into Volta:
https://devblogs.nvidia.com/inside-volta/
With my proposed 40-bit platform, I imagine specifying 20, 40, and 80-bit integers and FP. I think 20-bit integers and floats would be more useful in many cases than 16-bit. And the 80-bit floats perfectly sync up with the 80-bit Extended Precision FP that IEEE sort of documents already. I think Intel uses 80-bit floats when doing math on doubles. The 20, 40, and 80 bit floats would have to be very rigorously specified, much like the recent IEEE specs (but it should be free and open source, not cost an arm and a leg like the IEEE standards or the C++ standard).
There's also the new ISO/IEC standard which is much broader than floating point:
https://en.wikipedia.org/wiki/ISO/IEC_10967
And I'd want a logarithmic number system IF the requisite empirical research tells us that it would be a significant benefit for many programs. (And yes, we'd have to sort out what we mean by "significant" and "many" and so forth.)
I assume a 20/40/80-bit platform could easily support legacy 16/32/64-bit types by padding or other means.
I also like the idea of 320-bit vector registers. 8 40-bit values. 10 32-bit values. 4 80-bit. From what I've read, I'm not sure that huge vectors of the sort Agner wants are efficient. Isn't AVX-512 underperforming right now?
Finally, I think core type bit lengths, register sizes, address space, vector length, etc. should all be chosen by rigorous empirical research on what is optimal for the kind of operating system we want (and we really should want new, clean-sheet OSes), and the applications we expect to run on them. My 20/40/80 business is really just a hunch of near optimality for client devices. But the optimal values could be quite different, and innovations in semiconductor manufacturing and hardware design could enable a whole new set of optimal parameters.