forwardcom forum

It is possible to change the rounding mode globally using the Numeric control register, or for an individual instruction or even for a single vector element using a mask register.

80 bit float is not mentioned in the IEEE standard. It was introduced with the Intel 8087 coprocessor and AFAIK used only in Intel-compatible processors.
The IEEE standard mentions the following floating point types: binary16, binary32, binary64, binary128.

The intention is to support IEEE-754 in the form of the next revision forthcoming in 2019. Among the forthcoming changes to IEEE-754 is a sanitation of the max and min functions for NAN inputs. Subnormal numbers may be turned off by default for performance reasons. They are very costly to support. T...

There are no fixed-size pages. The OS would have to make an arbitrary-size entry in the memory map for a block of memory that has been swapped to disk.

There has been a lot of spam posting here lately. Apparently the security question was too easy to answer for spambots. Now I have changed the security question to make it more difficult. The answer is a single word.

mbitsnbites wrote: The most common problem is zero-terminated strings. With naive instructions this becomes a data dependent branched loop, where the content of every byte needs to be inspected, and it's impossible for the CPU to correctly predict the final branch. This problem has already been solv...

mbitsnbites wrote: in my design I need to know the vector operation length before accessing the vector register file. I have the vector register fetch in one pipeline stage, and I have the vector register element indexing logic in the pipeline stage before that. I have solved that problem by requiri...

Thanks for the link. The MRISC32 ISA has some of the same ideas as ForwardCom. Maybe there is a basis for collaboration. I have added my comment at http://www.bitsnbites.eu/the-mrisc32-a- ... ent-219160

LLVM would be my first choice for a compiler for ForwardCom. I haven't had the time to look into it, so I don't know if there are any obstacles. The variable-length vectors and vector loops might be a problem.

But could compilers be made to have such abilities? Forget the past -- what about now? Yes. For example the Intel compiler can put prefetching of data into a separate thread running in the same CPU core with simultaneous multithreading (= hyperthreading in Intel lingo). I generally don't like simul...

In my opinion, the patent system as it works today is rotten. The patent system is intended to stimulate invention and innovation, but today it is doing the opposite. Hi tech companies are filing scores of patents, not to protect their inventions but for having weapons to use in patent wars. For exa...

The tools that I have developed so far can emulate the ForwardCom processor but not simulate memory latencies. The instruction set is designed for making OoO execution efficient. The compiler does not need to do this. Only the old x87 instructions use 80-bit intermediaries. Most other instruction se...

Floating point calculations can generate infinity (INF) and not-a-number (NAN) in case of errors. These codes will propagate to the end result of a sequence of calculations in most cases. This is a convenient way of detecting floating point errors, and it is more efficient than using traps (software...

ForwardCom could use any caching model. Experiments with alternative forms of caching are welcome. I don't think that 128 kB would be enough if you have large vector registers, but the cache might be subdivided into 'lanes' that align with the data lanes of the CPU. Quoting from the manual: The Forw...

The difference between a vector register and a scalar register is that you can handle the entire vector register with a single operation. If you want to add 1 to four 32-bit registers you need four instructions. If you want to add 1 to all four elements of a 128 bit vector register you only need one...

csdt wrote: the multiple instruction scheme is also applicable to the division (just call div and rem) Yes, but not to extended division (divide a 64 bit integer by a 32 bit integer to get a 32 bit quotient and a 32 bit remainder). is there any performance critical codes that require to compute div ...

csdt wrote: Back on the add with carry, is it really necessary to have a single instruction to get both the result and the carry? It might be worth considering recalculating the sum to get the carry. An integer addition is pretty fast, so recalculating it would not incur too much overhead. That's po...

The ForwardCom instruction set has 'mul' instructions which give the low part of a product, 'mul_hi' gives the high part, and 'mul_ex' gives double-size products as a vector. Vector elements with even-numbered position (0, 2, 4, ..) contain the low parts while odd elements (1, 3, 5, ..) contain the ...

All the binary tools for ForwardCom are working now: assembler, disassembler, linker, library manager, emulator, debugger. These tools can run in Windows and Linux. I have also made function libraries: libc.li contains the most important standard C functions. A library of mathematical functions math...

Yes, but not for multiplication and division. I don't want to have ALU operations with different latencies combined with conditional jump because it will complicate the pipeline.

The C language is particularly bad for overflow checking. It's not safe to detect signed integer overflow after it has occurred because the compiler is allowed to optimize it away. I've seen a very nasty bug because I checked for overflow in this way. See https://codereview.stackexchange.com/questio...

Regarding integer fault traps. Yes, I would love to avoid fault trapping for both integer and floating point calculations altogether. In addition to the problems that Hubert point to, there is the problem that the behavior depends on vector length. A trap may happen at different times in a loop sequ...

Why would a financial application need decimal floating point? You can get exactness just by multiplying by 100 so that you are counting cents rather than $ or € or whatever. BTW, the x86 instruction set has instructions for decimal numbers but they were never used, so they have been removed in x86-...

I prefer computers to use binary numbers. This is more efficient. The standards for decimal floating point numbers are certainly not easier to deal with than binary.

Thank you Kulasko.

This is very similar to what I have in mind. The number of parallel units may wary of course. Memory write may be after the ALU's, but there are few, if any, instructions that use both ALU and memory write.

forwardcom forum

Search found 185 matches

Re: Range/Interval based floating point computation

Re: Computational standards compliance

Re: Computational standards compliance

Re: Handling paging without a page system

Protection against spambots

Re: Interesting new ISA: MRISC32

Re: Interesting new ISA: MRISC32

Re: Interesting new ISA: MRISC32

Re: Is ForwardCom LLVM-friendly

Re: Forwardcom and caching models

Re: Forwardcom and patents

Re: Forwardcom simulations

NAN propagation instead of fault trapping. Can we avoid speculative execution?

Re: Forwardcom and caching models

Re: One flexible register

Re: Emulating multiple output instructions with caching

Re: Emulating multiple output instructions with caching

Re: Emulating multiple output instructions with caching

All the tools are working now

Re: Forwardcom possible execution pipeline?

Re: Forwardcom possible execution pipeline?

Re: Forwardcom possible execution pipeline?

Re: Decimal floating point

Re: Decimal floating point

Re: Forwardcom possible execution pipeline?