Search found 178 matches

by agner
2022-01-20, 9:12:45
Forum: forwardcom forum
Topic: Nonlocal control flow
Replies: 10
Views: 30559

Re: Nonlocal control flow

Isn't this a case where forwardcom would do much better than traditional cpus? Because the call stack is expressed explicitly, it is the same as the return prediction stack. So after you install a new call stack, you maybe pipeline stall, but then start start predicting returns correctly from it. Y...
by agner
2022-01-20, 7:37:28
Forum: forwardcom forum
Topic: Nonlocal control flow
Replies: 10
Views: 30559

Re: Nonlocal control flow

Thank you for trying to find weak points in my system. I think that nonlocal returns may be implemented more efficiently as a sequence of normal returns. 1. Why? Why is it more efficient to return multiple times than just once? Because it doesn't mess up the return prediction. A non-local return wil...
by agner
2022-01-19, 11:09:59
Forum: forwardcom forum
Topic: Nonlocal control flow
Replies: 10
Views: 30559

Re: Nonlocal control flow

I think that nonlocal returns may be implemented more efficiently as a sequence of normal returns. It is possible to have multiple data stacks on ForwardCom. The compiler could use a register or an extra stack or a particular stack space to keep track of the nesting level. Error handling can also be...
by agner
2022-01-19, 7:51:14
Forum: forwardcom forum
Topic: Nonlocal control flow
Replies: 10
Views: 30559

Re: Nonlocal control flow

You are right that stack unwinding requires privileged instructions. Stack unwinding takes place in the following cases: Exception trapping (try/catch) in object oriented languages Debugging longjmp in C Exception trapping and debugging require privileged access anyway. A program that has a large nu...
by agner
2021-12-13, 7:13:36
Forum: forwardcom forum
Topic: How to avoid memory fragmentation
Replies: 5
Views: 15608

Re: How to avoid memory fragmentation

The more fragmented the memory becomes, the more complicated becomes the access: In simple cases where there are few processes running and plenty of vacant RAM, you don't need any address translation. All you need is a small memory map with a few variable-size entries. This memory map can stay on-ch...
by agner
2021-12-11, 7:19:39
Forum: forwardcom forum
Topic: How to avoid memory fragmentation
Replies: 5
Views: 15608

Re: How to avoid memory fragmentation

Thank you for your interesting comments. kyle-forward wrote: Many parsers and similar algorithms use recursion True. This even includes the parser in the ForwardCom assembler. But the recursion level is not very deep. How are things like ASRL going to be done? The security problems that "Addres...
by agner
2021-11-15, 15:18:37
Forum: forwardcom forum
Topic: Proposal to drop tiny instructions
Replies: 11
Views: 29775

Re: Proposal to drop tiny instructions

Thanks for the link. If you want to mix half-size (16 bits) with word size (32-bits) instructions then all addresses would need 16-bit granularity. Adding one bit at the bottom means removing one bit at the top. In other words, the length you can jump with an 8-bit address offset is halved. Furtherm...
by agner
2021-11-12, 10:45:57
Forum: forwardcom forum
Topic: Instruction boundaries
Replies: 1
Views: 12003

Re: Instruction boundaries

Thanks for the proposal. This has been proposed before. It would be difficult to find space for the extra bits which might be better used for other purposes, and it will make linkers, loaders, and other tools more complicated if they have to split 32-bit and 64-bit constants into non-contiguous fiel...
by agner
2021-09-11, 7:35:14
Forum: forwardcom forum
Topic: Branch instruction with a "one-hot" bitmask equality comparison
Replies: 1
Views: 12467

Re: Branch instruction with a "one-hot" bitmask equality comparison

Thanks for the idea. Your example can be implemented in C as if (1 << x & 0b1001001000){} Both Gcc and Clang compilers are actually using this optimization. This can be implemented in ForwardCom assembly: int r1 = 0b1001001000 int r2 = [x] bit_test(r1,r2), jump_true Whatever The bit_test will be...
by agner
2021-08-08, 11:31:07
Forum: forwardcom forum
Topic: New softcore
Replies: 0
Views: 21922

New softcore

I have been working hard on developing a softcore implementation of ForwardCom, and I am now proud to publish the first version on Github. Features: Implemented on Nexys A7 FPGA board with Xilinx Artix 7 100T FPGA Coded in System Verilog with an open license Everything is made from scratch. No borro...
by agner
2021-06-28, 11:34:58
Forum: forwardcom forum
Topic: Universal boolean instruction
Replies: 5
Views: 16837

Re: Universal boolean instruction

Thank you for your comment. I was not aware of MRISC32. It looks like we have got some of the same ideas. ForwardCom allows instructions to be up to three 32-bit words. This makes it possible to overcome the problem of cramming a lot of information into a single code word, that most RISC designs suf...
by agner
2021-06-28, 6:33:47
Forum: forwardcom forum
Topic: Universal boolean instruction
Replies: 5
Views: 16837

Re: Universal boolean instruction

Now I have tried to synthesize it in an FPGA. The solution with a complete truth table is using 18% more slices and 40% more LUTs than the bitwise_logic circuit I described above. The truth table implementation is using less resources than I expected because the FPGA has efficient ways of implementi...
by agner
2021-06-28, 6:16:26
Forum: forwardcom forum
Topic: input/output instructions
Replies: 8
Views: 23116

Re: input/output instructions

Not 100%. Hubert has fertilized it with valuable insight in hardware design :-)
by agner
2021-06-27, 7:52:00
Forum: forwardcom forum
Topic: Universal boolean instruction
Replies: 5
Views: 16837

Universal boolean instruction

I have an idea for a multi-purpose 3-input bitwise boolean instruction. This instruction can implement all functions of the type RESULT = (A AND/OR B) AND/OR/XOR C with optional inversion on all inputs and outputs. It can also do A XOR B XOR C and bit selection: A AND B OR NOT A AND C . The latter c...
by agner
2021-06-25, 5:11:33
Forum: forwardcom forum
Topic: Macro-op fusion as an intentional instruction set design choice
Replies: 4
Views: 17545

Re: Macro-op fusion as an intentional instruction set design choice

Hubert, my main motivation for making load+alu instructions is to do more work per instruction = higher throughput. The x86 instruction set is quite efficient despite a terribly complicated decode process, exactly because it does more work per instruction. This is also reducing the register load. An...
by agner
2021-06-19, 5:32:29
Forum: forwardcom forum
Topic: Default integer size 32 or 64 bits?
Replies: 7
Views: 20440

Re: Default integer size 32 or 64 bits?

Power consumption is also an issue. 32 bit integers use less power. My soft core can run faster when 64-bit integers are not implemented. While writing C++, I eventually switched to size_t size_t is unsigned, so it would not fit the short version loop instructions. The corresponding signed type in C...
by agner
2021-06-18, 17:09:31
Forum: forwardcom forum
Topic: Default integer size 32 or 64 bits?
Replies: 7
Views: 20440

Re: Default integer size 32 or 64 bits?

Most integer instructions are available in both 8, 16, 32, and 64 bits versions. The question is only which one to prioritize for short instructions (single code word). The forthcoming version (1.11) will have both 32 and 64 bit short versions of some instructions. Most branch and loop instructions ...
by agner
2021-06-17, 18:44:53
Forum: forwardcom forum
Topic: Load From Const Array Instruction
Replies: 3
Views: 7309

Re: Load From Const Array Instruction

The solution that Hubert proposes is already possible with the present design. ForwardCom supports a separate read-only data section addressed relative to IP. An addressing mode with [IP + offset + scaled index] is also supported. I don't remember if we have discussed this before, but it is certainl...
by agner
2021-06-10, 18:32:36
Forum: forwardcom forum
Topic: Load From Const Array Instruction
Replies: 3
Views: 7309

Re: Load From Const Array Instruction

Thank you for your suggestion. A problem with your proposal is that jumps are costly, especially if the pipeline is long, because they interrupt the prefetching and decoding of instructions. Another problem is that the table needs multiple copies if it is accessed from multiple points in the code. I...
by agner
2021-04-13, 5:28:33
Forum: forwardcom forum
Topic: Macro-op fusion as an intentional instruction set design choice
Replies: 4
Views: 17545

Re: Macro-op fusion as an intentional instruction set design choice

The x86 instruction set introduced prefixes long ago. Today, there is a lot of different prefixes that are 1, 2, 3, and 4 bytes long. There is no limit to how many prefixes an x86 instruction can have as long as the complete instruction is no more than 15 bytes long. This is a nightmare to decode. T...
by agner
2021-03-22, 9:58:14
Forum: forwardcom forum
Topic: Default integer size 32 or 64 bits?
Replies: 7
Views: 20440

Default integer size 32 or 64 bits?

Some ForwardCom instructions are available in a short form using format template C. Template C has one register field, 16 bits of immediate data, and no operand size field. This will fit an instruction like for example int r1 += 1000 I am in doubt whether the integer size should be 32 bits or 64 bit...
by agner
2021-03-19, 16:44:14
Forum: forwardcom forum
Topic: Rollbackable L1 Data Cache Design?
Replies: 7
Views: 12808

Re: Rollbackable L1 Data Cache Design?

Functions that receive a pointer can check if it is aligned. Functions like memcpy do that. But it is unrealistic to require that all functions have multiple paths for aligned and unaligned pointers. In most situations you can require that pointers be aligned according to the data size. Alignment of...
by agner
2021-03-15, 7:15:08
Forum: forwardcom forum
Topic: Rollbackable L1 Data Cache Design?
Replies: 7
Views: 12808

Re: Rollbackable L1 Data Cache Design?

In most cases, the compiler will know whether memory is aligned or not. Standard functions like memcpy are checking whether the pointers are aligned before it decides which method is optimal. Shifting data to make it aligned can be done in software. This may be inconvenient for the programmer, but i...
by agner
2021-03-13, 7:17:19
Forum: forwardcom forum
Topic: Rollbackable L1 Data Cache Design?
Replies: 7
Views: 12808

Re: Rollbackable L1 Data Cache Design?

Thank you Hubert for the explanation. It is a relevant discussion how caching can be made simpler. The ForwardCom design may have restrictions on alignment. Unaligned memory accesses could be split into two, or simply not allowed. The memcpy library function would need to shift data if source and de...
by agner
2021-03-02, 8:01:23
Forum: forwardcom forum
Topic: Implications of ForwardCom memory management approach
Replies: 15
Views: 26964

Re: Implications of ForwardCom memory management approach

When I google for "shared virtual address model" I get something with CPU and GPU sharing the same virtual addresses, but still with fixed-size pages. I think there is little need for a GPU when the CPU has long vectors. But address translation after the cache may be a very good idea. I wo...