Search found 80 matches
- 2021-03-03, 23:32:13
- Forum: forwardcom forum
- Topic: Implications of ForwardCom memory management approach
- Replies: 15
- Views: 27161
Re: Implications of ForwardCom memory management approach
When I google for "shared virtual address model" I get something with CPU and GPU sharing the same virtual addresses, but still with fixed-size pages. I think there is little need for a GPU when the CPU has long vectors. Yeah I think "shared virtual address model" is used for tw...
- 2021-02-03, 22:02:46
- Forum: forwardcom forum
- Topic: Using CPU cores as GPU
- Replies: 3
- Views: 6366
Re: Using CPU cores as GPU
Using a specialized RISC core as a starting point for a GPU makes sense... For instance in ATI/AMD's GCN is basically a very specialized RISC (for reference, GCN1 came out in 2012, PS4 and XBoxOne are GCN2, PS4pro and XBoxOneX are GCN4). Here's the instruction set document for GCN1: http://developer...
- 2021-01-29, 19:47:15
- Forum: forwardcom forum
- Topic: Using CPU cores as GPU
- Replies: 3
- Views: 6366
Re: Using CPU cores as GPU
Seems that they are walking down the path of the Intel Larrabbee eh? :) I imagine the texturing unit would be designed as 4 parallel data caches, and a bilinear-interpolated texture lookup would read the even-even, odd-even, even-odd, and odd-odd textels around the requested texture coordonated at t...
- 2021-01-12, 0:17:06
- Forum: forwardcom forum
- Topic: Separate call stack and data stack
- Replies: 6
- Views: 12555
Re: Separate call stack and data stack
Still, I'm not convinced that this is a net decrease in complexity. This creates a small new memory area, with its own addressing scheme separate from the main RAM. Since it's not affected by RAM area mapping, it needs to be integrally copied to/from main RAM during context switches. That means you ...
- 2020-11-22, 16:34:08
- Forum: forwardcom forum
- Topic: input/output instructions
- Replies: 8
- Views: 23340
Re: input/output instructions
If you can do efficient bilinear texturing on a general purpose CPU, that would already be better than Intel (who had to add a texturing unit to the Larrabbee) and Sony (who had to add a GPU to the PS3 when they realized that they couldn't efficiently texture polygons on the Cell). And it would prob...
- 2020-11-20, 20:54:23
- Forum: forwardcom forum
- Topic: input/output instructions
- Replies: 8
- Views: 23340
Re: input/output instructions
I imagine if it ever comes to that, with large 3d graphics adapters, by that point you'd probably memory-map the device space and use something like paging and chained DMA and bus mastering and even an IO-MMU (which is now a thing on modern PCs). The culmination of this trend is that on the PS4, whe...
- 2020-06-24, 20:37:06
- Forum: forwardcom forum
- Topic: Implications of ForwardCom memory management approach
- Replies: 15
- Views: 27161
Re: Implications of ForwardCom memory management approach
I was thinking today that I don't know why you care about not having page tables and a TLB. What's the problem? It takes hardware resources? So what? Is it really a big a problem to have page tables and TLBs, or is this more of an esthetic preference? One other thing to keep in mind is that you don...
- 2020-06-10, 15:38:24
- Forum: forwardcom forum
- Topic: Proposal to drop tiny instructions
- Replies: 11
- Views: 30045
Re: Multi-register instructions
Is the logic for multi-register push and pop too specific to be shared? I thought that perhaps, the multi-register logic that already is needed because of this could be used as a more general multi-register instruction generator. That would enable multi-register instructions without much added comp...
- 2020-05-20, 19:48:25
- Forum: forwardcom forum
- Topic: Proposal to drop tiny instructions
- Replies: 11
- Views: 30045
Re: Proposal to drop tiny instructions
I guess this is about the limit of where I can help, because there are like half a dozen styles of pipelines (single-isssue, atom-style load-alu single issue, dual-issue, VLIW, simple OOO where every load/store/ALU op is a full independent micro-op, OOO with micro-op fusion, OOO with clustering), an...
- 2020-05-19, 16:58:29
- Forum: forwardcom forum
- Topic: Separate call stack and data stack
- Replies: 6
- Views: 12555
Re: Separate call stack and data stack
I'm very much reminded of the register-window stack on SPARC which has similar semantics, and stores return addresses as part of its mechanism: http://icps.u-strasbg.fr/people/loechner/public_html/enseignement/SPARC/sparcstack.html Any local variable that is stored on the stack and that isn't access...
- 2020-05-15, 17:23:58
- Forum: forwardcom forum
- Topic: Proposal to drop tiny instructions
- Replies: 11
- Views: 30045
Re: Proposal to drop tiny instructions
If tiny instructions are rare and are mostly used by register saving/loading to stack, and load/store multiple or load/store pair (the ARM64 equivalent) is simpler to implement, then it makes total sense to go for the multi-register loads/stores instead yeah. I think the extra complexity in the regi...
- 2020-05-05, 16:26:39
- Forum: forwardcom forum
- Topic: Heterogenous cores / instruction sets
- Replies: 3
- Views: 7875
Re: Heterogenous cores / instruction sets
ARM is pretty good at this (obviously it has to, since so much of their business is embedded cores), with the whole gamut: - Stripped down 32bit (modern small microcontrollers) - 32bit (lots of microcontrollers and smaller cores, GBA and NDS) - 32bit + FPU (used in some microcontrollers for DSP-heav...
- 2020-05-05, 15:06:48
- Forum: forwardcom forum
- Topic: Implications of ForwardCom memory management approach
- Replies: 15
- Views: 27161
Re: Implications of ForwardCom memory management approach
Without the ability to increase the number of mappings when needed, then you'd definitely need more RAM for sure, because you have a lot fewer available techniques to use when RAM gets tight: - Apps rarely malloc() all their ram in just one initial go. The pattern is more like a dynamic mix of mallo...
- 2020-05-04, 22:42:53
- Forum: forwardcom forum
- Topic: Implications of ForwardCom memory management approach
- Replies: 15
- Views: 27161
Re: Implications of ForwardCom memory management approach
Presumably it would work roughly as follows: - All allocation happens in 4k blocks. - When a new program starts, its initial allocation is set some distance away from other previous allocations (maybe with a 16mb offset?). - When your program first allocates more memory, the OS grows this initial al...
- 2020-05-04, 17:25:56
- Forum: forwardcom forum
- Topic: Using Forwardcom as a GPU?
- Replies: 11
- Views: 18028
Re: Using Forwardcom as a GPU?
[...] Look up nvpath. It's Nvidia's feature that accelerates vector graphics on their GPUs/3D accelerators. It's a very interesting extension, and Adobe has used it to good effect in their Creative Cloud applications, probably Photoshop and others. I've taken a look at it. It's kinda weird but it m...
- 2020-05-01, 22:58:36
- Forum: forwardcom forum
- Topic: Using Forwardcom as a GPU?
- Replies: 11
- Views: 18028
Re: Using Forwardcom as a GPU?
This reminds me that fast Bezier curve performance would be very useful, without having to worry about being a traditional GPU. Bezier curves are central to a lot of 2D rendering, including fonts and vector graphics like SVG. Some are quadratic and some are cubic. I'm looking into this and it seems...
- 2020-04-06, 21:17:48
- Forum: forwardcom forum
- Topic: Possible difficulties for microcode-less implementations
- Replies: 8
- Views: 17118
Re: Possible difficulties for microcode-less implementations
Thank you for your replies. Having instructions that require multiple micro-ops doesn't necessarily mean you need microcode. You are right, I didn't think about that. I still have some difficulty dealing with this as it complicates the decoding stage. Presumably, decoders don't have a constant late...
- 2020-04-02, 17:22:53
- Forum: forwardcom forum
- Topic: Possible difficulties for microcode-less implementations
- Replies: 8
- Views: 17118
Re: Possible difficulties for microcode-less implementations
Having instructions that require multiple micro-ops doesn't necessarily mean you need microcode. ARM has many instructions that are multiple-uop (for instance, load+increment a pointer, multiple register store/load...). "Call" and "Ret" are inherently multiple-uop, since you load...
- 2020-03-31, 17:14:50
- Forum: forwardcom forum
- Topic: Using Forwardcom as a GPU?
- Replies: 11
- Views: 18028
Re: Using Forwardcom as a GPU?
Yeah. The idea of a Forwardcom Xeon-Phi / Larrabbee makes sense to me, especially since it plays to Forwardcom's strengths (lots and lots of vector instruction). Clearly you could go with the UltraSparc way - make it in-order to make the cores small, use really aggressive hyper-threading with lots o...
- 2020-03-26, 12:27:20
- Forum: forwardcom forum
- Topic: Using Forwardcom as a GPU?
- Replies: 11
- Views: 18028
Re: Using Forwardcom as a GPU?
For sure. I have no illusions - such a project would be likely to turn out like the ill-fated Larrabbee (the articles about its demise are confusing, but they seem to imply that it had something like half the perf of dedicated GPUs, with the drivers still in alpha stage as another generation of GPUs...
- 2020-03-21, 22:25:52
- Forum: forwardcom forum
- Topic: Using Forwardcom as a GPU?
- Replies: 11
- Views: 18028
Using Forwardcom as a GPU?
Considering how vector and throughput-oriented Forwardcom is, I've been wondering if it would make sense as a GPU. It should be pretty good at vector processing at least. It might make sense to use tiled rendering. For rasterization, you'd use the full vector register size all the time, with registe...
- 2020-03-17, 0:17:09
- Forum: forwardcom forum
- Topic: Possible difficulties for microcode-less implementations
- Replies: 8
- Views: 17118
Re: Possible difficulties for microcode-less implementations
I'd imagine this would require special handling in the load/store unit if the vector isn't constant sized. Something where you get a kind of double-slot read micro-op or 2 micro-ops (there needs to be 2 reads if the image straddles a cache line boundary anyways), and if there's a potential read faul...
- 2020-02-12, 0:20:34
- Forum: forwardcom forum
- Topic: Putting it on real hardware
- Replies: 7
- Views: 17152
Re: Putting it on real hardware
You'd start with an in-order implementation in Verilog (or VHDL), I'd think, using block RAM instead of DRAM for instruction memory and data memory at first... and probably no vector support initially, and not too much pipelining at first. Then, you'd build up from there. You'd presumably start with...
- 2018-12-18, 5:28:11
- Forum: forwardcom forum
- Topic: Handling paging without a page system
- Replies: 3
- Views: 11608
Re: Handling paging without a page system
Presumably, the OS would need to use something like Buddy Memory Allocation system-wide to keep allocations contiguous as much as possible and to limit the number of mappings (and to be able to do multiple hundred megabyte allocations at all). Excessively large mappings that get swapped to disk woul...
- 2018-10-08, 1:38:57
- Forum: forwardcom forum
- Topic: Interesting new ISA: MRISC32
- Replies: 13
- Views: 28411
Re: Interesting new ISA: MRISC32
A string machine? How would you implement this string cache? Some kind of fast hardware hash function that processes 32 bytes at the time? Hardware accelerated UTF8 character loading and capitalization changes? In particular, the hardware assisted string bank updating sounds really hard to build in ...