Search found 80 matches

by HubertLamontagne
2021-03-03, 23:32:13
Forum: forwardcom forum
Topic: Implications of ForwardCom memory management approach
Replies: 15
Views: 27088

Re: Implications of ForwardCom memory management approach

When I google for "shared virtual address model" I get something with CPU and GPU sharing the same virtual addresses, but still with fixed-size pages. I think there is little need for a GPU when the CPU has long vectors. Yeah I think "shared virtual address model" is used for tw...
by HubertLamontagne
2021-02-03, 22:02:46
Forum: forwardcom forum
Topic: Using CPU cores as GPU
Replies: 3
Views: 6345

Re: Using CPU cores as GPU

Using a specialized RISC core as a starting point for a GPU makes sense... For instance in ATI/AMD's GCN is basically a very specialized RISC (for reference, GCN1 came out in 2012, PS4 and XBoxOne are GCN2, PS4pro and XBoxOneX are GCN4). Here's the instruction set document for GCN1: http://developer...
by HubertLamontagne
2021-01-29, 19:47:15
Forum: forwardcom forum
Topic: Using CPU cores as GPU
Replies: 3
Views: 6345

Re: Using CPU cores as GPU

Seems that they are walking down the path of the Intel Larrabbee eh? :) I imagine the texturing unit would be designed as 4 parallel data caches, and a bilinear-interpolated texture lookup would read the even-even, odd-even, even-odd, and odd-odd textels around the requested texture coordonated at t...
by HubertLamontagne
2021-01-12, 0:17:06
Forum: forwardcom forum
Topic: Separate call stack and data stack
Replies: 6
Views: 12522

Re: Separate call stack and data stack

Still, I'm not convinced that this is a net decrease in complexity. This creates a small new memory area, with its own addressing scheme separate from the main RAM. Since it's not affected by RAM area mapping, it needs to be integrally copied to/from main RAM during context switches. That means you ...
by HubertLamontagne
2020-11-22, 16:34:08
Forum: forwardcom forum
Topic: input/output instructions
Replies: 8
Views: 23200

Re: input/output instructions

If you can do efficient bilinear texturing on a general purpose CPU, that would already be better than Intel (who had to add a texturing unit to the Larrabbee) and Sony (who had to add a GPU to the PS3 when they realized that they couldn't efficiently texture polygons on the Cell). And it would prob...
by HubertLamontagne
2020-11-20, 20:54:23
Forum: forwardcom forum
Topic: input/output instructions
Replies: 8
Views: 23200

Re: input/output instructions

I imagine if it ever comes to that, with large 3d graphics adapters, by that point you'd probably memory-map the device space and use something like paging and chained DMA and bus mastering and even an IO-MMU (which is now a thing on modern PCs). The culmination of this trend is that on the PS4, whe...
by HubertLamontagne
2020-06-24, 20:37:06
Forum: forwardcom forum
Topic: Implications of ForwardCom memory management approach
Replies: 15
Views: 27088

Re: Implications of ForwardCom memory management approach

I was thinking today that I don't know why you care about not having page tables and a TLB. What's the problem? It takes hardware resources? So what? Is it really a big a problem to have page tables and TLBs, or is this more of an esthetic preference? One other thing to keep in mind is that you don...
by HubertLamontagne
2020-06-10, 15:38:24
Forum: forwardcom forum
Topic: Proposal to drop tiny instructions
Replies: 11
Views: 29892

Re: Multi-register instructions

Is the logic for multi-register push and pop too specific to be shared? I thought that perhaps, the multi-register logic that already is needed because of this could be used as a more general multi-register instruction generator. That would enable multi-register instructions without much added comp...
by HubertLamontagne
2020-05-20, 19:48:25
Forum: forwardcom forum
Topic: Proposal to drop tiny instructions
Replies: 11
Views: 29892

Re: Proposal to drop tiny instructions

I guess this is about the limit of where I can help, because there are like half a dozen styles of pipelines (single-isssue, atom-style load-alu single issue, dual-issue, VLIW, simple OOO where every load/store/ALU op is a full independent micro-op, OOO with micro-op fusion, OOO with clustering), an...
by HubertLamontagne
2020-05-19, 16:58:29
Forum: forwardcom forum
Topic: Separate call stack and data stack
Replies: 6
Views: 12522

Re: Separate call stack and data stack

I'm very much reminded of the register-window stack on SPARC which has similar semantics, and stores return addresses as part of its mechanism: http://icps.u-strasbg.fr/people/loechner/public_html/enseignement/SPARC/sparcstack.html Any local variable that is stored on the stack and that isn't access...
by HubertLamontagne
2020-05-15, 17:23:58
Forum: forwardcom forum
Topic: Proposal to drop tiny instructions
Replies: 11
Views: 29892

Re: Proposal to drop tiny instructions

If tiny instructions are rare and are mostly used by register saving/loading to stack, and load/store multiple or load/store pair (the ARM64 equivalent) is simpler to implement, then it makes total sense to go for the multi-register loads/stores instead yeah. I think the extra complexity in the regi...
by HubertLamontagne
2020-05-05, 16:26:39
Forum: forwardcom forum
Topic: Heterogenous cores / instruction sets
Replies: 3
Views: 7854

Re: Heterogenous cores / instruction sets

ARM is pretty good at this (obviously it has to, since so much of their business is embedded cores), with the whole gamut: - Stripped down 32bit (modern small microcontrollers) - 32bit (lots of microcontrollers and smaller cores, GBA and NDS) - 32bit + FPU (used in some microcontrollers for DSP-heav...
by HubertLamontagne
2020-05-05, 15:06:48
Forum: forwardcom forum
Topic: Implications of ForwardCom memory management approach
Replies: 15
Views: 27088

Re: Implications of ForwardCom memory management approach

Without the ability to increase the number of mappings when needed, then you'd definitely need more RAM for sure, because you have a lot fewer available techniques to use when RAM gets tight: - Apps rarely malloc() all their ram in just one initial go. The pattern is more like a dynamic mix of mallo...
by HubertLamontagne
2020-05-04, 22:42:53
Forum: forwardcom forum
Topic: Implications of ForwardCom memory management approach
Replies: 15
Views: 27088

Re: Implications of ForwardCom memory management approach

Presumably it would work roughly as follows: - All allocation happens in 4k blocks. - When a new program starts, its initial allocation is set some distance away from other previous allocations (maybe with a 16mb offset?). - When your program first allocates more memory, the OS grows this initial al...
by HubertLamontagne
2020-05-04, 17:25:56
Forum: forwardcom forum
Topic: Using Forwardcom as a GPU?
Replies: 11
Views: 17960

Re: Using Forwardcom as a GPU?

[...] Look up nvpath. It's Nvidia's feature that accelerates vector graphics on their GPUs/3D accelerators. It's a very interesting extension, and Adobe has used it to good effect in their Creative Cloud applications, probably Photoshop and others. I've taken a look at it. It's kinda weird but it m...
by HubertLamontagne
2020-05-01, 22:58:36
Forum: forwardcom forum
Topic: Using Forwardcom as a GPU?
Replies: 11
Views: 17960

Re: Using Forwardcom as a GPU?

This reminds me that fast Bezier curve performance would be very useful, without having to worry about being a traditional GPU. Bezier curves are central to a lot of 2D rendering, including fonts and vector graphics like SVG. Some are quadratic and some are cubic. I'm looking into this and it seems...
by HubertLamontagne
2020-04-06, 21:17:48
Forum: forwardcom forum
Topic: Possible difficulties for microcode-less implementations
Replies: 8
Views: 17067

Re: Possible difficulties for microcode-less implementations

Thank you for your replies. Having instructions that require multiple micro-ops doesn't necessarily mean you need microcode. You are right, I didn't think about that. I still have some difficulty dealing with this as it complicates the decoding stage. Presumably, decoders don't have a constant late...
by HubertLamontagne
2020-04-02, 17:22:53
Forum: forwardcom forum
Topic: Possible difficulties for microcode-less implementations
Replies: 8
Views: 17067

Re: Possible difficulties for microcode-less implementations

Having instructions that require multiple micro-ops doesn't necessarily mean you need microcode. ARM has many instructions that are multiple-uop (for instance, load+increment a pointer, multiple register store/load...). "Call" and "Ret" are inherently multiple-uop, since you load...
by HubertLamontagne
2020-03-31, 17:14:50
Forum: forwardcom forum
Topic: Using Forwardcom as a GPU?
Replies: 11
Views: 17960

Re: Using Forwardcom as a GPU?

Yeah. The idea of a Forwardcom Xeon-Phi / Larrabbee makes sense to me, especially since it plays to Forwardcom's strengths (lots and lots of vector instruction). Clearly you could go with the UltraSparc way - make it in-order to make the cores small, use really aggressive hyper-threading with lots o...
by HubertLamontagne
2020-03-26, 12:27:20
Forum: forwardcom forum
Topic: Using Forwardcom as a GPU?
Replies: 11
Views: 17960

Re: Using Forwardcom as a GPU?

For sure. I have no illusions - such a project would be likely to turn out like the ill-fated Larrabbee (the articles about its demise are confusing, but they seem to imply that it had something like half the perf of dedicated GPUs, with the drivers still in alpha stage as another generation of GPUs...
by HubertLamontagne
2020-03-21, 22:25:52
Forum: forwardcom forum
Topic: Using Forwardcom as a GPU?
Replies: 11
Views: 17960

Using Forwardcom as a GPU?

Considering how vector and throughput-oriented Forwardcom is, I've been wondering if it would make sense as a GPU. It should be pretty good at vector processing at least. It might make sense to use tiled rendering. For rasterization, you'd use the full vector register size all the time, with registe...
by HubertLamontagne
2020-03-17, 0:17:09
Forum: forwardcom forum
Topic: Possible difficulties for microcode-less implementations
Replies: 8
Views: 17067

Re: Possible difficulties for microcode-less implementations

I'd imagine this would require special handling in the load/store unit if the vector isn't constant sized. Something where you get a kind of double-slot read micro-op or 2 micro-ops (there needs to be 2 reads if the image straddles a cache line boundary anyways), and if there's a potential read faul...
by HubertLamontagne
2020-02-12, 0:20:34
Forum: forwardcom forum
Topic: Putting it on real hardware
Replies: 7
Views: 17130

Re: Putting it on real hardware

You'd start with an in-order implementation in Verilog (or VHDL), I'd think, using block RAM instead of DRAM for instruction memory and data memory at first... and probably no vector support initially, and not too much pipelining at first. Then, you'd build up from there. You'd presumably start with...
by HubertLamontagne
2018-12-18, 5:28:11
Forum: forwardcom forum
Topic: Handling paging without a page system
Replies: 3
Views: 11591

Re: Handling paging without a page system

Presumably, the OS would need to use something like Buddy Memory Allocation system-wide to keep allocations contiguous as much as possible and to limit the number of mappings (and to be able to do multiple hundred megabyte allocations at all). Excessively large mappings that get swapped to disk woul...
by HubertLamontagne
2018-10-08, 1:38:57
Forum: forwardcom forum
Topic: Interesting new ISA: MRISC32
Replies: 13
Views: 28341

Re: Interesting new ISA: MRISC32

A string machine? How would you implement this string cache? Some kind of fast hardware hash function that processes 32 bytes at the time? Hardware accelerated UTF8 character loading and capitalization changes? In particular, the hardware assisted string bank updating sounds really hard to build in ...