HN – TCNs as Alternative to Transformers?

over the last months I've experimented with many alternatives to transformers, such as one i created github repo about:

https://github.com/bggb7781-collab/lrnnsmdds

Architectures I've experimented with and my personal notes as pros and cons:

1. RNNS: Like Mamba, RWKV and my attempt above: likely the best alternative to transformers but hard to parallelize and I've personally encountered weird logical "bugs": a. Severe bias over repeated text and the ending of the text corpus (go figure...). b. Speed similar to transformers.

Pros: a. Very limited RAM utilization and very good ability to generalize and learn, perplexity reaches to extremely low levels (~1.05 for 2+ for GPT in comparison). b. Relatively easy to understand as it uses backprop, feed-forward, matrices, very similar to transformers.

2. HDC: hyperdimensional computing: for the time being mostly sci-fi...

3. SNNs: spiking neural networks - i ended up having several vibecoded implementations in C#, F#, C. Ultimately despite novel ideas, not very succesful. May be could be succesful but at the moment mostly failure...still potential.

4. TCN: temporal convolution networks...best case it seems.

pros: after than rnn/transformers, good generalization, could be the next great reduction of resources in AI-gen!

screenshot of my last tcn attempts:

https://postimg.cc/R3r71PL2 - ultimately, there is a potential!

TCNs as Alternative to Transformers?

0 comments