Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

State space models are already being combined with transformers to form new hybrid models. The state-space part of the architecture is weaker in retrieving information from context (can't find a needle in the haystack as context gets longer, the details effectively get compressed away as everything has to fit in a fixed size) but computationally it's quite strong, O(N) not O(N^2).
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: