Word representations have been learned by matrix factorisation methods or methods that optimise for similar goals (Levy et al. 2015). However these methods are limited to exploiting only co-occurrence statistics or bag-of-word features. Nevertheless these methods are usually so computationally efficient that they can be trained with huge corpora that may contain billions of words.… Continue reading Learning word representations with sequential modelling