1.

Word2vec skip gram with negative sampling implementation

Answer» NOTE labels is 2 dimensional as required by tf.nn.sampled_softmax_loss used for negative sampling.

The embedding matrix has a size of the number of words by the number of units in the hidden layer. So, if you have 10,000 words and 300 hidden units, the matrix will have size 10,000×300. Remember that we’re using tokenized DATA for our inputs, USUALLY as integers, where the number of tokens is the number of words in our vocabulary.

n_vocab = len(int_to_vocab)
n_embedding = 300
embedding = tf.Variable(tf.random_uniform((n_vocab, n_embedding), -1, 1))
EMBED = tf.nn.embedding_lookup(embedding, inputs)


Discussion

No Comment Found

Related InterviewSolutions