Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
255 views
in Technique[技术] by (71.8m points)

machine learning - What is the difference between an Embedding Layer and a Dense Layer?

The docs for an Embedding Layer in Keras say:

Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

I believe this could also be achieved by encoding the inputs as one-hot vectors of length vocabulary_size, and feeding them into a Dense Layer.

Is an Embedding Layer merely a convenience for this two-step process, or is something fancier going on under the hood?

question from:https://stackoverflow.com/questions/47868265/what-is-the-difference-between-an-embedding-layer-and-a-dense-layer

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

An embedding layer is faster, because it is essentially the equivalent of a dense layer that makes simplifying assumptions.

Imagine a word-to-embedding layer with these weights:

w = [[0.1, 0.2, 0.3, 0.4],
     [0.5, 0.6, 0.7, 0.8],
     [0.9, 0.0, 0.1, 0.2]]

A Dense layer will treat these like actual weights with which to perform matrix multiplication. An embedding layer will simply treat these weights as a list of vectors, each vector representing one word; the 0th word in the vocabulary is w[0], 1st is w[1], etc.


For an example, use the weights above and this sentence:

[0, 2, 1, 2]

A naive Dense-based net needs to convert that sentence to a 1-hot encoding

[[1, 0, 0],
 [0, 0, 1],
 [0, 1, 0],
 [0, 0, 1]]

then do a matrix multiplication

[[1 * 0.1 + 0 * 0.5 + 0 * 0.9, 1 * 0.2 + 0 * 0.6 + 0 * 0.0, 1 * 0.3 + 0 * 0.7 + 0 * 0.1, 1 * 0.4 + 0 * 0.8 + 0 * 0.2],
 [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2],
 [0 * 0.1 + 1 * 0.5 + 0 * 0.9, 0 * 0.2 + 1 * 0.6 + 0 * 0.0, 0 * 0.3 + 1 * 0.7 + 0 * 0.1, 0 * 0.4 + 1 * 0.8 + 0 * 0.2],
 [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2]]

=

[[0.1, 0.2, 0.3, 0.4],
 [0.9, 0.0, 0.1, 0.2],
 [0.5, 0.6, 0.7, 0.8],
 [0.9, 0.0, 0.1, 0.2]]

However, an Embedding layer simply looks at [0, 2, 1, 2] and takes the weights of the layer at indices zero, two, one, and two to immediately get

[w[0],
 w[2],
 w[1],
 w[2]]

=

[[0.1, 0.2, 0.3, 0.4],
 [0.9, 0.0, 0.1, 0.2],
 [0.5, 0.6, 0.7, 0.8],
 [0.9, 0.0, 0.1, 0.2]]

So it's the same result, just obtained in a hopefully faster way.


The Embedding layer does have limitations:

  • The input needs to be integers in [0, vocab_length).
  • No bias.
  • No activation.

However, none of those limitations should matter if you just want to convert an integer-encoded word into an embedding.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...