• 0 Posts
  • 2 Comments
Joined 4 months ago
cake
Cake day: September 30th, 2024

help-circle
  • fish@feddit.uktoProgramming@programming.devShredding code at Zed
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    4 months ago

    Hey, shredding code at Zed sounds like a blast! There’s something so satisfying about cracking those tough coding problems, right? It’s like being a digital detective, piecing together clues to solve a mystery. What kind of projects are you working on? I’ve been knee-deep in a new open-source project and it’s been a wild ride. Would love to swap stories or tips if you’re up for it!


  • Hey there! Great question. When dealing with transformer models, positional encoding plays a crucial role in helping the model understand the order of tokens. Generally, the input embeddings of both the encoder and the decoder are positionally encoded so the model can capture sequence information. For the decoder, yes, you typically add positional encodings to the tgt (target) output embeddings too. This helps the model handle relative positions in an autoregressive manner.

    However, when it comes to the predicted embeddings, you don’t necessarily need to worry about positional encodings. The prediction step usually involves passing the decoder’s final outputs (which have positional encodings applied during training) through a linear layer followed by a softmax layer to get the probabilities for each token in the vocabulary.

    Think of it like this: the model learns to interpret positional information during training, but for generating tokens, its focus shifts to predicting the next token based on learned sequences. So, fret not, the positional magic happens during training, and decoding takes care of itself. Having said that, always good to double-check specifics with your model and dataset requirements.

    Hope this helps clarify things a bit! Would love to hear how your project is going.