File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,6 @@
2424
"<p>We subtract <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> before calculating the exponents to stabilize the softmax calculation.</p>\n<p>If <span translate=no>_^_2_^_</span> is large <span translate=no>_^_3_^_</span> becomes huge and the computation of <span translate=no>_^_4_^_</span>becomes unstable. Subtracting a constant before calculating the exponent from numerator and denominator will cancel out. and can help stabilize the computation. So we subtract <span translate=no>_^_5_^_</span> to stabilize the computation. </p>\n": "<p>\u6211\u4eec\u5728\u8ba1\u7b97\u6307\u6570<span translate=no>_^_1_^_</span>\u4e4b\u524d\u51cf\u53bb<span translate=no>_^_0_^_</span>\u548c\uff0c\u4ee5\u7a33\u5b9asoftmax\u7684\u8ba1\u7b97\u3002</p>\n<p><span translate=no>_^_2_^_</span>if \u5927<span translate=no>_^_3_^_</span>\u53d8\u5927\uff0c\u8ba1\u7b97<span translate=no>_^_4_^_</span>\u53d8\u5f97\u4e0d\u7a33\u5b9a\u3002\u5728\u8ba1\u7b97\u5206\u5b50\u548c\u5206\u6bcd\u7684\u6307\u6570\u4e4b\u524d\u51cf\u53bb\u4e00\u4e2a\u5e38\u6570\u5c06\u62b5\u6d88\u3002\u5e76\u4e14\u53ef\u4ee5\u5e2e\u52a9\u7a33\u5b9a\u8ba1\u7b97\u3002\u6240\u4ee5\u6211\u4eec\u51cf\u53bb<span translate=no>_^_5_^_</span>\u4ee5\u7a33\u5b9a\u8ba1\u7b97\u3002</p>\n",
2525
"<span translate=no>_^_0_^_</span><p>We compute <span translate=no>_^_1_^_</span>, <span translate=no>_^_2_^_</span> and <span translate=no>_^_3_^_</span> separately and do a matrix multiplication. We use einsum for clarity. </p>\n": "<span translate=no>_^_0_^_</span><p>\u6211\u4eec<span translate=no>_^_3_^_</span>\u5206\u522b\u8ba1\u7b97<span translate=no>_^_1_^_</span>\uff0c<span translate=no>_^_2_^_</span>\u7136\u540e\u8fdb\u884c\u77e9\u9635\u4e58\u6cd5\u3002\u4e3a\u4e86\u6e05\u695a\u8d77\u89c1\uff0c\u6211\u4eec\u4f7f\u7528 einsum\u3002</p>\n",
2626
"<ul><li><span translate=no>_^_0_^_</span> is the number of features in the <span translate=no>_^_1_^_</span>, <span translate=no>_^_2_^_</span> and <span translate=no>_^_3_^_</span> vectors. </li>\n<li><span translate=no>_^_4_^_</span> is <span translate=no>_^_5_^_</span> </li>\n<li><span translate=no>_^_6_^_</span> is the local window size <span translate=no>_^_7_^_</span> </li>\n<li><span translate=no>_^_8_^_</span> is whether to have a bias parameter for transformations for <span translate=no>_^_9_^_</span>, <span translate=no>_^_10_^_</span> and <span translate=no>_^_11_^_</span>.</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span>\u662f<span translate=no>_^_1_^_</span>\u3001<span translate=no>_^_2_^_</span>\u548c<span translate=no>_^_3_^_</span>\u5411\u91cf\u4e2d\u7684\u8981\u7d20\u6570\u3002</li>\n<li><span translate=no>_^_4_^_</span>\u662f<span translate=no>_^_5_^_</span></li>\n<li><span translate=no>_^_6_^_</span>\u662f\u672c\u5730\u7a97\u53e3\u5927\u5c0f<span translate=no>_^_7_^_</span></li>\n<li><span translate=no>_^_8_^_</span>\u662f\u662f\u5426\u4e3a<span translate=no>_^_10_^_</span>\u548c\u7684\u53d8\u6362<span translate=no>_^_9_^_</span>\u8bbe\u7f6e\u504f\u7f6e\u53c2\u6570<span translate=no>_^_11_^_</span>\u3002</li></ul>\n",
27-
"An Attention Free Transformer": "\u514d\u6ce8\u610f\u7684\u53d8\u538b\u5668",
28-
"This is an annotated implementation/tutorial of the AFT (Attention Free Transformer) in PyTorch.": "\u8fd9\u662f PyTorch \u4e2dAFT\uff08\u514d\u6ce8\u610f\u53d8\u538b\u5668\uff09\u7684\u5e26\u6ce8\u91ca\u7684\u5b9e\u73b0/\u6559\u7a0b\u3002"
27+
"An Attention Free Transformer": "\u4e00\u79cd\u65e0\u6ce8\u610f\u529b\u7684 Transformer",
28+
"This is an annotated implementation/tutorial of the AFT (Attention Free Transformer) in PyTorch.": "\u8fd9\u662f\u4e00\u4e2a PyTorch\u5b9e\u73b0\u7684 AFT \uff08\u65e0\u6ce8\u610f\u529b Transformer \uff09\u5e26\u6ce8\u91ca\u5b9e\u73b0/\u6559\u7a0b\u3002"
2929
}
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
2-
"<h1><a href=\"https://nn.labml.ai/transformers/aft/index.html\">An Attention Free Transformer</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2105.14103\">An Attention Free Transformer</a>.</p>\n<p>This paper replaces the <a href=\"https://nn.labml.ai/transformers/mha.html\">self-attention layer</a> with a new efficient operation, that has memory complexity of O(Td), where T is the sequence length and <span translate=no>_^_0_^_</span> is the dimensionality of embeddings.</p>\n<p>The paper introduces AFT along with AFT-local and AFT-conv. Here we have implemented AFT-local which pays attention to closeby tokens in an autoregressive model. </p>\n": "<h1><a href=\"https://nn.labml.ai/transformers/aft/index.html\">\u4e00\u6b3e\u65e0\u6ce8\u610f\u7684\u53d8\u5f62\u91d1\u521a</a></h1>\n<p>\u8fd9\u662f <a href=\"https://pytorch.org\">PyTorch \u5bf9</a>\u300a<a href=\"https://arxiv.org/abs/2105.14103\">\u65e0\u6ce8\u610f\u529b\u7684\u53d8\u5f62\u91d1\u521a\u300b\u4e00\u6587\u7684</a>\u5b9e\u73b0\u3002</p>\n<p>\u672c\u6587\u7528\u4e00\u79cd\u65b0\u7684\u9ad8\u6548\u8fd0\u7b97\u53d6\u4ee3\u4e86<a href=\"https://nn.labml.ai/transformers/mha.html\">\u81ea\u6211\u6ce8\u610f\u529b\u5c42</a>\uff0c\u8be5\u8fd0\u7b97\u7684\u5b58\u50a8\u590d\u6742\u5ea6\u4e3aO\uff08Td\uff09\uff0c\u5176\u4e2d T \u662f\u5e8f\u5217\u957f\u5ea6\uff0c<span translate=no>_^_0_^_</span>\u662f\u5d4c\u5165\u7684\u7ef4\u5ea6\u3002</p>\n<p>\u672c\u6587\u4ecb\u7ecd\u4e86 AFT \u4ee5\u53ca AFT-Local \u548c AFT-conv\u3002\u8fd9\u91cc\u6211\u4eec\u5b9e\u73b0\u4e86 aft-Local\uff0c\u5b83\u5173\u6ce8\u81ea\u56de\u5f52\u6a21\u578b\u4e2d\u7684 cloby \u4ee3\u5e01\u3002</p>\n",
3-
"An Attention Free Transformer": "\u514d\u6ce8\u610f\u7684\u53d8\u538b\u5668"
2+
"<h1><a href=\"https://nn.labml.ai/transformers/aft/index.html\">An Attention Free Transformer</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/2105.14103\">An Attention Free Transformer</a>.</p>\n<p>This paper replaces the <a href=\"https://nn.labml.ai/transformers/mha.html\">self-attention layer</a> with a new efficient operation, that has memory complexity of O(Td), where T is the sequence length and <span translate=no>_^_0_^_</span> is the dimensionality of embeddings.</p>\n<p>The paper introduces AFT along with AFT-local and AFT-conv. Here we have implemented AFT-local which pays attention to closeby tokens in an autoregressive model. </p>\n": "<h1><a href=\"https://nn.labml.ai/transformers/aft/index.html\">\u4e00\u79cd\u65e0\u6ce8\u610f\u529b\u7684 Transformer </a></h1>\n<p>\u8fd9\u662f\u8bba\u6587 <a href=\"https://arxiv.org/abs/2105.14103\">\u300a\u4e00\u79cd\u65e0\u6ce8\u610f\u529b\u7684 Transformer \u300b</a>\u7684<a href=\"https://pytorch.org\">PyTorch </a>\u5b9e\u73b0\u3002</p>\n<p>\u8fd9\u7bc7\u8bba\u6587\u7528\u4e00\u79cd\u65b0\u7684\u9ad8\u6548\u64cd\u4f5c\u66ff\u4ee3\u4e86<a href=\"https://nn.labml.ai/transformers/mha.html\">\u81ea\u6ce8\u610f\u529b\u5c42</a>\uff0c\u8be5\u8fd0\u7b97\u7684\u5b58\u50a8\u590d\u6742\u5ea6\u4e3aO\uff08Td\uff09\uff0c\u5176\u4e2d T \u662f\u5e8f\u5217\u957f\u5ea6\uff0c<span translate=no>_^_0_^_</span>\u662f\u5d4c\u5165\u7684\u7ef4\u5ea6\u3002</p>\n<p>\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86 AFT \u4ee5\u53ca AFT-local \u548c AFT-conv \u3002\u8fd9\u91cc\u6211\u4eec\u5b9e\u73b0\u4e86 AFT-local \uff0c\u5b83\u4f1a\u5728\u81ea\u56de\u5f52\u6a21\u578b\u4e2d\u5173\u6ce8\u90bb\u8fd1\u7684 token \u3002</p>\n",
3+
"An Attention Free Transformer": "\u4e00\u79cd\u65e0\u6ce8\u610f\u529b\u7684 Transformer"
44
}

0 commit comments

Comments
 (0)