Skip to content

Commit 5462733

Browse files
authored
Merge pull request #1840 from d2l-ai/master
Release v0.17.0
2 parents b9483ae + 7684ce7 commit 5462733

File tree

69 files changed

+18898
-673
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+18898
-673
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Our goal is to offer a resource that could
3434

3535
## Cool Papers Using D2L
3636

37+
1. [**Descending through a Crowded Valley--Benchmarking Deep Learning Optimizers**](https://arxiv.org/pdf/2007.01547.pdf). R. Schmidt, F. Schneider, P. Hennig. *International Conference on Machine Learning, 2021*
38+
3739
1. [**Universal Average-Case Optimality of Polyak Momentum**](https://arxiv.org/pdf/2002.04664.pdf). D. Scieur, F. Pedregosan. *International Conference on Machine Learning, 2020*
3840

3941
1. [**2D Digital Image Correlation and Region-Based Convolutional Neural Network in Monitoring and Evaluation of Surface Cracks in Concrete Structural Elements**](https://www.mdpi.com/1996-1944/13/16/3527/pdf). M. Słoński, M. Tekieli. *Materials, 2020*
@@ -42,11 +44,9 @@ Our goal is to offer a resource that could
4244

4345
1. [**Detecting Human Driver Inattentive and Aggressive Driving Behavior Using Deep Learning: Recent Advances, Requirements and Open Challenges**](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9107077). M. Alkinani, W. Khan, Q. Arshad. *IEEE Access, 2020*
4446

45-
1. [**Diagnosing Parkinson by Using Deep Autoencoder Neural Network**](https://link.springer.com/chapter/10.1007/978-981-15-6325-6_5). U. Kose, O. Deperlioglu, J. Alzubi, B. Patrut. *Deep Learning for Medical Decision Support Systems, 2020*
46-
4747
<details><summary>more</summary>
4848

49-
1. [**Descending through a Crowded Valley--Benchmarking Deep Learning Optimizers**](https://arxiv.org/pdf/2007.01547.pdf). R. Schmidt, F. Schneider, P. Hennig.
49+
1. [**Diagnosing Parkinson by Using Deep Autoencoder Neural Network**](https://link.springer.com/chapter/10.1007/978-981-15-6325-6_5). U. Kose, O. Deperlioglu, J. Alzubi, B. Patrut. *Deep Learning for Medical Decision Support Systems, 2020*
5050

5151
1. [**Deep Learning Architectures for Medical Diagnosis**](https://link.springer.com/chapter/10.1007/978-981-15-6325-6_2). U. Kose, O. Deperlioglu, J. Alzubi, B. Patrut. *Deep Learning for Medical Decision Support Systems, 2020*
5252

chapter_attention-mechanisms/bahdanau-attention.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,7 @@ d2l.show_heatmaps(attention_weights[:, :, :, :len(engs[-1].split()) + 1],
431431
* When predicting a token, if not all the input tokens are relevant, the RNN encoder-decoder with Bahdanau attention selectively aggregates different parts of the input sequence. This is achieved by treating the context variable as an output of additive attention pooling.
432432
* In the RNN encoder-decoder, Bahdanau attention treats the decoder hidden state at the previous time step as the query, and the encoder hidden states at all the time steps as both the keys and values.
433433

434+
434435
## Exercises
435436

436437
1. Replace GRU with LSTM in the experiment.

chapter_computer-vision/transposed-conv.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ Therefore,
308308
the transposed convolutional layer
309309
can just exchange the forward propagation function
310310
and the backpropagation function of the convolutional layer:
311-
its forward propagation
311+
its forward propagation
312312
and backpropagation functions
313313
multiply their input vector with
314314
$\mathbf{W}^\top$ and $\mathbf{W}$, respectively.

chapter_convolutional-neural-networks/channels.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ corr2d_multi_in(X, K)
102102
```
103103

104104
## Multiple Output Channels
105+
:label:`subsec_multi-output-channels`
105106

106107
Regardless of the number of input channels,
107108
so far we always ended up with one output channel.

chapter_deep-learning-computation/model-construction.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,13 +206,29 @@ Before we implement our own custom block,
206206
we briefly summarize the basic functionality
207207
that each block must provide:
208208

209+
:begin_tab:`mxnet, tensorflow`
210+
211+
1. Ingest input data as arguments to its forward propagation function.
212+
1. Generate an output by having the forward propagation function return a value. Note that the output may have a different shape from the input. For example, the first fully-connected layer in our model above ingests an input of arbitrary dimension but returns an output of dimension 256.
213+
1. Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation function. Typically this happens automatically.
214+
1. Store and provide access to those parameters necessary
215+
to execute the forward propagation computation.
216+
1. Initialize model parameters as needed.
217+
218+
:end_tab:
219+
220+
:begin_tab:`pytorch`
221+
209222
1. Ingest input data as arguments to its forward propagation function.
210-
1. Generate an output by having the forward propagation function return a value. Note that the output may have a different shape from the input. For example, the first fully-connected layer in our model above ingests an input of arbitrary dimension but returns an output of dimension 256.
223+
1. Generate an output by having the forward propagation function return a value. Note that the output may have a different shape from the input. For example, the first fully-connected layer in our model above ingests an input of dimension 20 but returns an output of dimension 256.
211224
1. Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation function. Typically this happens automatically.
212225
1. Store and provide access to those parameters necessary
213226
to execute the forward propagation computation.
214227
1. Initialize model parameters as needed.
215228

229+
:end_tab:
230+
231+
216232
In the following snippet,
217233
we code up a block from scratch
218234
corresponding to an MLP

chapter_natural-language-processing-applications/index.md

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,34 @@
11
# Natural Language Processing: Applications
22
:label:`chap_nlp_app`
33

4-
We have seen how to represent text tokens and train their representations in :numref:`chap_nlp_pretrain`.
4+
We have seen how to represent tokens in text sequences and train their representations in :numref:`chap_nlp_pretrain`.
55
Such pretrained text representations can be fed to various models for different downstream natural language processing tasks.
66

7-
This book does not intend to cover natural language processing applications in a comprehensive manner.
8-
Our focus is on *how to apply (deep) representation learning of languages to addressing natural language processing problems*.
9-
Nonetheless, we have already discussed several natural language processing applications without pretraining in earlier chapters,
7+
In fact,
8+
earlier chapters have already discussed some natural language processing applications
9+
*without pretraining*,
1010
just for explaining deep learning architectures.
1111
For instance, in :numref:`chap_rnn`,
1212
we have relied on RNNs to design language models to generate novella-like text.
1313
In :numref:`chap_modern_rnn` and :numref:`chap_attention`,
14-
we have also designed models based on RNNs and attention mechanisms
15-
for machine translation.
14+
we have also designed models based on RNNs and attention mechanisms for machine translation.
15+
16+
However, this book does not intend to cover all such applications in a comprehensive manner.
17+
Instead,
18+
our focus is on *how to apply (deep) representation learning of languages to addressing natural language processing problems*.
1619
Given pretrained text representations,
17-
in this chapter, we will consider two more downstream natural language processing tasks:
18-
sentiment analysis and natural language inference.
19-
These are popular and representative natural language processing applications:
20-
the former analyzes single text and the latter analyzes relationships of text pairs.
20+
this chapter will explore two
21+
popular and representative
22+
downstream natural language processing tasks:
23+
sentiment analysis and natural language inference,
24+
which analyze single text and relationships of text pairs, respectively.
2125

2226
![Pretrained text representations can be fed to various deep learning architectures for different downstream natural language processing applications. This chapter focuses on how to design models for different downstream natural language processing applications.](../img/nlp-map-app.svg)
2327
:label:`fig_nlp-map-app`
2428

2529
As depicted in :numref:`fig_nlp-map-app`,
2630
this chapter focuses on describing the basic ideas of designing natural language processing models using different types of deep learning architectures, such as MLPs, CNNs, RNNs, and attention.
27-
Though it is possible to combine any pretrained text representations with any architecture for either downstream natural language processing task in :numref:`fig_nlp-map-app`,
31+
Though it is possible to combine any pretrained text representations with any architecture for either application in :numref:`fig_nlp-map-app`,
2832
we select a few representative combinations.
2933
Specifically, we will explore popular architectures based on RNNs and CNNs for sentiment analysis.
3034
For natural language inference, we choose attention and MLPs to demonstrate how to analyze text pairs.
@@ -33,7 +37,7 @@ for a wide range of natural language processing applications,
3337
such as on a sequence level (single text classification and text pair classification)
3438
and a token level (text tagging and question answering).
3539
As a concrete empirical case,
36-
we will fine-tune BERT for natural language processing.
40+
we will fine-tune BERT for natural language inference.
3741

3842
As we have introduced in :numref:`sec_bert`,
3943
BERT requires minimal architecture changes

chapter_natural-language-processing-applications/natural-language-inference-and-dataset.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ To study this problem, we will begin by investigating a popular natural language
4848

4949
## The Stanford Natural Language Inference (SNLI) Dataset
5050

51-
Stanford Natural Language Inference (SNLI) Corpus is a collection of over $500,000$ labeled English sentence pairs :cite:`Bowman.Angeli.Potts.ea.2015`.
51+
Stanford Natural Language Inference (SNLI) Corpus is a collection of over 500000 labeled English sentence pairs :cite:`Bowman.Angeli.Potts.ea.2015`.
5252
We download and store the extracted SNLI dataset in the path `../data/snli_1.0`.
5353

5454
```{.python .input}
@@ -110,7 +110,7 @@ def read_snli(data_dir, is_train):
110110
return premises, hypotheses, labels
111111
```
112112

113-
Now let us print the first $3$ pairs of premise and hypothesis, as well as their labels ("0", "1", and "2" correspond to "entailment", "contradiction", and "neutral", respectively ).
113+
Now let us print the first 3 pairs of premise and hypothesis, as well as their labels ("0", "1", and "2" correspond to "entailment", "contradiction", and "neutral", respectively ).
114114

115115
```{.python .input}
116116
#@tab all
@@ -121,8 +121,8 @@ for x0, x1, y in zip(train_data[0][:3], train_data[1][:3], train_data[2][:3]):
121121
print('label:', y)
122122
```
123123

124-
The training set has about $550,000$ pairs,
125-
and the testing set has about $10,000$ pairs.
124+
The training set has about 550000 pairs,
125+
and the testing set has about 10000 pairs.
126126
The following shows that
127127
the three labels "entailment", "contradiction", and "neutral" are balanced in
128128
both the training set and the testing set.
@@ -246,7 +246,7 @@ def load_data_snli(batch_size, num_steps=50):
246246
return train_iter, test_iter, train_set.vocab
247247
```
248248

249-
Here we set the batch size to $128$ and sequence length to $50$,
249+
Here we set the batch size to 128 and sequence length to 50,
250250
and invoke the `load_data_snli` function to get the data iterators and vocabulary.
251251
Then we print the vocabulary size.
252252

@@ -258,7 +258,7 @@ len(vocab)
258258

259259
Now we print the shape of the first minibatch.
260260
Contrary to sentiment analysis,
261-
we have $2$ inputs `X[0]` and `X[1]` representing pairs of premises and hypotheses.
261+
we have two inputs `X[0]` and `X[1]` representing pairs of premises and hypotheses.
262262

263263
```{.python .input}
264264
#@tab all

0 commit comments

Comments
 (0)