google · Demfier · May 26, 2017
diff --git a/docs/concepts.md b/docs/concepts.md
@@ -27,10 +27,9 @@ An encoder reads in "source data", e.g. a sequence of words or an image, and pro
 
 ## Decoder
 
-A decoder is a generative model that is conditioned on the representation created by the encoder. For example, a Recurrent Neural Network decoder may learn generate the translation for an encoded sentence in another language. For a list of available decoder, see the [Decoder Reference](decoders/).
+A decoder is a generative model that is conditioned on the representation created by the encoder. For example, a Recurrent Neural Network decoder may learn to generate the translation for an encoded sentence in another language. For a list of available decoder, see the [Decoder Reference](decoders/).
 
 
 ## Model
 
-A model defines how to put together an encoder and decoder, and how to calculate and minize the loss functions. It also handles the necessary preprocessing of data read from an input pipeline. Under the hood, each model is implemented as a [model_fn passed to a tf.contrib.learn Estimator](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Estimator). For a list of available models, see the [Models Reference](models/).
-
+A model defines how to put together an encoder and decoder, and how to calculate and minimize the loss functions. It also handles the necessary preprocessing of data read from an input pipeline. Under the hood, each model is implemented as a [model_fn passed to a tf.contrib.learn Estimator](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Estimator). For a list of available models, see the [Models Reference](models/).
diff --git a/docs/encoders.md b/docs/encoders.md
@@ -39,7 +39,7 @@ An encoder that pools over embeddings, as described in [https://arxiv.org/abs/16
 | --- | --- | --- |
 | `pooling_fn` | `tensorflow.layers.average_pooling1d` | The 1-d pooling function to use, e.g. `tensorflow.layers.average_pooling1d`. |
 | `pool_size` | `5` | The pooling window, passed as `pool_size` to the pooling function. |
-| `strides` | `1` | The stride during pooling, passed as `strides` the pooling function. |
+| `strides` | `1` | The stride during pooling, passed as `strides` to the pooling function. |
 | `position_embeddings.enable` | `True` | If true, add position embeddings to the inputs before pooling. |
 | `position_embeddings.combiner_fn` | `tensorflow.add` | Function used to combine the position embeddings with the inputs. For example, `tensorflow.add`. |
 | `position_embeddings.num_positions` | `100` | Size of the position embedding matrix. This should be set to the maximum sequence length of the inputs. |
@@ -56,5 +56,3 @@ hidden layer before the logits as the feature representation.
 | --- | --- | --- |
 | `resize_height` | `299` | Resize the image to this height before feeding it into the convolutional network. |
 | `resize_width` | `299` | Resize the image to this width before feeding it into the convolutional network. |
-
-
diff --git a/docs/index.md b/docs/index.md
@@ -12,9 +12,9 @@ We built tf-seq2seq with the following goals in mind:
 
 - **Usability**: You can train a model with a single command. Several types of input data are supported, including standard raw text.
 
-- **Reproducibility**: Training pipelines and models are configured using YAML files. This allows other to run your exact same model configurations.
+- **Reproducibility**: Training pipelines and models are configured using YAML files. This allows others to run your exact same model configurations.
 
-- **Extensibility**: Code is structured in a modular way and that easy to build upon. For example, adding a new type of attention mechanism or encoder architecture requires only minimal code changes.
+- **Extensibility**: Code is structured in a modular way and that's easy to build upon. For example, adding a new type of attention mechanism or encoder architecture requires only minimal code changes.
 
 - **Documentation**: All code is documented using standard Python docstrings, and we have written guides to help you get started with common tasks.
 

diff --git a/docs/inference.md b/docs/inference.md
@@ -82,7 +82,7 @@ python -m bin.infer \
   ...
 ```
 
-By default, this script generates an `attention_score.npy` array file and one attention plot per example. The array file can be [loaded used numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) and will contain a list of arrays with shape `[target_length, source_length]`. If you only want the raw attention score data without the plots you can enable the `dump_atention_no_plot` parameter.
+By default, this script generates an `attention_score.npy` array file and one attention plot per example. The array file can be [loaded used numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) and will contain a list of arrays with shape `[target_length, source_length]`. If you only want the raw attention score data without the plots you can enable the `dump_attention_no_plot` parameter.
 
 
 

diff --git a/docs/nmt.md b/docs/nmt.md
@@ -91,7 +91,7 @@ export TRAIN_STEPS=1000000
 
 ## Alternative: Generate Toy Data
 
-Training on real-world translation data can take a very long time. If you do not have access to a machine with a GPU but would like to play around with a smaller dataset, we provide a way to generate toy data. The following command will generate a dataset where the target sequences are reversed source sequences. That is, the model needs to learn the reverse the inputs.  While this task is not very useful in practice, we can train such a model quickly and use it as as sanity-check to make sure that the end-to-end pipeline is working as intended.
+Training on real-world translation data can take a very long time. If you do not have access to a machine with a GPU but would like to play around with a smaller dataset, we provide a way to generate toy data. The following command will generate a dataset where the target sequences are reversed source sequences. That is, the model needs to learn the reverse of the inputs.  While this task is not very useful in practice, we can train such a model quickly and use it as as sanity-check to make sure that the end-to-end pipeline is working as intended.
 
 ```
 DATA_TYPE=reverse ./bin/data/toy.sh

diff --git a/docs/tools.md b/docs/tools.md
@@ -20,7 +20,7 @@ To run training on characters you must pass set `source_delimiter` and `target_d
 
 ## Visualizing Beam Search
 
-If you use the `DumpBeams` inference task (see [Inference](inference/) for more details) you can inspect the beam search data by loading the array using numpy, or generate beam search visualizations using the `generate_beam_viz.py` script. This required the `networkx` module to be installed.
+If you use the `DumpBeams` inference task (see [Inference](inference/) for more details) you can inspect the beam search data by loading the array using numpy, or generate beam search visualizations using the `generate_beam_viz.py` script. This requires the `networkx` module to be installed.
 
 ```
 python -m bin.tools.generate_beam_viz  \

diff --git a/seq2seq/test/hooks_test.py b/seq2seq/test/hooks_test.py
@@ -39,16 +39,16 @@ class TestPrintModelAnalysisHook(tf.test.TestCase):
   def test_begin(self):
     model_dir = tempfile.mkdtemp()
     outfile = tempfile.NamedTemporaryFile()
-    tf.get_variable("weigths", [128, 128])
+    tf.get_variable("weights", [128, 128])
     hook = hooks.PrintModelAnalysisHook(
         params={}, model_dir=model_dir, run_config=tf.contrib.learn.RunConfig())
     hook.begin()
 
     with gfile.GFile(os.path.join(model_dir, "model_analysis.txt")) as file:
-      file_contents = file.read().strip()
+      file_contents = tf.compat.as_text(file.read()).strip()
 
-    self.assertEqual(file_contents.decode(), "_TFProfRoot (--/16.38k params)\n"
-                     "  weigths (128x128, 16.38k/16.38k params)")
+    self.assertEqual(file_contents, "_TFProfRoot (--/16.38k params)\n"
+                     "  weights (128x128, 16.38k/16.38k params)")
     outfile.close()
 
 
@@ -94,7 +94,7 @@ def test_sampling(self):
       outfile = os.path.join(self.sample_dir, "samples_000000.txt")
       with open(outfile, "rb") as readfile:
         self.assertIn("Prediction followed by Target @ Step 0",
-                      readfile.read().decode("utf-8"))
+                      tf.compat.as_text(readfile.read()))
 
       # Should not trigger for step 9
       sess.run(tf.assign(global_step, 9))
@@ -108,7 +108,7 @@ def test_sampling(self):
       outfile = os.path.join(self.sample_dir, "samples_000010.txt")
       with open(outfile, "rb") as readfile:
         self.assertIn("Prediction followed by Target @ Step 10",
-                      readfile.read().decode("utf-8"))
+                      tf.compat.as_text(readfile.read()))
 
 
 class TestMetadataCaptureHook(tf.test.TestCase):
-Original file line number
+Diff line change
@@ Expand Up / @@ -82,7 +82,7 @@ python -m bin.infer \ @@
       ...
     ```
-    By default, this script generates an `attention_score.npy` array file and one attention plot per example. The array file can be [loaded used numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) and will contain a list of arrays with shape `[target_length, source_length]`. If you only want the raw attention score data without the plots you can enable the `dump_atention_no_plot` parameter.
+    By default, this script generates an `attention_score.npy` array file and one attention plot per example. The array file can be [loaded used numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) and will contain a list of arrays with shape `[target_length, source_length]`. If you only want the raw attention score data without the plots you can enable the `dump_attention_no_plot` parameter.
@@ Expand Down @@