Update README.md

yooper · web-flow · commit bae31c6bd8e3 · 2017-10-31T20:41:35.000-04:00
diff --git a/README.md b/README.md
@@ -6,7 +6,6 @@ php-text-analysis
 
 [![Total Downloads](https://poser.pugx.org/yooper/php-text-analysis/downloads)](https://packagist.org/packages/yooper/php-text-analysis)
 
-[![Latest Unstable Version](https://poser.pugx.org/yooper/php-text-analysis/v/unstable)](https://packagist.org/packages/yooper/php-text-analysis)
 
 PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language. All the documentation for this project can be found in the wiki. 
 
@@ -21,66 +20,52 @@ Documentation for the library resides in the wiki.
 https://github.com/yooper/php-text-analysis/wiki
 
 
-
-
-Dictionary Installation
-=============
-
-Not required unless you use the dictionary stemmers
-
-*For Ubuntu < 16*
-```
-sudo apt-get install libpspell-dev 
-sudo apt-get install php5-pspell
-sudo apt-get install aspell-en
-sudo apt-get install php5-enchant
-```
-*For Ubuntu >= 16*
-```
-sudo apt-get install libpspell-dev php7.0-pspell aspell-en php7.0-enchant
+### Tokenization
+```php
+$tokens = tokenize($text);
 ```
 
-
-*For Centos* 
-```
-sudo yum install php5-pspell
-sudo yum install aspell-en
-sudo yum install php5-enchant
+You can customize which type of tokenizer to tokenize with by passing in the name of the tokenizer class
+```php
+$tokens = tokenize($text, \TextAnalysis\Tokenizers\PennTreeBankTokenizer::class);
 ```
+The default tokenizer is **\TextAnalysis\Tokenizers\GeneralTokenizer::class** . Some tokenizers require parameters to be set upon instantiation. 
 
-*PHP Pecl Stem* is not currently available in php 7.0. 
+### Normalization
+By default, **normalize_tokens** uses the function **strtolower** to lowercase all the tokens. To customize
+the normalize function, pass in either a function or a string to be used by array_map. 
 
+```php
+$normalizedTokens = normalize_tokens(array $tokens); 
+```
 
-Tokenize
-=============
+```php
+$normalizedTokens = normalize_tokens(array $tokens, 'mb_strtolower');
 
-There are several tokenizers available 
+$normalizedTokens = normalize_tokens(array $tokens, function($token){ return mb_strtoupper($token); });
+```
 
- * FixedLengthTokenizer
- * GeneralTokenizer
- * LambdaTokenizer
- * PennTreeBankTokenizer
- * RegexTokenizer
- * SentenceTokenizer 
- * WhitespaceTokenizer
+### Frequency Distributions
 
-*Tokenizer Usage*
-```
-$tokenizer = new GeneralTokenizer()
-$tokens = $tokenizer->tokenize("Enter your text here");
+The call to **freq_dist** returns a [FreqDist](https://github.com/yooper/php-text-analysis/blob/master/src/Analysis/FreqDist.php) instance. 
+```php
+$freqDist = freq_dist(tokenize($text));
 ```
 
-Frequency Distribution
-=============
+### Ngram Generation
+By default bigrams are generated.
+```php
+$bigrams = ngrams($tokens);
 ```
-$tokenizer = new \TextAnalysis\Tokenizers\GeneralTokenizer();
-$tokens = $tokenizer->tokenize("time flies like an arrow and an arrow flies like time");
-$freqDist = new \TextAnalysis\Analysis\FreqDist($tokens);
-$freqDist->getHapaxes(); //Get the Hapaxes
-$freqDist->getTotalTokens();
-$freqDist->getTotalUniqueTokens();
+Customize the ngrams
+```php
+// create trigrams with a pipe delimiter in between each word
+$trigrams = ngrams($tokens,3, '|');
 ```
-Check out the API for full documentation
-https://github.com/yooper/php-text-analysis/blob/master/src/Analysis/FreqDist.php
-
  
+Dictionary Installation
+=============
+
+To do
+
+