Machine translated Telugu wikipedia for plants
The aim of this Honors project is to increase the number of Telugu wikipedia pages in any domain. The domain I chose is plants domain. The total number of plant telugu wikipedia pages I created are 7416.
- Scraping the data for plants to create a plant database.
 - Machine translating the data into Telugu.
 - Creating a Template for wikipedia article.
 - Creating .xml for wikipedia articles.
 
- Data Scraping from plants related websites.
 - Machine translating the plants database.
 - Prepared a database for plants using 40 attributes and 7416 types.
 - Post editing and formatiing the machine translated data.
 - Final plants database Here.
 
- First collect the data to make a dataset.
 - Use machine translation to translate the database into telugu database.
 - Correct the telugu database after machine translation.
 - The .csv file for plants dataset. here
 - Use the jinja2 template to write a template for the wikipedia page.
 - Use macro for writing the template.
 - use python to create a .xml file for uploading to the sandbox.
 
- Collected datafor plants database.
 - Translated data into telugu.
 - Created template for article generation.
 - Created .xml file for 7416 wikipedia pages based on different plants.
 
- Correct some mistakes in the template.
 - Translate the sentences bettter by using Deeptrans.
 - Collect image data from wiki commons.
 - Change data such that it can fit into any template so that it can be translated into different languages.
 - Add more detailed data to the plants wikipedia.
 - Upload the .xml files to wikipedia.
 
a) The sentences were finalisedwith the help of kasyap sir and vibha ma'am.
b) Infobox is added to the template.
c) categories are also included in the template.
d) The article is divided into easy to understand/search categories. This is to make it easier for the user to understand/search in the wikipedia article.
a) Tried using Deeptrans to translate the sentences but was not satisfied by the translated result.
b) Then tried to translate with anuvaad only to get unsatifactory translations.
c) Then used bing_translate for the translation of the sentences and paragraphs but the translations were just ok and some places the translation was horrible.
d) Even tried all these methods by diving the paragraphs into sentences then translating them this was a little better compared to the before attempts but even so some of the sentences were not translating correctly as the intuition or the assuming words of translating tool makes the translation difficult to happen correctly.
e) Due to this I gave up the idea of translating the paragraphs and decided to extract the important data from the paragraphs and then make it into structured data for using as structured data is easy to use and translate into different languages at any time.
f) So, the data was completely converted into structure data and can be used to make into  different languages wikipedia with little effort.
a) Completed collecting images and the image thumb from the wiki commons.
4) Change data such that it can fit into any template so that it can be translated into different languages:
a) Changed the data into structured data so with the help of transliteration and the translation of some attributes the data can fit any template.
a)Completed creating the xml dump, will have to send it to ramu sir for checking and the upload.
a) Transliterated many of the attributes.
b) Experienced many translators due to relatively less accurate translators for telugu language.
c) Got a grasp for extracting text, links, data.
d) Added infobox and categories to the template.
e) Extracted key words from 4 categories.
The final dataset which is strutured is here.
The code required to run this file can be viewed here.
The final xml file can be downloaded here.
There are 7416 plants in my plants database. Completed and finalised the database. Completed the template. Created xml files, needs to be reviewed.