-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Description
To reproduce:
import mwparserfromhell
import requests
obama = requests.get("https://en.wikipedia.org/wiki/Barack_Obama?action=raw").text
parsed = mwparserfromhell.parse(obama)
sections = parsed.get_sections(levels=[2])
for section in sections:
print(section.filter_headings())
This results in:
['==Early life and career==', '===Education===', '===Family and personal life===', '===Religious views===']
['==Legal career==', '===Civil rights attorney===']
['==Legislative career==', '===Illinois Senate (1997–2004)===', '===2004 U.S. Senate campaign in Illinois===', '===U.S. Senate (2005–2008)===']
['==Presidential campaigns==', '===2008===', '===2012===']
['==Presidency (2009–2017)==', '===First 100 days===', '===Domestic policy===', '====Racial issues====', '====LGBT rights====', '===== Same-sex marriage =====', '====Economic policy====', '====Environmental policy====', '====Health care reform====', '===Foreign policy===', '====War in Iraq====', '====Afghanistan and Pakistan====', '=====Killing of Osama bin Laden=====', '====Relations with Cuba====', '====Israel====', '====Libya====', '====Syrian civil war====', '====Iran nuclear talks====', '====Russia====']
['==Cultural and political image==', '=== Job approval ===', '===Foreign perceptions===', '=== Thanks, Obama ===', '==Post-presidency (2017–present)==', '==Legacy and recognition ==', '===Presidential library===', '=== Awards and honors ===', '===Eponymy===', '==Bibliography==', '===Books===', '===Audiobooks===', '===Articles===', '==See also==', '===Politics===', '===Other===', '===Lists===', '==Notes==', '==References==', '===Bibliography===', '==Further reading==', '==External links==', '===Official===', '===Other===']
There are more level-2 headers in the article, but it stops after "Cultural and political image", lumping the rest of the article into that section.
Metadata
Metadata
Assignees
Labels
No labels