Skip to content

Has trouble telling sections apart on "Barack Obama" #332

@harej

Description

@harej

To reproduce:

import mwparserfromhell
import requests

obama = requests.get("https://en.wikipedia.org/wiki/Barack_Obama?action=raw").text
parsed = mwparserfromhell.parse(obama)
sections = parsed.get_sections(levels=[2])

for section in sections:
    print(section.filter_headings())

This results in:

['==Early life and career==', '===Education===', '===Family and personal life===', '===Religious views===']
['==Legal career==', '===Civil rights attorney===']
['==Legislative career==', '===Illinois Senate (1997–2004)===', '===2004 U.S. Senate campaign in Illinois===', '===U.S. Senate (2005–2008)===']
['==Presidential campaigns==', '===2008===', '===2012===']
['==Presidency (2009–2017)==', '===First 100 days===', '===Domestic policy===', '====Racial issues====', '====LGBT rights====', '===== Same-sex marriage =====', '====Economic policy====', '====Environmental policy====', '====Health care reform====', '===Foreign policy===', '====War in Iraq====', '====Afghanistan and Pakistan====', '=====Killing of Osama bin Laden=====', '====Relations with Cuba====', '====Israel====', '====Libya====', '====Syrian civil war====', '====Iran nuclear talks====', '====Russia====']
['==Cultural and political image==', '=== Job approval ===', '===Foreign perceptions===', '=== Thanks, Obama ===', '==Post-presidency (2017–present)==', '==Legacy and recognition ==', '===Presidential library===', '=== Awards and honors ===', '===Eponymy===', '==Bibliography==', '===Books===', '===Audiobooks===', '===Articles===', '==See also==', '===Politics===', '===Other===', '===Lists===', '==Notes==', '==References==', '===Bibliography===', '==Further reading==', '==External links==', '===Official===', '===Other===']

There are more level-2 headers in the article, but it stops after "Cultural and political image", lumping the rest of the article into that section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions