Remove text from a PDF #4221
              
                
                  
                  
                    Answered
                  
                  by
                    JorjMcKie
                  
              
          
                  
                    
                      samuelbradshaw
                    
                  
                
                  asked this question in
                Looking for help
              
            -
| Hi! Is there an efficient way to remove/delete all text from a PDF with PyMuPDF? | 
Beta Was this translation helpful? Give feedback.
      
      
          Answered by
          
            JorjMcKie
          
      
      
        Jan 13, 2025 
      
    
    Replies: 1 comment 10 replies
-
| The easiest way to remove all text is using "redaction annotations" (from all or selected pages): doc = pymupdf.open("input.pdf")
page = doc[0]  # 0 or any 0-based page number
page.add_redact_annot(page.rect)  # redaction annotation covering the full page
page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE,  # keep the images
    graphics=pymupdf.PDF_REDACT_LINE_ART_NONE,  # keep vector graphics
    )Specific text erasures work the same way, except you have to determine the desired boundary box to use instead of  # extract text and full meta data exclusively (no images)
for block in page.get_text("dict", flags=pymupdf.TEXTFLAGS_TEXT)["blocks"]:
    for line in block["lines"]:
        for span in line["spans"]:
            if "unwanted font name" in span["font"]:
                page.add_redact_annot(span["bbox"])  # cover text span with redact annot
page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE,  # keep the images
    graphics=pymupdf.PDF_REDACT_LINE_ART_NONE,  # keep vector graphics
    )Important: 
 | 
Beta Was this translation helpful? Give feedback.
                  
                    10 replies
                  
                
            
      Answer selected by
        samuelbradshaw
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
The easiest way to remove all text is using "redaction annotations" (from all or selected pages):
Specific text erasures work the same way, except you have to determine the desired boundary box to use instead of
page.rect.