What's a simple example of removing a node from an XML file using Saxon 12.9 and an XSLT 3.0 stylesheet?
I've got an XML export from Blogger, and I'm puzzled on how to remove just the COMMENT entries but retain the POST entries.
Below is the input.xml file, which contains COMMENT entries I want to remove, but also POST entries that I want to retain in output.xml:
input.xml:
<?xml version='1.0' encoding='utf-8'?> <feed xmlns='http://www.w3.org/2005/Atom' xmlns:blogger='http://schemas.google.com/blogger/2018'> <id>tag:blogger.com,1999:blog-17477</id> <title>Test Blog</title> <entry> <id>tag:blogger.com,1999:blog-17477.post-3947073770</id> <blogger:parent>tag:blogger.com,1999:blog-17477.post-23573</blogger:parent> <blogger:inReplyTo/> <blogger:type>COMMENT</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Name Name</name> <blogger:type>ANONYMOUS</blogger:type> </author> <content type='html'>A comment.....</content> <blogger:created>2024-06-10T10:32:13.389Z</blogger:created> <published>2024-06-10T10:32:13.389Z</published> <updated>2024-06-10T10:32:13.389Z</updated> <blogger:trashed/> </entry> <entry> <id>tag:blogger.com,1999:blog-17477.post-670855911</id> <blogger:type>POST</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Author</name> <uri></uri> <blogger:type>BLOGGER</blogger:type> </author> <title>Title Title</title> <content type='html'>Content Content Content Content Content</content> <blogger:metaDescription/> <blogger:created>2011-01-05T16:33:59.731Z</blogger:created> <published>2011-01-06T12:32:00.001Z</published> <updated>2011-01-06T12:32:00.138Z</updated> <blogger:location/> <category scheme='tag:blogger.com,1999:blog-17477683' term='News'/> <blogger:filename>/2011/01/post.html</blogger:filename> <link/> <enclosure/> <blogger:trashed/> </entry> <entry> <id>tag:blogger.com,1999:blog-17477.post-4539665487</id> <blogger:parent>tag:blogger.com,1999:blog-17477.post-8659501057</blogger:parent> <blogger:inReplyTo/> <blogger:type>COMMENT</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Author 2</name> <blogger:type>BLOGGER</blogger:type> </author> <content type='html'>My comment</content> <blogger:created>2009-11-30T20:09:49.055Z</blogger:created> <published>2009-11-30T20:09:49.055Z</published> <updated>2009-11-30T20:09:49.055Z</updated> <blogger:trashed/> </entry> </feed> Desired output.xml:
<?xml version='1.0' encoding='utf-8'?> <feed xmlns='http://www.w3.org/2005/Atom' xmlns:blogger='http://schemas.google.com/blogger/2018'> <id>tag:blogger.com,1999:blog-17477</id> <title>Test Blog</title> <entry> <id>tag:blogger.com,1999:blog-17477.post-670855911</id> <blogger:type>POST</blogger:type> <blogger:status>LIVE</blogger:status> <author> <name>Author</name> <uri></uri> <blogger:type>BLOGGER</blogger:type> </author> <title>Title Title</title> <content type='html'>Content Content Content Content Content</content> <blogger:metaDescription/> <blogger:created>2011-01-05T16:33:59.731Z</blogger:created> <published>2011-01-06T12:32:00.001Z</published> <updated>2011-01-06T12:32:00.138Z</updated> <blogger:location/> <category scheme='tag:blogger.com,1999:blog-17477683' term='News'/> <blogger:filename>/2011/01/post.html</blogger:filename> <link/> <enclosure/> <blogger:trashed/> </entry> </feed> Below is the skeleton of a stylesheet.xsl, borrowed from Martin Honnen's answer to an earlier question of mine Use xmlstarlet and XPath to find/replace HTML entities in an XML node
sample stylesheet.xsl:
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xpath-default-namespace="http://www.w3.org/2005/Atom" exclude-result-prefixes="#all" expand-text="yes"> // how to remove only COMMENT nodes and leave POST nodes? </xsl:stylesheet> How do I designate only the COMMENT nodes to be removed in stylesheet.xsl?
feedthat applies templates to (or copies) only child nodes you want to keep.