Using Word 2007 XML Format to Edit Files

Gerald McLeod
Few Microsoft Word users are aware the new default '.docx' format Word documents are saved in are actually compressed (or zipped) files that contain several different XML documents. The reason for shifting to this format is the XML format is suppose to be an improvement on the binary format used by earlier versions of Word. The newer file format creates saved files which are up to 75% smaller than the earlier binary '.doc' files. If you ever wondered why your old '.doc' Word files are not compatible with the new versions of Microsoft Word from version Word 2007 up, this is why.

The XML format consists of individual components that make up the entire document. Each components segment is a module whose contents can be manipulated individual a part from the entire document. To view each XML segment the '.docx' file must be renamed to a '.zip' extension document. From Windows Explorer highlight the file you wish to rename, click file then rename. Change the '.docx' extension to '.zip' and press the enter key on your keyboard. Be sure you make a backup copy of the file you are renaming because all alterations you make to the zip file are permanent.

After the file has been successfully renamed, double click on it and the zipped package will open in a new folder and display a list of the XML components that make up the '.docx' file. You can manipulate and delete the individual modules as you wish without having to open up the file in Word. A text version of the document is contained within the document.xml module. The other modules are so named; the document's style components are housed in the style.xml module, the headers definitions are named headers.xml, template properties will be in theme.xml, etc. The document.xml.rels includes the instructions for reassembling all of the components into the completed document. This xml file should not be altered or deleted.

If any of the components or modules have been deleted from the document while in the '.xml' format, when it is reassembled and renamed to a '.docx' file, Word will automatically attempt to repair the file and placeholders will appear in the area where the alterations occurred. Editing a Word '.docx' file using this method is far more confusing and complicated then using Word's graphical user interface. I uncovered this method while investigating why older Word '.doc' files were incompatible with the newer versions of Microsoft Word. It may be helpful for some, but I have yet to find a use for it.

Resource: Microsoft Word Step by Step by Microsoft Press

Tags: Microsoft word, xml format in word, word 2007, word, ms office

Published by Gerald McLeod

Living in Hawaii over 25 years. 3 adult children who left this pacific paradise for the Pacific Northwest. After years of insurance investigation reports writing is a habit. AC let s me choose what I like...  View profile

1 Comments

Post a Comment
  • Ron12/18/2009

    Editing a DOCX in ZIP 'mode' is handy if you run into the old problem of DOCument corruption. Although, getting away from binary DOC format means there is much less chance of DOC corruption.

    In ZIP mode you can easily extract the text or images to put into a new document.

    You can also use ZIP mode as a learning tool about XML formatting.

To comment, please sign in to your Yahoo! account, or sign up for a new account.