From Rick:

In Part-2 of this series, I promised a Part-3 to deal with how to create a clean source file. As often happens, other projects that were already running behind prevented me from tackling this third part of the series at the time. For those who missed the previous posts, here’s the link to Part-2. Near the start of that post is the link to Part-1.


In those posts I showed you what to search for and how to remove or fix much of the garbage. In this post, I’m going to give a review list of what you need to clean out, show you how to tag things you want to keep so you can restore them later, then show you how to eliminate the rest of the invisible garbage in the file that you may not know is there.

The list below comes from an article by J. W. Manus that is no longer available online (at least I couldn’t find it). I made reference to that article on Part-2. I have a PDF copy I downloaded, but without permission, I cannot repost it. The article on tagging and restoring italics is still out there, but I don’t want to risk that disappearing, so I will provide those instructions below.

For purposes of explanation, the “source file” is the clean, master file that you will use to produce your ebook or print book. The reason this file needs to be clean is not so much for the print book but for the ebook. MS Word puts a lot of invisible stuff in its documents. Some of this won’t hurt anything, but some can royally mess up your ebook.

Here is the list of everything you MUST do in order to have a clean source file:


Remove tabs
Remove headers and page numbering
Remove extra paragraph returns
Remove extra spaces between words and sentences
Remove spaces before and after paragraph returns
Remove page breaks (but you may want to tag chapter breaks)
Remove line breaks
Remove columns and text boxes (but see List #2)


Tag chapter, scene, and section breaks
Tag special formatting such as italics, bold, underlining
Tag special layouts such as lists, block quotes, centered text
Tag or mark anything that needs to be in the final manuscript and that’s not included in the list above

Now, before you work yourself into a fit and give yourself a heart attack, let me clarify some things about the items in List #1.

Extra returns can mess up an ebook file. Extra spaces and tabs in particular can result in weird spacing between words. Therefore, these all MUST BE REMOVED!

Our ultimate goal is to remove anything, like invisible junk and special characters, that can interfere with a properly formatted end product. By the way, those superscript -st, -nd, -rd, -th letters with 1st, 2nd, 3rd, 4th that Word so nicely puts in are considered special characters and may not display properly in some ebooks. This is why I turn off all of those automatic replacements that the Microsoft thoughtfully made as defaults in MS Word. If you don’t know how to disable automatic replacement and formatting, look them up online.

NOTE: The process I document below to get a clean source will remove any superscripts and subscripts. It will NOT remove small fractions (1/4, 1/2, 3/4), but those are not considered special characters and should display properly in ebooks. I will show you how to convert those to normal fractions if you wish to do so.

Once the file is clean, we can carefully reinsert permissible formatting. That’s what List #2 is about. You will need to mark (tag) anything that you want to put back because the process I’m going to show you will remove EVERY bit of formatting, including italics. The only formatting that will remain are standard punctuation. Smart quotes will be retained.

For page breaks, most writers do those at chapter breaks. If your chapter begin with the word “Chapter” then you can simply search those out and add page breaks there after the “cleaning.” If you don’t use the word “chapter” but simply use chapter numbers as numerals or written out, then you should pick some standard symbol to put next to the chapter number for easy searching later. Be sure it’s not a symbol you use elsewhere, such as for scene breaks. Do not use the caret symbol ^.

Speaking of scene breaks, be sure you ALWAYS use a physical mark to indicate those, never a blank line (because blank lines require extra returns and we’re eliminating all those). A common scene break marker is asterisks (***). I recommend these, unless you use multiple asterisks in the text, and you can always replace them with some other scene break designator later.

Continue this process for everything in List #2 EXCEPT for the italics, bold, and underlining. First, you should be aware that underlining is almost never used in most books today except for Internet links and perhaps in some textbooks. You should therefore NOT use underlining in your novels. As for bold text, this should rarely be used except for titles and subtitles, and those can be taken care of with Styles in Word instead of manually having to bold things. Nevertheless, if bold text is required, you will need to tag it. And here is the method J. W. Manus used, and which I have successfully used myself.

If you did use underlining instead of italics, you will need to look up how to do it. Underlining to show emphasis went away when word processors replaced typewriters, and if you still use underlining, you won’t have much credibility as a writer.

For MS Word files, J. W. Manus suggests the following tags, and I advise you to follow these. I did trying something different at one point, and I had problems. So, use these because I know they work reliably.

-STARTI- for italics
-STARTB- for bold
-END- to close the tag

Before you use this method to tag, MAKE A COPY of your manuscript to practice on to be sure you did it right or don’t mess up your original. Here’s the procedure for italics:

(1) Go to the REPLACE option in Word.

(2) be sure the FIND WHAT box completely empty (no spaces in it). At the bottom of the find and replace window you should see a FORMAT box. (If not click the MORE>> box to show more options.) In the Format dropdown, choose FONT. A new window opens. Under “Font Style” choose ITALIC, then click OK.

(3) In the REPLACE WITH box enter the following exactly as shown:


(4) Recheck it carefully to be sure you typed it exactly and with the hyphens.

(5) Click on REPLACE ALL.

By way of explanation, the ^& in the FIND WHAT box tells Word to put the -STARTI- before the italic type and -END- at the end of the italic type passage. If you scroll through your manuscript, you’ll still see the italics, but you’ll also see the tags surrounding it.

If you need to do the same with bold type, repeat steps 1-5 but select BOLD for the font and in the REPLACE WITH type the following:


After you have done any other cleanups in the file, you’re ready to perform the final cleaning step. Do a SAVE AS on the file, and in the “save as type” dropdown box, choose PLAIN TEXT (.txt) not Rich Text.

*** Be sure you give the file a new name so you don’t accidentally overwrite your existing file. ***

When the file conversion box comes up, just click OK.

By doing this “save as text” you will remove all the garbage, including special fonts, Styles, and all other formatting. It is possible to add tables and multi-column text to ebooks (I did one table in Scott’s and my punctuation book), but most of the time these would not be used in fiction. And keep in mind that things like tables and columns are not going to break at convenient points on the small screens of phones that some people read on.

Now you’re ready to format your file properly. I’m not going to tell you how to do all the steps because that’s way beyond the scope of this post. However, I will show you how to retrieve the saved file and restore your italics.

(6) Open the saved .txt file.

(7) Go to REPLACE and in the FIND WHAT box type the following:


Be sure the asterisk is there!

(8) At the bottom of the Find and Replace window you will see a box that reads “No Formatting.” If it’s not grayed out, click it and be sure that beneath the FIND WHAT box you don’t see any wording (like “Font: Italic” or “Font: Bold”).

(9) Also in the window under Search Options you will see “Use wildcards.” Check that box.

(10) In the REPLACE WITH box, clear it out so it’s completely empty. If there is any wording beneath it, click “No Formatting.” At the bottom of the window, again click the Format dropdown and choose Font and under Font Style choose Italic.

(11) VERIFY that the FIND WHAT box has -STARTI-*-END- in it and beneath the box reads “Use wildcards.”

VERIFY that the REPLACE WITH box is completely empty (click in it to be sure the cursor is at the left and no spaces are there) and that underneath it reads “Font: Italic.”

Click “Replace All.”

(12) You now need to remove the tags. First, uncheck “Use Wildcards.”

(13) In the FIND WHAT box, clear the formatting (click “No Formatting” if not grayed out) and type in the box: -STARTI-

(14) In the REPLACE WITH box, clear the formatting and clear the box as well.

(15) Click “Replace All.”

(16) Repeat step 13 except that in the FIND WHAT box put: -END-

(17) Click “Replace All.”

NOTE: This method appears to retain smart quotes if used, so you shouldn’t have to restore those.

In your manuscript you should find the italics restored and the tags gone and no other garbage in the file. The formatting will be completely absent. But there may still be a problem.

If tabs were used to indent any of the paragraphs originally and you didn’t remove them, they will still be there. If the paragraphs were originally indented with First Line Indent, then they will be indented with spaces corresponding to the amount of the indent.

I hear you sighing (or cursing) that you’ll now have to search and replace extra spaces and tabs. Fear not. I found a cool trick to remove tabs and spaces at the start of lines as well as any at the end of paragraphs.

WARNING: Do not use this trick on your formatted master file because it will undo formatting such as centered text. It will work fine if you have not yet completed your formatting.

Here’s the trick:

—Select all the text in the document by using CTRL+A (the CTRL and A keys together) and keep everything selected for the next 3 steps.

—Press CTRL+E to center align the text (and remove tabs on the left, but not anywhere else.

—Press CTRL+R to right align the text. This removed trailing spaces.

—Press CTRL+L to left align the text. This removed leading spaces.

You may have one final question: “Rick, if I’m careful with my manuscript and know I don’t have any garbage in it, do I need to use this drastic cleaning procedure of saving as a text file?

No, you don’t. As long as you’re sure that you’ve cleaned up everything, you should be fine. However, if you have old manuscripts written pre-enlightenment, you should consider this process.

I have used the process when editing and compiling the last two Write Well Award anthologies. The stories submitted to us came from many magazines and not all of them arrived “clean.” I saved myself a lot of time with this process. I first copied all the stories into one giant manuscript, cleaned the file, then went through and formatted everything as it needed to be. It was much easier than doing each story individually, especially when you have two or three dozen to edit. And when one of my clients wants the book formatted along with editing, if the file is unusually messy, then I will use this process to clean up the mess.

Be aware that this process alone will not fix everything. And I pointed out in the two previous posts, you will have to be sure any smart quotes and apostrophes point the right way and that they’re all either straight or smart quotes and not a mix of both. Formatting is not a difficult task, but even with some of the shortcuts I’ve shown you, it can be time-consuming process to get everything right. This is why good professional editors and formatters who do this for a living need to charge what they do. It’s only because this is not how I make my living that I can charge significantly less than what they do. And you can save yourself money by learning how to do it right.

NOTE: I’ve noticed that some Kindle ebook chapters have no page breaks in them. If the next chapter doesn’t start on a new Kindle page, the author left out page breaks. This is considered sloppy formatting.

I know this was a long post, but I really hope it helps you to improve the look of your books. It doesn’t end with good writing and a good cover. If the formatting is sloppy, you risk losing sales and getting poor reviews.