I had recently posted about minimalistic Q10 text editor for writing. My only gripe with it that for all benefits of plain text it needs cumbersome reformat when pasted in Windows Live Writer.
So I decided to make AutoIt script to remove manual part and convert text to simple (X)HTML.
What is needed
Plain text operates with lines. String of text continues until it ends with two special and invisible symbols – carriage return and line feed (at least on Windows platform, differs on *nix).
When line of text is simply dropped in HTML editor it gets converted into ugly
tag. It looks bad and despite similar function has slightly different usage.
HTML operates with paragraphs – text enclosed in tags.
So for conversion all lines must be surrounded by tags to change them into paragraphs. I had also added cleanup of blank lines, tag addition for links and few more things.
How script works
$clip = ClipGet()
If FileExists($clip) Then
$txt = FileRead($clip)
Else
$txt = $clip
EndIf
Script is designed to work on hotkey so for input it takes either file copied in clipboard or text in clipboard.
$html = StringRegExpReplace($txt, "(\r\n){2,}", "\1")
$html = StringReplace($html, "&", "&")
$html = StringReplace($html, "<", "<")
$html = StringReplace($html, ">", ">")
$html = StringRegExpReplace($html, "(http://|https://|ftp://)(\S+)",
'<a href="\0">\2</a>')
$html = StringRegExpReplace($html, "(.+)(\r\n|\z)", "<p>\1</p>\2")
Then it does all needed conversions, pastes text by emulating Ctrl+V keystroke and restores clipboard to what it was.
ClipPut($html)
Send("^v")
ClipPut($clip)
Exit
Script makes use of regular expressions which AutoIt supports quite nicely and I know enough for something on this scale (and want to learn more when I get to that regexp tutorial at last).
Regexp explanations
Abovementioned symbols for line end are represented in regexp as \r\n.
- (\r\n){2,} searches for two or more line breaks in a row and replaces them with \1 - back reference to first group, which is single line break in this case;
- (http://|https://|ftp://)(\S+) searches for one of common link protocols followed by number of non-whitespace characters and replaces it with \2 link markup with full back reference (both groups) for actual link and only second group (without protocol) for link description;
- (.+)(\r\n|\z) searches for one or more of any symbol followed by new line symbols or end of line (end of line without symbols, to match end of file in this case) and replaces with
\1
\2 line surrounded by paragraph tags and finished with line break (not really needed but makes result more readable).
Overall
No promises on how accurate expressions are, it is easy to make mistake in those – tell me if it doesn’t work for you.
Still totally beats manual conversion. I just might switch to Q10 for most of post writing. :)
Script https://www.rarst.net/script/txt2html.au3
PS RegExp Quick Tester is awesome (even if slightly outdated) AutoIt script for creating and testing regular expressions for usage in AutoIt.
Robert Palmar #
Chocobito #
Rarst #