Jump to content

[Kerensky] Transcript Annotations Cleaner V1.6.9.1 (07-05-2011)


Kerensky
 Share

Recommended Posts

Hi all!

The main purpose of this application is automatically removing the text for Hearing Impaired and formating the text (2 lines, balanced, etc).

I've created a new application witch deletes: (checking that option)

  • The HI annotations [Delimiters can have any length]
  • The "NAME:" at the beginning
  • The Songs [Delimiters can have any length]
  • The srt marks for Italics, Bold and Underline
  • The 1st or 2nd/last line of text (for dual English/Chinesse subs)

-------------------------------------------------------------------------------------------

Other options:
-------------------

  • Text format (split/join) for XX characters per line.
  • Add ... to the end of unfinished lines
    (lines witch don't end in . , ; etc)
  • Batch processing
  • Replace Window: 1) Normal Replace; 2) Fix Names so they have the 1st letter in Uppercase; 3) Fix the more common CC errors

It also "format" the subtitle a bit, like erasing the empty lines (after erasing things they do appear xD), fixing thing like a line "- " or correcting if a < i > or < / i > is left alone and much more!.

Since the app accepts a *.ass input file, it can be used to convert from *.ass to *.srt
Now also can read txt transcript files.
Also, the TAC can Auto-Update itself.



Installation:

You only need the Microsoft Framework .NET 3.5 SP1 (included in windows 7)

tacnewmainen.png

Download (RAR) || Direct Install (Internet Explorer)

Mac OS X / Linux Version <---> MONO has to be installed




Comparison example: (Original Vs TAC Output with default options)

rrtenjpg.jpg



Additional languages supported:

Spanish (Español)

(Check the txt included in the rar for helping to translate the app to your language)


For problems/suggestions: 1N3ubD1het.gif

I hope you guys like it! :angel:


ByeZ
  • Like 8
Link to comment
Share on other sites

If I'm not mistaken it should left that line like:

- I saw her.

I can check if after erasing it's a single line text like "- TEXT" and erase that "- " too,
so the final text will be "I saw her".
Perhaps in the next version.

EDIT:
ok, the app already left that line: "I saw her"
(works even better than i imagine xDD)

Link to comment
Share on other sites

A very small fix:

Now, when the line is like:

PATIENT: Hi!
DOCTOR: Die, sucker!

the output will be: [With the "Erase text like NAME: " option checked]

- Hi!
- Die, sucker!

before both "NAME:" were just erased.

Also:

PATIENT: Hi! ---> - Hi!
- Die, sucker!

or

- Hi!
DOCTOR: Die, sucker! ----> - Die, sucker!

Link to comment
Share on other sites

New version: v28-11-09 -> v02-12-09:

taccapture.jpg

Well, I notice the other day some errors... so I fixed it:

1) The ¶ when displayed by the DirectVobSub appeared as "¶" --> FIXED
(Encoding issue)

2) Lines with just "**" were erased (with 1 or more than 2 weren't) --> FIXED

3) Now it left just 1 white blank space instead of 3, so no more error
if uploading directly to Addic7ed ---> FIXED

4) Before, if a line contains the hour at the begginig, like 6:33,
it was left as 33, now it will don't touch that lines ---> FIXED

more features soon...

Link to comment
Share on other sites

New version: v09-12-09




taccapture.jpg


New Features:
---------------------------------


- More versatility in the input of annotations and songs delimiters.
Now you can choose witch delimiters you want to use,
and those delimiters can have any length.

-> Annotations delimiters format:
OpenDelimiter1,CloseDelimiter1;OpenDelimiter2,CloseDelimiter2; ...
For example: [[ , ]] or ([( , ])] are valid delimiters

-> Songs delimiters format:
Delimiter1 Delimiter2 Delimiter3 ... (1 white space between them)
For example: *P* is a valid delimiter

- New option: "Erase 2nd/last line if exits" [Requested by chamallow]
As the name says, it there are 2 lines of text, it will erase the 2nd line
and if there are more, it will erase the last.
This was added to erase the Chinesse text in the dual subs English/Chinesse,
where the Chinesse text is alwasys in the 2nd line.

For example in:
Hello Mark!
#€@¬#€#~€[][@ (weird characters)
--> the second line will be erased.

- New option: "Smart Split/Join of text for XX characters per line"
To limit the length of text to certain value (like home DVD).
It will seek for a proper cut point, looking for the common text delimiters,
as . , ; ... etc. Also, it will try to balance the length of the text lines.
(Check the tooltips of the Advanced Options window)

For example:
Why are you standing there. Start the damn slush.
Will be split in (if line length > char per line):
Why are you standing there.
Start the damn slush.
(or joined in the other case, if line length < char per line)

- New option: "Add ... at the end of unfinished lines"
The unfinished lines are the lines who don't end in one of those common text delimiters.
(This is one of the Tusseries rules)

For example in:
1
Hi, i didn't know
2
you were here.
--> will add "..." after "know": Hi, i didn't know...

- Toolbar and status bar added.
The status bar will display if there are any problem.

- New options in "Basic Options" located in the toolbar.
-> Erase only the "NAMES:" (text in uppercase)
-> Erase the "- " in there is only 1 line (is not a dialogue) [Requested by chamallow]

- Now, you can drag and drop the subtitle file into the exe, and the app will start with that sub loaded.

- Batch procesing of the selected subtitles.
Located in the toolbar. All sub will use the same selected options.

- Now, if some error is detected, it will generate a txt file NameInput_ERROR.txt
witch contains debug info. Perfect for sending to me ;) (the subtitle too eh :P)

- Txt file included with the exe in the rar, for translating the app to others languages.
(more info on the txt itself)




Also, I corrected a some minor bugs...

I hope you guys like this new version :) , as always, please send to me all the errors
you notice in the output result and the new features you want to be included.
Link to comment
Share on other sites

<i>Here's what you missed last
week: Quinn's pregnant,</i>

goes

<i>Here's what you missed last
Quinn's pregnant,</i>

if I check "Erase NAME:'s, (only if they're in UPPERCASE)" option
it goes

<i>Here's what you missed last



Okay. So here's our assignment
for the week:

always turns

Okay. So here's our assignment

INPUT
(whispering):
I bet the duck's
in the hat.

SANTANA:
But Matt's out
sick today.

OUTPUT (Erase NAME:'s option off)
I bet the duck's
I bet the duck's
in the hat.

But Matt's out
But Matt's out
sick today.

OUTPUT (Erase NAME:'s option on)
But Matt's out
But Matt's out
sick today.

(yeah, first block is missing)

INPUT
Not very many.
RUSSELL:
Judy!

OUTPUT (option on or off)
RUSSELL:
Judy!
Judy!

INPUT
at 4:00 sharp this afternoon.

Well... I'll
see you at 4:00.

OUTPUT (on or off)
(disappears)

Well... I'll
(disappears)
Link to comment
Share on other sites

Thank you very much for the info.
I'll fix it as soon as I can.

Edit: OK, all those errors are fixed now,
I updated the download link of the 1st post.
---> v10-12-09


Now, using those inputs, the outputs will be:



Okay. So here's our assignment
for the week:


[NAME's in uppercase only option selected in Basic Options, otherwise, the "week:" will be deleted]
Here's what you missed last
week: Quinn's pregnant,



I bet the duck's
in the hat.
But Matt's out
sick today.


Not very many.
Judy!


at 4:00 sharp this afternoon.
Well... I'll
see you at 4:00.



Please, feel free to post more errors like that,
they are very helpful to debug the app.
Link to comment
Share on other sites

New version: v10-12-09

(updated the download link of the 1st post)





New Features:
---------------------------------

- Now, instead of the 2nd/last text line, it will delete the 1st text line.
Inside Basic options, you can change to delete the 2nd/last text.
[Requested by honeybunny]

- Fixed some (big) errors, like you can see in the previous post.



- Now, the app can read srt with a bad formating of Text and Timecodes:

1) white lines between lines of text
2) White lines before lines of text
3) No white line after the text
4) Lines with empty text
5) No consecutive lines number
6) It will be almost immune to errors in the timecodes as seen below


INPUT:

1
00:00:00.334 --> 00:00:01.634
NARRATOR:

In November 2009,


3
00.00.03.367 --> 00,00.05.500

the first Thanksgiving
at their very own apartment.


4
0000,00.05,5 --> 00:0:8,367
And Marshall had found
the perfect turkey.


11
0:0:21,433 --> 0:0000:23,3
12
000.0:24,3 --> 00,00,26,934
So, when we showed up
for the big day,




OUTPUT: [without any option selected]

1
00:00:00,334 --> 00:00:01,634
NARRATOR:
In November 2009,


2
00:00:03,367 --> 00:00:05,500
the first Thanksgiving
at their very own apartment.


3
00:00:05,500 --> 00:00:08,367
And Marshall had found
the perfect turkey.


4
00:00:24,300 --> 00:00:26,934
So, when we showed up
for the big day,
Link to comment
Share on other sites

New version: v16-12-09

(updated the download link of the 1st post)





New Features:
---------------------------------

- New Replace++ window:

1) Normal Replace: Change one sequence of characters for another. [requested by Alex]

replace1l.png


2) Fix Names so the have the 1st letter in Uppercase and the rest in lowercase. The Names can be added:

- Manually:

replace2.png

- From file: [requested by Verdikt]

replace3.png

[Names should be always separated by: spaces, commas, semicolon, or line breaks.]

3) Fix the more common errors in the CC [requested by honeybunny]




- Basic Options: Add *** to the empty text lines in input, so the timming of these lines won't be lost.




- Basic Options: Now, you can select witch srt tag will delete the option:
"Erase all srt tags" [requested by txu]




- Some fixing / improving:

Now, the smart split/join text for XX chars per line works great.

And the BATCH mode is now more natural to use.




As always, I hope you guys like it!, and, please, send me all the errors or ideas you think of.
Link to comment
Share on other sites

New version: v21-12-09

(updated the download link of the 1st post)




New Features:
---------------------------------

- Now, you can select the charset for:
The Input file, the Output file and the Stored Names file

Between one of: ANSI, UTF-8, UTF-32, Unicode
and AUTO (in witch the app will try to detect automatically the charset used).




- Context menu in all the main window, with shortcuts to:
1) Input Charset
2) Output Charset
3) Replace window
4) Load Names from file tab




- New ( I didn't saw it in any other software) method to translate the app:

In Options -> Language, now there are: "Save current language" and "Load language"

"Save current language" will create a srt file with all text of the app,
so you can use any software you like to translate it.

"Load language" will load a previous saved srt language file and will update
all the text of the app with it. (It will appear as "Unknown language")

NOTE: DON'T TOUCH THE TIMES OF THE LINES when you translate the text.

I've added a few srt as language pack using google translator
(not great result, but still...)





- Added under "File" in the toolbar:
--- New (it will restart the app)
--- Recent Subtitles cleaned (It will only appear if those files still exists)




- Added: A few uncommon character to the Replace window:
Break Line, ¶, µ, ?, ?, ?, ?, ?, ?, ?, ?
(some of them only will appear correctly if the output charset is not ANSI)




- New icons for the options (the reason of the big jump in the app size)




- Minor improvement / fixes
(More intuitive Replace window behavior, and a couple of bugs in reading the input srt and formatting the text witch only appeared using a subtitle with a bad srt format as input)




btw, the smart join/split text option is far, far better than this similar option in any other software.


I'm running out of cool ideas to add to the app, so, please, all suggestions are welcome.

EDIT: New quick version v22-12-09, now the loaded Custom language will be stored, so no more loading it each time.
Link to comment
Share on other sites

I like the name feature (cheers verdikt) :-)

How about an "auto time compensation" feature:

Have a look at the total time of a subtitle line, then look at how many letters you have removed and adjust the resulting start/end time accordingly. (This is based on the fact that every letter has to stay a certain amount of time on screen to be read properly, so without compensation the subs will be too early if there is some descriptive stuff before the dialog, and much too long if something follows.)

Example:

INPUT:

1
00:00:00,000 --> 00:00:04,300
[WOMAN laughing in the distance]
[HARRY] So wassup?

OUTPUT:

1
00:00:03,200 --> 00:00:04,300
So wassup?

The same could happen if the HI parts are at the end of the subtitle. It will never be exact but in my opinion it would be better than nothing. A slider to adapt this feature to the user's needs would be nice too.

That'd be awesome...

Other ideas:
Automatic addition of leading dashes if there is only one dash for the second speaker. I like it when both speakers in a subtitle have leading dashes.

so this:

Hello.
- Hi.

becomes:

- Hello.
- Hi.

Oh, and you could add a feature to remove exclamations like Hahaha, Ouch, Oof, Ah, Erm etc.
(A list where new words can be added of course, just like the list of names.)

Keep it up. :-)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

  • Member Statistics

    26229
    Total Members
    6268
    Most Online
    Vilianabzz
    Newest Member
    Vilianabzz
    Joined
×
×
  • Create New...