[Kerensky] Transcript Annotations Cleaner V1.6.9.1 (07-05-2011)

81 posts in this topic

Posted

Thank you for the suggestions rogard!

For the remove exclamations features, i will add a couple of tabs in the replace windows.

And for the Automatic addition of leading dashes if there is only one dash for the second speaker, I will add it to the basic options.


The "auto time compensation" feature will take a little longer, but definitely I could do something about it.
I guess as long as the deleted text is at the begging or at the end, shouldn't be a big problem.

The slider you refer to are for the time duration per letter erased, right?

Share this post


Link to post
Share on other sites

Posted

That's what I meant. Some way to adjust how big the adjustment is.

I am usually using Gaupol which is awesome for spell-checking and correcting multiple(!) subtitles simultaneously, change encoding etc., and Subtitle Workshop which is excellent splitting 3-line subtitles and adjust timings/FPS. I am dreaming of a combination of both together with the ability to auto-correct times and do a few other things as well...

Who knows, maybe your tool will become exactly what I am dreaming of...?

I'd like a feature to get rid of "forbidden" characters, either by pointing them out or by replacing them. Sometimes a subtitle is broken because of a single character ("wrong" apostrophes for example) that's not correctly encoded. (See the replace list below)

You could also check for overlapping subtitles, too short duration etc.

How about a list of words or characters that are searched and replaced automatically? I need that quite often. Over time, this can be a huge help with fixing badly spelled subtitles.

In general, automatic actions are nice, but sometimes it's much better if the software asks before it removes or changes anything. Example: remove speaker before colon, which would also remove parts of the dialog if a sentence contains a colon.

Gaupol shows a list of changes, that's great. It would be even better if I could see the pending changes for each feature, so that I canb concentrate one the critical ones that often go wrong. (Dream)

Other common errors that your software could tackle:

- wrong handling of blank spaces for abbreviations with dots (12:00 p.m. often becomes 12:00 p. m. or C.I. A. etc.
- "blabla." not "blabla. " or " blabla." (Tricky, this one...)
- Sophisticated correction of the i vs. L problem (Hard to beat subrip and gaupol there....but you can always try :-)
- correction of single characters that are not in italics, surrounded by other characters in italics. (and vice versa)
- a slash "/" often appears as < i>I< /i>
< i>blabla...< /i> not < i>blabla< /i>...
- A way to choose between [ MAN ] and [MAN]
- correct music signs and blank spaces: # blabla # not #blabla#
- get rid of unneccessary blank spaces and add blank spaces when needed: begin/end of line, before/after punctuation, before/after tags.
- Capitalize words at beginning of a line/sentence, but only if the character before is a . ? or !
- add a feature to remove speakers before a colon in CAPS like MAN:
- add a feature to remove speakers before a colon like Man: (you need to be careful there, some sentences contain a colon, so you will remove half a sentence!) Maybe ask the user before you remove it?
- sometimes ain't becomes ain' t ro Rock'n'Roll becomes Rock 'n 'Roll or hit 'em becomes hit'em.
- goin' on becomes goin'on etc.

These are all rough ideas. Maybe you can use some of them.
Thank you for working on your subtitle software, very much appreciated.

Merry Xmas everyone!

Share this post


Link to post
Share on other sites

Posted

Thank you :) A nice gift for Christmas :)

Share this post


Link to post
Share on other sites

Posted


That's what I meant. Some way to adjust how big the adjustment is.

I am usually using Gaupol which is awesome for spell-checking and correcting multiple(!) subtitles simultaneously, change encoding etc., and Subtitle Workshop which is excellent splitting 3-line subtitles and adjust timings/FPS. I am dreaming of a combination of both together with the ability to auto-correct times and do a few other things as well...

Who knows, maybe your tool will become exactly what I am dreaming of...?

I'd like a feature to get rid of "forbidden" characters, either by pointing them out or by replacing them. Sometimes a subtitle is broken because of a single character ("wrong" apostrophes for example) that's not correctly encoded. (See the replace list below)

You could also check for overlapping subtitles, too short duration etc.

How about a list of words or characters that are searched and replaced automatically? I need that quite often. Over time, this can be a huge help with fixing badly spelled subtitles.

In general, automatic actions are nice, but sometimes it's much better if the software asks before it removes or changes anything. Example: remove speaker before colon, which would also remove parts of the dialog if a sentence contains a colon.

Gaupol shows a list of changes, that's great. It would be even better if I could see the pending changes for each feature, so that I canb concentrate one the critical ones that often go wrong. (Dream)

Other common errors that your software could tackle:

- wrong handling of blank spaces for abbreviations with dots (12:00 p.m. often becomes 12:00 p. m. or C.I. A. etc.
- "blabla." not "blabla. " or " blabla." (Tricky, this one...)
- Sophisticated correction of the i vs. L problem (Hard to beat subrip and gaupol there....but you can always try :-)
- correction of single characters that are not in italics, surrounded by other characters in italics. (and vice versa)
- a slash "/" often appears as < i>I< /i>
< i>blabla...< /i> not < i>blabla< /i>...
- A way to choose between [ MAN ] and [MAN]
- correct music signs and blank spaces: # blabla # not #blabla#
- get rid of unneccessary blank spaces and add blank spaces when needed: begin/end of line, before/after punctuation, before/after tags.
- Capitalize words at beginning of a line/sentence, but only if the character before is a . ? or !
- add a feature to remove speakers before a colon in CAPS like MAN:
- add a feature to remove speakers before a colon like Man: (you need to be careful there, some sentences contain a colon, so you will remove half a sentence!) Maybe ask the user before you remove it?
- sometimes ain't becomes ain' t ro Rock'n'Roll becomes Rock 'n 'Roll or hit 'em becomes hit'em.
- goin' on becomes goin'on etc.

These are all rough ideas. Maybe you can use some of them.
Thank you for working on your subtitle software, very much appreciated.

Merry Xmas everyone!



Big list, I like it!

But some of these thing are just not doable because this app is an automatic process.
The most I can do is present a log of changes when it's done.

Already done (using last version) of your list:

- change encoding.

- remove speaker before colon, which would also remove parts of the dialog if a sentence contains a colon. <-- It won't do that.

- add a feature to remove speakers before a colon in CAPS like MAN:
- add a feature to remove speakers before a colon like Man: (you need to be careful there, some sentences contain a colon, so you will remove half a sentence!) Maybe ask the user before you remove it? <-- look in Main / Basic Options.


For the timming correction, I'm planning something you will like a lot ;)

I can also add a "Fix common text errors" function to do things like you describe with incorrect blank spaces, for a future version (The timming fix goes before).

Merry Xmas guys!
( Don't let a fat Santa stole your subs ;) )

Share this post


Link to post
Share on other sites

Posted

New version: v25-12-09

(updated the download link in the 1st post)




New Features:
---------------------------------

A couple of new features under "Basic Options":

tacopcionesbsicaseng.png

- Automatic addition of leading dashes if there is only one dash for the second speaker. [requested by rogard]

Hello.
- Hi.

< i >Eo
- No< / i >

becomes:

- Hello.
- Hi.

< i >- Eo
- No< / i >




- Sort lines by their starting times and also deletes all repeated lines
(repeated line: when two lines have the same times and text)



And a screencap of the contex menu of the main window introduced in the last version:

tacmenucontexeng.png




This will be the last version of the TAC for a while (except for bug corrections).

But don't worry ;), it will return stronger than never, and with a new name:
Automatic Subtitle Editor

Share this post


Link to post
Share on other sites

Posted

Looks awesome. Thank you for your efforts, kerensky. I will test it in the next few days and tell you what I think.

I have another idea: how about different profiles/presets with different settings, i.e. one for removing HI parts, another one for general corrections, yet another for very special adjustments etc....
I think that would make it more efficient to use your software for different tasks.

Share this post


Link to post
Share on other sites

Posted


I have another idea: how about different profiles/presets with different settings, i.e. one for removing HI parts, another one for general corrections, yet another for very special adjustments etc....
I think that would make it more efficient to use your software for different tasks.


Right now the app doesn't have so many options to need profiles, but I'll have it in mind for the next versions.

Share this post


Link to post
Share on other sites

Posted

New version: v25-12-09.2

(updated the download link in the 1st post)




Just a bug fix.

It will fix the error witch gave the following error trace:

System.IndexOutOfRangeException: Index was outside the bounds of the array.
at LimpiaTranscript.LinSRT.EliminaRenglonesNoValidos(LinSRT Entrada)
at LimpiaTranscript.SRT.EliminaLineasVacias(SRT Entrada)
at LimpiaTranscript.LimpiaTranscript.Procesado_Click(Object sender, EventArgs e)

Big thanks to enigma92 for sending the debug info.

Share this post


Link to post
Share on other sites

Posted

New version: v25-12-09.3

(updated the download link in the 1st post)




- Another bug fix.

It will fix the error witch gave the following error trace:

System.ArgumentOutOfRangeException: Count cannot be less than zero.
Parameter name: count
at System.String.Remove(Int32 startIndex, Int32 count)
at LimpiaTranscript.LinSRT.EliminaNombresLin(LinSRT Entrada)
at LimpiaTranscript.SRT.EliminaNombres(SRT Subt)
at LimpiaTranscript.LimpiaTranscript.Procesado_Click(Object sender, EventArgs e)

- Now, the automatic correction of CC errors (like â?ª) will now work properly.

Big thanks to elderman for sending the debug info.

Share this post


Link to post
Share on other sites

Posted

New version: v16-02-2010.2

(updated the download link in the 1st post)




Fixed things:

  • Now, if an annotation has ":" in the end, like: "(Howling voice):" it will be eliminated too.
  • Smart Join/Split text:
  • No more break line in composite words like "Star-Gate"
  • No more break line in srt tags (only happened if the tag was in the middle of the text)


EDIT: A quick fix in for the balance text option.

I just finished my exams, so I will start soon working in the next major release.

As always, feel free to post bugs/suggestions/comments/whatever

Share this post


Link to post
Share on other sites

Posted

Great, keep up the good work :)

Share this post


Link to post
Share on other sites

Posted

New version: v21-02-2010

(updated the download link in the 1st post)




New Features:

- It will fix all broken dialogues after deleting annotations with srt tags in any position.
Example:
< i >- ( tires screech )
- whoa whoa.< / i >
becomes:
< i >whoa whoa.< / i >

- Now, the "Fix Names" option, also check the length of the name,
Example: if the name to fix is "Jo", no more converting jones in Jones.
Only will do it if the name is just "jo" ("jo" becomes "Jo").

- A small bugfix in the "Erase - at the beginning".





Comparison example: (Original Vs TAC Output with default options)
rrtenjpg.jpg



As always, feel free to post bugs/suggestions/comments/whatever

Share this post


Link to post
Share on other sites

Posted

New version: v24-04-2010

(updated the download link in the 1st post)




New Features:

In the last version, I changed a bit the behavior of the Smart Joint/Split Text.
The lines witch their text line length it was closer to the maximum ( more than 80%) chars/line, they were splited only if it was appropriate.

Example: (44 chars/ line max)
"He's supposed to co-lead a presentation" (39 chars)

was left as:

"He's supposed to
co-lead a presentation" (39 chars)


So, that's wasn't exactly an error, but what I did, is forgot to add the possibility of disabling this behavior in the options.

To change this:
look under Basic Options and uncheck this feature ("Split line too if...")
to NEVER SPLIT LINES IF THEY DON'T REACH THE MAX CHARS/LINE.

( -> THX to Audrey for the report <- )

As always, feel free to post bugs/suggestions/comments/whatever.

Share this post


Link to post
Share on other sites

Posted

Sometimes it splits like this

"It was 8:
00 and the power was still off."

Can't really say how long the 2 rows were, but i think the whole one was longer than the defined value. Any solution to this?

Share this post


Link to post
Share on other sites

Posted


Sometimes it splits like this

"It was 8:
00 and the power was still off."

Can't really say how long the 2 rows were, but i think the whole one was longer than the defined value. Any solution to this?


wow, I thought i fix that long ago (in the v02-12-09 looking at the changelog). Send me the sub, I'll look into it.

EDIT: Ok, what I fixed back then was that the app doesn't mistake names ( NAME: ) with the hour at the beginning of the lines.

I located and fixed the error, so new version :P

Share this post


Link to post
Share on other sites

Posted

New version: v26-04-2010

(updated the download link in the 1st post)




New Features:

I fixed the following error:
If the ":" is used as cut point in the line, now looks if it's not the hour.

Example:
Oh, yeah, play the 'bone and 8:00 at some point you lost.

was badly splited as:
Oh, yeah, play the 'bone and 8:
00 at some point you lost.

Now, the result will be:
Oh, yeah, play the 'bone and
8:00 at some point you lost.

( --> THX to Bunny for the report <-- )


As always, feel free to post bugs/suggestions/comments/whatever.

Share this post


Link to post
Share on other sites

Posted

New version: v27-04-2010

(updated the download link in the 1st post)




New Features:

I fixed some looong lasting errors with multiple srt tags in the same line.

1) The Smart Join/Split text splited a word when there was multiple tags in the line. (not always, just in some cases)

Example:
< i >Our patient's unhappy< /i >
< i >because she's suffering< /i >

was left as:
< i >Our patient's unhappy< /i > < i >becau
se she's suffering< /i >

and now, the result will be: (don't cutting a word in half... never again, I hope)
< i >Our patient's unhappy< /i > < i >because
she's suffering< /i >


2) Also, before when the text between tags was removed, and there was more text in that line, the empty tag wasn't removed.

Example:
< i >(Tom)< /i >
< i >It's Tom.< /i >

was left as:
< i >< /i >< i >It's Tom.< /i >

now the output will be:
< i >It's Tom.< /i >

3) Now, the TAC will also clean unnecessary srt tags (only if they are in the same text line)

Example:
< i >Thank you so much< /i > < i >for catching this.< /i >

Will be left as:
< i >Thank you so much for catching this.< /i >


( --> THX to elderman for testing this version <-- )


As always, feel free to post bugs/suggestions/comments/whatever.

Share this post


Link to post
Share on other sites

Posted

I love the app! Thank you!

I've noticed 2 small problems when using the option to erase NAMES there are in uppercase. It doesn't delete names with spaces in them, like "COMMANDER ADAMA" and not names that have letters that should always be in lowercase, like "McMANUS". Just thought I should mention it :)

Share this post


Link to post
Share on other sites

Posted

Hi,

well, the app doesn't erase composite names, like that one, (in uppercase or otherwise) on purpose.

Because if it did, in this sentence for example:
"Repeat this: I can fly!" will erase the "Repeat this:" when it shouldn't be erased.

I did some test about this problem, erasing composite names,
but it's really tricky and I think it's best if the app do not erase a composite name
over the possibility of erasing valid info.

The other problem, the Mc thing in uppercase...
Now it just check if all letters are uppercase,
but I'll add in the next version a exception in case the NAME starts
in "Mc" (and have 3 or more letters), ok?

Thx for the input :)

Share this post


Link to post
Share on other sites

Posted

Hi, Kerensky.

I only got around to trying this yesterday - it's brilliant!

I had to clean up two seasons of 24 & 27 episodes, each, and even without using the batch mode it was much quicker than any other way of doing it - the batch mode made it a laugh!

Thank you.

s.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Member Statistics

    20,267
    Total Members
    6,268
    Most Online
    Newest Member
    beetlepoet
    Joined 03/21/2019 10:20 PM