Word detection + usability
chrissikek last edited by chrissikek
Hey, I am trying this program and I just went through all the basic steps, plus started added my known words, to give it a try. However it seemed like basic functionalities are quite unuasable to me. Since other people seem to be happy, maybe the mistake is on my end.
Since I saw in a sponsored video that somebody claimed to work with youtube I gave that a go. However I searched a lot of videos and the auto generated subtitles were always way to horrible to extract anything useful. A lot of videos have Japanese subtitles, but if they do, they are burnt on the video, so still no subtitles that can be read.
Since that did not really work out for me, I tried some news articles instead. But even there, it seemed relatively unusable to me. The detected words only seem to roughly correlate with the words in their base form. The problem with these mistakes is that then the premise of automatically finding the one word that you do not know in a sentence kind of breaks down, because there is about always something wrong detected in a sentence?
I am not critizising the implementation btw, I realize that it is really hard to do, I am just wondering if someone can give some advice how to avoid these problems.
Thanks in advance!
SteviMigaku last edited by SteviMigaku
A thing to add for YouTube. The extension cannot (yet) do OCR on hard subs (subs that are in the video itself). On YouTube or Netflix soft subs are used to create cards (the subs that you can turn on and off on YouTube). If you want to look for videos on YouTube that have soft subs that you can use, you can do so with https://youglish.com/japanese.
You can just look for any word and then click through the videos and find one you want to create cards from.
Here an example of a video that has soft subs you can use with the extension: https://www.youtube.com/watch?v=wZySfwnh5eA&list=PLcm5I05IhDpZ498c_knWUqznpVUUyvgu1&index=4&ab_channel=トップランキング
As for the second point, I would need to see some examples. As @Daichi said, the parses for all languages need some work, but it is far from what you seem to describe. There are small errors here and there. But 90% of sentences are parsed fine. Do you maybe have an example with screenshot etc.?
Didn't see the image. Can you give the link to the article? This seems like a problem with the specific site you are using and not the parser per se.
Checked the article from the link I can see in your screenshot. Yes, that is a problem with that specific site. The build in furigana on the site interferes with the extension. The same article on the normal NHK site, parses 増えている fine (see screenshot). https://imgur.com/a/nUCxIXU
chrissikek last edited by
@daichi Thanks for the kind advice
Daichi last edited by
@chrissikek Yeah, Parsing isn't perfect for any language. Parsing for all languages is planned to get an overhaul later this year, but right now we have to make do till then.
If parsing for a sentence feels really off, you can use "Search for words in the dictionary" feature, which ignores parsed words via the "Cursor over Text + Shift + Z" hotkey (As seen in the toolbar help, just click the ? in the toolbar for other hotkeys). You can also just paste the sentence into a better parser like ichi.moe and see if that alleviates any confusion you might of had.