@anon-fn Another user suggests that Pleco uses a more accurate parser; is it possible to use the same one, or otherwise find a more accurate parser library?
Anon fn
@Anon fn
Hi! I’m a software engineer (formerly at Facebook, then Google, but currently taking time off) currently living in Taiwan.
Latest posts made by Anon fn
-
RE: Chinese parsing issues
-
RE: Chinese parsing issues
I’ve seen a number of parsing issues too. I reported a few of them but it’s not very motivating since it’s fairly time-consuming, but I don’t get any feedback when I do that. Is there a way to set up an automated feedback loop where we can correct the errors from within Anki, and then the parser automatically learns to make fewer mistakes in the future?
I think it is important to get this working really well since parsing errors contaminate the list of known words, and then when I go to make new cards, sometimes Migaku thinks I know a word I actually don’t, or vice versa, and sometimes it gives me an incorrect list of new words because one or more word boundaries were identified incorrectly.
-
RE: Mandarin example sentences sometimes do not actually include word
Hi, this is happening with quite a significant portion of my example sentences, somewhere in the ballpark of 50%. It is really very inconvenient for Mandarin learners. Can you please escalate this? It seems like it shouldn’t be very hard to fix.
-
RE: Report parsing errors here!
In this sentence (Traditional Chinese):
你有AIT護照無法配送請跟我聯絡約配送時間約 should be yue1, not yao1
-
RE: Report parsing errors here!
In this sentence (Traditional Chinese):
做了什麼事情?The 事 in 事情 (shìqíng, meaning “a matter; business; circumstances; an event; an affair; an incident; an occurrence”) is getting stuck onto the end of 什麼 (shéme, meaning “what”) and leaving 情 (qíng, “feeling/emotion/passion/situation”) as a separate word.
-
RE: Report parsing errors here!
In this sentence (traditional Chinese):
對剛到台北的外國人士來說,坐捷運的時候,在英語播音之前,會先聽到三種發音不同,卻又好像有點相似的廣播,讓他們覺得奇怪,但也很有意思。
外國人士 is made up of two words: 外國 (“wàiguó,” foreign country) and 人士 (“rénshì,” persons sharing a specified characteristic). Together it means “foreigners.” It’s getting parsed as 外國人 (“wàiguórén,” a more common word for “foreigners”) and 士 (“shì,” a surname or “member of the senior ministerial class (old)/scholar”).
-
RE: Report parsing errors here!
@anon-fn Also in this sentence, 上 is parsed as shǎng (which means “rising tone”), but it should be parsed as shàng, which is almost always correct for this character.
-
RE: Report parsing errors here!
@anon-fn This sentence has a second parsing error: 都會 should be parsed as “dōu huì” (two words meaning “all will be”) but is getting parsed as “dūhuì” (“city/metropolis”).
-
RE: Report parsing errors here!
This sentence (Traditional Chinese) was parsed incorrectly:
第一次坐台北捷運的人,差不多都會被捷運上的站名廣播所吸引。坐台北捷運 (zuò Táiběi jiéyùn) means “to take the Taipei MRT.” My flash card ended up with the word 坐台 (zuòtái) which means “to work as a hostess in a bar or KTV.”
-
RE: Report parsing errors here!
I added a card for this sentence:
他在待命的時候接到電話,有一個小孩不小心用槍射到自己的眼睛
“小心” means “to be careful” and “不小心” is the opposite of that (“careless” perhaps?). This got parsed as “不小[bu4 xiao3;形容词]心[xin1;名词]”. “不小” got identified as a new word, but there’s no definition shown for it (because it’s not actually a word, I believe).