Navigation

    Migaku Community

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups

    Chinese parsing issues

    Migaku Browser Extension
    4
    5
    42
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • I
      Istangel last edited by

      Here's an example of the type of parsing issue I see regularly. I don't expect parsing to be perfect (of course it won't be, especially for a language like Chinese), but imo this type of parsing error is really annoying. The reason is, those first two characters are not part of the second two. 不能 appears very often in Chinese before a verb, because it means "can't". I don't think there's a single situation where 不能 should be considered as part of the word that comes directly after it.

      8791fe20-ea2d-4947-9840-44a9819152fb-image.png

      A 1 Reply Last reply Reply Quote 0
      • A
        Anon fn @Anon fn last edited by

        @anon-fn Another user suggests that Pleco uses a more accurate parser; is it possible to use the same one, or otherwise find a more accurate parser library?

        1 Reply Last reply Reply Quote 0
        • A
          Anon fn last edited by

          I’ve seen a number of parsing issues too. I reported a few of them but it’s not very motivating since it’s fairly time-consuming, but I don’t get any feedback when I do that. Is there a way to set up an automated feedback loop where we can correct the errors from within Anki, and then the parser automatically learns to make fewer mistakes in the future?

          I think it is important to get this working really well since parsing errors contaminate the list of known words, and then when I go to make new cards, sometimes Migaku thinks I know a word I actually don’t, or vice versa, and sometimes it gives me an incorrect list of new words because one or more word boundaries were identified incorrectly.

          A 1 Reply Last reply Reply Quote 0
          • A
            Anon po last edited by

            I'm also seeing the same issue, and it happens frequently. I think it's fixable- at least when I use Pleco (particularly the reader), it parses accurately.

            1 Reply Last reply Reply Quote 1
            • A
              asane @Istangel last edited by

              @istangel I think Migaku use an external reference for parsing so I'm not sure there is much that can be done. Even Morphman messes up a lot. At least you sited a real phrase. But things like “真奇怪的風” parsed as "真奇.. 怪 ..的 ..風", which isn't even a real word (at least according to 8 dictionaries) is what's super annoying. I'm never sure whether to say I "know" it or not. If I don't, then I'm missing out on identifying T1s. Adding feature that lets us manually correct parsing errors would be great. A not as ideal, but still helpful option would be to add an "Ignore" option (instead of learning/known/unknown), which we could also use on these non-words and Names. Left as unknown, they hide opportunities for T1 sentence flagging. On the bright side, so far I have found Migaku's Chinese parsing at least on par if not better than Language Reactor's.

              1 Reply Last reply Reply Quote 1
              • First post
                Last post