Features/HandwritingRecognizer

From FedoraProject

< Features
Revision as of 16:33, 24 May 2008 by Admin (Talk | contribs)

Jump to: navigation, search

Contents

Handwriting Recognizer

Summary

Handwriting Recognizer writrecogn allows users to input Chinese characters used in Chinese, Japanese and Korean with a mouse or tablet.

Owner

  • Name: DingyiChen

Current status

  • Targeted release: N/A
  • Last updated: 2008-03-25
  • Percentage of completion: 70%
  • 0.1.8 Released
  • [TODO] Need to connect to SCIM

Detailed Description

Handwriting Recognizer recognizes Chinese handwritten characters for Chinese, Japanese, and Korean (CJK) and will interface to input methods such as SCIM. Unlike some implementations which require to build a huge set of character recognition rules, we recognize radicals of Chinese characters, i.e. the word root of the character, then use a character-structure-based input method to search for the word. This saves us from writing recognition rules for tens of thousands of CJK characters. This should provide better recognition accuracy than current open source handwriting recognition libraries, like tomoe.

The main program which provides the GUI is WritRecogn and there is a commandline character data maintenance program WritRecogn-manager.

Other features include:

  • Stroke editor: users can input new characters for the recognizer to learn.

Benefit to Fedora

  • Enable users who have little knowledge of CJK input methods to write Chinese characters.
  • Suitable for keyboardless handheld devices.
  • Technique can be extended to OCR.

Scope

Currently recognizes Chinese characters as used in Traditional and Simplified Chinese hanzi, Japanese kanji, and Korean hanja.

Test Plan

  • test that it is possible to input Chinese characters smoothly
  • make sure that window focus and input to other applications works correctly
  • profiling to check performance

User Experience

  • User can input Chinese characters by handwriting.

Dependencies

  • writrecogn package needs to be reviewed, accepted and built in Fedora.
  • Need integration with SCIM.

Contingency Plan

None needed: it is a new package. Tomoe and conventional input methods for Chinese characters will still be available.

Documentation

None yet.

Release Notes

writrecogn is a new handwritten input system for Chinese characters written by Ding Chen.

Development history

  • 2007-05-07: Development started
  • Some milestones:
  • GObjectize RawCharacter, RawStroke, CharacterMatcher, StrokeRecognizer, StrokeNoiseReducer
  • SQLite backend
  • Can import from SCIM Tomoe XML data base.
  • 2008-01: registered project on SourceForge as writrecogn
  • 2008-01-21: initial release of version 0.1 on SourceForge

Development Plan

  • Public Release as beta version if the recognizer recognize level 0 radicals (strokes).
  • Gather and merge the community contributed stroke data and recognition hypotheses.
  • Release the revised stroke data and recognition hypotheses, receive feedback and comments, goto 2. if necessary.
  • Ver 0.2
  • SQLitize Raw Character List (The list that hold all character).
  • Implementation of Relative Radical Bounding box.
  • Ver 0.3
  • A brief document about Relative Radical Bounding box and "Radical Textbook".

A radical Textbook is a collection of characters and their corresponding sub-radical combinations, which are represented as set of relative radical bounding box .

  • Radical Textbook importer (TUI)
  • Radical Textbook editor (GUI)
  • Ver 0.4
  • Convert the stroke-sequence to Radical Textbook, so user does not need to know the exact sequences.
  • Character Matcher that can handle the Radical Relative Bounding box.
  • Ver 0.5 (Alpha)
  • Link to SCIM
  • Fuzzy Character Matcher (error tolerence)
  • Pack as RPM
  • Ver 0.6
  • Incremental learning machine (apply incremental SVM or other algorithm)
  • English characters recognition.
  • Number recognition.
  • Commonly used symbol recognition.
  • Ver 0.7 (Beta)
  • Make Traditional Chinese, Japanese and Korean Radical Text book.
  • CJK synonyms (The character that share the same meaning).
  • CJK I/O switch. For example, user might input simlified Chinese but wish to output Traditional Chinese and vise versa.
  • Ver 0.8
  • Evaluation framework
  • Plugin framework (such as Stroke Noise Reducer, Stroke Recognizer, Character Matcher)
  • Ver 0.9
  • Research paper about this project.
  • Help documentation.
  • Ver 1.0 (Official release)
  • Improve stroke editor/trainer interface.
  • Double writing canvas
  • Hot keys.
  • Ver 1.1
  • Transparent canvas
  • Frameless window
  • Other
  • UniHan support: Show the character information from UniHan

Contributions

Help is always welcome, several things need to be discussed and done: 1. Algorithm Plugins (such as Stroke Noise Reducer, StrokeRecognizer, CharacterMatcher) 2. UI 3. Radical Textbooks 4. Help documentaion (User and developer) 5. Feature ideas. Please join the writRecogn project on SourceForge, your efforts will be appreciated.

Feature Requests

Please put your feature request here.

Comments

  • MatthiasClasen: Sounds pretty interesting. What is the package that need to install to play with this ? Or, if it is not packaged yet, do you have some screenshots ?
  • I think the bullet points currently in "Contingency Plan" should be moved to a new section with more detailed plans covering the remainder of the project.
  • JeremyKatz: Could this be extended to also work for non-CJK input? On the English side, I've been playing with cellwriter on my tablet and it works reasonably well, but more options are better
  • PeterGordon: I would love to see this feature worked-on. As a beginning Japanese student, I think it would be incredibly awesome to be able to draw Kanji/Kana on my Wacom and have it automagically recognized and inserted. Tomoe is good at recognizing the characters, but its UI is far too cumbersome for me; and CellWriter seems great at "learning" how I write, but having to train every single Kana and Kanji (thousands) would be so extraordinarily tedious. :o (And about the cumbersome Tomoe UI, I just want an entry pad and a list of candidates that doesn't block the focused app.) What can we do to help? :)
  • BillNottingham: I'm leery of pushing this as a 'HandwritingRecognizer' feature when
  • Fedora already includes at least two others (tomoe, cellwriter)
  • this only appears to at this stage support Chinese
  • DingyiChen: The project preview is put on sourceforge as writRecogn.

http://sourceforge.net/projects/writrecogn

MatthiasClasen, the source code and screen shots is at the website.

JeremyKatz, the Alphanum support will be available in 0.6. The support of language other than CJK and English depends on request. They will like be implemented after 1.1

PeterGordon, I haven't thought much about UI yet, but come to think of it, how about transparent canvas (without frame, perhaps).

  • JasonTibbitts: Could you clarify whether this is intended to support kana and/or hangul, or just the ideograms? It's tough to say we have anything more than very limited Japanese or (especially) modern Korean handwriting recognition without them.
  • DingyiChen: Jason, it will support kana and hangul, if someone is willing to edit the "textbooks" of them.