Features/HandwritingRecognizer

From FedoraProject

Jump to: navigation, search

Contents

Handwriting Recognizer

Summary

Handwriting Recognizer writrecogn allows users to input Chinese characters used in Chinese, Japanese and Korean with a mouse or tablet.

Owner

  • Name: DingyiChen

Current status

  • Targeted release: N/A
  • Last updated: 2008-03-25
  • Percentage of completion: 70%
  • 0.1.8 Released
  • [TODO] Need to connect to SCIM

Detailed Description

Handwriting Recognizer recognizes Chinese handwritten characters for Chinese, Japanese, and Korean (CJK) and will interface to input methods such as SCIM. Unlike some implementations which require to build a huge set of character recognition rules, we recognize radicals of Chinese characters, i.e. the word root of the character, then use a character-structure-based input method to search for the word. This saves us from writing recognition rules for tens of thousands of CJK characters. This should provide better recognition accuracy than current open source handwriting recognition libraries, like tomoe.

The main program which provides the GUI is WritRecogn and there is a commandline character data maintenance program WritRecogn-manager.

Other features include:

  • Stroke editor: users can input new characters for the recognizer to learn.

Benefit to Fedora

  • Enable users who have little knowledge of CJK input methods to write Chinese characters.
  • Suitable for keyboardless handheld devices.
  • Technique can be extended to OCR.

Scope

Currently recognizes Chinese characters as used in Traditional and Simplified Chinese hanzi, Japanese kanji, and Korean hanja.

Test Plan

  • test that it is possible to input Chinese characters smoothly
  • make sure that window focus and input to other applications works correctly
  • profiling to check performance

User Experience

  • User can input Chinese characters by handwriting.

Dependencies

  • writrecogn package needs to be reviewed, accepted and built in Fedora.
  • Need integration with SCIM.

Contingency Plan

None needed: it is a new package. Tomoe and conventional input methods for Chinese characters will still be available.

Documentation

None yet.

Release Notes

writrecogn is a new handwritten input system for Chinese characters written by Ding Chen.

Development history

  • 2007-05-07: Development started
  • Some milestones:
  • GObjectize RawCharacter, RawStroke, CharacterMatcher, StrokeRecognizer, StrokeNoiseReducer
  • SQLite backend
  • Can import from SCIM Tomoe XML data base.
  • 2008-01: registered project on SourceForge as writrecogn
  • 2008-01-21: initial release of version 0.1 on SourceForge

Development Plan

  • Public Release as beta version if the recognizer recognize level 0 radicals (strokes).
  • Gather and merge the community contributed stroke data and recognition hypotheses.
  • Release the revised stroke data and recognition hypotheses, receive feedback and comments, goto 2. if necessary.
  • Ver 0.2
  • SQLitize Raw Character List (The list that hold all character).
  • Implementation of Relative Radical Bounding box.
  • Ver 0.3
  • A brief document about Relative Radical Bounding box and "Radical Textbook".

A radical Textbook is a collection of characters and their corresponding sub-radical combinations, which are represented as set of relative radical bounding box .

  • Radical Textbook importer (TUI)
  • Radical Textbook editor (GUI)
  • Ver 0.4
  • Convert the stroke-sequence to Radical Textbook, so user does not need to know the exact sequences.
  • Character Matcher that can handle the Radical Relative Bounding box.
  • Ver 0.5 (Alpha)
  • Link to SCIM
  • Fuzzy Character Matcher (error tolerence)
  • Pack as RPM
  • Ver 0.6
  • Incremental learning machine (apply incremental SVM or other algorithm)
  • English characters recognition.
  • Number recognition.
  • Commonly used symbol recognition.
  • Ver 0.7 (Beta)
  • Make Traditional Chinese, Japanese and Korean Radical Text book.
  • CJK synonyms (The character that share the same meaning).
  • CJK I/O switch. For example, user might input simlified Chinese but wish to output Traditional Chinese and vise versa.
  • Ver 0.8
  • Evaluation framework
  • Plugin framework (such as Stroke Noise Reducer, Stroke Recognizer, Character Matcher)
  • Ver 0.9
  • Research paper about this project.
  • Help documentation.
  • Ver 1.0 (Official release)
  • Improve stroke editor/trainer interface.
  • Double writing canvas
  • Hot keys.
  • Ver 1.1
  • Transparent canvas
  • Frameless window
  • Other
  • UniHan support: Show the character information from UniHan

Contributions

Help is always welcome, several things need to be discussed and done: 1. Algorithm Plugins (such as Stroke Noise Reducer, StrokeRecognizer, CharacterMatcher) 2. UI 3. Radical Textbooks 4. Help documentaion (User and developer) 5. Feature ideas. Please join the writRecogn project on SourceForge, your efforts will be appreciated.

Feature Requests

Please put your feature request here.

Comments and Discussion

See Talk:Features/HandwritingRecognizer