I’m researching Japanese text anaysis techniques. There appear to be two approaches: morphological/n-gram analysis and dictionary based which is more accurate but requires more up-front and ongoing work. Most of the dictionary based systems are commercial which is unsurprising considering the amount of effort that needs to be put into them. Some relevant links:
- Jim Breen’s Japanese Page – a massive gateway to all kinds of Japanese computing and parsing information.
- Freely available Japanese Morphological Analysers & Dictionaries – a good listing of the currently available software.
- A Stochastic Morphological Analysis for Japanese employing Character n-Gram and k-NN method – research paper
- Japanese Text Initiative – corpus of Japanese texts
- Bibliography of n-gram research
- Japanese Language and Linguistics-related links – mostly general linguistic information
- Japanese Natural Language Processing Websites – links to the NLP home pages of Japanese academic institutions.
- Namazu – a Japanese full-text search engine
- Natural Language Processing & Text Analysis at the Hitachi Advanced Research Laboratory – some interesting research papers.