Japanese Text Analysis
I'm researching Japanese text anaysis techniques. There appear to be two approaches: morphological/n-gram analysis and dictionary based which is more accurate but requires more up-front and ongoing work. Most of the dictionary based systems are commercial which is unsurprising considering the amount of effort that needs to be put into them. Some relevant links:
- Jim Breen's Japanese Page - a massive gateway to all kinds of Japanese computing and parsing information.
- Freely available Japanese Morphological Analysers & Dictionaries - a good listing of the currently available software.
- A Stochastic Morphological Analysis for Japanese employing Character n-Gram and k-NN method - research paper
- Japanese Text Initiative - corpus of Japanese texts
- Bibliography of n-gram research
- Japanese Language and Linguistics-related links - mostly general linguistic information
- Japanese Natural Language Processing Websites - links to the NLP home pages of Japanese academic institutions.
- Namazu - a Japanese full-text search engine
- Natural Language Processing & Text Analysis at the Hitachi Advanced Research Laboratory - some interesting research papers.
</ul>