O0h, look at me! My field is so ill-defined I can subscribe to any of dozens of contradictory models and still be taken seriously.


Kaggle competition and dataset on Farsi spell checking, try your algorithm!
https://www.kaggle.com/rtatman/faspell
It has been a while ...


so I got my comp ling degree, and moved to Canada (maple syrup, moose, cold to name few of stereotypes, right?). I got a job, actually two and right now, I am coding NLP stuff in a cubicle, feeling blessed and exhausted. To be continued ....

CLARA summer school

I've been in Copenhagen, attending CLARA summer school in semantic annotation. the course was great. the city was amazing.

Farsi morphoogical analyzer

http://pars-morph.appspot.com/

here is the link to a morphological analyzer in Farsi..for now, it  has just inflection, not derivation.
the guy who developed this and deployed it on Google app engine is: Vahid Mavaji









one percent more

"Many areas in NLP are like this. You can get 92% accuracy in a few hours of work, and then you can get 93% after a week or work, and then you can write a whole PhD thesis about how you got 94% accuracy."

Farsi pos tagger

Finally, I have a  Farsi pos tagger.  I trained a unigram, bigram and TnT tagger on 2 million tagged words of BijanKhan corpus. check the funny part: I trained them on all my data, without splitting it to train and test. now must re-train them again. I just looked at the results, were promising. so after re-training and evaluating, I will post the result. If my classmate, Vahid, helps, would deploy it on a web server.

NLTK 3.0

the good news is that NLTK 3.0 would be available by mid 2011. no more unicode problems, way to go with Farsi and NLTK modules.

New Year

happy blah blah..Does anyone have a clue about sense tagging a corpus? before the lung cancer! takes over me, I should know this. there's a kind of romance in the air that i can use. I can picture how my useless life would end: I like a  Chekhovian end with long, dry  coughs. quitting cigar could be a new year resolution, but no way. 

zipf law

wrote a small python program, which investigates Zipf law. from this, we realize that for most of the words of a text, we don't have many examples and our data about words, based on their frequencies, are sparse.



Zipf law diagram for Quran


zipf law diagram for Bijankhan corpus








Tiger and hyponymy extraction


 The curious case of my late studies could be a matter of laughter. on my age I should be lecturing the shi*. anyway finally I wrote my very first academic paper! yeyy!!
"tiger, tiger, burning bright, in the forests of night"
چکيده
تشخیص الگو یکی از روش‌های استخراج دانش و کشف روابط میان مفاهیم زبانی است. بنابراین برای استخراج دانش مفهومی از میان داده‌های زبانی باید به طراحی و ساخت الگوهای معنایی پرداخت. مقاله حاضر ضمن بررسی روش‌های موجود مبتنی بر الگو به معرفی چند الگوی واژگانی- نحوی برای تشخیص رابطه شمول معنایی می‌پردازد. داده‌های لازم برای آزمایش الگوها از  ویکی‌پدیای فارسی انتخاب شده است. این انتخاب به این دلیل صورت گرفته که ویکیپدیا به عنوان یک متن ساخت یافته، منبع خوبی برای استخراج روابط معنایی است. الگوهای معرفی شده در این نوشتار بر روی متون موجود در ویکی‌پدیا آزمایش شده و  دقت هر الگو مورد ارزیابی  قرار گرفته است.