星期三, 12月 10, 2008

只要量夠大:部落文版

一直想找機會紀錄最近一些和語言計量與文本研究有關的心得。(其實應該放在research blog 上的,不過因為應用上具有娛樂性,也許可藉此機會替計算語言學作點宣傳;-))

Extremely Fatigue


It was on like Thursday or Wednesday I don't really remember. I had the thought, in the middle of my spiky moments on Tuesday and Wednesday, that I've probably been on a wrong track in my general efforts, perceptible through this journal, to keep undue hopefulness out of my daily round. I could probably make an educated guess about their ages. The first time, I knew the answer to their little trivia question and called at least 30 times. I immediately called the distributor and canceled it. He called 3 times, each time telling me that he really couldn't get the cab and he's gotten be real late. THE DAY the titles started showing up again, I called again. She stamped and wrote and stamped again in my passport, and then stamped some pieces of paper. I started thinking to myself that it was really more of a courtesy invite and Audoctor wouldn't go and Id be safe to enjoy my friends in a fun atmosphere. I just really don't like people that pretend their time is more valuable than yours. Is this really what I'm supposed to do?

不好猜出,這是一篇完全由程式自動"寫"出的部落格文章吧。給個主題,程式利用網路上大量的資料萃取與拼湊語言片斷,結果還造成頗似意識流的技巧。這是鉅量資料的可怕的地方!它讓AI研究傳統上一些 toy programs 起死回生。不具語言理解能力、單純的 pattern matching,輔以每日以驚人速度成長的語料,已經開始讓人誤以為 machine intelligence 已悄然誕生。人開始需要站在機器的角度去想,理解該怎麼定義?溝通又是什麼意義?

我看了看,這個程式寫得不難。改成中文版,多實驗幾次,加入一些 discourse parameters,要瞞過人的眼睛指日可待。你問我,做這個要幹嘛?嗯,第一好玩嘛,第二跑過一次,保證妳對於人與機器的語言理解與溝通有進一步的體悟。第三呢?至少我覺得可以是一篇 qualified 的計算語言學論文:)

沒有留言:

張貼留言