jieba 自訂詞庫斷詞

在進行中文 Text Mining 前處理時,必須先經過斷詞處理。社群當中存在相當好的斷詞處理工具,如 jieba。但斷詞時常遇到一個問題:文本中重要的詞彙因為不常見於其它地方而被斷開,像是人物角色名稱。要處理這個問題,需將自訂詞庫提供給斷詞套件,才不會將重要詞彙斷開。

Read more

google 表單即時回饋

google 表單大幅降低蒐集問卷資料的難度;此外,表單將回應自動彙整成試算表更使分析資料變得非常容易。然而,google 表單缺乏一項重要的功能:即時將結果回饋給填寫者

Read more

我的 R 學習歷程

接觸 R 的時間大約五個月了,從原本對電腦、程式一竅不通到現在能有效率的 debug、寫出簡潔有條理的 R code、甚至用 R 與 Markdown 架站寫部落格。算一算,我每週通常至少 3 天會用到 R,不是督促自己熟悉 R,是因為它太有魅力了。

Read more

MathJax Setup

MathJax is a JavaScript display engine for mathematics that works in all browsers. By including MathJax support on the website, LaTeX mathematical expressions are rendered as pretty mathematical equations.

Read more

Constructing Life Tables with R

I have been using the package dplyr to handle with data for a while, and I thought I can use it with ease until I was stuck with my homework on contructing a life table. I found spreadsheets (either Excel or Google Spreadsheets) easy for handling this task, but had a hard time dealing with it in R. I think it was due to my unfamiliarity with the...

Read more