– All Files in this holder are under the New BSD License
We describe how to install an array-based SQL implementation of the Viterbi algorithm for compute the maxlimum likelihood labeling for a piece of text using Condition Random Field (CRF) in PostgreSQL 8.4.1.
- System requirement: PostgreSQL 8.4.1
- Add UDF function files topk_array.c and viterbi_top1.sql to PostgreSQL8.4.1 source codebase
- Modify the paths in topk-array.c, and compile the C UDF functions. For more imformation on compilation of the C UDF function in PostgreSQL, please refer to http://www.postgresql.org/docs/8.4/interactive/xfunc-c.html.
- Modify setup_viterbi.sql to point to the path of topk_array.c and viterbi_top1.sql
- Modify all the path references in import_data.sql to point to the loaction of data files enron.test.MR, enron.test.SegmentTbl, and enron.labels
- Import the enron sample dataset (a small number of signatures from enron email corpus) by calling import_data.sql: \i import_data.sql
- Setup the Viterbi UDF functions by calling setup_viterbi.sql, which includes the C and SQL UDF functions: \i setup_viterbi.sql
- Try it out!
-- Compute the ML segmentation for the first document
select ie.viterbi_top1(1);
select * from display_u10;
-- Compute the ML segmentation for the first 10 documents
truncate table u10;
select ie.viterbi_top1(doc_id) from doc_id_tbl where doc_id<11;
select * from display_u10;