Comp90049 knowledge technologies project lexical, Computer Engineering

Comp90049 knowledge technologies project lexical

Knowledge Technologies Project: Lexical Normalisation of Twitter Data

Overview -

The goal of this Project is to assess the performance of some spelling correction methods on the problem of tweet normalisation, and to express the knowledge that you have gained in a technical report. This aims to reinforce concepts in approximate matching and evaluation, and to strengthen your skills in data analysis and problem solving.

Deliverables

1. One or more programs, implemented in one or more programming languages, which must:

Determine the best match(es) for a token, with respect to a reference collection (dictionary)
Process the data input ?le(s), to determine the best match for each token
Evaluate the matches, with respect to the truly intended words, using one or more evaluation metrics

2. A README that brie?y details how your program(s) work(s). You may use any external re- sources for your program(s) that you wish: you must indicate these, and where you obtained them, in your README. The program(s) and README are required submission elements, but will not typically be directly assessed.

3. A technical report, of 1000-1600 words, which must:

Give a short description of the problem and data set
Brie?y summarise some relevant literature
Brie?y explain the approximate matching technique(s), and how it is (they are) used
Present the results, in terms of the evaluation metric(s) and illustrative examples
Contextualise the system's behaviour, based on the (admittedly incomplete) understanding from the subject materials
Clearly demonstrate some knowledge about the problem

By using this data, you are becoming part of the research community - consequently, as part of your commitment to Academic Honesty, you must cite the curators of the dataset in your report, as the following publication:

Bo Han and Timothy Baldwin (2011) Lexical normalisation of short text messages: Makn sens a #twitter. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, USA. pp. 368-378.

Reports that do not cite this work constitute plagiarism, and will be correspondingly assigned a mark of 0.

Please note that the dataset is a sub-sample of actual data posted to Twitter, with almost no ?ltering whatsoever. Unfortunately, the Internet is a place where freedom of speech is both empowering and harmful: consequently, some of the information expressed in the tweets is undoubtedly in poor taste. We would ask you to please look beyond this to the task at hand, as much as possible. (For example, it is generally not necessary to actually read the tweets themselves.)

The opinions expressed within the tweets in no way express the of?cial views of the University of Melbourne or any of its employees; using the data in a teaching capacity does not constitute endorsement of the views expressed within. The University accepts no responsibility for offence caused by any content contained within this data.

Attachment:- Assignment Files.rar

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Computer Engineering: Comp90049 knowledge technologies project lexical

Reference No:- TGS02430654

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Computer Engineering: Comp90049 knowledge technologies project lexical

Reference No:- TGS02430654

Have a Question? (oR Write a Review)

Recent Questions Asked Computer Engineering

Q : How is juvenile crime measured in the united states include

Q : Is voter fraud a major problem for our democracy or are

Q : How or why human factors can influence the effectiveness of

Q : Consider two approaches to reducing emissions of co2 into

Q : Comp90049 knowledge technologies project lexical

Q : Choose from one of the topics below and analyze its history

Q : What are the motivations of average congressmen what

Q : Describe the etiology and pathophysiology of morbid obesity

Q : List and describe five roles manages play in carrying out

Discuss reflective questions to help gain self-awareness

What positive feedback incident would include

What term theoretical integration refer to in criminology

Problem about training sessions and competitions

Why athlete prefers to exercise on an empty stomach

Absence of intentional transformational leadership

Ways the project manager can assist others

Request for Solution File

Ask an Expert for Answer!!

Computer Engineering: Comp90049 knowledge technologies project lexical

Reference No:- TGS02430654

Recent Questions Asked Computer Engineering

Q : How is juvenile crime measured in the united states include

Q : Is voter fraud a major problem for our democracy or are

Q : How or why human factors can influence the effectiveness of

Q : Consider two approaches to reducing emissions of co2 into

Q : Comp90049 knowledge technologies project lexical

Q : Choose from one of the topics below and analyze its history

Q : What are the motivations of average congressmen what

Q : Describe the etiology and pathophysiology of morbid obesity

Q : List and describe five roles manages play in carrying out

Asked Questions