There are number sentences in the attached file the words


There are number sentences in the attached file. The words in these sentences are annotated with set of tags. I would like to have a script that splits the sentences into chunks based on the annotation tags (tag1 and tag2). The splitted chunks must be written into two files based the annotation tags. These files are tag1 file and tag2 file. Information such as sentence number, Chunk number and word index must be maintained.

To clarify I will use the example in the attached file:

There are 4 sentences:

sentence1: He is a good person .
sentence2: Thank you so much !!
sentence3: john likes to play with his friends .
sentence4: Netflix has almost 75 million global subscribers.

These sentences must be splitted into chunks and written into two files as following:

in the attached file there are 4 sentences:

sentence1: He is a good person .
sentence2: Thank you so much !!
sentence3: john likes to play with his friends .
sentence4: Netflix has almost 75 million global subscribers.

For sentences1, there are two chunks:

Chunk-1: He is >> written in tag1 file
Chunk-2: a good person . >>> written in tag2 file

For sentences2, there are two chunks:

Chunk-1:Thank you >> written in tag2 file
Chunk-2: so much !! >> written in tag1 file

For sentences3, there are four chunks:

Chunk-1: John Adam likes >> written in tag1 file
Chunk-2: to play >> written in tag2 file
Chunk-3: with his >> written in tag1 file
Chunk-4:friend :) >> written in tag2 file

For sentences4, there are four chunks:

Chunk-1:Netflix has >> written in tag1 file
Chunk-2: almost 75 million >> written in tag2 file
Chunk-3: global >> written in tag1 file
Chunk-4: subscribers >> written in tag2 file

As I mention above the following information must be maintained: sentence number, chunk number and word index.Maintaining these information is helpful to re-construct the sentences. So the script should be able to use the information from the two files (tag1 and tag2 files) to form the original file ( the attached file).

I'm attaching just a sample of sentences. I will test the script on the original file that includes a huge number of sentences.
you can write two scripts one for splitting into two files and the other for joining the two files to form the original file, or just write one script that can do the tasks.

Word-Index Word Tag
0 He tag1
1 is tag1
2 tag2
3 good tag2
4 person tag2
5 . punctuation 
0 Thank tag2
1 you tag2
2 so tag1
3 much tag1
4 !! punctuation 
0 John NE
1 Adam NE
2 likes tag1
3 to tag2
4 play tag2
5 with tag1
6 his tag1
7 friends tag2
8 :) emoticon 
0 Netflix NE
1 has tag1
2 almost tag2
3 75 number
4 million number
5 global tag1
6 subscribers tag2

Solution Preview :

Prepared by a verified Expert
JAVA Programming: There are number sentences in the attached file the words
Reference No:- TGS01481663

Now Priced at $15 (50% Discount)

Recommended (91%)

Rated (4.3/5)