Identify the key properties of a web crawler describe in


Use Crawler Java Assignment

Review, fix and run the crawler.

Add code for additional requiments.

Make sure you crawler does the following.

Test your crawler only on the data in:

https://lyle.smu.edu/~fmoore

Make sure that your crawler is not allowed to get out of this directory!!! Yes, there is a robots.txt file that must be used. Note that it is in a non-standard location.

The required input to your program is N, the limit on the number of pages to retrieve and a list of stop words (of your choosing) to exclude.

Perform case insensitive matching.

You can assume that there are no errors in the input. Your code should be robust under errors in the Web pages you're searching. If an error is encountered, feel free, if necessary, just to skip the page where it is encountered.

1. Identify the key properties of a web crawler. Describe in detail how each of these properties is implemented in your code.

2. Use your crawler to list the URL of all pages in the test data and report all out-going links of the test data. [10 points] display the contents of the tag</p> <p style="text-align: justify;">3. Implement duplicate detection, and report if any URLs refer to already seen content.</p> <p style="text-align: justify;">4. Use your crawler to list all broken links within the test data.</p> <p style="text-align: justify;">5. How many graphic files are included in the test data?</p> <p style="text-align: justify;">6. Have your crawler save the words from each page of type (.txt, .htm, .html). Make sure that you do not save HTML markup. Explain your definition of "word". In this process, give each page a unique document ID.</p> <p style="text-align: justify;">Implement Stemming</p> <p style="text-align: justify;">7. Report the 20 most common words with its document frequency. words or stemmed words?</p> <p><strong>Attachment:-</strong> <a href="https://secure.tutorsglobe.com/Atten_files/409_crawler_project.zip" target="_blank">crawler_project.zip</a></p></p> </div> <div id="viewreadmore" class="link"> <a id="readmore" href="javascript:void(0);" class="read-more-trigger mar_top10" onclick="changeheight(this)">View Complete Question</a> </div> <div id="DivSolution"> <h4> Solution Preview : </h4> <div class="seprator"> </div> <p> </p> <div class="downloadfiles"> <h5> Prepared by a verified Expert</h5> <h6> JAVA Programming: Identify the key properties of a web crawler describe in</h6> <h5> Reference No:- TGS02238162</h5> <input type="submit" name="getPaid" value="Purchase Solution File" id="getPaid" class="btn btn-success btn-lg btn-block-sm mar_btm20" /> <p> Now Priced at $70 (50% Discount)</p> </div> <div style="text-align: justify"></div> </div> </div> <div class="row"> <div class="col-sm-12 reviewbox"> <div id="PlnRated"> <div class="row recomded"> <div class="recomdedbox col-sm-2 col-xs-12"> <p class="inner"><i class="fa fa-thumbs-o-up"></i> Recommended <b>(90%)</b></p> </div> <div class="recomdedbox col-sm-2 col-xs-12"> <p class="inner rating"><i class="fa fa-star"></i> Rated <b>(4.3/5)</b></p> </div> </div> </div> <div class="row "> <div class="panel-group review" id="accordion" role="tablist" aria-multiselectable="true"> <div class="panel-heading" role="tab" id="headingTwo"> <h4 class="panel-title"> <a class="collapsed" role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseTwo" aria-expanded="false" aria-controls="collapseTwo"> Have a Question? (oR Write a Review) </a> </h4> </div> <div id="collapseTwo" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingTwo"> <div class="panel-body"> <div class="col-sm-12"> <div class="row search searchbg message"> <span id="RequiredFieldValidator1" style="visibility:hidden;">Write atleast 100 words!!</span> <textarea name="txtcomments" id="txtcomments" maxlength="1000" ValidationGroup="Review" placeholder="Write your review" class="form-control" rows="6"></textarea> <div class="pull-right mar_top20"> <input type="submit" name="btnReviewSubmit" value="Submit" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("btnReviewSubmit", "", true, "Review", "", false, false))" id="btnReviewSubmit" class="btn btn-primary pull-right" /> </div> </div> </div> </div> </div> </div> </div> </div> </div> <div class="user-comments-area hidden-xs"> <h4 class="text-uppercase mar_btm20"> <i class="fa fa-question-circle"></i>   Recent Questions Asked JAVA Programming</h4> <ul class="user-comments-list"> <table id="dlMaterials" cellspacing="0" style="width:100%;border-collapse:collapse;"> <tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_0" class="studenthdname" href="https://www.tutorsglobe.com/question/according-to-a-recent-study93--of-high-school-dropouts-are-52238158.aspx">According to a recent study93 of high school dropouts are</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_0">according to a recent study93 of high school dropouts are 16- to 17-year-olds in addition65 of high school dropouts</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_1" class="studenthdname" href="https://www.tutorsglobe.com/question/why-would-it-be-important-to-occasionally-check-your-52238159.aspx">Why would it be important to occasionally check your</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_1">assignment web scenerio foruminstructions discuss the following below1 the role of css in htmla advantages of style</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_2" class="studenthdname" href="https://www.tutorsglobe.com/question/design-a-database-diagram-for-a-database-that-stores-52238160.aspx">Design a database diagram for a database that stores</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_2">sql server 2012 assingment1 design a database diagram for a database that stores information about the downloads</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_3" class="studenthdname" href="https://www.tutorsglobe.com/question/write-an-essay-on-the-effects-of-internet-usage-or-lack-52238161.aspx">Write an essay on the effects of internet usage or lack</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_3">write an essay on the effects of internet usage or lack thereof on your daily life following the steps diane wood took</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_4" class="studenthdname" href="https://www.tutorsglobe.com/question/identify-the-key-properties-of-a-web-crawler-describe-in-52238162.aspx">Identify the key properties of a web crawler describe in</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_4">use crawler java assignmentreview fix and run the crawleradd code for additional requimentsmake sure you crawler does</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_5" class="studenthdname" href="https://www.tutorsglobe.com/question/we-toss-an-unfair-coin-100-times-in-a-row-we-play-according-52238163.aspx">We toss an unfair coin 100 times in a row we play according</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_5">we toss an unfair coin 100 times in a row we play according to following rules if tail 1 if head -145 p head04 estimate</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_6" class="studenthdname" href="https://www.tutorsglobe.com/question/based-on-the-answer-from-question-9-calculate-90-confidence-52238164.aspx">Based on the answer from question 9 calculate 90 confidence</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_6">hollow is proud of their energy saving program a sample of 29 houses reveals an average saving of 475 kilowatt hours</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_7" class="studenthdname" href="https://www.tutorsglobe.com/question/psyc-164--please-watch-the-following-ted-talk-there-is-some-52238165.aspx">Psyc 164 please watch the following ted talk there is some</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_7">assignmentplease watch the following ted talk there is some overlap with my module - wish id known that before i</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_8" class="studenthdname" href="https://www.tutorsglobe.com/question/you-are-skeptical-of-the-business-school-claim-and-decide-52238166.aspx">You are skeptical of the business school claim and decide</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_8">a local business school claims that its graduating seniors get higher-paying jobs than the national average for</span></p> </div> <!-- /comment-box --> </li> </td> </tr> </table> </ul> <!-- /user-comments-list --> </div> </div> <div class="col-md-4 col-xs-12 login-area innerpage"> <div class="row"> <div class="details col-md-12"> <div class="col-md-4"> <div class="circle orange"> <i class="fa fa-question"></i> </div> <p> 1960004 </p> <p> Questions<br /> Asked</p> </div> <div class="col-md-4"> <div class="circle yellow"> <i class="fa fa-user-secret"></i> </div> <p> 3,689</p> <p> Active Tutors</p> </div> <div class="col-md-4"> <div class="circle green"> <i class="fa fa-thumbs-o-up"></i> </div> <p> 1455657</p> <p> Questions<br /> Answered</p> </div> <p><b> Start Excelling in your courses, Ask a tutor for help and get answers for your problems !! </b></p> <a href="https://www.tutorsglobe.com/post-your-job-for-free.aspx" class="btn btn-primary btn-lg mar_top10">ask Question</a> </div> </div> <div class="row"> <div class="user-comments-area hidden-xs"> <hr /> <h4 class="text-uppercase mar_btm20"> <i class="fa fa-question-circle"></i> Asked Questions</h4> <hr /> <ul class="user-comments-list"> <table id="dlNewReviews" cellspacing="0" style="width:100%;border-collapse:collapse;"> <tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_0" class="studenthdname" href="https://www.tutorsglobe.com/question/identify-ego-psychology-limitations-53424096.aspx">Identify ego psychology limitations</a></h5> <p> <span id="dlNewReviews_lblReviews_0">Identify Ego Psychology limitations when working with individuals diagnosed with depression (What are some potential weaknesses or</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_1" class="studenthdname" href="https://www.tutorsglobe.com/question/how-level-of-parental-involvement-impact-middle-school-53424090.aspx">How level of parental involvement impact middle school</a></h5> <p> <span id="dlNewReviews_lblReviews_1">How does the level of parental involvement impact middle school students' academic performance and overall educational experience?</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_2" class="studenthdname" href="https://www.tutorsglobe.com/question/compare-independent-variables-dependent-variables-53424085.aspx">Compare independent variables, dependent variables</a></h5> <p> <span id="dlNewReviews_lblReviews_2">Problem: Compare independent variables, dependent variables, and extraneous variables. </span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_3" class="studenthdname" href="https://www.tutorsglobe.com/question/you-stay-late-at-work-to-finish-an-important-report-53424089.aspx">You stay late at work to finish an important report</a></h5> <p> <span id="dlNewReviews_lblReviews_3"> You stay late at work to finish an important report. Your supervisor found out and was so thrilled that you took the initiative to do so</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_4" class="studenthdname" href="https://www.tutorsglobe.com/question/introduce-the-instrument-you-are-focusing-on-53424094.aspx">Introduce the instrument you are focusing on</a></h5> <p> <span id="dlNewReviews_lblReviews_4">Introduce the instrument you're focusing on. Provide some background information about it and state why it's important in psychological assessment.</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_5" class="studenthdname" href="https://www.tutorsglobe.com/question/can-you-put-this-in-study-guide-format-53424086.aspx">Can you put this in study guide format</a></h5> <p> <span id="dlNewReviews_lblReviews_5">Can you put this in study guide format: External Validity Sample characteristics Stimulus Characteristics Multiple Treatment Interference </span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_6" class="studenthdname" href="https://www.tutorsglobe.com/question/searching-for-articles-relating-to-young-people-53424091.aspx">Searching for articles relating to young people</a></h5> <p> <span id="dlNewReviews_lblReviews_6">You might be interested in searching for articles relating to young people and you are aware that this term has many related words</span></p> </div> <!-- /comment-box --> </li> </td> </tr> </table> </ul> </div> </div> </div> </div> </div> </div> </div> <script> var url = 'https://www.tutorsglobe.com/include/javascript/watiWidget.js'; var s = document.createElement('script'); s.type = 'text/javascript'; s.async = true; s.src = url; var options = { "enabled":true, "chatButtonSetting":{ "backgroundColor":"#00e785", "ctaText":"Whatsapp Support!!", "borderRadius":"25", "marginLeft": "0", "marginRight": "20", "marginBottom": "20", "ctaIconWATI":false, "position":"left" }, "brandSetting":{ "brandName":"Tutorsglobe", "brandSubTitle":"Trusted Since 2005", "brandImg":"https://www.tutorsglobe.com/include/images/chat-logo.svg", "welcomeText":"Hi there!\nDo you Need help?", "messageText":"Hello, Tutorsglobe !! I have a question!", "backgroundColor":"#00e785", "ctaText":"Chat with Whatsapp", "borderRadius":"25", "autoShow":false, "phoneNumber":"441416286080" } }; s.onload = function() { CreateWhatsappChatWidget(options); }; var x = document.getElementsByTagName('script')[0]; x.parentNode.insertBefore(s, x); </script> <footer class="site-footer"> <div class="container"> <div class="footerlinks"> <a href="https://www.tutorsglobe.com/">Home</a> | <a href="https://www.tutorsglobe.com/about-us.aspx">Company Overview</a> | <a href="https://www.tutorsglobe.com/services.aspx">Services</a> | <a href="https://www.tutorsglobe.com/library/">Discover Q&A</a> | <a href="https://www.tutorsglobe.com/sitemap.aspx">Sitemap</a> | <a href="https://www.tutorsglobe.com/contact-us.aspx">Contact Us</a> | <a href="https://www.tutorsglobe.com/terms-and-conditions.aspx">T & C</a> | <a href="https://www.tutorsglobe.com/refundcancelpolicy.aspx">Refund Policy</a> | <a href="https://www.tutorsglobe.com/copyright-infringement-policy.aspx">Copyright Policy</a> | <a href="https://www.tutorsglobe.com/blog/archive/">Blog</a> | <a href="https://www.tutorsglobe.com/library/archive.aspx">Q&A</a> | <a href="https://www.tutorsglobe.com/education-directory.aspx">Directory</a> </div> <p>©TutorsGlobe</a> All rights reserved 2022-2023. </p> <script type="application/ld+json"> { "@context": "http://schema.org/", "@type": "product", "name": "Tutorsglobe", "image": "https://www.tutorsglobe.com/IncludeLib/Images/logo.png", "description": "elearning Platform - Tutor Service", "brand": { "@type": "elearning", "name": "Tutorsglobe" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": "4.9", "ratingCount": "37128" } } </script> <a href="#" class="settings"><i class="fa fa-angle-up"></i></a> <ul class="social-icons"> <li><a href="https://www.facebook.com/TutorsGlobe" rel="nofollow" target="_blank"><i class="fa fa-facebook-square"></i></a></li> <li><a href="https://twitter.com/Tutorsglobe" rel="nofollow" target="_blank"><i class="fa fa-twitter-square"></i></a></li> <li><a href="#" rel="nofollow"><i class="fa fa-youtube-square"></i></a></li> <li><a href="https://www.linkedin.com/company/tutorsglobe" target="_blank" rel="nofollow"><i class="fa fa-linkedin-square"></i></a></li> </ul> </div> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-32333066-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.tutorsglobe.com/IncludeLib/js/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> <script async src="https://www.googletagmanager.com/gtag/js?id=G-5E9QFMFDJR"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-5E9QFMFDJR'); </script> </footer> </div> <!-- /pageWrap --> <div class="overlay"> </div> <!-- JavaScript Files ================================================== --> <script type="text/javascript" src="../IncludeLib/js/jquery-1.11.2.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/bootstrap.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/jquery.mCustomScrollbar.concat.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/script.js"></script> <script type="text/javascript" src="../IncludeLib/js/ie10-viewport-bug-workaround.js"></script> <script type="text/javascript"> //<![CDATA[ var Page_Validators = new Array(document.getElementById("RequiredFieldValidator1")); //]]> </script> <script type="text/javascript"> //<![CDATA[ var RequiredFieldValidator1 = document.all ? document.all["RequiredFieldValidator1"] : document.getElementById("RequiredFieldValidator1"); RequiredFieldValidator1.controltovalidate = "txtcomments"; RequiredFieldValidator1.errormessage = "Write atleast 100 words!!"; RequiredFieldValidator1.validationGroup = "Review"; RequiredFieldValidator1.evaluationfunction = "RequiredFieldValidatorEvaluateIsValid"; RequiredFieldValidator1.initialvalue = ""; //]]> </script> <script type="text/javascript"> //<![CDATA[ var Page_ValidationActive = false; if (typeof(ValidatorOnLoad) == "function") { ValidatorOnLoad(); } function ValidatorOnSubmit() { if (Page_ValidationActive) { return ValidatorCommonOnSubmit(); } else { return true; } } //]]> </script> </form> </body> </html>