Websites are constantly being scraped by scripts and bots and other little automated workers. Sometimes this is good, sometimes this is bad, but usually it's somewhere in the middle.
For the unfamiliar, scraping is the practice of copying or otherwise extracting information from a website. Applications that provide comparison shopping or data monitoring rely on some form of scraping in order to function. These are generally considered legitimate uses of scraping and often follow the terms of service put forth by the content provider. The more nefarious type of scraping occurs when content from one website is published on another, with no permission, and no regard to copyright or terms of service. Legal cases like Ticketmaster.com v Tickets.com and American Airlines v FareChase are examples of what happens when scraping doesn't fit nicely into either category.
So what can you do if you'd like to manage the scraping of your content? Well as it turns out, the Barracuda Web Application Firewall is made for that kind of thing.
The Barracuda Web Application Firewall protects servers, applications, and data from web-based attacks and other activities. When it comes to scraping, there are several features that will be of particular importance:
- Heuristic fingerprinting and IP reputation: IP Addresses can be restricted based on GeoIP or the Barracuda Reputational Database. Admins can then block, throttle, or CAPTCHA challenge suspicious traffic using these identifications methods.
- Deny access to a specific user agent: Once a scraper and its user-agent string has been identified, the WAF can be configured to deny access to this traffic.
- Prevent forceful browsing: Forceful browsing is the attempt to access unlinked content on a website. For example, you may have a web page for employee access which is available only if the URL is entered directly into the browser. Forceful browsing finds these pages and searches for valuable content or other data.
For a complete list of features and capabilities, visit the features section of the Barracuda Web Application Firewall corporate page.
With these capabilities, you can deploy a variety of strategies to protect your content. Want to prohibit scraping and stop it completely if it gets through? The Barracuda WAF can do that. Want to prohibit only unauthorized scraping? The Barracuda WAF can do that. Want to allow scraping to all but only to linked resources? The Barracuda WAF can do that.
For a 30-day risk-free evaluation of the Barracuda Web Application Firewall, visit this page. For more information on the product, take a look here:
- Barracuda Web Application Firewall product page
- Barracuda Web Application Firewall Vx product page
- Technical documentation
- Live demo (User: guest Pwd: [blank])
- Risk-free 30-day free demo unit
Christine Barry is Senior Chief Blogger and Social Media Manager at Barracuda. In this role, she helps bring Barracuda stories to life and facilitate communication between the public and Barracuda internal teams. Prior to joining Barracuda, Christine was a field engineer and project manager for K12 and SMB clients for over 15 years. She holds several technology credentials, a Bachelor of Arts, and a Master of Business Administration. She is a graduate of the University of Michigan.