Our proprietary screen scraping technology can be configured to the a customers exact specification to retrieve information from any web site, and more importantly can keep that information up to date. This is ideal for competitor product & price analysis, legacy data migrations, building up the data necessary for price comparison web sites, website change detection and web research
Once retrieved this data can be provided in numerous formats including Excel, CSV and XML
Part of what we have been doing for years has involved selectively crawling web sites to retrieve pages that can be indexed by one of the search platforms we specialise in, namely Lucene and Microsoft Fast ESP. However we have often thought that the web spider technology used to crawl these sites for search engines could also be used to intelligently identify and extract useful information from the pages it encounters for other purposes.
Information such as product names, brands, prices, product images and delivery information. Attributes for products such as screen size, colour and weight, or maybe non product related information such as key contacts within companies, email and postal addresses, telephone numbers. The list goes on.
So we turned our attention to packaging our web spider software as a self contained, configurable software application that runs on our servers and can be tuned so that it “understands” the structure of key web sites and knows where to go for this information.
We are now so happy with it that we find that it takes only 30 minutes to configure the crawler for a new site, and are therefore able to offer the data extraction as a service to MarabouStork customers
Further details can be found on the DataVault™ product pages here


