Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scraping with Geb

Scraping with Geb

GR8Conf Eu 2016

Sergio del Amo

June 02, 2016
Tweet

More Decks by Sergio del Amo

Other Decks in Technology

Transcript

  1. WHAT CAN YOU DO? ▸ PRICE-MONITORING WEBBOTS ▸ IMAGE-CAPTURING WEBBOTS

    ▸ LINK VERIFICATION WEBBOTS ▸ WEBBOTS THAT SEND EMAIL ▸ WEBBOTS THAT CONVERT A WEBSITE IN AN API ▸ SNIPERS
  2. Define the interesting parts of your pages in a concise,

    maintanable and extensible manner GEB PAGES
  3. GEB EXAMPLE GRADLE HTTPS://GITHUB.COM/GEB/GEB-EXAMPLE-GRADLE The following commands will launch the

    tests with the individual browsers: ./gradlew chromeTest ./gradlew firefoxTest ./gradlew phantomJsTest To run with all, you can run: ./gradlew test MARCIN ERDMANN
  4. SPLIT LOAD BETWEEN WEBBOTS 1 2 5 3 4 11

    12 15 13 14 21 22 25 23 24 31 32 35 33 34 41 42 45 43 44 6 7 10 8 9 16 17 20 18 19 26 27 30 28 29 36 37 40 38 39 46 47 50 48 49 def ids = 1..50 def webbotIndex = 3 def webbotsInParallel = 6 int total = ids.size() def sublistsSize = (total / webbotsInParallel) as int def s = ids.collate(sublistsSize)[webbotIndex]
  5. STEALTH MEANS SIMULATING HUMAN PATTERNS ▸ BE KIND TO YOUR

    RESOURCES ▸ RUN YOUR WEBBOTS DURING BUSY HOURS ▸ DON’T RUN YOUR WEBBOTS AT THE SAME TIME EACH DAY ▸ DON’T RUN YOUR WEBBOT ON HOLIDAYS AND WEEKENDS ▸ USE RANDOM, INTRA-FETCH DELAYS
  6. ?