Thursday, March 24, 2011

Screen Scraping

Hi friends , now i learn cURL , i face one difficult that is to login into a page by username and password directly

From stackoverflow
  • For standard HTTP authentication, you could try:

    curl http://username:password@url
    

    It should work!

  • The method you need to use will depend on exactly how the web page's username/password checking is implemented, but this might help you:
    http://curl.haxx.se/mail/archive-2008-05/0113.html

  • I assume you want to fetch pages hidden behind a login page, and this page is not CAPTCHA-protected. To do it, you have to

    1. send POST request with login form data to the submit URL of the login form (see HTML source)
    2. save cookies
    3. send these cookies with all subsequent requests (update if necessary)

    I do it with wget. curl should be similar (see its manual).

    1, 2:

    wget --keep-session-cookies --save-cookies "mycookies" \
         --post-data "login=mylogin&password=mypass" submit_URL
    

    3:

    wget --load-cookies "mycookies" --keep-session-cookies --save-cookies "mycookies" \
         another_URL_behind_login_form
    

    From what I see in the man curl, 1–2 should be something like this (not tested):

    curl -F "login=mylogin;password=mypass" -c "mycookies" submit_URL
    

    and 3:

    curl -b "mycookies" -c "mycookies" another_URL
    

    But I didn't try it with curl.

0 comments:

Post a Comment