{"id":677,"date":"2016-11-14T18:40:25","date_gmt":"2016-11-14T23:40:25","guid":{"rendered":"http:\/\/pmcgovern.ca\/wp\/?p=677"},"modified":"2021-12-12T12:30:07","modified_gmt":"2021-12-12T17:30:07","slug":"downloading-files-with-selenium","status":"publish","type":"post","link":"https:\/\/pmcgovern.ca\/wp\/?p=677","title":{"rendered":"Downloading Files With Selenium"},"content":{"rendered":"<p>Selenium in an indispensable tool but it cannot download files. It gives you control of the DOM but cannot interact with the native dialogs the browser coughs up when prompting for a file download.<\/p>\n<p>The work-around is to use Apache HTTP Client to download the target file by loading it up with cookies from Selenium. First, navigate to the SIT and do whatever is necessary to establish a session, then grab the cookies from Selenium and push them into HTTP Client&#8217;s cookie store, and perform an HTTP GET to fetch the file.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"java\">public static final int TIMEOUT = 10000;\n\nWebDriver driver;\n\n\/\/ Do Selenium stuff...\n\nProductsPage products = this.home.navigateProducts();\n\n\/\/ Set up the HTTP client to emulate a browser.\n\/\/ Use cookies and follow 302 redirects\n\/\/ Set a reasonable timeout\nRequestConfig requestCfg = RequestConfig.custom()\n.setConnectTimeout( TIMEOUT )\n.setSocketTimeout( TIMEOUT )\n.setConnectionRequestTimeout( TIMEOUT )\n.build();\n\nBasicCookieStore cookieStore = new BasicCookieStore();\n\nCloseableHttpClient httpclient = HttpClients.custom()\n.setDefaultCookieStore(cookieStore)\n.setRedirectStrategy( new LaxRedirectStrategy() )\n.setDefaultRequestConfig( requestCfg )\n.build();\n\n\/\/ Save state\nHttpClientContext context = HttpClientContext.create();\n\n\/\/ Duplicate cookies from Selenium in HTTP Client\nSet<cookie> seCookies = this.driver.manage().getCookies();<\/cookie>\n\nfor( Cookie seCookie : seCookies ) {\n\nSystem.out.println( \"Converting cookie \" + seCookie.getName() );\n\nBasicClientCookie dupCookie =\nnew BasicClientCookie(seCookie.getName(), seCookie.getValue());\n\ndupCookie.setDomain( seCookie.getDomain());\ndupCookie.setPath( seCookie.getPath());\ndupCookie.setSecure(seCookie.isSecure());\ndupCookie.setExpiryDate( seCookie.getExpiry());\n\ncookieStore.addCookie(dupCookie);\n}\n\n\/\/ Download the file\nFileResponseHandler fileHandler = new FileResponseHandler( \"out.csv\" );\n\nHttpGet httpget = new HttpGet( \"https:\/\/sit\/foobar\/out.csv\" );\n\nFile csv = httpclient.execute( httpget, f, context );\n\nSystem.out.println( \"Downloaded to \" + csv.getAbsolutePath() );\n<\/pre>\n<p>The file response handler that writes the file from the HTTP response is described thusly:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"java\">public class FileResponseHandler implements ResponseHandler<file> {<\/file>\n\nprotected String filename;\n\npublic FileResponseHandler( String filename ) {\n\nif( filename == null || filename.isEmpty() ) {\nthrow new IllegalArgumentException( \"Filename is null or empty.\" );\n}\n\nthis.filename = filename;\n}\n\npublic File handleResponse( HttpResponse response ) throws IOException {\n\nSystem.out.println( \"File download response: \" +\nresponse.getStatusLine().getStatusCode() );\n\nif( 200 != response.getStatusLine().getStatusCode() ) {\nthrow new IllegalStateException( \"Bad response code for file fetch \" +\nresponse.getStatusLine().getStatusCode() );\n}\n\nFile outFile = new File( this.filename );\nFileOutputStream out = new FileOutputStream( outFile );\n\nHttpEntity entity = response.getEntity();\nBufferedInputStream in = new BufferedInputStream(entity.getContent());\n\nbyte[] buff = new byte[ 4096 ];\nint read = 0;\nint total = 0;\nint oldTotal = 0;\n\nwhile((read = in.read( buff )) != -1 ) {\nout.write( buff,  0,  read );\ntotal += read;\n\n\/\/ Output every 1MB\nif( total - oldTotal &gt; 1024000) {\nSystem.out.print( \".\" );\noldTotal = total;\n}\n}\nSystem.out.println();\nout.flush();\nout.close();\n\nSystem.out.println( \"Wrote \" + total + \" bytes to \" + this.filename );\nreturn outFile;\n}\n}\n<\/pre>\n<p>A similar approach can be found <a href=\"http:\/\/ardesco.lazerycode.com\/index.php\/2012\/07\/how-to-download-files-with-selenium-and-why-you-shouldnt\/\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Selenium in an indispensable tool but it cannot download files. It gives you control of the DOM but cannot interact with the native dialogs the browser coughs up when prompting for a file download. The work-around is to use Apache&#8230;<\/p>\n","protected":false},"author":1,"featured_media":678,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-677","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-programming"],"_links":{"self":[{"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=\/wp\/v2\/posts\/677","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=677"}],"version-history":[{"count":16,"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=\/wp\/v2\/posts\/677\/revisions"}],"predecessor-version":[{"id":875,"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=\/wp\/v2\/posts\/677\/revisions\/875"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=\/wp\/v2\/media\/678"}],"wp:attachment":[{"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=677"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=677"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pmcgovern.ca\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=677"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}