By Camuel Gilyadov, on March 1st, 2012

Apache Hadoop over OpenStack Swift

This is a post by Constantine Peresypkin and David Gruzman.
Lately we were working on integrating Hadoop with OpenStack Swift. Hadoop doesn’t need an introduction neither does OpenStack. Swift is an object-storage system and the technology behind RackSpace cloud-files (and quite a few others like Korea Telecom object storage, Internap and etc…)
Before we go into details of Hadoop-Swift integration let’s get some relevant background:
  1. Hadoop already have integration with Amazon S3 and is widely used to crunch S3-stored data. http://wiki.apache.org/hadoop/AmazonS3
  2. NameNode is a known SPOF in Hadoop. If it can be avoided it would be nice.
  3. Current S3 integration stages all data as temporary files on local disk to S3. That’s because S3 needs to know content length in advance it is one of the required headers.
  4. Current S3 also suffers form 5GB max file limitation which is slightly annoying.
  5. Hadoop requires seek support which means that HTTP range support is required if it is run over an object-storage . S3 supports it.
  6. Append file support is optional for Hadoop, but it’s required for HBase. S3 doesn’t have any append support thus native integration can not use HBase over S3.
  7. While OpenStack Swift is compatible with S3, RackSpace CloudFiles is not. It is because RackSpace CloudFiles disables S3-compatibility layer in Swift. This prevents existing Swift users from integration with Hadoop.
  8. The only information that is available on Internet on Hadoop-Swift integration is that with using Apache Whirr! it should work. But for best of our knowledge it is relevant only to rolling out Block FileSystem on top of Swift not a Native FileSystem. In other words we haven’t found any solution on how to process data that is already stored in RackSpace CloudFiles without costly re-importing.
So instrumented with above information let’s examine what we got here:
  1. In general we instrumented Hadoop to run over Swift naively, without resorting to S3 compatibility layer.  This means it works with CloudFiles which misses the S3-compatibility layer.
  2. CloudFiles client SDK doesn’t have support for HTTP range functionality. Hacked it to allow using HTTP range, this is a must for Hadoop to work.
  3. Removed the need for NameNode in similar way it is removed with S3 integration for Amazon.
  4. As opposed to S3 implementation we avoided staging files on local disk to and from CloudFiles/Swift. In other words data directly streamed to/from compute node RAM into CloudFiles/Swift.
  5. Though the data is still processed remotely. Extensive data shipping takes place between compute nodes and CloudFiles/Swift. As frequent readers of this blog know we are working on technology that will allow to run code snippets directly in Swift. Look here for more details: http://www.zerovm.com. As next step we plan to perform predicate-pushdown optimization to process most of data completely locally inside ZeroVM-enabled object-storage system.
  6. Support for native Swift large objects is planned also (something that’s absent in Amazon S3)
  7. We also working on append support for Swift (this could be easily done through Swift large object support which uses versioning) so even HBase will work on top of Swift, and this is not the case with S3 now.
  8. As it is the case with Hadoop S3, storing BigData in native format on Swift provides options for multi-site replication and CDN

39 comments to Apache Hadoop over OpenStack Swift

  • sashank

    Hi ,

    This is awesome .. can i see it in action some where ?
    can i download it and install locally ?

    Regards
    Sashank

  • David Gruzman

    Hi,
    All of it is available as open source in a github.
    Here is a branch of hadoop common
    https://github.com/Dazo-org/hadoop-common
    You will also need to recompile java CloudFiles (java swift client):
    https://github.com/Dazo-org/java-cloudfiles
    Please drop a few lines about your deployment scenario and we will try to suggest easiest way to do so.
    With best regards,
    David Gruzman

  • sashank

    David,

    Basically iam trying to make a prototype of processing weblogs ( apache logs ) using Hadoop’s MapReduce jobs the output of which i want to store in OpenStack’s Swift and am new to both.

    Scenario: Hadoop MapReduce jobs would be run on internal private servers or can be outsourced to AWS Elastic MapReduce but the output storage to be stored in openstack’s swift on internal servers but not AWS’S S3 storage.

    Question , may be dumb one , OpenStack Compute overview datasheet says its ideal use case is for running hadoop jobs on openstack compute , how do i run it ? As i understand openstack compute is for running and managing vms , should i need to install hadoop on one of those vms ?

    Thanks for your time.

    Regards,
    Sashank

  • David Gruzman

    Hi Sashank,

    Your questions are welcome.
    Hadoop can be deployed on any set of interconnected VMs and, if you already have OpenStack’s compute infrastructure it makes a lot of sense to run hadoop there and connect it to Your Swift.
    I do not think it is a good idea to process data in AWS if original logs are available internally and the processed results should be stored in the internal swift.
    Two options I would consider are internal cluster on dedicated machines or indeed hadoop inside OpenCtack compute.
    Usually dedicated cluster is good for a lot of heavy processing while virtualized clusters are better in prototyping stage when it is not clear how much resources needed, how much machines would be involved etc.
    Taking into also account that hadoop cluster will be mostly “stateless” with all data living within Swift – i think transient hadoop cluster inside OpenStack compute is best option in Your case.

    Please correct me if I didn’t understood the requirements.

    Regards,
    David

  • Ramya

    Hi I want to run hadoop to directly process data stored in swift(openstack),can you please help me with guidelines in implementing it.Thanks in advance.

  • Ramya

    Hi,
    Regarding the scenario of hadoop jobs processing the data in swift,can you please explain me the configurations to be set in the core-site.xml and hdfs-site.xml,please do reply,Thanks in advance

  • Ramya

    Hi David,
    I had started with the installation of hadoop-common after cloning the code from https://github.com/Dazo-org/hadoop-common,it is observed that some jars are missing or some part of code(some classes) are missing which have been imported by many classes.
    The missing package is org.apache.hadoop.ipc.protobuf

  • Constantine

    Hi, Ramya
    What branch did you check out?
    All our code is in branch-0.20-security-205.swift

  • Ramya

    Hi Constantine,I have few doubts regarding hadoop over swift scenario,they are :-
    1)In the scenario of hadoop over swift ,does the hadoop nodes get the data from swift container directly for processing or will they copy the data from swift container to hadoop HDFS?
    2)I guess there is no filesystem supported by hadoop to support swift ,can you please share with us how could it be possible to get the input data from swift to be processed by hadoop nodes without a hadoop supported filesystem?

    Please do help us to know about the above points.Thanks in advance.

    • Constantine

      Hi Ramiya
      1) Hadoop will get the files directly from Swift, without HDFS. HDFS is not needed in this scenario.
      2) This is the whole point. We wrote Hadoop filesystem implementation for Swift. Hadoop can use any filesystem if the proper interfaces are implemented. We implemented those interfaces for Swift.

  • Ramya

    Hi Constantine,
    Thankyou so much for the reply,Im really interested regarding the hadoop+swift scenario.
    I had cloned the branch-0.20-security-205.swift code,I being new to hadoop request you to guide me in installing hadoop from the branch mentioned by you.Thanks in advance.

  • Ramya

    Hi Constantine,I installed hadoop from the source code of the branch branch-0.20-security-205.swift successfully,I’m able to work well with the hdfs filesystem. Now I want to configure hadoop to point to container of swift running on other node on port 8080,Please do help me in mentioning the configurations in the hadoop conf files.Thanks in advance.

  • Ramya

    Hi Constantine,
    I had installed hadoop from the branch branch-0.20-security-205.swift successfully and I’m able to work well with hdfs filesystem.Now I want to make my hadoop node to use the swift container for its input,please help me in configuring conf files of the hadoop to make this happen,please do help me.

  • Ramya

    Hi Constantine,
    I had configured core-site.xml as below to make my hadoop use swift:

    fs.swift.userName
    root

    fs.swift.userPassword
    testpass

    fs.swift.authUrl
    https://10.200.50.206:8080/auth/v1.0

    fs.swift.accountName
    AUTH_system

    fs.swift.connectionTimeout
    5000

    and executed the command as below:
    bin/hadoop fs -fs swift://10.200.50.206:8080/auth/v1.0 -put /usr/input/input.txt input

    here my swift is running on 10.200.50.206:8080
    but im facing few exceptions like:
    com.rackspacecloud.client.cloudfiles.FilesAuthorizationException: You must be logged in
    and
    javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

    Please do help me if i miss anything in the configurations to be given in the hadoop conf files or if I’m wrong with the command .Thanks in advance.

  • Ankit

    I too have the similar doubts as what Ramya raised.
    Can some one provide me with some pointers of how to take the input from swift container.
    Any help regarding this will be appreciated.
    Thanks.

  • rohit

    Does the Hadoop filesystem implementation for Swift guarantee data locality especially for Mapreduce jobs?

    • Camuel Gilyadov

      Swift does not implement HDFS, Swift replaces HDFS. Locality is not achieved with simple Hadoop-Swift integration but we are working hard on a version that guarantees locality through the spawning of ZeroVM based in-situ compute cloud inside Swift.

  • Ramya

    Hi Camuel ,I had installed hadoop using the branch suggested by Constantine,but having many doubts regarding the configurations to be set in the conf files with respect to swift ,So i posted the configuration which I had set in the previous comments of mine,Please do help me in making hadoop + swift to work successfully.Thanks in advance.

    Regards,
    Ramya

  • rohit

    Pardon my persistence but if there is no data locality then is there any advantage in running MR jobs on data stored in swift? I would rather use a multithreaded program and get the job done.

    • Camuel Gilyadov

      I am not sure I understood the link between Hadoop and multithreaded programs even without bringing Swift/HDFS into the picture. From my experience, if you can get the job done with single multithreaded program – don’t use Hadoop :)

      In any case just Google the situation with Hadoop over AWS (amazon), what we have done with Hadoop-Swift integration is same deal that you have in AWS with Hadoop-S3 integration.

      We working now on much tighter integration with predicate pushdown into storage that for the best of knowledge even not offered at Amazon at this time.

  • Hi, I have created a slide here about how to use your swift fs plugin in hadoop 1.0.3 and newest java-cloudfiles because I think maybe it can help somebody to experiment.
    Really appreciate you for the amazing idea and implementation.

    http://www.slideshare.net/xz911jp/2012-0908josugjeff

    If you have any comment or some others to discussion, please feel free to connect me.

    • Raju Konduru

      Hi,

      I followed your tutorial in slideshare, I downloaded
      swift v1.6
      java-cloudfiles-master.zip ( master branch )
      hadoop-1.1.1 (latest stable version )
      hadoop-fs-swift-master ( https://github.com/Dazo-org/hadoop-common page not found )

      then I followed same configuration. But I am getting the following error
      hadoop@raju:/home/hadoop/hadoop-1.1.1$ bin/hadoop fs -ls /
      Exception in thread “main” java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.fs.swift.SwiftFileSystem
      at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:859)
      at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1405)
      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1429)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
      at org.apache.hadoop.fs.FsShell.init(FsShell.java:82)
      at org.apache.hadoop.fs.FsShell.run(FsShell.java:1745)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
      Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.swift.SwiftFileSystem
      at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
      at java.lang.Class.forName0(Native Method)
      at java.lang.Class.forName(Class.java:247)
      at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:812)
      at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
      … 10 more
      Can you please help me

  • Ramya

    Hi
    I had Installed hadoop from the branch branch-0.20-security-205.swift and trying to interact with the swift running on the node 10.211.53.232 on the port 8080,so I had made changes to the hadoop core_site.xml to point to the swift :

    fs.swift.userName
    root

    fs.swift.userPassword
    testpass

    fs.swift.authUrl
    https://10.211.53.232:8080/auth/v1.0

    fs.swift.accountName
    AUTH_system

    fs.swift.connectionTimeout
    5000

    and trying the command:
    bin/hadoop fs -fs swift://10.211.53.232:8080/auth/v1.0 -put /usr/input/core-site.xml input

    but it was returning me with exceptions ,please do help me if I’m wrong at any part of the configuration or the command
    The exception is as below:
    fullPath: user/hadoop/input
    com.rackspacecloud.client.cloudfiles.FilesAuthorizationException: You must be logged in
    at com.rackspacecloud.client.cloudfiles.FilesClient.getObjectMetaData(FilesClient.java:2693)
    at org.apache.hadoop.fs.swift.FilesClientWrapper.getObjectMetaData(FilesClientWrapper.java:79)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.getObjectMetaData(Unknown Source)
    at org.apache.hadoop.fs.swift.SwiftFileSystem.getFileStatus(SwiftFileSystem.java:472)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
    at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:355)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:211)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
    at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    com.rackspacecloud.client.cloudfiles.FilesAuthorizationException: You must be logged in
    at com.rackspacecloud.client.cloudfiles.FilesClient.listObjectsStartingWith(FilesClient.java:610)
    at org.apache.hadoop.fs.swift.FilesClientWrapper.listObjectsStartingWith(FilesClientWrapper.java:110)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.listObjectsStartingWith(Unknown Source)
    at org.apache.hadoop.fs.swift.SwiftFileSystem.getFileStatus(SwiftFileSystem.java:478)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
    at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:355)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:211)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
    at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    fullPath: user/hadoop/input
    fullPath: user/hadoop/input
    com.rackspacecloud.client.cloudfiles.FilesAuthorizationException: You must be logged in
    at com.rackspacecloud.client.cloudfiles.FilesClient.getObjectMetaData(FilesClient.java:2693)
    at org.apache.hadoop.fs.swift.FilesClientWrapper.getObjectMetaData(FilesClientWrapper.java:79)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.getObjectMetaData(Unknown Source)
    at org.apache.hadoop.fs.swift.SwiftFileSystem.getFileStatus(SwiftFileSystem.java:472)
    at org.apache.hadoop.fs.swift.SwiftFileSystem.create(SwiftFileSystem.java:389)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:229)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
    at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    com.rackspacecloud.client.cloudfiles.FilesAuthorizationException: You must be logged in
    at com.rackspacecloud.client.cloudfiles.FilesClient.listObjectsStartingWith(FilesClient.java:610)
    at org.apache.hadoop.fs.swift.FilesClientWrapper.listObjectsStartingWith(FilesClientWrapper.java:110)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.listObjectsStartingWith(Unknown Source)
    at org.apache.hadoop.fs.swift.SwiftFileSystem.getFileStatus(SwiftFileSystem.java:478)
    at org.apache.hadoop.fs.swift.SwiftFileSystem.create(SwiftFileSystem.java:389)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:229)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
    at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    create: swift://10.233.53.206:8080/user/hadoop/input
    com.rackspacecloud.client.cloudfiles.FilesAuthorizationException: You must be logged in
    at com.rackspacecloud.client.cloudfiles.FilesClient.storeStreamedObject(FilesClient.java:2373)
    at org.apache.hadoop.fs.swift.FilesClientWrapper.storeStreamedObject(FilesClientWrapper.java:48)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.storeStreamedObject(Unknown Source)
    at org.apache.hadoop.fs.swift.SwiftFileSystem$SwiftFsOutputStream$1.run(SwiftFileSystem.java:157)
    com.rackspacecloud.client.cloudfiles.FilesAuthorizationException: You must be logged in
    at com.rackspacecloud.client.cloudfiles.FilesClient.createManifestObject(FilesClient.java:2024)
    at org.apache.hadoop.fs.swift.FilesClientWrapper.createManifestObject(FilesClientWrapper.java:186)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.createManifestObject(Unknown Source)
    at org.apache.hadoop.fs.swift.SwiftFileSystem$SwiftFsOutputStream.close(SwiftFileSystem.java:188)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:50)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
    at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    put: null

    Thanks,
    Ramya

  • Hi, Ramya
    Your command to put file is
    bin/hadoop fs -fs swift://10.211.53.232:8080/auth/v1.0 -put /usr/input/core-site.xml input

    As 10.211.53.232:8080/auth/v1.0 is url for authentication, I think this is the problem.
    How about try swift://10.211.53.232:8080/v1/AUTH_system (which is path for account of swift) in your case?

  • Ramya

    Hi Zheng,
    Thankyou so much for the reply ,I had tried the command suggested by you:
    swift://10.211.53.232:8080/v1/AUTH_system ,but now I’m facing issue in starting the namenode ,the logs state that: java.net.BindException: Problem binding to slave/10.211.53.232:8080 : Address already in use .
    My swift node is also binded to same node and port,Please help me in resolving this issue.
    Thanks in advance.

    Regards,
    Ramya

  • Sriji

    Hi
    I had successfully configured hadoop and swift .I was able to run list out the objects of the swift container by using command:
    bin/hadoop fs -ls /
    and able to run copyToLocal,copyFromLocal and many other fs commands successfully.

    But Im facing problem in running a job on the data of the swift container’s object.Im using the command
    —>bin/hadoop jar wordcount.jar org.myorg.WordCount /hadoop/core-site.xml output2.txt
    Logs of the Jobtracker stating that
    2012-09-13 16:55:54,392 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: swift://10.233.53.206:8081/tmp/hadoop-hduser/mapred/system
    java.io.FileNotFoundException: stat: swift://10.233.53.206:8081/tmp/hadoop-hduser/mapred/system: No such file or directory

    Please do help me to resolve this issue.

    Thanks and Regards
    Sriji

  • Sriji

    Hi
    Im using the java-cloudfiles implementation library: https://github.com/Dazo-org/java-cloudfiles itself,and also I manually tried to create that system directory ,but still its returning with the same issue when I’m trying to run the map-reduce job.
    Please do verify the command which Im using and please correct ne if I’m wrong.Command is:-

    bin/hadoop jar wordcount.jar org.myorg.WordCount /hadoop/core-site.xml /hadoop/output2.txt

    and when Im tring the command
    bin/hadoop fs -ls /hadoop ,It is successfully listing out the object of the hadoop container of the swift node

    Please do help me to solve this issue.

    Thanks in advance,
    Sriji

  • David Gruzman

    To Sriji : I do not think that core-site.xml should be passed as a regular parameter. Usually it is found in the standard location /etc/hadoop/conf or looked in the classpath.
    If it does not help – please post the error you got.

  • Sriji

    Hi David,
    I had tried with the following command:

    bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /swift_container/input.txt /swift_container/output.txt

    where
    hadoop-examples-1.0.3.jar is the jar
    wordcount is the class
    swift_container is my swift container
    input.txt is the input file(swift object)
    output.txt is the output file(swift object)

    Now my intention is to run wordcount on data in input.txt and store result in output.txt

    But my JobTracker logs are showing:
    —–
    2012-09-19 01:00:24,996 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: swift://10.233.53.206:8081/tmp/hadoop-hadoop/mapred/system
    java.io.FileNotFoundException: stat: swift://10.233.53.206:8081/tmp/hadoop-hadoop/mapred/system: No such file or directory
    at org.apache.hadoop.fs.swift.SwiftFileSystem.getFileStatus(SwiftFileSystem.java:487)
    at org.apache.hadoop.fs.swift.SwiftFileSystem.listStatus(SwiftFileSystem.java:516)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2389)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2192)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2186)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)
    —–
    and
    my swift node is on 10.233.53.206:8081

    Thanks in advance.

    Regards,
    Sriji

  • Issac Kelly

    In general we instrumented Hadoop to run over Swift *naively*, without resorting to S3 compatibility layer. This means it works with CloudFiles which misses the S3-compatibility layer.

    Should be *natively*

  • Issac Kelly

    Just as a heads up. your github links need re-pointed at zerovm

  • rk

    hey guys I have installed Openstack swift SAIO. I am unable to upload and download any objects on openstack swift serve. can anyone please help me build a web interface in order to download objects using web interface

  • [...] Sankar, "Not all those who wander are lost …..Checkout Camuel's blog http://bigdatacraft.com/archives…Via Ramya Bavirisetty.Comment Loading… • Post • Just now  Add [...]

  • [...] on OpenStack Swift: experiments Some time has passed since our initial post on Hadoop over OpenStack Swift implementation. A couple of things have changed (Rackspace finally [...]

  • [...] Answers   Camuel Gilyadov, bloggerblogger1 vote by Florian HinesYes, it is possible: http://bigdatacraft.com/archives…Embed QuoteComment Loading… • Share • Embed • 9 JulCannot add comment if you are [...]

  • [...] what I see it's possible. Not tried it yet though http://www.openstack.org/softwarhttp://bigdatacraft.com/archives…Embed QuoteComment Loading… • Share • Embed • Just nowCannot add comment if you are [...]

  • [...] When I use Swift as a storage and I wanna work Mapreduce with some data in Swift, How?The Process from HDFS to Mapreduce is that First. Mapper search a data in Namenode using Metadata.Second. If Mapper find a data, send Mapreduce source Datanode.Third. In Datanode, Mapreduce works.I have some data in Swift, and I wanna works this data with distributed computing. I should Mapreduce, but Swift-Local-HDFS-Mapreduce is not nice. So I want to works directly from Swift to Mapreduce.During 1month, I have been search this question in 'Stackoverflow' and 'Quora' and 'Korea-Openstack community', 'Japan-Openstack community' and so on. But I'm not enough to my curiosity. Of many project, I think 'Savanna' is the most similar to my question, but 'Savanna' is going on project(connect HDFS yet). So How does Hadoop work over Swift?(Reference:1.http://bigdatacraft.com/archives…2.https://issues.apache.org/jira/b…3.http://savanna.mirantis.com/) [...]

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>