17:33:05  * isaacstopic: Manta: Big Data Unix | http://apidocs.joyent.com/manta/ | http://logs.libuv.org/manta/latest
17:35:18  * nfitchquit (Quit: Leaving.)
17:36:05  * nfitchjoined
17:42:24  * bixuquit (Remote host closed the connection)
17:42:51  * bixujoined
17:51:47  * chorrelljoined
17:54:19  * chorrellquit (Read error: Connection reset by peer)
17:57:55  * yunongquit (Quit: Leaving.)
18:03:27  * chorrelljoined
18:06:01  * chorrellquit (Read error: Connection reset by peer)
18:06:28  * yunongjoined
18:09:42  * chorrelljoined
18:10:58  * fredk1quit (Quit: Leaving.)
18:10:58  * chorrellquit (Read error: Connection reset by peer)
18:10:58  * yunongquit (Client Quit)
18:11:36  * nfitchquit (Quit: Leaving.)
18:16:03  * chorrelljoined
18:17:34  * chorrellquit (Read error: Connection reset by peer)
18:21:58  * chorrelljoined
18:24:13  * chorrellquit (Read error: Connection reset by peer)
18:25:45  * yunongjoined
18:25:48  * yunongquit (Client Quit)
18:27:59  * chorrelljoined
18:29:14  * chorrellquit (Read error: Connection reset by peer)
18:30:45  * yunongjoined
18:30:45  * yunongquit (Client Quit)
18:36:03  * chorrelljoined
18:37:00  * yunongjoined
18:37:32  * yunongquit (Client Quit)
18:39:43  * chorrellquit (Read error: Connection reset by peer)
18:41:30  * fredkjoined
18:42:56  * AvianFlujoined
18:43:16  * mamashpart
18:44:29  * mamashjoined
18:44:47  * chorrelljoined
18:46:58  * chorrellquit (Read error: Connection reset by peer)
18:49:25  * fredkquit (Ping timeout: 268 seconds)
18:51:11  * chorrelljoined
18:52:19  * chorrellquit (Read error: Connection reset by peer)
18:57:14  * chorrelljoined
18:58:58  * nfitchjoined
18:58:58  * chorrellquit (Read error: Connection reset by peer)
19:03:42  * chorrelljoined
19:04:56  * chorrellquit (Read error: Connection reset by peer)
19:08:42  <mcavage>to anybody lurking, here's the next up blog post on "using node_modules" in manta: http://mcavage.me/blog/2013/07/19/using-node-modules-in-manta/
19:09:56  * chorrelljoined
19:10:22  <nahamu>nice
19:12:19  * chorrellquit (Read error: Connection reset by peer)
19:13:07  <nahamu>implementation detail question: do you guys use ZFS compression on the pools storing the objects?
19:14:15  <nahamu>I'd expect those 1GB files of tweets to compress pretty well...
19:14:58  <mcavage>yes, we do.
19:15:45  <mcavage>that's why I don't compress anything, pretty much ever -- it's performance-wise faster to just let ZFS compress than to much with gzip/bzip2.
19:16:08  * fredkjoined
19:16:23  * chorrelljoined
19:16:43  <nahamu>is the billing still based on the uncompressed sizes?
19:17:40  <mcavage>yeah, it is. this is one of those "we're going to figure out what we want to do" after the data is in for a while. I.e., either we make it explicit and cheaper for *you* or just amortize it and make it cheaper for *everybody* depending on what "real world" usage comes in at after a while.
19:17:54  <nahamu>(if someone wanted to store 1PB of highly compressible data, would they save money gzip'ing it?)
19:18:12  <nahamu>that's an interesting point.
19:18:13  <mcavage>which is why we haven't explicitly said one way or the other anything about this (besides, only people who know how ZFS works know to ask anyway ;) )
19:18:54  <mcavage>but yes right now you would pay less if you pre-zipped.
19:18:56  <nahamu>charging everyone less per byte because your costs are lower thanks to compression does make sense.
19:19:23  <mcavage>but, you'll pay more for compute time on it, since you've got to now uncompress it on a "premium" cpu.
19:19:44  <nahamu>right. space-time tradeoff made quite clear. :)
19:19:59  <mcavage>;)
19:21:25  * chorrellquit (Read error: Connection reset by peer)
19:21:49  <nahamu>so for people doing bulk storage and rare access (and perhaps no compute jobs) compressing saves money, but if you're "querying" it a lot, letting ZFS handle it saves CPU time and could conceivably save money.
19:22:54  <nahamu>Might be interesting to run the numbers, but I'm not at that scale, so I'll leave it as an exercise for someone for whom it's a real question. :)
19:23:20  <mcavage>yeah, if it's "cold data", by all means, compress it ;)
19:23:38  <mcavage>i think the answer is really case by case, since there's obviously a breaking point.
19:23:48  <mcavage>or tipping point, however you want to say it.
19:24:53  * fredkquit (Ping timeout: 248 seconds)
19:26:28  * chorrelljoined
19:28:02  <nahamu>when you create a reduce job with no map step, does Manta have to mount the objects into zones to stream them off, or can it just request them from the object store system and stream them into the reduce job?
19:28:02  * chorrellquit (Read error: Connection reset by peer)
19:28:29  <nahamu>nevermind, your job does have a map step
19:28:34  <nahamu>(the example in the blog post)
19:29:25  <nahamu>is there some trickery somewhere to get a zone with both the twitter 1GB file and your tarball?
19:32:42  * chorrelljoined
19:34:46  * chorrellquit (Read error: Connection reset by peer)
19:38:53  * chorrelljoined
19:42:24  * chorrellquit (Read error: Connection reset by peer)
19:47:21  * chorrelljoined
19:51:15  * fredkjoined
19:54:02  * chorrellquit (Read error: Connection reset by peer)
19:59:15  <nahamu>'Assets will be downloaded into the compute instance before any "init" script is run (and so before any "exec" script is run as well).' (http://apidocs.joyent.com/manta/jobs-reference.html)
19:59:33  * fredkquit (Ping timeout: 264 seconds)
20:04:21  * yunongjoined
20:06:38  * fredkjoined
20:08:11  * yunongquit (Client Quit)
20:09:26  * fredkquit (Client Quit)
20:10:14  * AvianFlu_joined
20:10:47  * AvianFluquit (Remote host closed the connection)
20:10:47  * AvianFlu_quit (Remote host closed the connection)
20:12:27  * nfitchquit (Quit: Leaving.)
20:12:41  * AvianFlujoined
20:14:09  * yunongjoined
20:14:30  * yunongquit (Client Quit)
20:22:35  <mcavage>nahamu: what do you mean?
20:22:41  <mcavage>"is there some trickery somewhere to get a zone with both the twitter 1GB file and your tarball?"
20:23:46  * AvianFluquit (Remote host closed the connection)
20:27:28  <nahamu>the 1GB file, having been passed in on stdin is the file that's at rest on the server where the zone gets launched and the file hyperlofs mounted in.
20:28:02  <nahamu>the other asset has to be downloaded into the zone from wherever it lives (likely "elsewhere").
20:28:02  <mcavage>correct - i'm just confused what you're asking about :)
20:28:06  <mcavage>yes.
20:28:17  <mcavage>assume assets always have to get "brought in"
20:28:23  <mcavage>so you want them to be "smallish"
20:28:28  <nahamu>right.
20:29:01  <nahamu>I think I understand now.
20:29:29  <nahamu>until I found the relevant line in the docs I was confused how you'd hyperlofs in two files that could be on different servers.
20:29:35  <mcavage>ahh
20:29:40  <mcavage>yeah assets aren't hyperlof's
20:29:44  <mcavage>assets are just pulled over.
20:30:01  <mcavage>fairly low tech, but effective :)
20:30:03  <nahamu>before or after the clock starts ticking on the billing for the zone time?
20:30:25  <mcavage>I believe before, but I don't remember 100%: dap?
20:31:31  <nahamu>not important
20:32:09  * CarlosCquit (Read error: Connection reset by peer)
20:32:11  <nahamu>I did have one other question though, before I noticed the map step, it occurred to me to wonder what happens if you only added a reduce step.
20:32:21  * CarlosCjoined
20:32:37  <nahamu>would it effectively just mget all the objects and stream them into the reducer?
20:33:02  <mcavage>so mfind /... | mjob create -r ... ?
20:33:08  <nahamu>yeah
20:33:27  <mcavage>yes that would just stream them all in onto stdin
20:33:43  <mcavage>in a non-deterministic order ;)
20:33:49  <nahamu>of course.
20:36:10  <dap>The assets are brought in after the clock starts ticking, I believe.
20:36:29  <mcavage>k, so i was wrong.
20:38:08  <nahamu>all the more reason to keep them small.
20:38:15  <dap>Actually, I'd like to move the asset-downloading code into the lackey. It's really intended to be sugar.
20:38:54  <dap>(The lackey is the code that executes your script. It's part of the system, but running more explicitly on your behalf.)
20:40:46  * yunongjoined
20:41:32  * yunongquit (Client Quit)
21:09:24  * saxbyjoined
21:09:43  <bixu>Just saw a weird thing from 'mlogin' - session cleaned itself up and exited (I didn't give 'mlogin' any arguments) - is that normal?
21:10:11  <dap>bixu: That usually means there was an error. It spits out a jobid — try "mjob errors $jobid"
21:12:06  <bixu>dap: I see this from the error output object: server did not Upgrade
21:12:21  <dap>Huh. LeftWing: ^
21:12:38  <bixu>LeftWing: You're welcome.
21:13:27  <bixu>Job UUID was 0caeb2a7-7483-42c6-a386-a85d64a09a29
21:23:54  * konobipart
21:26:36  <LeftWing>wuh oh
21:27:17  * AvianFlujoined
21:43:56  * yunong1joined
21:44:24  * yunong1quit (Client Quit)
21:52:38  * papertigersquit (Quit: papertigers)
21:52:45  * yunongjoined
21:53:03  * yunongquit (Client Quit)
21:55:31  * mamashpart
22:00:56  * mamashjoined
22:01:00  * papertigersjoined
22:04:22  * chorrelljoined
22:05:33  * cburroughsquit (Ping timeout: 264 seconds)
22:15:29  * ryancnelsonjoined
22:16:22  * yunong1joined
22:16:31  * yunong1quit (Client Quit)
22:16:45  * papertigersquit (Quit: papertigers)
22:17:09  * ghostbarquit (Remote host closed the connection)
22:24:26  * papertigersjoined
22:34:17  * papertigers_joined
22:35:05  * papertigersquit (Ping timeout: 246 seconds)
22:35:05  * papertigers_changed nick to papertigers
22:46:45  * fredkjoined
22:49:21  * yunongjoined
22:49:40  * yunongquit (Client Quit)
22:50:19  * fredkquit (Client Quit)
22:59:48  * papertigersquit (Read error: Operation timed out)
23:13:28  * ghostbarjoined
23:15:05  * yunongjoined
23:15:18  * yunongquit (Client Quit)
23:20:48  * fredkjoined
23:25:07  * yunongjoined
23:25:53  * yunongquit (Client Quit)
23:26:15  <bixu>About using md5 checksums with mput...
23:26:22  <bixu>Can I expect this to work? mput -H 'content-md5: $(md5 ./ten.file)' /$MANTA_USER/stor/backups/postgres/manta.test/ten.file
23:27:03  <rmustacc>IIRC, you should.
23:27:22  <bixu>rmustacc: I was seeing the command appear to hang.
23:27:23  <rmustacc>Adding a -f for the file.
23:27:38  <rmustacc>Probably expecting to read from stdin per no file specified.
23:27:45  <bixu>Haha - oops.
23:28:11  * CarlosCquit (Quit: Leaving.)
23:28:36  * ghostbarquit (Remote host closed the connection)
23:28:53  * ghostbarjoined
23:29:11  <ryancnelson>oh, you're adding that header so we'll serve it up later? your application wants a content-md5: header?
23:29:32  <bixu>ryancnelson: Maybe I'm misunderstanding what I read.
23:29:33  <mcavage>also - that md4 won't match.
23:29:43  <mcavage>the md5 in HTTP is base64
23:29:46  <mcavage>the openssl command is hex
23:29:47  <mcavage>1s
23:29:49  <bixu>mcavage: Ah.
23:29:53  <mcavage>but yes manta will validate for you.
23:30:07  <bixu>mcavage: Yes, that's what I want to do - ask for validation.
23:30:15  <mcavage>yeah 1s
23:30:16  <LeftWing>Do we have a flag to do the MD5 for -f?
23:30:22  * fredkquit (Ping timeout: 256 seconds)
23:32:12  <mcavage>bixu:
23:32:15  <mcavage>Marks-MacBook-Pro:node-manta mcavage$ mput -f README.md -H content-md5:$(cat README.md | openssl md5 -binary | openssl enc -base64) /$MANTA_USER/stor
23:32:15  <mcavage>/mark.cavage/stor/README.md [=====================================================================>] 100% 2.73KB
23:32:15  <mcavage>Marks-MacBook-Pro:node-manta mcavage$
23:32:31  <bixu>mcavage: Perfect.
23:32:37  <mcavage>i mean, kind of ;)
23:32:47  <mcavage>it's fugly. but everything CLI wants hex, and HTTP wants base64
23:33:05  <mcavage>Marks-MacBook-Pro:node-manta mcavage$ mput -f README.md -H content-md5:$(cat README.md | openssl md5) /$MANTA_USER/stor/mark.cavage/stor/README.md [=====================================================================>] 100% 2.73KB
23:33:05  <mcavage>/mark.cavage/stor/README.md [=====================================================================>] 100% 2.73KB
23:33:05  <mcavage>/mark.cavage/stor/README.md [=====================================================================>] 100% 2.73KB
23:33:07  <mcavage>/mark.cavage/stor/README.md [=====================================================================>] 100% 2.73KB
23:33:09  <mcavage>mput: ContentMD5MismatchError: Content-MD5 expected d8e9a91f26bd379fc219d9073ac8965d, but was 2OmpHya9N5/CGdkHOsiWXQ==
23:33:11  <mcavage>Marks-MacBook-Pro:node-manta mcavage$
23:33:13  <mcavage>if you make it do the "wrong thing":
23:33:14  <bixu>Yes - I noticed that.
23:33:15  <mcavage>^^
23:33:23  <mcavage>so it will retry 3x b/c md5 failure is a valid reason to retry.
23:33:25  <mcavage>ok, great.
23:35:13  <bixu>Thanks. Will incorporate this into https://github.com/wanelo/manta-backup
23:36:30  <mcavage>sweet -- btw, is the intent to check for "manta bitrot" or just that "it changed via something else"?
23:36:40  <mcavage>if the former, do the job. if the latter - there's mmd5
23:36:45  <mcavage>(which does output hex)
23:37:02  <mcavage>you can always HEAD /$you/stor/$object and get the MD5 that corresponds.
23:37:06  <mcavage>bixu: ^^
23:37:09  <bixu>mcavage: I'm using something like mmd5 to see if I have already uploaded the file.
23:37:24  <mcavage>oh ok
23:37:28  <LeftWing>You should use preconditions!
23:37:37  <bixu>LeftWing: Fork me.
23:37:44  <mcavage>I didn't look at the code, I just read the readme ;)
23:37:53  <bixu>D:
23:38:33  <bixu>This is just a by-product of me finding shell script rat kings in the basement of our virtual infrastructure.
23:38:55  <mcavage>yeah you could replace this ->
23:39:00  <mcavage>if [ $(md5sum $2 | awk '{print $1}') == $(echo -n "$targetdir/$file" | mjob create -q -o md5sum 2> /dev/null | awk '{print $1}') ]; then
23:39:00  <mcavage> echo "[$log_date] $targetdir/$file already exists and has the same checksum as $2. Skipping..."
23:39:00  <mcavage> exit 0
23:39:01  <mcavage>fi
23:39:07  <mcavage>with mmd5 $targetdir/$file - fwiw.
23:39:15  <bixu>Yup. I'm doing that now.
23:39:17  <mcavage>unless you don't trust manta, then in which case, go ahead and do that.
23:39:39  <bixu>I'm not trusting Manta when I check to see if I should upload the file or not.
23:39:43  <bixu>But that's it.
23:39:55  <mcavage>yeah, mmd5 will be *a lot* faster.
23:40:00  <bixu>Will do that.
23:40:10  * mamashpart
23:41:07  <mcavage>k, i have to cut out, hope that helped.
23:41:43  <bixu>Very much. Thanks!
23:42:04  <bixu>Another building spared my fiery wrath.
23:45:43  <mcavage>haha
23:46:29  * mcavagequit (Remote host closed the connection)
23:55:22  * trentmquit (Quit: Leaving.)
23:56:42  * fredkjoined
23:57:52  * chorrellquit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
23:59:12  * AvianFlu_joined
23:59:35  * AvianFluquit (Remote host closed the connection)
23:59:36  * AvianFlu_quit (Remote host closed the connection)