00:14:39  * seldo_joined
00:19:04  * seldo_quit (Ping timeout: 240 seconds)
01:20:00  * ed209quit (Remote host closed the connection)
01:20:08  * ed209joined
01:26:10  * ryancnelsonjoined
01:32:48  * utlemmingquit (Remote host closed the connection)
02:12:58  * utlemmingjoined
03:54:02  * yruss972joined
04:18:11  * ringzerojoined
05:17:57  * seldo_joined
05:22:33  * seldo_quit (Ping timeout: 240 seconds)
05:28:39  * D4nthr4xjoined
05:31:50  * D4nthr4x_quit (Ping timeout: 244 seconds)
05:33:48  * marsellquit (Quit: marsell)
05:48:59  * ryancnelsonquit (Quit: Leaving.)
06:27:38  * D4nthr4xquit (Ping timeout: 244 seconds)
07:03:22  * yruss972_joined
07:06:05  * yruss972quit (Ping timeout: 240 seconds)
07:08:13  * yruss972_quit (Ping timeout: 272 seconds)
08:02:38  * bixujoined
08:14:31  * bixuquit (Remote host closed the connection)
08:31:09  * yruss972joined
08:43:48  * bixujoined
09:34:09  * ringzeroquit
09:46:12  * marselljoined
10:09:15  * bixuquit (Remote host closed the connection)
10:09:44  * bixujoined
10:13:57  * bixuquit (Ping timeout: 245 seconds)
10:20:00  * ed209quit (Remote host closed the connection)
10:20:08  * ed209joined
10:22:05  * seldo_joined
10:26:24  * seldo_quit (Ping timeout: 255 seconds)
11:48:53  * bixujoined
14:30:45  <nahamu>another idea: manta cron
14:31:19  <nahamu>method for running jobs in Manta on a schedule without needing a client system to be up to request the jobs
14:31:41  <nahamu>(bonus: regularly scheduled revenue... ;)
15:06:01  <bixu>mcron even
15:06:08  <bixu>i would kill for this
15:06:42  <nahamu>well, you can certainly implement a version of it running on a regular server / zone.
15:06:59  <nahamu>but it would be cool for it to just live directly in the JPC.
15:07:07  <nahamu>*in Manta
15:07:37  <nahamu>bixu: what would you use it for?
15:08:09  <bixu>reporting mostly
15:09:17  <tjfontaine>which is more compelling cron or triggers?
15:09:44  <bixu>probably cron, because i can make my own triggers in the cronjob
15:09:58  <bixu>although i suppose i could trigger something based on the current time - haha
15:10:30  <tjfontaine>well, I mean cron is usually 'hey poll to see if there's new work for me to do' - wouldn't it be easier to just trigger when you know there is new work?
15:10:52  <bixu>possibly
15:11:16  <bixu>the cron metaphor is nice in the sense that it feels like *nix
15:11:41  <bixu>(even though under the hood it would not be like traditional cron, i imagine)
15:11:50  <rmustacc>The nice thing is that by solving triggers, you end up solving a lot of the issues with cron (assuming a timesource is a triger)
15:12:09  <rmustacc>The challenge with both comes around what are the failure semantics, retry semantics, etc.
15:13:00  <bixu>rmustacc: I would be perfectly happy with triggers assuming that I could trigger by time.
15:13:44  <bixu>and i think i see your point about issues with cron (as it's often used as a polling mechanism when we really want triggered behavior based on some contextual change)
15:15:11  <rmustacc>I'd say sometimes time is the appropriate interval. eg. nightly reports. So that's where, to me, cron makes more sense.
15:17:42  <nahamu>What does Joyent use to run reporting?
15:18:07  <nahamu>Is it built into some of the service daemons, kicked off by cron somewhere?
15:19:23  <bixu>i suppose you could do both...
15:19:43  <bixu>i don't know what you'd need to do to implement, so i'm not sure which path makes most sense
15:20:32  <nahamu>I'd imagined a "trigger" as being fired when a new object gets created inside a given directory
15:21:24  <nahamu>because that would correspond to the user hitting the API
15:22:20  <nahamu>But having something that on a regular schedule checks for job definitions in a certain spot could be cool.
15:25:37  <nahamu>Oh!
15:26:20  <nahamu>Okay, if mjob could schedule a job for a specific time rather than for immediate execution
15:27:14  <nahamu>And if there was some mechanism for a final phase of the job that always runs even in the presence of failures elsewhere
15:27:25  <nahamu>You could implement the equivalent of cron on top of that
15:28:30  <nahamu>In my head it's the python "finally" from "try: thing; execept: handle errors; finally: no matter whether things work or not, always do this"
15:28:34  * fredkjoined
15:30:28  <nahamu>so "mjob create -m <map1> -m <map2> -r <reduce> --finally <final phases> --run-at=<timespec>" where that finally would contain an invocation of mjob create
15:30:36  <nahamu>(I sense a quine)
15:31:46  <nahamu>a bit of syntactic sugar around that could create a "run this at time t and at the end schedule another run to be run at t+n"
15:33:40  <nahamu>Separately, a ~~/stor/foo/bar/.trigger containing the specification for a map job to be run whenever new objects are created in ~~/stor/foo/bar could be easily looked up by the system whenever new objects are uploaded.
15:34:38  <nahamu>and of course, I'm not the first to think of that latter one.
15:35:32  <nahamu>separately, that "finally" phase might be really interesting for error reporting.
15:59:20  * seldo_joined
16:00:13  * dap_joined
16:10:43  * ryancnelsonjoined
16:16:34  * bixuquit (Remote host closed the connection)
16:18:29  * ringzerojoined
16:19:27  * bixujoined
16:20:30  * trentmjoined
16:21:58  * dap_quit (Quit: Leaving.)
16:26:12  * dap_joined
16:27:29  * seldo_quit (Remote host closed the connection)
16:27:36  * trentmquit (Ping timeout: 240 seconds)
16:30:29  <bixu>Maybe this is documented somewhere, but if I store a gzipped and uncompressed file in manta do i get billed differently?
16:31:25  <nahamu>you get billed based on the size of the file
16:31:56  <nahamu>if Joyent happen to be using compression behind the scenes you get the benefit in performance improvements, not it saving money on file size.
16:32:03  <nahamu>*in
16:32:03  * seldo_joined
16:32:15  <nahamu>so if you're storing lots of data, compressing it yourself will save you money
16:32:22  <nahamu>(at least, as far as I understand)
16:32:25  <bixu>k
16:32:49  * trentmjoined
16:34:27  <nahamu>hopefully someone from Joyent will chime in if I'm wrong
16:35:32  <nahamu>I don't know that ZFS easily exposes how many bytes on disk you're using after compression.
16:36:53  <bixu>You wouldn't be able to see that from within Manta anyway I think.
16:37:03  <nahamu>I'm of course assuming that you meant if I have file A of size 2MB and I could either store it raw, or store a 1MB gzipped version, will I save money by uploading the gzipped version rather than the raw version
16:37:36  <nahamu>if you have a 1MB raw file A and a 1MB gzipped file B, I'm pretty sure you pay the same for both ;)
16:41:07  <ryancnelson>as i understand it, that's true.
16:42:28  <dap_>Yeah, billing is based on object size. I'm not sure where the docs are that talk about it. Of course, if you compute on that data, you could wind up spending more in compute time to decompress it than you save by not storing it compressed. Depends on the situation.
16:45:09  * trentmquit (Quit: Leaving.)
16:45:40  <dap_>ZFS can show you bytes used on disk after compression, but there's nothing in the API to show that today.
16:46:11  <rmustacc>It's also worth noting that the data isn't synchronous with respect to write(2).
16:46:13  <dap_>I believe you can get it from stat(2), but only after the file is at least written.
16:54:46  * trentmjoined
16:55:13  <nahamu>dap_: on a per file basis?
16:55:23  <nahamu>ah, stat
16:55:37  <nahamu>cool
17:05:18  * dap_quit (Quit: Leaving.)
17:09:14  * yruss972quit
17:15:15  * ryancnelson1joined
17:16:21  * ryancnelsonquit (Ping timeout: 255 seconds)
17:39:14  * dap_joined
17:44:42  * seldo_quit (Remote host closed the connection)
17:56:59  * seldo_joined
18:03:55  * arashjoined
18:04:54  <arash>hi, can anybody please help me on this command? mfind -t o /user/stor/BLUED/ | grep '_001.txt.bz2$' | mjob create -w -m 'bzcat $MANTA_INPUT_FILE | head -n 24 | tail -n 1 | cut -d"," -f1'
18:06:20  <bixu>i'm actually working with arash on this - we aren't seeing any output - i think i'm not understanding something about the reduce phase here
18:13:53  <nahamu>bixu: what reduce phase?
18:14:01  <trentm>arash: I tried something similar as follows (my /trent.mick/public/tmp/foo.txt.bz2 is a bzip'd file with the lowercase letters of the alphabet, one per line):
18:14:01  <trentm>$ echo /trent.mick/public/tmp/foo.txt.bz2 | mjob create -w -m 'bzcat $MANTA_INPUT_FILE | head -n 24 | tail -n 1'
18:14:01  <trentm>229b4d9f-fbc5-c0fa-f57b-b4807d876bfd
18:14:01  <trentm>added 1 input to 229b4d9f-fbc5-c0fa-f57b-b4807d876bfd
18:14:01  <trentm>$ mjob outputs 229b4d9f-fbc5-c0fa-f57b-b4807d876bfd
18:14:02  <trentm>/trent.mick/jobs/229b4d9f-fbc5-c0fa-f57b-b4807d876bfd/stor/trent.mick/public/tmp/foo.txt.bz2.0.a997efe3-1031-487b-a007-f43877608c05
18:14:03  <trentm>$ mget /trent.mick/jobs/229b4d9f-fbc5-c0fa-f57b-b4807d876bfd/stor/trent.mick/public/tmp/foo.txt.bz2.0.a997efe3-1031-487b-a007-f43877608c05
18:14:03  <trentm>x
18:14:04  <trentm>Which is as expected.
18:15:14  <trentm>also, what nahamu said.
18:15:14  <trentm>Perhaps you want to also have: `mjob create ... -r cat` ?
18:15:23  <bixu>trentm: we just tried that
18:15:26  <trentm>also perhaps `mjob create ... -o` to have it dump the job outputs?
18:16:17  <nahamu>you need both the "-r cat" and the "-o" if you expect to see the results on the stdout of the mjob invocation.
18:16:23  <bixu>makes sense
18:16:37  <nahamu>otherwise just go look at the outputs of the previously run jobs.
18:17:15  <arash>this is the last job: mfind -t o /smartb/stor/BLUED/ | grep '_001.txt.bz2$' | mjob create -w -m -o 'bzcat $MANTA_INPUT_FILE | head -n 24 | tail -n 1 | cut -d"," -f1' -r cat
18:18:02  <trentm>the order of args isn't correct there
18:18:16  <trentm>the '-m' and the 'bzcat ...' need to be together
18:18:33  <bixu>makes sense
18:18:39  <bixu>-m == 'map phase'
18:18:53  <bixu>bingo
18:18:56  <arash>great, worked
18:20:26  <trentm>cool
18:29:05  * arashquit (Remote host closed the connection)
18:29:41  * arashjoined
18:29:44  <bixu>thx :D
18:29:53  * bixuquit (Remote host closed the connection)
18:30:23  * bixujoined
18:34:01  * arashquit (Ping timeout: 264 seconds)
18:34:36  * bixuquit (Ping timeout: 240 seconds)
18:49:22  * seldo_quit (Remote host closed the connection)
18:50:58  * seldo_joined
18:55:12  * seldo_quit (Ping timeout: 245 seconds)
18:56:35  * seldo_joined
19:15:12  * chorrelljoined
20:06:04  * chorrellquit (Quit: Textual IRC Client: www.textualapp.com)
20:15:50  * _AvianFlu_joined
20:20:00  * ed209quit (Remote host closed the connection)
20:20:08  * ed209joined
20:34:46  * _AvianFlu_quit (Remote host closed the connection)
22:34:12  * seldo_quit (Remote host closed the connection)
22:34:42  * seldo_joined
22:41:28  * fredkquit (Read error: No route to host)
22:41:30  * fredk1joined
22:44:30  * seldo_quit (Remote host closed the connection)
22:47:34  * seldo_joined
22:49:13  * fredkjoined
22:49:13  * fredk1quit (Read error: Connection reset by peer)
23:17:04  * seldo_quit (Remote host closed the connection)
23:17:23  * seldo_joined
23:44:59  * seldo_quit (Remote host closed the connection)
23:53:56  * seldo_joined