004 – UC Berkeley Cloud Computing Meetup 004 (June 25, 2019)

004 – UC Berkeley Cloud Computing Meetup 004 (June 25, 2019)


Welcome! Sometimes we take a pause so people can introduce themselves to their neighbor but we’ve been talking for
about half an hour, so I won’t do that but I did want to do a quick poll. We do the poll every time so this is a data science project (small data). How many of you are here on the academic side of the university or
academics? How many are in IT? All right, and then how many are here
visiting from you know part of the university proper or are local — came
from xxLydia. Many of you have not actually ever been to the Skydeck before
so I want to give Gordon, who is a representative from the Skydeck and program manager here — a little bit of time to talk about Berkeley’s
accelerator and incubator. Thank you Bill. Hi everyone, my name is Gordon Peng
so what we are at the Skydeck is a University startup accelerator. So we
bring in enough 20 core teams to accelerate through a process,
throughout a cyclic process of investment from a public-private partnership fund to allow this public institution to
bring Berkeley’s key talent and network into a monetization strategy and we have…so we have invested in 20 teams in about another 100 or so hotdesk teams as
well, so those are secondary accelerated companies that also come through our
process…come through our organization and are able to take advantage of of
what we have to offer, ranging from meetings with advisors like Bill here
and some of the bootcamp style classes that we have to derisk their startup, to
fireside chats and general commitments. Does anybody have any questions? What kind of projects do you sponsor? So for the startups, we are actually
industry agnostic, so we have startups that are biopharma, hardware, software…We run everything essentially. Well, if you guys know of any world-changing startups that
have the potential to contribute back to Berkeley very, very well please come talk to us. Great, so thank you Gordon. So, without further ado Owen, I will turn it over to you, and then… Hi, my name is Owen McGrath, I’m from educational technology services on
campus and Bill put it pretty well, I’m going to talk about a story, our
dabbling in virtual computing and it did involve the big cloud as you’ll see in a
minute Amazon, but we actually eventually settled in on the local private cloud and that happens to be AobXX, and I refer to Jason as an
architect rockstar of AobXX, and so I’m the warm up band (laughter), and later you’ll get to hear from Jason. (pause) So instructional computing is
actually one of the oldest IT services on campus, it goes way back, there’s lots of history there, but you can imagine where computing and classes has been a big deal
and the sort of ongoing service here and facilities for that sort of had
their hayday probably twenty years ago and there are just a few of them left,
they’re very specialized nowadays they tend – if you’re going to still run an
in-person physical facility is for some reason like we have one geared towards
the music department where they do digital music and they teach people how
to do synthetic music composition in this beautiful facility that has all
kinds of music related stuff we have another one that’s now here towards 3d
design, Maker, ARDR design, that sort of thing, and then one towards math,
science. But one of our biggest facilities was at Tolman Hall, there was
actually kind of three labs in one and it had a particular attraction to
graduate programs in public health, psychology and the School of Education
which is housed there and so around 2016 those tenants of Tolman Hall, knew
that their new building, which is now completed, Berkeley Way West, wasn’t
going to have any computer lab at all. And so that’s sort of a motivation there to start saying what we’re going to do. For something for some classes it wasn’t
a big but there was a core set of faculty who
were very concerned about this and so we were in dialogue with them
we’ve been tracking virtual computing solutions forever and you know there’s
sort of a sine wave to that, how that works, some that were promising. But we now have reason to go back and look at the marketplace and it turned out that,
as I’ll tell you in a minute, there were some good products to look at.
But the short version this story was that they were very motivated and so
that’s a great position to be in when you’re trying to do service
transition — we weren’t going to them saying we need to move you, they knew
that and this is Tolman Hall today (laughter) so that’s one story I want to talk about. But
another one comes from some work that I do with my colleague Anne Marie, this is
actually a campus-wide initiative funded by the students, through the student
technology fund across many years. And so this is where
the students understood the many IT gaps that they faced, craters really, like there was no
student helpdesk for computing, that kind of stuff. And so they funded us to look
at those and to address those kinds of issues across the past several years wide-ranging focus from the help desk,
which was an early success, it’s in the Moffitt undergraduate library. We had a
technical component to this, where we’re looking at computing needs in general
and what how to match them up, not just for the day-to-day computing
but the 21st century curriculum, data science stuff for instance, and then
outreach — anyone who’s familiar Berkeley just noticed that sort of the tragedy of
this place is there all these resources and you don’t find out about them until
graduating semester. And I want to credit Bill for this because early on in this
student getting a nap at Cal initiative Bill kind of give us some ideas
about how to think about, we were wrestling, I think it’s fair to say,
early on we had a very traditional approach to desktop imaging — if we could we just get everyone like the library and ETS working on that and Bill kind of
broke that apart for us and this is just one new way of thinking about it but
it’s the strategizer role, from value proposition design. And
I’d like that, because I think that really characterizes what we’ve done,
we’ve been very iterative, very ethnographic we go out and observe and talk to students constantly, and it allows us to pivot a lot, we pivot all the time in
this project, but we’ve I think we’re spending their money well and we are
delivering outcomes, and I took one of those, it actually turns out to be a gap
that emerged really early on is, even if you get in the right computing, the right
printing and all the knowledge they need, there’s still these really intriguing
software gaps, where they’re going into a course, and there’s some very exotic
expensive piece of software they need for that semester, or maybe even just for
a few weeks and surprisingly, there are some gaps there that we’re also
looking at, so that’s the other story. Yeah and so you look across a whole range of things, but virtual environments keeps coming up. So our first part of this a real first pilot was with those classes that were
going to lose Tolman Hall, and this was already in the fall 2017. What was nice
is we still had the existing facility and there was a safety factor for them.
They wanted to try out some virtual environments. We approached Amazon. They had a new round of their online services, a new version of
Workspaces, which look pretty good, and best of all they agreed to subsidize
our pilot on that semester. And if you ever look at Amazon and Workspaces or
AppscreenXX you’ll find out that Cornell is their showcase campus and so we had heard a lot about that, I’d seen some presentations by them at conferences, and
they definitely jumped out ahead, I would say that they also started with
Workspaces but then moved to Appstream, which I’m not going to talk about so
much today, but the story kind of resonates. This is a
very well resource school by the way, they don’t have necessarily the same…
and then they have a lot of…it’s a residential college so they
probably have labs, but they were finding, especially in engineering
courses, that there was this disparity around high-end engineering simulation software.
Part of it was just making sure everyone going had access to it, platform
issues, but also a need to democratize, to make sure that no one had an edge over
anyone else, and so they partnered with Amazon early on and did some amazing
stuff there. For our pilot though, we weren’t necessarily looking at
engineering college scale, hundreds of thousands of students, again we were
focusing on, some of these graduate level courses, and as an example in the School
of Education, these are the core courses I’d say they’re above intro, and then
higher level courses for graduate students, and so these are students who
are need these, these are required courses in some cases, this is kind of a
basic tool set of statistical analysis and inferencing they’ll need in
their graduate study. For some of them these are going to be the tools they’ll be
using later on in projects. And so particularly working with SophiaXX — this is Berkeley right, so
she didn’t write just the book, she wrote several of the books, she’s um a
real international leader in this field, which kind of raised the stakes for us
that this probably better succeed because there was no bluffing her. And so
we had that fall to use Amazon Workspaces. It was all subsidized, as I mentioned. Quick takeaways were — it worked incredibly well
and again we were dialing it up, they have this sort of always-on model
where a student goes into a browser and it’s literally like having a laptop in
the sky and it would start up just almost as quickly as a real life laptop.
Their environment there is, their personal desktop it is their files —
the state is safe, so it’s just as if you woke up your laptop. Very
impressive. We didn’t have time to integrate it with campus systems but at
this small scale that didn’t matter so much. You know again, clear to us, that
this would mean that the range of needs around the statistical software.
Clearly, you know there are some things you wouldn’t want to do on there like
the ARBR stuff and the heavy design things that might require some
sort of output, but yeah it was it was very successful and then at the end we
started saying wait, what would that have cost us, and I want to be neutral
on this but I think anybody who has Amazon experience knows that at least when I looked at this a year ago the billing is inscrutable,
so we never did get a good answer on that and I’ll just leave it at that. At the same time though, it just so happened that my organization and Research IT had joined together I have heard about this
thing called AOD and it sounded really cool, I met Jason but just around the
time that we’ve merged together Jason introduced a cool paper, we got to
looking at it more closely and suddenly realized wow, this has a lot of features
we’re looking for, and I think the first pilot up to us understand better what we
would ask and sure enough Jason was game to
trying this out and so kind of in a beautiful experimental fashion, we had the
same courses coming back in the spring, different students but the same
instructors, same very motivated, the clock was ticking, and we said hey, try
this other thing too, and so we worked with Jason and now this meant you know
this is very different this is not a subscription model where you’re just
turning it on, now we’re doing a little bit more work, had to understand AodXX we introduced Sport modelXX behind it we even had to start talking about you know
renting some hardware in the data center, but at the same time there was this
amazing community that we could tap into and I think he’s going to talk about
that later. This had integration, which is really key
for us, so now we weren’t having to sort of create weird little accounts, students
were able to use their CalNet. Moving data around is a big deal for especially
when you’re the graduate level and higher the data sets get bigger in
itself is actually a focus. You might carry some data across courses and into a
project. The masters or doctorate level, so it was really cool to be able to tap
Google drive, which they are familiar with, and have that sort of show up
on their desktop. And that was a way for them to get their data in and out of the
environments. Again it’s just like having a laptop in the sky, really amazing
performance, once you’re logged in. So some of the lessons learned. Again,
it worked actually worked better according to the students. I don’t have any
sort of scientific metrics but the the instructors and the TAs had come
across both semesters and they said it was actually better performance, which is
fascinating. Having that integration was key for us
Cost, you know, I don’t have a cost comparison but I would say from someone
who runs an organization, it sure is nice to have some control and predictability
to cost, which I didn’t feel like I had at all of Amazon.
And by that I mean if you know the people who bring me into the budget year
and support this you know what it costs to rent some laid to the data center and
you can predict what would happen if a whole bunch of people suddenly got
interested in this how would you scale that up? At the same time we still continue
to track the commercial services but here in 2019 we’re going into our third
year of using Aon, there are good players all the time this company called
AportoXX, specifically going after this academic computing market, so we have
some colleagues at UC Irvine moving in a big way towards AportoXX, we’re kind of
trying to see how they figure all that out, but I have to say that it
would take a lot to win us away because the other thing about AodXX which
Jason will talk about, it this is coming from the research computing realm
this is a toolset and environment that researchers were already using and it’s
fascinating to see the different occupations.
There are folks in Environmental Design Artis in there and they do their thing, the
business school is doing amazing things which I’m sure they’ll talk about,
and so particularly for our customers in
this, you think about graduate students, as they move on in the career,
we love the idea that this could be the environment that they’re taking into
their dissertation project and beyond and then something even the faculty find
interesting. Yeah just back to that strategizer role in students at Cal,
think we’re turning back to look at the needs — we have some big proposals in,
about addressing some of them computing needs for students at Berkeley, and
certainly looking at how to provide that just-in-time access to
certain applications right when the students need it, because we’re
literally you know, they’re, I could go on we have food scarcity at Berkeley, and we
have computing scarcity, you know, we have second year students need computer science courses
and they don’t own a computer. So suddenly find out they
need to pay $100 to get ArcGISXX for two weeks,
you know they skip it you know they don’t get the software so we want to
look at those things and study and look carefully what students are doing. I do
have to run but I can answer some questions but I’m sure Jason’s probably can answer any of my questions too. Any questions? Who and how many people do you have doing this ethnography thing, you know
talking to students, following them, understanding what they
need? You and what army? Yeah, a couple of generals, but we do have some
staff funded through student computing at Cal and I think we first spun around that methodology,
we do surveys of course, but what’s beautiful is we’ve been doing these
pilots every semester and we tend to tap students we have direct access to. Both
Anne Marie and I have large student workforces and so you know there’s
nothing like being able to get direct access to somebody while they’re on
shift and start talking to them about why do you do that? Yeah, that’s the best
part. We also built some partnerships with different groups on campus so that
we did hear from very targeted groups so we were meeting with the EOP office and
learning more about students that were in the educational opportunity program,
we met the transfer student parent programs so
we can hear what’s different for transfer students that are only here for
two years — a couple different approaches like that we just kind of sent our
students out there to build partnerships… and student government actually, they were really big allies this year, there were a couple of student senators and various positions
who cared about Student Computing and cared about this kind of technology
equity issue and they did a lot of our work for us, which is terrific, working with us on our
surveys, willing to include our questions on their surveys, so it’s very kind of crowd-sourced. So how many workstations did you spin up say last semester? It’s, around 100, it’s still the core courses. Public health comes in and out of this as well because they also have
that same loss of the physical site. We’ve actually not advertised this so
much yet, and I need to make sure we don’t snap inXX resources my mind and some of
the transitions around our services factoring there too, but,
we haven’t gotten big on it. So 100 workstations — how does that look for user
point of view, how does that look from a user point of view, how many students is that serving? It’s one-to-one. Okay so it’s serving 100 students. And when you said that the AWS billing was
inscrutable, did you get anything that looked like a dollar amount? Kevin Chan worked at that and I he tried for a months to kind of come up with that
cost, and we even raised this with — very frankly they were open to this it was
not the first time they had heard about their billing issues, so we never did
actually get it down to a per hour or per student cost so that’s why I’m
obviously…So it was lumped together for a bunch of workstations?
Yeah. So there’s like a monthly part and then a utilization? Or are you really just don’t know? They were not able to provide much on utilization so it was more lump sums. Amazon probably watches these a….Anything else? Owen, what is the cost model for Aon? Yeah thats the thing, a lot of those costs are sunk or hidden, right.
Jason’s already…. So that’s a good point and
that’s a scale issue too, so I think umm, we’re catching a time when our
leadership is trying to see, what services makes sense, and they’re willing to float
these for a while, could this be a common good service? But the AODXX community has a model that does require not just part of our
contribution but brain chair and expertise which is really fascinating as well and
so I’m sure they’ll talk about that. So Hi. My name is Maryana. I’m currently a post doc at UCSF and I work in neuroscience in a neuropathology lab. My background is in computer science engineering. And my work is actually computer vision and image processing. And I’m here because I’m very familiar with cloud computing. We do have
like some big data sets and I’m going to talk about our pipeline and the problems
that we had…so it’s basically about using computer
vision for larger scale to the neurodegenerative problem and it’s in Alzheimer’s
disease – so what are we trying to do? We want to map a protein called Tau and how spreads in the brain. So basically we have real human brains and we have some computer vision pipeline and we try to create maps and quantify these maps and create 3-D computer visualizations. And basically how we work is we get patients’ donated brains we actually have their MRI and CAT scans in vivo and they get processed these brains, and they go through some chemical processing and when they’re ready they go through a pretty slicing and sectioning and then finally mounted in glass slides and they actually
huge slides like five to six inches the number one cost of
the slides. And what we actually have to do is using our imaging
platform so we have our in-house made scanner
hole is like scanner, and the software controllers after, and usually when you
see one of those slides I wouldn’t have an opinion, if you know, somewhere around 10-20 gigabytes of raw data and one single brain can have around 800 slides. So it’s more like petabytes of raw data per brain,
it can process all the slides, and those images like
when they go through stitching they’re big, 10 gigabytes image and it’s
gonna be about eight thousand by fifty thousand pixels
so it’s that’s like need other computer vision stuff just as those and if we-
that’s how the images look like and if you can see here you see like this tiny
black dots those are the Tau inclusions and those are the things that we are trying to
segment out of this image. So currently we’re running
most of our pipeline is on Windows I use the cluster at UCSF
But we have- but we don’t have GPU’s right now.
No public access to GPUs and well no well problems like we have a lot of things
also to doing their workstations so that’s the image processing pipeline
and have a module or have a imaging and the mic stitching runs on the cluster, I have a low resolution version of the image and I have a very high resolution which is going to be about 10 gigabytes, I do some preprocessing like automated background to the
registration so what if you didn’t what to do is get the slice and bring them
back to the 3-D space where they used to belong, cause I want to
be able to compare the histology data to an MRI or CAT-scan that was
the originally modalities. So we have something called blockface that’s this image here…xxthe pictures that puts together the block from the slicing block from the pictures. They’re gonna be
processed in real time so you actually have a 3-D volume of the brain and
this works with… because when we do all the cutting and staining and mounting you’re gonna have a lot of artifacts you’re going to
have a lot of information so this is a 3-D plate of like you know quick
original brain shape and that’s going to save space and give
useful images. Once you do your low resolution processing so should begin the high-resolution image and here
that’s where like the core of the pipeline is, and we are trying to find these named proteins here which is kind of like send it right now,
and put writings different looking thats court we need is your view looks great
and that’s what we don’t have this is our training lately procedure we have to
do some major labels so we use a semi-automated tool from Fuji which is
imaging where you can actually have segmentation that is pretty strange but
you have the user selection some CentOS and you retraining on an ARM is actually
a CPM and you creates like some like it accumulated correct and have your labels
that could be used for training its networks and that’s the kind of it’s a kind of result you get from the
segmentation layer. So how are networks based basically your mesh architectures and that’s like the original events inference might be three
lines here, that’s the result of the segmentation. yeah, it’s and from his mass we can actually
poke you toward some heat maps so the synopsis like it’s their spatial max up
in me thanks to your pal very having savage
was for common issue and that’s the kind of result you see here senior and
like year like the brighter the color the more Tau about you have
in the tissue and once you get these heat maps ready
you want to align them MRI or PET scan so you go to the registration between your MRI and the topface, which is this image here. And you have your heatmap
have your heat map saw you just walk 15 and you overlay that and you can do that
direct comparison between MRIs and heatmap. And this is the result that you get. From this heatmap processing. and this is like the original histology and this is the heatmap if you cannot see it’s a really high density here and brighter colors and you
have two whole brains, and that’s the kind of result you get you can literally lay the result next to the prior PET scan and in problem is,
from this pipeline you have a lot of manual steps but those steps actually
not that time-consuming ninety percent of all project time
is the running of some core parts of the pipeline. One is the stitching where I have to stitch all those 20 gigabytes of Tau image in with a single big
image the only one is tiny because I have these huge images in the so if you let some piping break them in
smaller and smaller images it’s worth the extra segmentations at the bottom axis we don’t have the GPUs right now. so I download everything that’s ready for the pipeline back to my workstation.
and that’s one of our and they learn the segmentations on workstation GPUs and I send them back to the cluster and then I finish my pipeline and run the other modules that actually compute the pipelines and create the color visualizations and then back to the visualization limitations to the workstation where the registration and representations are if they’re ready on their workstations so
yeah so 90% of our time has been we’re dispatching this time like that makes be
reduced sending his files to the cluster stitching getting files back doing
something sending them back to the cluster running the pipeline bringing them back
segmentation city the man to the cluster and I think like this we
could improve that that point and one of the problems that are one of these
I’ve actually consider is like maybe using AWS or something we didn’t go
that way this because of cost so and other things like we told it was things
like it’s available for everybody this is sense
so we don’t have as a likely to have any extra costs that’s how you need them numbers here for one single slice I’m
gonna have about around 20 gigabytes at all which is 100 or 200 files. And once we have the stitching we are gonna have an extra file of 100 GB device I have that files that I keep because if I want to rerun the pipeline you don’t have to run
everything again so I’m gonna have it around more two gigabytes, 12 GB
for any files and 12 gigabytes for masking
for working gray matter so I’m getting beat up like I’ll background
the right-hander when it was segmentation though segmentation files run enough to detect what’s background and just don’t work with those so it’s you heavily or less there’s more than 100 petabytes with a
final method 16 resolutions every 20 so it’s
about a hundred gigabytes for your slides I apparently not really did not work
people like all of us in the brain where you needed gaps because even though you
want to have a different pane and each slide stain for some else we currently have
precisely 28524 slides it’s like it’s about 52 terabytes of data we that purchase
your storage so we just have a hundred terabytes of storage because we’re gonna bringing up
space all the time we had our public third party just running out of space and we had to go into the workstations and free up some space and bring the data you needed back and work and
then you might have to call to send all the data back and it was very complicated volume of storage you have to factor in the
time it takes for saving these files even though you’re working it’s all a lot of effort, sometimes it takes one day just to transfer your files you couldnt’ really work because you’re transfering and okay wait do you have a lot of
issues the current bios and sometimes I’m there’s, something
happens in your like you lose your connection. One big problem we’re seeing is data management in general. for instance, versioning of datasets —
sometimes we you spend something and you do the other processing and it has to be rescanned so I need — yeah so I have to rescan and I’m like where’s the older dataset which one’s the new which one’s the older one? right now
everything’s kept in a simple folder shared folders, so we don’t have any real good tools for data management. we also don’t have a really good tool for data
like a pretty good folk for a little transfer, like I think at LBL they use Globus and when I asked them about it I was like I wish we had that something
similar with that. I’m just using my proxy right now. And…. if there are any questions? So how long does the data transfer take? Is it like weeks? It’s not
actually not that long so say it’d be like two or three days
for the whole thing. So everything’s like, it runs in parallel so just grab everything like I
like you just like put like a hundred jobs on the list and let it run. the
bottom not to me to come from like I transfer place. Do you have ways of prioritizing jobs? I worked for the– oh yeah! — a worked at a place before where, if someone had a paper due, they kinda got like– I don’t know I don’t
even have this mechanisms right now– what wouldn’t it looks like they have like
paid paying users it’s like somebody who like it might not do it, like cluster
something where they have some high priority. Or a bigger number of slots. we are not paying users right now… Do you have access to something called Globus? No, that is something that is found at LBL. we don’t have it. Like a UC license? They have some licensing there. I asked IT and they had some licensing issues with Globus. I would give it a second try! So Berkeley has Globus– but it sounds like UCSF doesn’t? Or maybe it’s many different realities across Berkeley? Yeah- UCSF doesn’t, at least, my department doesn’t. …One more question– you are transferring the files just to do the segmentation on the workstation? Yeah. so there is no way of doing segmentation on the cluster? well, only for using CPUs instead of the GPUs but that’s
gonna be so long that I might as well just transfer the files. Where is this cluster? At UCSF. there are– they have some
initiatives having GPUs. the thing that are getting GPU on the cluster right now
was like you have to buy your own server and GPUs and send them and get
them and it’s kind of like out of our budget so we’re just using the ones we have locally. So Globus comes in a
lot of features… and the features that you want for her data transfers are on
free for anybody in the world. You don’t need to have a license for just transferring
files from one location to another location using Globus. About data
transfer, Globus provides other features like sharing capabilities and data
repository capabilities for those you need to have a license so everywhere in
the world just you can just open a Globus.org make an account and they have
instructions on how to bring your laptop into Globus, your cluster into Globus. Once you do that then… Um, for some reason, LBL once you your institution has to be–
you have to have like some relationship with Globus right? So if you want those additional
license and features, like sharing capability, and the repository
capabilities, then, yes your institution need to have it but just for data transfering capabilities you don’t need any institution support license. Anybody can just logon to globus.org, make an account just for transfering data. I’m gonna check that! Is this something that like a user can do? Or do you need help from IT? You don’t need support from IT. It’s all designed for the users to do on their own. Krishna, when you login, it will let you log in through your federated search but if UCSF isn’t one of those listed she can still
create a personal account. Yes– so don’t get thrown off if UCSF doesn’t
show up in a list of login organizations. Use your gmail.com or yahoo.com email or your ucsf.edu email address, use that particular email address as the login. you can create an account
on Globus.org and start using their free services I think you mentioned you had
processed two brains? how many brains would you like to process if you– if your
pipeline was faster? we’d like to have at least six because it’s, the six just
takes a long time area project like topology it’s called braak mark
staging. so zero for instance is our control. it’s actually a braak one well somebody that has alzheimers and zero is a completely normal person, what is I think in the beginning you get into some things are going to run into Braak 2 or Braak 3. and there’s a problem with this because so when
you get this system is like your brain is already really damaged so we’d like
to see how it goes you want to predict. You mentioned that you have the data to … [unintelligible] yeah I know you might have issues. Why? not so much it’s timely ah I have issues like when you run this neural network at the
edges so why do a little petty replicate– mirror the image
and segmentation and just copy it back to what it was… Do you collaborate with any faculty at UC Berkeley for this work? No. For this work no. If you are collaborating with any UC Berkeley faculty you can get a computing allotment on the Savio cluster. which has GPUs. This is the kind of work we want to do on Savio. yeah [Jason] Well there’s no follow-up to that kind
of incredible presentation but I’ll give it a shot. So, Owen talked about his investigation
into the need for virtual machines mostly looking on the instruction
side. And so what I’m going to talk about a little bit is a computational service for researchers at least that was the original
ambition, it has grown into serving researchers and instruction. I’ll give
you a little bit of a background on research computing at Berkeley and kind of where our stored services for research computing came from. Really with
the intent to kind of provoke ideas for and kind of illustrate where we’re
headed with it. So around five years ago a number of
faculty got together on campus and as the story was told to me went forward to
the Vice Chancellor of Research and said we need support for
computation intensive work on campus. Some of our colleagues, faculty
colleagues get that support from Lawrence Berkeley Lab but others don’t
have that kind of support we’re working in social sciences, digital humanities
and so forth and so we’re actually losing grant funding because we can’t
demonstrate that we have institutional support for computing. Curiously I had
been at Berkeley for 10 years before that — Public Policy and worked on some
grant funding research and decided to leave in part because it didn’t seem
like there was a lot of support in the central IT group for research computing
which I had moved more into. So ironically right after I left this faculty
submitted the letter, the vice-chancellor research and the CIO Larry Conrad and
the Chancellor agreed to put funding into to create Research IT and Berkeley Research
Computing and so immediately one of the first steps was to reach out to Lawrence
Berkeley Lab and the partners there to establish high performance computing
service but because of the nature I think in part of the original request it
was always clear that we’re going to need to provide some kind of
computational platforms for folks who will not be working in traditional
research computing or high performance computing. And it was also going to be
really important to provide a program of consultation and research facilitation
to reach out to different groups on campus and help them, help researchers
figure out what’s the right fit for their compute needs the right fit you
know what’s the right compute resources and storage and networking needed.
And so sort of thus Berkeley Research Computing was born with sort of HPC
condo cluster being established. Cloud computing, cloud consulting around
cloud — we knew we needed to provide something in the cloud arena at the time
but we weren’t really sure exactly what that should be and ultimately we decided
not to try to run any sort of private local cloud for research computation locally but instead we would try to develop an expertise in being able to
advise about the cloud so that researchers could at least ask the right questions before
they before they before they jumped in so that’s kind of become merged with our
consulting practice which we regard as sort of the center of what we do and then the final part in the best part is the analytics environments on demand and so
that was always envisioned and the purpose for my hire was to develop a
computational service for folks who are not in the sciences, who need some
other kind of compute modality likely with which they are familiar
because you — it’s a really hard thing to to tell someone forget what
you’re doing now with your technology and we’re going to move you over here and we’re going to tell a funny story about that in a minute. And so we had to get that going and
it was sort of the last service to the group and it was sort of the last service to the group. And we said we had to do it kind
of fast and so I was charged with figuring out what that was and to get it
like available as soon as possible and so analyzingXX on environments on demand
is this idea what we would do so I took a look around the tons of resources
available for doing working with Linux and different capacities but no resources
available on campus for doing anything computational with Windows, and the
reality is that a lot of digital humanists and social scientists work
with Windows and so as their fields that become more computationally intense they
have wanted to ramp up their Windows experience and so what happens is you
start running a data analytics job on your on your laptop and I don’t know
what’s happening that is taking three days to finish and is there a better way. And
so thus analytics environments on demand was born. And so here’s what it is, so I keep this kind of
slide up to date analytics environments in a nutshell. It’s a virtual Windows
based research desktop in support of computationally intensive research and
— that’s one definition of it — another definition is: An IT service supported by
campus partners the consulting practice and a working group and so and —
the third way to think of it is: It’s an allocation program where it provides
compute resources to a department or project in exchange for time from a
technical partner in that place at some of that person’s time, because it is
really clear that even if we could provide a service for these folks we
don’t have the staffing and scalability to support a thousand users or 2,000
users across campus given the staffing we have so had to come up with a service
model that would sort of create staffing out of what we did have and so thus that
sort of sort of came up with this idea of an allocation program where I would
give a department no cost allocation of compute resources in exchange they can
be to earth to four hours of an IT person’s time and that person in the
department supports their users locally and that’s really — you know, well what did you have to go through to think of that — it was really the only
thing I could think of having worked for ten years at the School of Public Policy
and kind of having the experience that when you are the IT group there, no
matter what the question is you are the person they go to like
anything that electricity runs through is yours [laughter] so email was mine
Google, no, it was my email. And so well, I have to get these people who are supporting the faculty in
research there they have to get them engaged you know as it turned out
it was really easy stuff because you can kind of go to them and say look in
exchange you know I’m gonna give you these resources and you know you’re
gonna have to support these virtual machines and you’re gonna have to take
away time from supporting your Windows desktops to learn how to you know spin
up virtual machines add storage and think about it and course yeah, I want to do
that that’ll be really fun to do, so they really engaged with it and so that’s the
only way that the service has worked because the service model involves
bringing in staff from all the different from different departments so
the design principles of it it had to be turnkey it had to be really
direct and easy to use and fit the model of computing that people were
familiar with, it had to be scalable and interactive, you know, so you can do
visualization in HPC if you have a visualization node but it’s primarily
kind of command line experience and so a AEod had to be sort of very interactive
like okay my stated job failed OK I’m going to tweak this and run it again, it
had to be very quick like that, easily shared among colleagues and then
reliable and secure, so one of the one of the things about the easily shared is
that in terms of accounts, I use the Calnet system for accounts, the HPC
side uses their their own accounts, so they have to deal with account
management, I don’t have to deal with the account management, because of using CalNet so great, it makes it awesome and easy to provide. What are
they use it for? For what? For research, run research or
computing on your virtual machine, more power than your laptop, so for each
each department that’s a partner, we give them 136 gigs of ram, if they’re
good partners in the department I can get them twice that
much for a limited time, so if somebody needs to run a Windows box
on 256 gigs of ram, you know I can we can do that. We have a number of partners on campus, currently Hass, Goldman School of Public Policy, Archaeological Research Facility,
ETS, which is now research teaching and learning,
that’s where Owen is, so they were a partner and then we sort of got merged
with them, and then most recently with the with the Law School. So we’ve been
running for two and a half years, we’re kind of adding a partner like one every
semester, and as I explained, you join the service you get technical
support from people in the field, they agree to be on an AEod working group, we
meet every two weeks, we discuss operational things, we provide internally,
we provide a very lean Windows-based image but then I can give out to the
different departments and say if you can build your your virtual machines on this
base image, and it’s it’s a secure base image. When we started out, we were in like a 54 gigs, now we’ve got it down to 19 gigs, so we’re pretty excited about
that. Here’s an example of why this is why this is great and this is like very
heartwarming to me. Sergei ShevchenkoXX is my AEod partner at the Goldman School, this is the kind of interaction he’s having with researchers
trying to try to get them the right fit. They had started using AEod, and they’re running into problems it was slow, and then he digs into the
problem — Why is it running slow on this virtual machine and not really sure. And
then these are the kind of responses, okay, when you’re not going to
get a speed-up on another platform because of what you’re trying to do. The
really fun story that I promised to tell, was that there’s a school on campus
researcher’s laptop was not working out so they bought her a twelve thousand dollar
machine that puts data on it with the the most expensive state, a version, the
8-processorXX version, I think that’s the most expensive, that’s what they put on, so they probably spent like sixteen thousand, they wrapped it all up in a package, they
gave it to the professor she started running her computer job and it ran
slower than her laptop. That is a ballistic moment. [laughter] And it’s because they didn’t know at the time, that if you do regression analysis, and say that yes it
will use all the cores if you get the 8, but if you were doing data filtering or
other analyses, it will only ever use one core, and so that’s what she was
doing she had 24 cores but every one of those cores is slower than one of the cores on the laptop and so it ran slower. So this is similar, this is what he’s explaining, but like, I can’t buy that, that’s gold right there, I don’t have 1.25 stuff, there’s no way we could do that across campus, so that right there is why it works, because you get that kind of expertise. For Berkeley grads, which I am not, Sergei is a graduate of the CS department at Berkeley and he’s a graduate to Goldman SchoolXX, but all our
partners are so it’s just a time you know where where is where is the a are
headed and one of the places that is headed is like it’s getting it’s getting
taken up in the tsunami wave of secure computing right as the social scientists
come on board and do research computing they’re not working with images of the
night sky they’re working with within with data about people and then that has
to that’s very sensitive and typically isn’t this high volume but the
sensitivity of it is great and so what we’re doing is we’re taking our a on
infrastructure which which is built by the way on the infrastructure that one
of bills teams manages the VMware Citrix infrastructure built for enterprise
computing we’re taking that and we’ve reconfigured it and make it sort of
hardened in cyber city more cyber secure environment so that researchers working
with data protection level Wan data can and be able to to to compute using a
odds I like to think I’m kind of wrap it up here
what a odd does among other things is it it allows departments on campus to start
thinking about cloud you know start thinking about well instead of buying a
certain this will sound like antiquated a lot of people in this room but still
start thinking about maybe I won’t buy their metal thing and put it in the data
center maybe I think about well what if I needed it only for six months in a
year only for two months and then what if I use those resources and kind of for
different folks at different times and maybe I have like sometimes the needs
are for like a regular 16 gigabit big of a machine maybe other times I want like
three giant machines instead of eight smaller machines and sort of thinking
that way and then that gets you in the kind of right mindset to start thinking
about what of what are the clouds offer what’s at available AWS and and Google I
measure a oh it’s moving towards the secure place we’re also thinking about
perhaps we’re doing a sort of service development project now to consider what
would be like if we move some of our servers to AWS and it was hosted there
we built this and set this up like two and a half years ago things things are
changing there’s pros and cons for doing it locally it’s all in the morn data
center Asher for example Microsoft but the azure has come out with data science
virtual machines both in Windows and then Linux and how about two and half
years ago so we’re doing an analysis now to look at what those data science
virtual machines are again they’re in Windows so that that’s that’s really
compelling potential for us so it’s it’s established but it’s a fun it’s a
in London progress along the way you know I love hearing working research
computing so it’s incredible to hear stories and kind of what you’re working
on from what you describe it may you know I think maybe the kind of bill and
I were thinking like how could a God help you I think maybe you’re probably
better served and working some of the resources of it Lawrence Berkeley Lab
but it’s a really fun conversation can tell you like in different groups that I
have conversations with everyone is asking the question when and what do I
move to the cloud or for enterprise and administrative computing and for
research computing how do I have a mobility in between how do I not get
locked into a particular vendor how does it how do I stay neutral with it some
people are and then people well it’s so much more expensive in the cloud versus
building locally I can tell you that last week I was with the was on a panel
with professor at UC Davis who works with secure data from watt in the water
water for California and he said something really interesting which is
sort of not service provider mentality so so so service provider mentality is
well have to keep the cost down and I have to be able to scale so to again
keep the cost down and so he said he said someone first question rooms well
why isn’t pod more expensive with the data ingress and egress and storage is
going to be more expensive and we’re all like yes it’s more expensive or some was
saying but in some cases it can work out better maybe you don’t have a data
center at your institution maybe you would be have to build a new one and
there are other reasons you could consider I chime in and said like if
perhaps you need to do a very secure environment and you want to sort of test
the waters first you can consider contracting with someone like Sherlock
at UC San Diego who has a secure environment that has already got an
authority to operate and is the last with hip
you know in higher levels of secret then cloud becomes an option but it’s still
some cost the professor said very interesting note I disagree with all of
that I do everything in the cloud yes it cost me twenty seven thousand a year
keep in mind that it’s less than five percent of my grant budget I don’t care
that it’s maybe you tell me it’s twice as expensive I don’t care it’s less it
is a fraction of my research budget and it works so you know end of story
like and so so so on the service provider side we don’t have much money
one what can we do we have to constantly but we also need to keep in mind that
for a lot of researchers in that actually I’ve heard this from another
research on campus too or it’s you know where we said like he said why aren’t
you using Sherlock why are you building your own which are like well but
Sherlock’s many you know can be many times more expensive depending on the
complexity of your architecture and the storage need oh yes I said well so so
and he’s coming with I I want to get this done I don’t want to go rap you
know I don’t want any impediments you’re telling me this is already ready but
you’re telling me it cost twice as much but it’s still fractional for my grant I
want it done there immediately so we’re trying to you know absorb and
and think of those ways to about doing good computing in them and that’s what
makes the most important part and I’ll end on this whether it’s analytics
environments on demand or HPC or cloud computing the most important part of
research computing from from my group across campus is knowing what’s out
there and being able to convey that to
researchers and make sure that they asked some of the right questions before
they before they before they jump into an environment that right fit that
consulting is essential and and Krishna who is here earlier and 10
and other goes from LD Allah and others in the room and Rick and ro and will all
work on helping researchers figure out what’s there a art is built on that
system that you know something in the data centers is my windows file server
so that’s built on that so that that is happening I don’t really see into that
world too much anymore what is happening is more and more people are saying we’re
getting rid of this physical lab and what could we do here what could we do
and then you know of course what’s happening is that commodity entities are
coming around too so this company called a Porto which is very active at UC
Irvine and NYU and it’s kind of it’s it’s it’s they use all the cloud
providers and they provide Windows machines it’s it’s it’s a Odessa
commodity right and so if that keeps coming on and stabilizes then like many
other 19 things it starts out as an innovation and it becomes like why are
we doing this locally because can be done 10 times cheaper somewhere else a
research is not administered continuing right it’s messy in the breaks and it
has to always be like the next thing and so originally funded through research
grant money that was allocated high computing you know so that’s a great
question because I I get to say like a Abhi’s a fraction not even 1/10 so I
spend like less than 50,000 a year to provide no-cost compute resources to
partners across campus they get that no cost and then they can say like well
this is great like Hass and ETS I said this is great
but I need more and I said ok that’s great you can get more I will facilitate
that you can buy in to the and you can add more computing resources to your
allocation and so then the researchers or the approver are paying for that and
that’s an ist central central I team cost is like just a pass-through from
that so there is a way the kind of institution you know can scale up better
pastas they they bought in big time and but they use it for their masters of
financial engineering program and then when that ends they’ve got an awful lot for people who don’t know how the condom
works you see sure so the condom model in HPC is a really interesting and fun
way and Tinsley’s jump in as I as I move through this the condom model is a sort
of way to scale your existing compute cluster in a way that serves everybody
who uses it sort of at the same time so so so find out for example suppose I
have 10 compute nodes in my cluster and I’m a service provider I find 10 and I’m
saying community here we have this cluster in and you are welcome to use it
we only have 10 mil and if you would like to buy and add to this condo you
make you can buy your own Hardware with your grant funding which will not be
taxed because it’s got little equipment to come by that we will add it to this
existing cluster and you will get you will get priority but not exclusive
access to those additional notes is it that you bought and so sudden if you
know say you are then able to double the number and hire artists say if you got
other people here up to 20 notes now you’ve got more users and users consists
of snipped a job and when the notes become available for
them they can use all twenty and thus compute cluster grows as people you know
decide I want to I actually want to buy buy-in because I want preferential
access to some of the computer it might say for example I want to add some GPUs
to this cluster say goodbye GPU notes and add it to them but then what those
GPU nodes are not this into more complicated but when they’re not being
used by this project they’re available to everybody on campus and so you get
this kind of great as it grows everybody who gains from that it’s really fun
really fun system to to grow an AI outside it works a little bit different
but but not too much people can they can’t right now anyway by hardware to to
add to add but they can sort of rent it so they can rent it from ist and pay a
monthly cost and thus they can expand their their resources it’s kind of like
a condo but when they do that they they have exclusive access to the to what
they have added so these models work early well there’s variations of it
Stanford has a condo model for their HPC but when a researcher by say for nodes
then no one else can use those nodes only the researchers who bought those
notes can use it they do that and there’s actually there’s competing
reasons and why that can be a good idea and we struggle a little bit this is
Berkeley where we’ll add some researcher group will buy say some rely on GPUs and
then as we were kind of working out how that would work
you know at one point this is when we got out of email hey I’m trying to use
my new GPUs and I have been able to get on them for two days
because other people are doing jobs that are that are that are using them and hey
you know what’s going on I just bought this and so we’ve had to change the
policies of usage so that for groups that there are very few numbers of this
special sort of nodes that people who bought them can get more more
efficiently and directly use them so you get into those fun things or you’re
tweaking policies and trying to try to try to make it available to people by
them and also shared so there’s tens group that lvl oh hey I haven’t seen it
the positive thing about the condo why what’s the app what’s the benefit to the
researcher who’s contributing they actually get supported and system
support and power I mean you know so in other words part of the carrot to the
condo model is not only that you’re doing the greater service to the greater
good because you’re sharing your resources but you’re actually getting a
lot of service and support gratis because you’re willing to you know sort
of you know provide you know the greater good resources I almost think of ice is
like a timeshare like going to say qqu you’ve got a rental modak anima and I’m
like I kind of much kind of like a timeshare almost where you get priority
because you’ve bought into the timeshare but then other people can like you know
AirBNB it or rent it when it’s not being used. so anyway right So if you want a free one you just need to listen to a pitch… alright well thank
you everybody So the next UC Berkeley Cloud Computing Meetup is July 30th. We will not be in this
conference room – we will be upstairs in the penthouse.