From a rather cold Malmö, I was hoping for a warm Lisbon when I left home on November 14th. Wrong! It was a cold and windy Portugal that welcomed me for my sixth Codebits. Late morning, I decided to go to the venue to organize the O’Reilly table of books and sort out the pre-orders.
It is always fun to get into the Tejo Hall at Lisbon’s Parque das Nações as the main entrance is locked before the opening of the show. I therefore needed to find a side entrance but which one? Most of them were locked. At long last I found an open door and soon met with Jose Alves de Castro, the main organizer. After the greetings he showed me the layout of the place. Once again big changes – or improvements – were made in the main room: more colours, and three new bubble-shaped igloos. But what were those bubbles? It seems that last year, there were 3 tracks of talks given in the main area – unfortunately the noise level was too high and some of the talks were badly interrupted. This year the talks were still given in this area but they are held in the bubbles. As you can see in this video and the photos, the room looks awesome and of course, Sapo’s logo, the little frog, is still the star of the event.
Unfortunately, I was unable to visit the Hackspace since it was in a different part of the building – some of the big names that gave presentations and workshops included:
- Mitch Altman – a San Francisco-based hacker and inventor, best known for inventing TV-B-Gone remote controls, a keychain that turns off TVs in public places.
- Rob Bishop – a RaspberryPi Foundation Member who has been involved with development of the RaspberryPi since the first Broadcom SoC based prototypes (when the RaspberryPi looked like this) and was responsible for the first Quake port.
- Erik de Bruijn – is currently involved in a research project with Eric von Hippel (MIT) and Jeroen de Jong (E.I.M.) to assess knowledge transfers between actors in the additive manufacturing industry. In 2011 he co-founded Ultimaking LTD, a company that develops and sells a fast, large build volume, open-source 3D printer: the Ultimaker.
Unfortunately, I could not follow the talks in the main area but most talks were well attended.
Can you imagine three days and two nights of talks and programming contests? Very tough so there was a need for some light entertainment:
- Presentation karokee – when the presenter does not have a clue about the content of the presented slides.
- Quiz – random questions
- Flashmob Gangnam Style
- Nuclear tacos eating contest
- Badges collections
and much more.
Codebits ended on Saturday evening with the presentation of the outcomes of the programming contests, and the prize giving ceremony as well as a few speeches from the VIPs of SAPO – unfortunately these speeches were given in Portuguese and I cannot therefore tell you what was said.
Codebits is well-established with more and more attendees from 400 in 2007 to approx 1 000 this year. I do not know if it can get bigger but in 2013 Codebits Lisbon will run parallel with Codebits Brazil.
After this rather demanding event, I took a few days off in Portugal with my daughter. There is so much to do there that three and half days were definitely not enough.
Day 1 was dedicated to the Lisbon aquarium – I was told one of the biggest and most populated aquariums in the world. There you can meet the usual fish as well as sunfish, sea otters, sharks and penguins. The day finished with a Fado dinner.
Day 2 was a nice visit to Sintra, the summer retreat of the Portuguese kings. There are several castles in Sintra. For me, the most beautiful was the Costello dos Mouros or Castle of the Moors from the eighth century. Beautiful old stones among huge boulders – you feel like a mountain goat climbing up the very narrow paths. Other castles included the Palacio da Pena and the Palacio Nacional. Unfortunately we did not have time to visit the Park of Pena – I suppose one has to keep something for the future.
Day 3 – Woke up with a reminder of the muscles I did not know I had. Mountain climbing is not for the unfit. With the aid of some painkillers we climbed yet another hill to visit the Castello S. Jorge in the centre of Lisbon – lots of stairs! This castle is yet another ruin dating from the Moorish era and then conquered by the first king of Portugal, Dom Afonso Henriques, in 1147. More climbing on the ramparts – I must admit that the coffee shop with its peacocks and cats was very welcome. The afternoon was filled with a visit to the Calouste Gulbenkian Museum with its treasures collected from all over the world. I loved the “Portrait of an Old Man” by Rembrandt but was not impressed by the French furniture or silver.
On the last day, we did a little shopping and walked around the centre of town and along the Tejo.
One thing I was not prepared for was an invasion of green and white t-shirts – Celtic played Benfica on Tuesday. No squirmishes, the ambience was great but why did the Celtic fans have to come back in the same plane as us! There was not one inch to spare.
This is a summary report of what happened there. O’Reilly invited Data Science London to host its community meetup at the end of Day 2, Strata Conference. The meetup took place at the grandiose Buckingham Room in the Hilton Metropole. Funded by contributions from community members, lots of delicious sandwiches and beers were provided to all meetup attendees for free. Everyone seemed to be enjoying the conversations and meeting new, interesting people. The room was packed with data scientists and data geeks, and –on the last count- we tallied up more that 275 people. Check out the photos.
The meetup was a special session dedicated to Recommender Systems. We had 4 speakers. Dr. Neal Lathia at Cambridge University Computer Lab, kicked off the talks and presented some of the highlights of 6th ACM Conference on Recommender Systems, Dublin. Neal did a great job, given the short time slot, providing an overview of key aspects of 6th ACM RecSys and also –importantly- summarizing 5 open problems in recommender systems. You can read a more detailed post on these five issues here but here is a summary what Neal presented:
- Why do we need recommender systems? We now implement recommender systems to foster engagement and community, and the web has become an ecosystem of personalisation
- Problem 1: Predictions. The research community has become very aware of the fact that there is more to recommendation than predicting ratings. How can you make recommendations novel, diverse and serendipitous? How do you deal with conflicting objectives?
- Problem 2: Algorithms. We need to find a balance between the effort required on users to rate things in order to improve recommendations vs. improving algorithms that can deal with few ratings and make better rankings
- Problem 3: Users and Ratings. The traditional mode of thinking about recommender systems has been “users” and “items,” who are linked by “ratings.” This paradigm is slowly being shown to be incomplete.
- Problem 4: Items. The idea of having tangible “things” that you recommend is also slowly shifting.
- Problem 5: Measurement. Understanding how to measure progress in recommender systems, and also ensuring that algorithm-people, usability-people, and academic researchers work closely are two main issues not solved yet
Second in line, was Tamas Jambor, a PhD student at University College London. The title of Tamas’ talk was Beyond Accuracy: Goal-Drive Recommender System Design. In his talk Tamas explained the differences between goal-driven and metric-driven recommender systems, and also provided a step-b-step, structured approach to goal-driven recommender system design. You can read the slides from his presentation here.
The third speaker of the evening was Dinesh Vadhia, CEO at Xyggy, a startup that is building serendipitous discovery system based on the concept of autonomous computing or anticipatory computing. The title of his talk was Autonomous Computing: The New Interface? (slides here). And although Dinesh did not provide a lot of technical in-depth detail due to IP protection issues, he provided a high level overview of concepts like: new Bayesian machine learning algorithms, digital doppelanger, dynamic predictions, and autonomous discovery.
The fourth and final speaker was Sean Owen, founder at Myrrix, a startup that is building complete, real-time, scalable recommender system, built on Apache Mahout. Sean needs no introduction as he is well known across the recommender systems community. He is also one of the main committers of the Apache Mahout Project. Sean’s talk (Big Practical Recommendations with ALS) provided an in-depth overview on how to use alternating least squares algorithm and matrix factorization to compute, practical recommendations at scale.
At the end of the session talks, the 4 speakers engaged in several informal Q&As with many of the attendees, and several break-out groups formed with lively discussions and intense exchange of ideas. Beer and sandwiches were provided until they lasted. Networking and socializing continued well beyond the end of the meetup, perhaps until 11pm. Everyone –speakers and attendees- had a great time -an although exhausted- the Data Science London team was extremely pleased with the results of the meetup.
Thanks to Gina Blaber, VP Conferences, O’Reilly and her team for inviting us to host our meetup at Strata Conference London. We had a lot of fun, the meetup was a success, and we really look forward to do more things with O’Reilly. See you next time!
Carlos Somohano is the Founder of Data Science London. He holds a BSc Hons. in Business Administration, major Information Systems & Operations Research (ISOM). Carlos is an enterprise consultant with +12 years of experience in SAP, ERP, BI and data projects.
On Thursday November 1st, I left home on a very autumnal morning. It rained all night so the roads were at their worst. Got to Heathrow Airport for the BA flight to Sofia – no problem but I am still wondering why we had to sit over half an hour in the plane before taking off. Made it to Sofia without any other problem and left my luggage in my hotel – sorry I meant in the flat of the organizer of OpenFest, Marian Marinov and his wife Toni Marinova – Once again I stayed with this kind couple.
After setting up the O’Reilly table with the many books, I was invited to listen to a talk on GitHub by Brian Doll at the University of Sofia. It all started very well – I was learning fast and then catastrophe! I lost it and once again I realise that technology is not for me. I did discover how GitHub make money but I am sure you are all aware of that so I will not bore you with the information. From there we went to one of the oldest restaurant of Sofia, wonderful local food in a very special setting.
The making of TUX
OpenFest has been going on for years – I believe since 2003 and has been growing ever since. This year there were more attendees than ever and it offered 6 or 8 talks in English – so OpenFest is becoming an international event with speakers from:
- USA – Brian Doll and Nick Hengeveld both from GitHub
- Germany – Harald Welte, famous for his work on the Linux kernel, GPL enforcement – see GPL-violations.org and Openmoko
- Croatia – Tonimir Kišasondi, Junior researcher at Faculty of Organization and Informatics in Varazdin, Croatia and Vlatko Košturjak
I am always surprised to see how well our books are doing in Bulgaria when the average salaries are so low. I feel also sad that the books are not readily available and that not only do the book buyers have to pay a lot of tax but also the heavy cost of shipping.
GitHub invited everybody to a “Rock Party” on Sunday night – probably not the right music for me but I thoroughly enjoyed the people and their enthusiasm to music, beer and peanuts. It was also nice to hear some typical Bulgarian songs among the heavy rock stuff.
This was my 6th visit to Bulgaria, I am very impressed to see how Sofia has changed during the last 6 years. Of course the communist blocks of flats are still there and will be there for a long time, but I found the city a lot cleaner, the grass was cut and there are a lot less potholes on the roads.
Marian taught me how to take night pictures so please enjoy my favourite church – the Russian Church near the Aleksandar Nevski – two beautiful buildings.
I still need to work on my settings…..
Monday 5th, I was back on the road or should I say back in the air – travelling to Malmo for Øredev 2012. There was no direct flight so had a stop in Vienna, just enough time to eat a Goulash Suppe – that brought back 40 years old memories.
Last week I attended two conferences organised by O’Reilly: Strata (themed around Big Data) and Velocity (performance and administration of web applications).
Recently I have been exploring various NoSQL databases, so when I heard the Strata conference was coming to London for the first time, I decided to attend – after all, many of the NoSQL products are very closely associated with the world of Big Data. Seduced by the discount for booking both, on a whim I decided to attend the Velocity conference as well. Two days for each, so that would be 4 days of presentations. I made sure I had plenty of sleep in advance…
Strata came first. My goal here was to get a wider understanding of the whole Big Data scene – hot technologies, interesting problems that the community faces, and so on.
My first impression was that this is a field still being explored – even the problems are not yet well-defined. Several of the speakers offered their own definitions of “Big Data”. I think it was George Dyson who suggested that the Big Data era began when it became cheaper to keep all your data than to spend human effort to delete it. A more subjective definition was that you know you have Big Data when you have to start thinking about the size of it – which suggests that the threshold will rise as the state of the art moves forward.
I’ve not seen Hadoop in real deployments, so it was interesting to hear the war stories about that, but there were plenty more technologies under discussion. I heard about RDF, Clojure, Cascalogic, techniques for visualizing and exploring data, and much more.
Funnily though, one of the talks that had the most impact on me was not a “deep techie” thing at all: in the last session on Tuesday, Felienne Hermans of Delft University spoke about PhD research she’d done into corporate use of spreadsheets. We all are vaguely aware that Excel gets (ab)used for all sorts of things – largely because it is a quasi-programming environment that is used by non-programmers – but do we really know the true extent? A spreadsheet can combine data, logic and presentation with a complete failure of “separation of concerns”. Felienne had worked with an investment bank where the management initially estimated there might be 10 thousand spreadsheets; the correct figure was more like 3 million. A timely reminder that while we worry about the challenge of slightly rough data in our databases, there’s a whole lot of business-critical stuff out there in users’ home directories…
Velocity followed on Wednesday and Thursday, and here my objective was to catch up with a field where I was a bit stale – my real web experience dates from 5 years ago, and of course things have moved on. There was a lot of talk about DevOps, but this isn’t so new to me; instead I tried to cast my net wide, and went to talks about queueing, monitoring, stories of real-life experience, and various new technologies.
The Velocity conference seemed slightly more “corporate” than Strata, perhaps because it seemed mostly to be about better ways of tackling well-known problems, rather than working out what the heck the problem actually is. Strata was asking, “What do I do with all this data? Is there a business model hidden in there, or knowledge that I can extract? How can do I do any of that?”. In contrast, Velocity mostly concentrated on more specific questions for a more mature field: “How can I monitor the performance of my app around the globe? What metrics should I track? Can I use DevOps-style agility to improve stability and deploy releases more quickly?”
At both conferences there was a good selection of exhibitors; particularly at Velocity where the more mature problem space means there are more players with competing offerings to sell. As a fan of open-source, I find it encouraging how many of the free products now have companies to back them and sell extra support (and conversely, how many companies choose to open their core products). Most of the stands were definitely geared to the technical nature of the conferences and were able to deal with proper in-depth questioning.
The least satisfactory aspect of the whole thing was the hotel conference rooms. All of them had the same narrow chairs bolted together in rows. I’m certainly no “big guy” but I was at least an inch or two wider than the chairs, so in a well-attended talk everyone ended up very tightly wedged, or taking it in turns to lean forward. In most of the rooms the projection screens were low and you couldn’t see the bottom half of slides from the back. A plus for the hotel was the good-quality food; though this may not have helped with the narrow seating!
As you’d expect, this event has sparked a whole lot of questions and further research to do. I’ll certainly be looking into a bunch of new technologies – Hadoop the Definitive Guide is first up on my reading list – but it seems that statistics is going to become a surprisingly in-demand skill as businesses try to extract the patterns from their data. Statistics in a Nutshell next, perhaps…
Overall, my first experience of these conferences was very positive – in both cases, it was a great way to get a survey of the scene and drill down to a few more in-depth areas too. Of course I can’t speak for anyone who had more specific objectives, but it seemed that the corridor conversations around the formal talks offered plenty of opportunities to make contacts, and get into more detailed discussions. I suspect I’ll return in the future, hopefully with some stories of my own to tell!
Gordon Banner is a sysadmin and infrastructure consultant who is interested in almost anything technological, but when forced to specialise will concentrate on supporting developers and maintaining applications at enterprise scale.
It’s that time of year again: Devopsdays Europe is on, four years after the original Devopsdays conference in Gent that got the movement started. The second event was in Hamburg, last year’s conference in Gotheborg, and this year’s location is Rome.
With talks covering the whole devops spectrum, including culture, automation and measurement, lots of people will be sharing their experiences. Speakers for this year’s European edition include Damon Edwards, Jason Dixon , Bryan Berry and Mark Burgess.
Let me introduce you to Alessandro Franceschi, one of the Italian devops evangelists. Al has been active in the Puppet and devops communities for a couple of years now and is one of the people organising the Rome event, so we asked Alessandro a couple of questions.
Q: After Gent, Hamburg and Gotheburg, the European Edition of Devopsdays is now coming to Rome. What makes Rome the perfect place to discuss Devops?
A: In conferences like DevOps Days you go to share ideas, learn new things and meet friends. Rome is a great place for all of these things… and it has many benefits like hopefully nice weather, remarkable sightseeing and good Italian food. This choice happened almost by chance – from just a few discussions to some tweets the location was decided, but to be honest, I think that Patrick, Kris and the other main organisers just wanted to have an excuse to visit Rome (again!)…
We will talk about and discuss tools, processes, people and culture, we will learn how others face our same problems or know about problems we never thought we might have. But there will be a large focus on how the “DevOps way” involves and drives a cultural shift in how people collaborate within a company. “It’s all about Culture” is this edition’s headline, and Rome is a really appropriate location to talk about this.
Q: Alessandro, you’ve been pretty active in the devops / puppet community for a couple of years now. Can you tell us a bit about the local Italian scene?
A: Italy is a haven for talented IT geeks. Many are curious and explore trends and technologies way before the companies where they work. Communication frictions, work organisation in silos, established rigidities are still dominant.
At the same time a vital and active startup community is growing larger and larger: it sparkles with vitality and hope amid the depressing fogs of the crisis. Here DevOps adoption is easier, if not plain necessary. There’s interest both from the dev side (where Agile adoption is well established) and from the Ops side (mostly driven by automation needs). Puppet is growing fast and there are various Chef users as well: numbers are still relatively low but the trend is evident.
Q: The schedule is almost finished, what are you most looking forward to?
A: Many of the presentations seem quite promising. The quantity and quality of the proposals were good.
I’m personally looking forward to learn about Damon Edwards’s “levers for change” for DevOps Culture adoption and quite interested in Chris Hilton’s talk “Beyond Continuous Delivery”. I’m also curious on Bryan Berry’s “Monitoring data.fao.org” and Jason Dixon’s “The State of Open Source Monitoring: The good, the bad, the fucking terrible, and a glimpse into our future”. In any case I’m quite sure that I’ll also find enlightening or interesting ideas in more unexpected places. There will be a good number of Ignites, a format that I personally love. But it’s the people who make the event, so I expect vibrant open space discussions and long evening followups accompanied by great beer.
Besides the actual contents of the conference we hope to provide a really nice experience to the participants. We expect an excellent catering service and the location, kindly sponsored by IBM Italy (I want to mention them without reservation because they have been really helpful in making this edition possible), looks a very interesting and unusual place.
Q: What else besides devopsdays should people visit Rome/ Italy for ?
A: Rome is an incredible city. If you have been there, you know what I mean. You walk over history with every step you take, you turn around a corner and see something new: a church, some ruins or monuments that might be 2000, 200, 50 years old. It reflects the past virtues and current vices of Italy, with all their contradictions. We plan to propose an informal city tour, the day after the event, to the DevOps who want to chill out together by visiting the town after 2 days of brainstorms
With DevOps from all over Europe, Rome’s charm, Italy’s way (and no more political jokes that are a reality), this is Devops Days.
It’s all about Culture!
Kris Buytaert is a long time Linux and Open Source Consultant. He’s one of instigators of the devops movement, currently working for Inuits.
Kris is the Co-Author of Virtualization with Xen ,used to be the maintainer of the openMosix HOWTO and author of different technical publications. He is frequently speaking at, or organizing different international conferences
He spends most of his time working on Linux Clustering (both High Availability, Scalability and HPC), Virtualisation and Large Infrastructure Management projects hence trying to build infrastructures that can survive the 10th floor test, better known today as the cloud while actively promoting the devops idea !
His blog titled “Everything is a Freaking DNS Problem” can be found at http://www.krisbuytaert.be/blog/.
I own many sysadmin books but my favourite has to be Linux System Programming by Robert Love. It’s not always been my favourite of course, there will always be a special place in my heart for Unix Power Tools, if only for getting me past that initial “Write us a Unix shell script”, “no problem, what’s Unix?” moment in my first job, but that’s been pushed into second place by Robert Love’s book.
It’s not because I have any great expertise in C, or any great desire to write low level system code – I’m a mediocre programmer at best. It’s because it is the best book I know for those who want to actually understand how Linux works.
There are many books on administrating Linux boxes, most of them start with a “how to install” chapter. It’s been a long time since I needed one of those. Many of them continue with a tour of some of the GUI-based admin applications that exist. I don’t install X-windows on servers. Then they continue with an introduction to the command line and a tour of some useful commands. I know all the grep options I’ve ever required, and I can use a manual page as well as any other 15 year veteran CLI [command line interface] user. Basically these books have no appeal to me anymore, and I’m no longer their target market.
But once you move beyond the introductory books you hit a problem. The only other type of books that exist for the sysadmin are the application specific ones covering Bind or Tomcat or Apache or one of the many other applications we admin. These are of course fantastically useful if you want to admin Bind or Tomcat or Apache, but they’re pretty much useless if you don’t. If you want to get all business speak about it they address a vertical market, not a horizontal one.
What makes Linux Systems Programming so useful to me is the sheer amount of background information he has put into it. Each chapter contains big chunks of general information related to the topic of the chapter before he wanders off into the C functions that address it. So, for example, the chapter on “Advanced File I/O” contains descriptions of the various IO schedulers available and how and why you might want to use them. The “Advanced Process Management” section contains a good introduction to real-time, and “File and Directory Management” talks about extended attributes (heavily used by SELinux and Gluster) the way ext3 and xfs implement them and the restrictions they impose.
What this book gives me, and what no other book I’ve been able to find does (with the possible exception of Unix Power Tools) is the background information I need to make me a better Linux admin: not an Apache admin, or a puppet admin, but a Linux admin.
It’s not without its flaws of course, it’s at heart a book about C programming, and its value to a sysadmin is serendipitous rather than intentional – so many things you’d like to know aren’t covered (extended attributes are explained but not how to manipulate them from the command line for example), and there are large sections that are of little interest to the non-programmer. But when it’s good, it’s by far the best book out there.
It seems very strange to me that to get this kind of detailed knowledge I have to buy programming books, but when you think about it, there really aren’t any advanced sysadmin books on the market. Contrast this with the programming book market, taking Perl as our example. You can start with Learning Perl, move on to Intermediate Perl, study Advanced Perl, and somewhere along the way read Programming Perl and a dozen other books about different aspects of the language, and if you want to take it even further you can read many books about code style, algorithms or even the lofty heights of The Art of Computer Programming.
But where are the intermediate or advanced level sysadmin books? Where are the books that talk about performance tuning the OS, laying out RAID, the pros and cons of different file systems for your workload, (or even just figuring out what your “workload” actually means), the what, when and why (and why not) of configuring HugePages and the thousand and one other topics that deal with the fundamental ways Linux operates that we deal with every day as part of our jobs, but about which we never gain the deep understanding that perhaps we should.
This sort of information is out there: if you hunt around on the web you can find dozens of blog posts and howtos (many contradictory and outdated of course, this is the web after all!) on any of these subjects, but if you’re looking for a single cohesive description of the OS we make our livings at, sorry, you’re out of luck. This is a shame because the more you understand your job the better you’re going to be, and the easier you’re going find fixing the next weird and unusual problem (in fact the less likely you are to have a weird and unusual problem because you’ve anticipated it and fixed it before it had a chance to set off the pager).
Maybe it’s just me but these sorts of details are just plain fun, both as an end in themselves and as a way of showing off just how awesome you are to your comrades. We all know that sysadmins are the true elite of the computer world, so the better a sysadmin you can be, the more you truly understand Linux, then obviously the better the person you are!
So go and have a look at Robert’s book, it may make you a better sysadmin and a better person. I hope so because it’s the only choice you have.
Chris studied astrophysics at university but drifted into Unix when he finally accepted he was never going to get a job as a spaceman. After a brief stint working with Digital Unix he installed Red Hat 5.0 from a magazine cover disk and has never looked back. In the 15 years since then he’s alternated between sysadmin and consultant roles and is is currently a consultant with Red Hat.