kevin burton’s new feedblog

kevin burton’s new feedblog kevin burton’s new feedblogyou may say i’m a dreamer, but i’m not the only one. thoughts on munin, performance monitoring, and svg nov 8, 2007 in clustering, linux, open source with 0 comments i spent some time the other night hacking on munin to get it to produce svg output and wanted to serialize my thoughts. the basic premise that i’m working on is that svgs should be much faster to generate on the server, send over the wire, and manipulate on the client (zoom, interact with, etc). the results are somewhat disappointing. first…. what’s good. * it’s somewhat easy to have munin generate svg output. you just have to hack a couple of scripts. * the svg output is certainly smaller. it’s about 1/2 the size as the png version. now the bad. * the svg rendering on the server is not any faster (at least in my unscientific benchmarks). this might be isolated in rrdtool or in munin itself. * munin will internally need to be reworked to to use object tags intead of img tags since this doesn’t seem to be supported under firefox or safari. * svgs were overally much slower since for *some* reason munin didn’t build them lazily and each iteration a new graph was created. * filesize wasn’t my intial concern as much as time. the time to render each graph is 2-4 seconds which is unacceptable. * the default font size in 2.15 is way too small. perhaps svg is the wrong choice. maybe apple canvas. i was also thinking that sending over an svg generated on the server and into the browser might be less than ideal. why not send over json output? the client would then load a google maps-style ui from this data and render the graph. since it’s data there are more client manipulations that are possible on the client such as technorati’s dropped index nov 6, 2007 in google, spinn3r, tailrank, technorati with 0 comments looks like technorati is trimming their index back a bit down to content less than six months. we’re in the midst of some economization, performance fixes and retooling that have required taking some data offline. the data is not lost but our priorities are to prefer keeping recent data online. most people don’t notice we’ll probably be bringing that data back online but i don’t have an eta yet. i can sympathize a bit on the hardware requirements here - especially now that technorati is having to cut back on their infrastructure requirements. the major obstacle i think their new ceo faces is going to be their burn rate. they can’t have more than a year of cash in the bank so he’s going to have to slash and burn to extend that. he’s then going to have to pitch for potentially a series d. the emphasis in the quote is mine but it’s the key line: “most people don’t [sic] notice.” they didn’t notice because they’ve long since switched to using google blogsearch or the main index of google itself. the declining number of people who do regularly use technorati for search will soon be jumping across to google as they discover that technorati is a shallow pool when searching blogs. that’s a bit harsh and certainly not what ian meant. the vast majority of people search for recent content. that said. recall is an important characteristic for a search engine - then again google blogsearch is no panacea. ask’s custom hardware via dell nov 4, 2007 in ask, clustering, google, open source with 0 comments apparently, ask is selling custom hardware to ask tuned to run search applications: “the box comes in at a much lower price because it only has the components that are required to support each application,” said mark stockford, the senior vice president of operations at the ask.com. this customization capability on its industry standard servers has cut server power usage by 30%, he said. i’ve been thinking about it over the last few months - i really want to buy machines at the rack level and not the machine level. possibly even by the quarter rack. the biggest issue is the os load i think. dell and hp will come out and physically swap out your hardware but if the drives fails you basically need to start over. pxe can solve this of course but that requires more setup time. mysql and disk transfers per second nov 4, 2007 in clustering, linux, mysql, open source with 0 comments the unix iostat command (along with vmstat) is a great tool for finding bottlenecks in your disk subsystem. the ‘tps’ field is especially useful since you can see the unique reads/writes to your disk which usually map to disk seeks (which are a performance killer). the problem is that there are no simple tools for monitoring the iostat tps across all of your servers. so late last week i threw one together by groking the source of iostat. apparently, it just reads the value of /proc/diskstats and compares the new values with the previous sample. it took me about 10 minutes to throw together a munin plugin to measure the performance across all of our servers. since our disks can handle about 300 tps without being overwhelmed we’re well within operating limits. however, this is interesting: this is the tps on a server running myisam and long standing queries. as the table fills up the performance starts to fail and significantly fall over. it’s going to be interesting to see these numbers increase over the next week. spinn3r reference client moved to google code nov 2, 2007 in google, spinn3r with 0 comments we’ve moved the spinn3r reference client into google code which should allow for a lot more collaboration with the open source community. we’ve already received some great feedback from our client base about features to implement and misc small bug fixes. i think that moving forward we’re going to be using google code for all the open source projects that we sponsor. twitter on earthquakes oct 30, 2007 in san francisco with 0 comments well that was a fun one. quick confirmation: at&t invents programming language to spy on americans oct 29, 2007 in open source, security with 0 comments what do you get when you combine a cool distributed system, new programming language, and illegal government spying? apparently, a programming language named hancock: an at&t research paper published in 2001 and unearthed today by andrew appel at freedom to tinker shows how the phone company uses hancock-coded software to crunch through tens of millions of long distance phone records a night to draw up what at&t calls “communities of interest” — i.e., calling circles that show who is talking to whom. i’m going to add this to my reading list as it also adds graph theory, clustering, and programming languages, into the mix. i think this post by threat level might be a bit alarmist though. while this tool can be used for evil purposes it can just as easily be used for good. for example, one could use hancock to determine clusters and then keep calling records together in a database. they can also use it for legitimate fisa warrant data collection purposes. as long as the government has a warrant at&t should comply. if not they should prepare to be sued. initial release of mysqlslavesync oct 29, 2007 in linux, mysql with 2 comments i’d like to announce the first release of mysqlslavesync. this is a script to perform unattended cloning of mysql slave servers (or masters) to put a new slave online with minimal interaction. it connects to the source node, performs a mysqldump or mysqlhotcopy, transfers the data, restores the data, and then sets up all replication parameters, on the target machine. it then starts the slave which then begins to catch up to the master. this little script has saved me a massive amount of time and eliminated a lot of stress and hassle when setting up new slaves. now i just launch screen, run mysqlslavesync, and forget about it. an hour or so later i have a new slave up and running. a few caveats. this is the first release. the code needs to be cleaned up a bit. your mileage may vary, and feedback is appreciated. log5j oct 29, 2007 in open source with 0 comments ok. i’m on a roll tonight. i just released another small library named log5j which is a facade around log4j: logger facade that supports printf style message format for both performance and ease of use. the log5j package supports a ‘modernized’ interface on top of the class log4j api usage. it provides a few syntactic extensions thanks to jdk 1.5 (hence the name log5j). first. it is no long required to give log4j the category when creating a new class level logger. log5j just figures it out from the call stack. google code rocks! i think i’m going to move all the projects hosted on code.tailrank.com over to google code. prediction: ron paul’s 10m fund raising hack will backfire oct 29, 2007 in politics with 1 comment the ron paul campaign is trying to take in $10m in one single day. which is an impressive goal and i think they have a good chance of pulling it off. if they succeed it will be the single biggest day of campaign contributions in us history. the problem? their huge political win is going to be mired due to neocon criticism that ron paul is endorsing terrorism against the us government. why? november 5th is the day of the gunpowder plot: the gunpowder plot of 1605 was a failed attempt by a group of provincial english catholics to kill king james i of england, his family, and most of the protestant aristocracy in a single attack by blowing up the houses of parliament during the state opening. the conspirators had then planned to abduct the royal children, (who were protestant) not present in parliament, and incite a revolt in the midlands. the recent film v for vendetta made numerous references to guy fawkes and the gunpowder plot. they’re not ignorant of this fact. on their site they use a poorly worded poem on the subject: remember, remember, the 5th of november, the patriot money bomb plot … the neocons are anything but stupid. they’re going to make this connection and go on the offensive. mark my words. you heard it here first! thoughts on innodb internals (re heikki tuur) oct 29, 2007 in linux, mysql, open source with 1 comment heikki tuuri has responded to the innodb questions the community asked him earlier this month. there’s a lot of information here so instead of just responding in comment form a dozen times i decided to make this just one coherent post. q7: does innodb has any protection from pages being overwritten in buffer pool by large full table scan ht: no pz: another possible area of optimization. i frequently see batch jobs killing server performance overtaking buffer pool. though full table scan is only one of replacement policy optimizations possible. note that most database systems like myisam are very vulnerable to this problem. one solution is to have dedicated reporting machines or to compute stats in some other manner. this is one advantage of having the cache in user space and not having to rely on the kernel as you can give it ‘hints’ about how to perform. q15: how frequently does innodb fuzzy checkpointing is activated ht: innodb flushes about 128 dirty pages per flush. that means that under a heavy write load, a new flush and a checkpoint happens more than once per second. why not make it exactly 128 dirty pages per flush and ditch the “fuzzy” part? i’d like to boost his radically so that i can sustain higher io on database that are 100% in memory. ideally i’d be able to just do sequential writes to the disk. this is going to become more of an issue as disk subsystems get faster and faster. i imagine for raid systems with lots of this that this is a big bottleneck. … heikki also goes on to talk about blob storage: the ‘zip’ source code tree by marko has removed most of the 768 byte local storage in the record. in that source code tree, innodb only needs to store locally a prefix of an indexed column. pz: i think it is also very interesting question what happens for blobs larger than 16k - is exact size allocated or also segment based allocation is used. i was curious about this zip source code tree so i went to dig into this a bit more and found his talk from mysql comcon europe, frankfurt nov 10th, 2004 we will also implement a transparent, on-the-fly zip-like compression that will reduce space usage a further 50 %. this will appear in mysql-5.1 interesting. was this implemented? did it ever make it into 5.1? i then asked: any plans to enable tuning of the checkpointing rate? postgres exposes this data and allows the user to tune the checkpointing values. ht: hmm… we could tune the way innodb does the buffer pool flush. i think yasufumi kinoshita talked at users’ conference 2007 about his patch that makes innodb’s flushes smoother and increase performance substantially. i assume there is lots of room to tune the flushes, since i never optimized the algorithm under a realistic workload. making the doublewrite buffer bigger than 128 pages would require a bit more work. now it is allocated permanently in the system tablespace when an innodb instance is created. in theory, one could just recompile with a doublewrite buffer (or disable it) and increase the fuzzy checkpointing rate more than 128 which should improve performance. is there a url for yasufumi kinoshita’s patches? mysql ipo - are we friends and family? oct 29, 2007 in linux, mysql, open source with 0 comments rumor has it that mysql is filing for an ipo: one of the most-anticipated tech ipos of the year has been that of open-source database company mysql. it seemed like they were ready to go public back in the beginning of the year. now i am hearing chatter from hedge fund circles that the filing may be imminent. last i checked, nothing has been filed with the sec yet. investors, including benchmark, index, ivp, intel, and sap, have put in more than $39 million to date. do you think mysql might be nice enough to allocate a portion of ipo as friends and family to the open source community? (hint hint). both barak obama and chris dodd will filibuster immunity legislation oct 24, 2007 in politics with 0 comments well this makes my day: it’s official: obama will back a filibuster of any senate fisa legislation containing telecom immunity, his campaign has just told election central. the obama campaign has just sent over the following statement from spokesman bill burton: “to be clear: barack will support a filibuster of any bill that includes retroactive immunity for telecommunications companies.” it would be really awesome to see all the democratic leadership show some courage on this issue. this really should be a bipartisan issue. the fourth amendment is one of the crucial freedoms that separates america from the rest of the world. the first rule of lobbyconn oct 24, 2007 in san francisco, startups with 0 comments is that you do not talk about lobbyconn!: “the sessions at technology conferences are often like plots in porn films,” said ben metcalfe, a technology consultant from san francisco who said he lobbycons about four conferences annually. “it’s required for the context, but it’s not really what you paid for.” … in any case, rafer will be conducting his business outside the web 2.0 halls this week. “i already have too many good meetings scheduled to bother going inside,” he said. you guys are out of the club! i was asked to participate for this story - but i follow the rules! nice catch oct 22, 2007 in uncategorized with 0 comments i really like this image on pizdaus by miguel lasa. this looks like an osprey with about a 5 lb. largemouth bass. beta announcement of spinn3r client libraries oct 22, 2007 in spinn3r with 0 comments one of the things we’ve started to notice in the last few weeks is that as the spinn3r api becomes more sophisticated it’s becoming difficult for clients to implement. moving forward, we’re going to release and support client drivers for multiple languages including java, python, perl, and ruby. today we’re announcing a java reference implementation (and javadoc) of the spinn3r api. all of our drivers will be released under the apache 2.0 license. the apl is a very liberal license and basically allows customers and researchers using the spinn3r api to build whatever type of application they want on top of our platform without having to worry about legal and licensing implications. another interesting property of this implementation is that it’s very small - 1500 lines of code. this means ports to other languages should be very easy. this api will be included in spinn3r 2.1 as a final release as there are only a few small features left to implement. update: niall suggests hosting this on google code so there can be a public svn repository. this makes a lot of sense. we started hosting code.tailrank.com before google code as released. this would be one less thing for tailrank to admin. thanks also go out to adactio on flickr for providing a cool photo of a latte under the creative commons attribution license. bush may veto fisa “compromise” because he can’t spy on americans on foreign soil oct 20, 2007 in politics with 0 comments john stewart was right. bush had turned from villain to cartoon super villain. apparently, there’s a poison pill in the fisa legislation in front of congress which adds: “the individual freedom of an american shouldn’t depend on their physical geography.” the nytimes has more: but passage in the committee came with one unexpected hitch. in an interview after the closed session, mr. wyden said he had succeeded, by a vote of 9 to 6, in adding an amendment that would offer additional protections by requiring that the government get a warrant whenever it wanted to wiretap an american outside the country, like an american soldier based overseas or a business person. security at lax misses 75% of fake bombs oct 19, 2007 in politics with 0 comments security in lax is seemingly pathetic: transportation security administration screeners at lax failed to notice 75 percent of fake bombs and explosives that passed through the airport during unannounced drills. by comparison, screeners at san francisco international missed 20 percent of the would-be bombs and, at chicago’s o’hare international, 60 percent of the fake explosives were unnoticed, according to usa today, citing a classified memo. sfo is not much better with their hit rate of 20%. well i’m going to sleep well tonight! bigtable and c oct 19, 2007 in google, search with 0 comments there’s been a lot of activity in the distributed database space in the last few weeks. first was kfs (kosmos fs) and now powerset brings us hadoop. i’ve been thinking about this a lot recently but i think java is the wrong language in which to design distributed databases (or any database in general). i’m specifically talking about the on-disk persistence engine. the main problem being implementations of sendfile, async and event io, memory management, and implementation details such as access to mlock. java’s vm is one problematic area. once the vm allocates memory it doesn’t want to let it go. then there’s the problem that there’s no implementation of mlockall for java. one could write an implementation in jni but then you run into other problems with lack of access to other jni libraries. c just isn’t that hard. for a small and tight database implementation like gfs or bigtable it seems to just make more sense to implement it in c. memcached and lighttpd are a great examples of what i’m talking about. they’re small, thin, and get the job done. osi approves microsoft public license oct 19, 2007 in open source with 0 comments don’t say this out loud “osi approves microsoft public license” as your brain might explode. maybe that’s their whole goal? there’s evil here somewhere - i just haven’t found it yet. acting on the advice of the license approval chair, the osi board today approved the microsoft public license (ms-pl) and the microsoft reciprocal license (ms-rl). the decision to approve was informed by the overwhelming (though not unanimous) consensus from the open source community that these licenses satisfied the 10 criteria of the open source definition, and should therefore be approved. the formal evaluation of these licenses began in august and the discussion of these licenses was vigourous and thorough. the community raised questions that microsoft (and others) answered; they raised issues that, when germane to the licenses in question, microsoft addressed. microsoft came to the osi and submitted their licenses according to the published policies and procedures that dozens of other parties have followed over the years. microsoft didn’t ask for special treatment, and didn’t receive any. in spite of recent negative interactions between microsoft and the open source community, the spirit of the dialog was constructive and we hope that carries forward to a constructive outcome as well. hugs microsoft! welcome to the open source community. your official membership card is in the mail. next page » about a blog by kevin burton - founder/ceo of web crawler spinn3r and the tailrank memetracker get a $100 on serverbeach discount with the code: stdjk33xzq spinn3r news spinn3r reference client moved to google codebeta announcement of spinn3r client librariesspinn3r indexing 52t per monthannouncing spinn3r 2.0google news as a walled garden?post-mortem of an advanced spam attacktailrank on lunchmeetreporting crawl stats to google analyticsspinn3r launches today recent posts thoughts on munin, performance monitoring, and svg technorati’s dropped index ask’s custom hardware via dell mysql and disk transfers per second spinn3r reference client moved to google code twitter on earthquakes at&t invents programming language to spy on americans initial release of mysqlslavesync log5j prediction: ron paul’s 10m fund raising hack will backfire thoughts on innodb internals (re heikki tuur) mysql ipo - are we friends and family? both barak obama and chris dodd will filibuster immunity legislation the first rule of lobbyconn nice catch recent comments diegobel on using o_direct on linux and in...jaypipes on thoughts on innodb internals (...loopyloo350 on prediction: ron paul’s 1...burtonator on initial release of mysqlslaves...gslin on initial release of mysqlslaves...xaprb on is mysql binary data replicati...rubymatt on bush hits record disapproval r...burtonator on spinn3r indexing 52t per month...chipux on spinn3r indexing 52t per month...udi on google top blogs graphed by ra...burtonator on cnbc vs ron paul - either inco...davidu on cnbc vs ron paul - either inco...gojomo on engineering open house at goog...adamarchetype on did the korean war just end?...john herren on fun with spammers categories 24hourlaundry adsense advertising aggregation ajax amazon aol apple ask atom attention barcamp blog blogger blogging bloglines blogpulse blogs clustering computers and internet crawler debian del.icio.us digg ebay edgeio feedburner feeddemon feedlounge feeds feedster firefox flickr flock foocamp gadgets google icerocket ie indexing internet itconversations java kiko leopard linux longtail memcached memeorandum memetrackers microformats microsoft mobile money mozilla msm myspace mysql netscape newsgator newsvine ning nofollow nsa ookles open source opml os x p2p patents pings podcasting politics privacy programming pubsub reddit reputation rest rojo rss san francisco science search security serverbeach sixapart skype slashdot songbird spam sphere spinn3r startups syndicate syndication tags tailrank technology technorati terrorism topix typepad ubuntu uncategorized vast venture capital verisign vidcasting web2.0 weblog weblogs wikipedia wireless wordpress yahoo youtube subscribe now top posts einstein lolcat using on duplicate key update to improve mysql replication performancekid shoots huge wild boarstupid linux swap tricks with "swappiness"why doesn't mysql support millisecond datetime resolution?changing linux mount options at runtime (noatime)lazyweb: why does windowserver on osx always use 30% of my cpu?i hate nagios!dtv - internet tv on your mac (via rss)unfair benchmarks of ehcache vs memcached tag cloud aggregation apple atom blog blogging blogs clustering feeds google internet linux microsoft mysql open source podcasting politics rss san francisco search startups syndication tailrank technology technorati uncategorized web2.0 weblog weblogs wireless yahoo flickr more photos blog at wordpress.com. theme: 2813 by eli, neil, and paul..

Acceuil

suivante

kevin burton’s new feedblog  Kevin Federline  Kevin Lyttle - Musique Ados.fr  Kevin Michael - Musique Ados.fr  Jamendo : mortad hell - kevin kaos  Kevin Smith (I)  Kevin Costner  Amazon.fr: Kevin Costner: DVD  Amazon.fr : Kevin Michael: Musique: Kevin Michael  {:. LITTEUL KEVIN // Epique & Sauvage .:}  Kevin Kofler's Homepage  FILMDECULTE : KEVIN SMITH  Astrologue, Kevin Lagrange, l'astrologie pour mieux atteindre ses buts  Fiche coureur  Fiche coureur  Kevin, saimal ! :: Les garons, saimal !  KevinFederline.com  Epeus' epigone  Kevin Williamson en V.F.  Bienvenue sur kevin-informatique  Kevin Rose, crateur de Digg.com: La cl du web 2.0, c'est l ...  Kevin Dillon Ados.fr  Kevin Nealon Ados.fr  Kvin Labcot - Rfrences & C.V  :: KEVIN LIZZIT :: - Plug-in Flash MX Non install  Kevin Mitnick - Wikipedia, la enciclopedia libre  We Are Kevin  K E V I N  kvin - Espace Perso HitMuse de kvin  Kevin Potts: Kansas City-based Web Design, Development, and ...  TONIC STUDIO  LFP - La fiche de Kvin GAMEIRO  Kevin Macdonald - EVENE  Kevin Costner - EVENE  Blog de Kevin :: Chevalier du Temps.com  My Bloody Valentine - Kevin Shields confirme la runion ...  L'e-distribution vu par Kevin Garcia. les sujets traits sont : e ...  Kevin Spacey Ados.fr  Kevin Costner Ados.fr  Kevin Garnett Ados.fr  Qubec Info Musique - Kevin PARENT  kevin - Vidos - TomsGames.com  Kevin, M.D. - Medical Weblog  Mitnick Security Consulting, LLC  StarMinute : Britney Spears, toujours amoureuse de Kevin Federline ?  Webpage of Kevin J. Walsh  Kevin Federline > LeMag VIP : Actualit people 100%  Kevin Jee: Official website  Kevin Smokler  VOIR.CA - Montral - Les nouvelles cinma avec Kevin Laforest  Kvin Santugini: Accueil  kev/null  Kevin Lynch  Kevin Weg - DJ Producer - Official Website  Illustration Technique - Illustration De Kevin Hulsey, Inc.  KEVIN MICHAEL - : Album en coute sur MCM  KEVIN LYTTLE - Biographie, albums, clips, cd sur MCM.net  MSN Sports - Kevin Kuranyi - Allemagne  L'Hebdo du St-Maurice > Sports > Entrevue avec le gardien Kevin ...  Kevin - Candidats  Ecran Noir * Kevin Spacey