Showing posts with label work. Show all posts
Showing posts with label work. Show all posts

Tuesday, June 3, 2008

Writing Workflow for Scientific Articles

I'm a researcher. An important part of my work concerns writing peer-reviewed scientific papers, in order to expose my work in different scientific venues, such as symposiums, conferences, journals, and books.

Having an excellent research work, with excellent results and findings, is insufficient to have a paper accepted. An important part of this process relates to exposing your ideas, your results. And writing papers is really hard task. It's a mix of sweating to find the proper words and to put them in the proper places, with a fluid sequence of ideas and explanations. It's almost an art form, despite some fairly dogmatic (common-sense?) items that must be present, such as state-of-the-art review, introduction and conclusions, etc.

To achieve a an accepted quality in the writing process (assuming that the actual content is scientifically relevant, of course), I typically perform a well-defined set of tasks: research on existing (and relevant) state-of-the-art work (hand-in-hand with the development of the research work and results gathering/analysis), organise high-level ideas into concepts, drill, cite work, read, annotate, and iterate until reaching the desired result (or, more often than not, reaching the deadline).

This fairly complex and exhausting process can be leveraged a bit by using the right tools at the right time, in order to shift my focus towards Getting Things Done. That is, not worrying about crashing document editors, text formatting, citing format, print+comment+rectify/improve. Just focus on structure on my ideas and write them in a coherent way.

Furthermore, the sheer amount of research work that is published every year in related venues makes it increasingly difficult to find needles in haystacks. That is, find that research article in the piles of paper sitting in the desk, unorganised or, at best, stored in shelves. Obviously, this process doesn't scale. It's an evident role for digital technologies, specially for bibliography and citation management tasks.

I think that several researchers can relate to these scenarios. Hence, all of this blabber leads to my suggestion of a workflow optimised for scientific articles writing tasks, tailored to the best software I could find. On OS X. I'm not sure if some of the software I'll be talking about in the rest of this post has counterparts in other platforms. If so, please feel free to comment and contribute with some thoughts and links.

LaTeX



No researcher in her/his own mind writes scientific articles with other software (unless it's specifically prohibited). LaTeX, a set of extensions to the TeX typing system, where one focuses just on document structure (i.e., abstract, sections, etc.) and on content itself. LaTeX files are plain text files. They are parsed and processed with LaTeX software through one of the several flavours wildly available on the Web, resulting on either a PostScript document (.PS) or a universally accepted PDF.

Despite some problems of LaTeX, such as (oft) lack of WYSIWYG software (due to its typesetting compiler-alike nature), the results are of high-quality and WYSIWYP (What You See Is What You Print). Add that to the almost ubiquitous availability of LaTeX templates on conference/journal websites, a really good automatic bibliography formatter (BibTeX), coupled with an almost dauntingly comprehensive number of utilities, and you've got yourself a must-have typesetting software for scientific papers.

There are several LaTeX distributions at one's disposal, for every platform. My preferred choice on OS X goes to MacTeX, since it is geared toward OS X's look&feel on supportive tools, as well as correct integration with the OS (read: it just works out-of-the-box).

So, LaTeX will be the centre on which the rest of my software choices gravitate around.

Papers



As explained earlier, managing state-of-the-art and other relevant sources of information can be daunting. Either at a physical level (stacks of real printed paper) or at the digital (folders), managing and searching through all papers to find that particular one you're looking for (and with a paper submission deadline lurking in the corner) is just cumbersome.

Papers will help you on this (too obvious name for a software!) It's a really good software to manage, organise, and usefully leverage your entire collection of PDFs laying around in the hard drive. It integrates with well-known scientific digital libraries, including ACM, IEEE Explore, arXiv, among many many others (and it's plugin based for repositories integration).

Despite the fact that one has to pay a license to use it (€29, not that expensive), trust me on this one, it's worth the money. With Papers I can tag (i.e., multi-category), annotate, and search through my own repository within the program, as well as through Spotlight.

One more thing. It affords exporting papers' metadata into the BibTeX format. This way, I can manage everything related to what I have to cite in a single program. It is the right hammer to the right nail.

Scrivener



At some point is time to put thoughts, ideas, and results into words. As I previously said, it's not easy. Almost no one can write a paper top to bottom, from the first word to the last. It's an iterative process that starts invariantly with organising ideas in a coherent line of thought. That's when Scrivener comes to help.

Scrivener is a tool targeted to all writers that exploits the typical workflow of drafts, loose notes, and combining them into a consistent piece. It's fairly similar to scientific writing, minus some issues that I'll describe later on. One of its killer features is the full-screen editing mode. I've written an essay before about the benefits of full-screen applications, ergo Scrivener fits perfectly into this line of thought. It hides all other apps, animations, popups, and everything that might stand in the way of the writing process. This way, one focuses just on what's supposed to be done: writing that paper.

This software also supports researching tasks (lato sensus), including searching the Web, bookmarking Webpages, as well as annotating text drafts. I do not advise performing all of these tasks within Scrivener. To put it simple: use it just to organise your ideas, structure your text in different drafts, and that's it. Papers and other software listed in this essay will streamline research and annotating tasks in a better way.

Oh, and did I mention that Scrivener exports into the LaTeX file format?


TextMate, Skim, and pdfsync



After having the core texts for the paper converted to LaTeX, one has to delve into details and typeset it. Editing it in a generic brand or non-specialised text editor is something from the last century. With the current days of syntax highlighting and IDEs, a lot of choices are available.

Furthermore, LaTeX is a command line oriented software package. And that's how it should (continue to) be. However, one typically wastes too much time opening a shell and running a set of commands to typeset LaTeX documents.

To complete the workflow I've described earlier, this detailing and improving process includes annotating the paper with comments, highlights, strikes, underlines, etc. Since I'm talking about an all-digital workflow, the process of annotating and editing must be as simple as possible, mimicking the print-annotate-edit traditional process.

All of this can be easily avoidable with a tailored LaTeX text editor plus some useful tools.

While other choices are available, my personal belief is that the workflow is better supported and streamlined with TextMate.

TextMate is an all purpose text editor mostly targeted to programming tasks. It was popularised by the Ruby on Rails guys, as a simple, lightwight, and GTD-friendly text editor (I strongly agree with this opinion.) It also provides a comprehensive support for different programming languages and, as you surely have figured out, supports LaTeX out-of-the-box.

Through a bunch of keystrokes , the typeset tasks are instantly launched, and a user-friendly window presents possible errors and warnings that might occur. Citations are easily managed, and syntax highlighting provide visual cues to LaTeX keywords.

Within the iterative process of improving the paper one's writing, the back and forth reading, annotating, and editing process can be really tiresome. Therefore, to mitigate such problem, two other tools can help getting back on track on the main task: finishing the paper. These tools are pdfsync (which is already bundled in LaTeX distributions) and Skim.

pdfsync provides the core support for swinging between the typeset PDF and the LaTeX source (with a fairly good granularity). Setting it up just requires adding a \usepackage{pdfsync} on your LaTeX preamble. After typesetting, a marker will appear on the PDF, representing the position your cursor is located within the LaTeX source.

Vice versa, Skim supports the other direction (PDF towards LaTeX), since OS X's default PDF reader does not afford this functionality. Skim is supported by TextMate, which can be setup with just two mouse clicks.

As a bonus, Skim has built-in PDF annotation tasks just like Adobe products, with the added bonus of being free (as in beer) and really lightweight.

OmniGraffle



The last thing I'll be talking about in this essay concerns creating vector-based figures. One of the beauties of PDF (and PS, for that matter) is that it's a vector-based file format. It means that it's resolution independent. Consequently, it is desirable that, whenever possible, all figures embedded into the paper are vector-based as well.

My preference for creating figures is OmniGraffle. It's a lightweight and easy-to-use piece of software, that provides intelligent guides to create vector-based figures that are coherently aligned, dimensioned, and eye-candy. Remember that a good figure can be worth one thousand words. A poor quality figure (e.g., misaligned shapes) conveys an amateurish approach to the work, which can be negatively reflected in the peer reviewing process. High quality graphics do help improving the paper's overall quality. After using it, you'll be constantly reminded that it's an excellent piece of software when you have to use Microsoft Visio or any other diagram software of lesser quality.

Add to that fact that it supports 100% vectorised PDF exporting - which can be directly embedded into LaTeX files, and you've got a high quality research paper ready to be submitted, peer-reviewed and, hopefully, accepted!

Ending remarks


I hope this info will help you lowering the burden on the logistics of writing scientific papers. While I'm not an expert on all of these topics, all of this comes from my 4 to 5 years of experience working as a researcher. Once again, this is not an exhaustive list of software and workflow. It's just my own experience being described.

I'm sure there is a lot of things that I may have missed, and better software out there. I'm still missing two significant pieces of software that can integrate seamlessly into my workflow: WYSIWYG table and equation editors, and integrated into TextMate. It would be great to select a table or an equation, and edit it without having to know a bunch of macros.

Therefore, please feel free to comment, and make corrections and suggestions. I believe it's important that researchers spend their time on researching, not wasting it on avoidable pitfalls in the writing process.

And now, back to that pesky paper I'm writing...

Monday, November 12, 2007

slowing down

Unfortunately, I'm having little to no time to do proper blogging on the past month. This slowdown has been influenced by:


  • Finishing a paper for ACM's WWW 2008;

  • Improving my methodology for modeling Web Interaction Environments (more on this on a later post);

  • Reviewing students' assignments;

  • Traveling;

  • Helping out on organizing ACM's CIKM 2007;

  • Reviewing some stuff for XProc.


Apart from these, two or three (as a matter of fact, three) startup projects are being started. They're at different inception stages: some just on the back of my head+some chatting with Daniel, others with Tiago and Bruno.

Too much stuff, too little time on my hands... *sigh*

Wednesday, May 16, 2007

Afterthoughts

The last two weeks were really really interesting. So interesting that my initial plans on blogging while at the conference went straight to the garbage bin, especially coped with the average 5 hours sleep I was able to get there.

The first four days at Banff were simply beautiful. Surrounded by the rocky mountains at the Douglas Fir, I managed to wake up every morning and go skiing with a bunch of good friends from all around the world on the Sunshine Village ski resort. In one word: beautiful. I even managed to do some black diamond runs, no way I could've imagined myself doing it... But as a matter of fact, I did :) Awesome!

Returning to reality, being a volunteer for WWW and presenter for W4A, some things had to be done. Moving to the jawdropping Fairmont Banff Springs Hotel, with an insane window view towards mountains, river, golf course, snow, forest... simply beautiful. It was a good omen, I thought. And I was right. It was.

W4A started. There were several interesting presentations over Web 2.0 technologies and how to leverage them regarding accessibility. Some more technical, others geared towards research, but nevertheless interesting. My presentation went fine, despite some anxiety (oh, so typical of me...), got a bunch of interesting questions.

On the second day, as an assigned volunteer for W4A's room (thanks John!), I got to see the rest of the conference, and think about my own research goals. That's what conferences are for.

Heading to WWW itself, I was pleased to have the chance of hearing and seeing lots of great presentations on Browsers and User Interfaces, advances on standards from W3C's Technical Track, some cute demos on the Developers' Track, and loads and loads of Information Retrieval and Semantics. Concerning the plenary keynotes, they were simply great. Hearing Tim talking about WSRI - Web Science Research Initiative - and viewing those simple state charts summarizing the whole research process for the emergence of the Web Science field (and other Internet gimmicks such as e-mail) was definitely insightful. And viewing the Web as a set of linked data (as Brian reported), coupled with some chat I've managed to make with Brian Kelly, Peter Brusilovsky, people from the DAISY Consortium, and others, I managed to flourish several thoughts and research directions for my PhD work.

Summing it all up: it's a must-go-to conference every year, it's a must-go-to place to be on vacations when possible!