VOT = 0 - The vocal folds start vibrating at the point when the consonant is released. Typical for fortis consonants in Czech ("tak", "pak", "kat"), Spanish and many others. In English, this occurs when t,p,k are following "s" ("step", "speak", "sky").
Positive VOT - The vocal folds start vibrating more than 15 milliseconds after the release of the consonant. The resulting sound is aspiration. It normally doesn't occur in Czech, but is typical for English ("ten", "pay", "cup").
Negative VOT - The vocal folds start vibrating before the release of the consonant (i.e. during the closure, or in its beginning). The resulting sound is a partially or fully voiced consonant.
a) Vocal folds start vibrating at the consonant's onset: a fully voiced consonant. Typical for Czech ("den", "být", "guma" etc.), Spanish, and other languages (usually those which don't have aspiration). In English, fully voiced consonants occur only between two vowels.
b) Vocal folds start vibrating sometime during the closure: a partially voiced consonant. Typical for languages which do feature aspiration (English: "day", "bay", "gay", "they", "jam" etc., German and others)
On the picture below (click to view full size), you see all types of VOT, from fully voiced to aspirated consonant.
As it seems impossible to publish audio, I made videos. The voice is mine :-) It was recorded in quite awful conditions, that's why you'll hear some noises (especially at the final aspiration, unfortunately), but the voice onset can be heard. Notice how it is unnatural for me, as a Czech speaker, to pronounce a devoiced lenis consonant; I pronounced it fully devoiced (which does NOT make it fortis!):
[bæ] (fully voiced)
[b̥æ](partially voiced / devoiced)
These are the spectrograms of the respective above recordings. Click to view full size.
Important note: When an aspirated plosive (we're talking about English) is followed by something else than a vowel (an approximant - in English), the aspiration is not marked with the small "h", but as a devoicing of the consonant.
Why? Because the effect is the same. VOT is counted from the end of the consonant, therefore a partially voiced consonant (the vocal folds start vibrating somewhere towards the middle of the following sound, be it a vowel or another consonant) is marked as devoiced, with a little circle under (or above) it. Examples: "track", "play", "clay", "cute"