Friday, November 13, 2009

linux kernel jbd2 notes

Sharing some of the notes taken by me while looking at fs/jbd2 code earlier.
Journal commit
==============

1) We start with the running transaction. (journal->j_running_transaction)

2) We change the transaction state to T_LOCKED so that nobody gets a handle
   to this transaction when we are committing the transaction.

    When we do a jbd2_journal_start we look at the transaction t_state
 and if found T_LOCKED we do wait on j_wait_transaction_locked wait
  queue.

3) If somebody already have got a handle on this transaction wait for that
   transaction to do a journal_stop.  (j_wait_updates wait queue).
  
 jbd2_journal_stop does a wakeup on j_wait_updates.

4) Discard the contents of t_reserved_list. FIXME Not sure how we
   can have something in the reserved list. We generally add items to reserved
   list when we are trying to get write access to meta data buffers. ie, via
   jbd2_journal_get_write_access. But since we are preventing a journal_start
   above not sure how we can something on reserved list. The code comment mention
   truncate. 

5) Now switch the revoke table. That means any new revoke action will add to the
   new table. The old table will be written to the journal and we want to make
   sure nobody update the old table.

6) Change the journal state to T_FLUSH

7) mark the transaction as committing  transaction 
    journal->j_committing_transaction = current_transaction

8) Mark the running transaction as NULL 
    journal->j_running_transaction = NULL;
 This would create a new transaction during the next journal_start. So at any
 time we can have one running transaction and one committing  transaction.

9) Wake up anybody waiting on the j_wait_transaction_locked wait queue. (See step 2).

10) Write the dependent data blocks of this transaction.
    Dependent data blocks are added during file system write. For example ext4_ordered_write_end
    does ext4_jbd2_file_inode which will file the inode in the transactions inode list.This implies
    in order to commit this transaction we need to ensure that all dirty pages of the inode need
    to be written to disk. That ensure that we don't have meta data on disk that point to blocks
    carrying old data.

11) Write revoke records. This will allocate a journal block of JBD2_REVOKE_BLOCK type and add
    revoke records in the journal. After writing the revoke block we add the revoke block details
    to BJ_LogCtl list. The revoke record are used during replay so that we don't replay recorded
    journal changes for specific blocks. These records are created when file system do free a blocks.
    For example ext4 calls ext4_forget on the meta data blocks freed and that does jbd2_journal_revoke.
    

12) Change the journal state to T_COMMIT

13) Loop through the meta data blocks added to the journal. We look at transaction->t_buffers. That is
    the BJ_Metadata list. The meta data blocks get added to the list after the file system have modified
    the meta data. For example ext4 calls ext4_handle_dirty_metadata after updating meta data values.
    It does jbd2_journal_dirty_metadata  which marks the journal_head->b_modified = 1  and then adds it
    to the current transaction's BJ_Metadata list, thereby removing it from the BJ_Reserved list.

14) Allocate a descriptor block (block type JBD2_DESCRIPTOR_BLOCK) if we don't have one. A descriptor block
    contain a journal_header followed by journal block tag. There can be as many journal block tag that
    can fit in a journal descriptor block. Each journal block tag will carry the block number of the file
    system that is being put into the journal. So What we have is something below.

    | journal_header_t | journal_block_tag_t 1 | journal_block_tag_t 2|......| metata data block 1 | meta data block 2|

    journal_block_tag 1 will contain information which file system block number maps to meta data block 1.

    All the descriptor blocks are added to BJ_LogCtl list of the transaction. 

15) Get a journal block for writing meta data. ( See step 14)

16) Make a copy of the meta data content into the journal block. We use frozen data if present.
    Frozen data is used when we try to modify the meta data which is already part of a committing  transaction.
    For ex: ext4_get_write_access we check whether the buffer_head already part of committing transaction.
    In detail, during do_get_write_access if we find the journal_head mapping buffer_head to be in BJ_Shadow
    list (we put buffer_heads in BJ_Shadow list when we having started the I/O to disk). We wait for the
    IO to be over. We do that by waiting  for the buffer_head flag to set BH_Unshadow. If we are not on
    BJ_Shadow list and also not on BJ_Forget list. (We are on BJ_Forget list when we have finished IO
    and are going to be added to checkpoint list) we make a copy of the existing buffer_head data to forzen
    data. This make sure what we write to the disk doesn't get modified beneath us. We also update the 
    journal_head next transaction to be the transaction in which we are making the update. This make sure
    once we are done with the I/O instead of removing the meta data block from the journal we add it back
    to the next transaction.

       We also need to make sure we update the journal metata data content in such a way that we don't
       have data matching JBD2_MAGIC_NUMBER which would make it consider as a journal internal block.So
       we do necessary data modification and capture that information in journal block tag.

17) Add the meta data buffer to BJ_Shadow list

18) Add the journal block to the BJ_IO list 

19) Start writing the journal block 

20) Wait for the dependent data block to finish writing. (See step 10)

21) Wait for the t_io_buf_list  IO to finish. These are buffers in BJ_IO list.
    they are added in step 18

22) Remove the journal blocks from the BJ_IO list

23) Remove the mapping meta data blocks from the BJ_Shadow list (t_shadow_list)

24) Add the metadata to BJ_Forget list for tracking checkpoint

25) Wake up anybody waiting on the shadow list (see step 16) 
      wake_up_bit(&bh->b_state, BH_Unshadow);

26) Wait for t_log_list (BJ_logCtl) to finish IO (see step 11 and 14)

27) Now that we know that we have written everything to the journal write the commit record.
    commit record have block type JBD2_COMMIT_BLOCK.

28) Wait on commit record to finish IO.

29) Loop through the t_forget_list (BJ_Forget) and if the journal head is found dirty add to
    checkpoint list of transaction.
     
 We add meta data blocks which mapping journal blocks are already written to
 the journal to the BJ_Forget list.

30) If the meta data buffer is part of next transaction (see step 16) refile the buffer in
    either BJ_Reserved or BJ_Meta depending on whether the data is already modified or not
    (step 13)

31) If not mark the meta data block as dirty thereby exposing the data to VM (__jbd2_journal_unfile_buffer)

32) If the transaction have checkpoint list add it to the journal checkpoint transaction list.

33) As we request for space from the journal and if we find no space we write the blocks in the
    checkpoint list thereby freeing the journal space for reuse.


Journal Recovery
==============

1) Starts with commit id in journal superblock->s_sequence and with start block in journal
   superblock->s_start.

2) Read the start blocks which is the logical block number of the journal inode

3) Look at the header and find out what block type.

4) if block type is descriptor block look at the journal block tag and find the file system
   block number. (see step 14)

5) replay the journal block which is immediately following the descriptor block. Consult the
   revoke records to check whether the block should be replayed or not.

6) Do this till commit record is hit

   The layout is:
[revoke record block][ descriptor block ][data][data][descriptor block][data]..[commit]

Wednesday, September 2, 2009

Second attempt at POSIX NFSv4 acl

Posted the RFC patch that did POSIX ACL NFSv4 client side support using sideband protocol.
http://linux-nfs.org/pipermail/nfsv4/2009-September/011142.html
previous one which mapped the v4 acl to posix acl on the client side can be found
http://linux-nfs.org/pipermail/nfsv4/2009-July/010768.html

onashamsakal



Happy onam

Tuesday, July 28, 2009

emacs auto-complete and ispell integration

(require 'auto-complete)
(require 'ispell)

(defvar ac-ispell-modes
  '(text-mode))

(defun ac-ispell-candidate ()
(if (memq major-mode ac-ispell-modes)
(let ((word (ispell-get-word nil "\\*")))
     (setq word (car word))
     (lookup-words (concat word "*") ispell-complete-word-dict))))

(defvar ac-source-ispell
  '((candidates . ac-ispell-candidate)
    (requires . 3))
  "Source for ispell.")

(provide 'auto-complete-ispell)

Tuesday, June 30, 2009

emacs frame made easy in no-window mode

Now that we have separate tag stack for each frame, I wanted to have the something similar to vim tab support. Emacs wiki is an excellent resource for finding new emacs extensions. For tab support i found

o tabbar mode
o elscreen

Of the above what I wanted was elscreen. I liked elscreen because of its similarity to screen. But getting separate tag stack for multiple elscreen window turned out to be complex.

The solution was to see how easy to get the emacs frame support do what i wanted. Emacs no-window-mode already allowed creating multiple frames. So what remained is to have a sane key binding that allowed creating new frames and moving between multiple frames easily. Here is what i ended up writing.


(defvar frame-browse-mode-map (make-sparse-keymap)
  "Keymap used in frame-browse mode.")

(defun select-frame-1 ()
(interactive)
(select-frame-by-name "F1"))

(defun select-frame-2 ()
(interactive)
(select-frame-by-name "F2"))

(defun select-frame-3 ()
(interactive)
(select-frame-by-name "F3"))

(defun select-frame-4 ()
(interactive)
(select-frame-by-name "F4"))

(defun select-next-frame ()
(interactive)
(other-frame -1))

(defun select-previous-frame ()
(interactive)
(other-frame 1))


(define-key frame-browse-mode-map "c" 'make-frame-command)
(define-key frame-browse-mode-map "1" 'select-frame-1)
(define-key frame-browse-mode-map "2" 'select-frame-2)
(define-key frame-browse-mode-map "3" 'select-frame-3)
(define-key frame-browse-mode-map "4" 'select-frame-4)
(define-key frame-browse-mode-map "n" 'select-next-frame)
(define-key frame-browse-mode-map "p" 'select-previous-frame)

(global-set-key "\C-t" frame-browse-mode-map)

Tuesday, June 23, 2009

Linux kernel and gtags changes

Linux kernel have these complex syscall indirection using macros like
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, int, mode) 
SYSCALL_DEFINE(fallocate)(int fd, int mode, loff_t offset, loff_t len)
Getting the source code tag system to understand them is going to be difficult. ctags support regex based substitution that help to translate the above definition to sys_open and sys_fallocate. To achieve the same in global i had to patch global.

diff --git a/gtags-parser/C.c b/gtags-parser/C.c
index c5653aa..635e8bd 100644
--- a/gtags-parser/C.c
+++ b/gtags-parser/C.c
@@ -143,6 +143,7 @@ C_family(const char *file, int type)
     } else if (level == 0 && !startmacro && !startsharp) {
      char arg1[MAXTOKEN], savetok[MAXTOKEN], *saveline;
      int savelineno = lineno;
+     char *linux_kernel;
 
      strlimcpy(savetok, token, sizeof(savetok));
      strbuf_reset(sb);
@@ -166,6 +167,12 @@ C_family(const char *file, int type)
      if (function_definition(target, arg1)) {
       if (!strcmp(savetok, "SCM_DEFINE") && *arg1)
        strlimcpy(savetok, arg1, sizeof(savetok));
+      linux_kernel = getenv("LINUX_KERNEL_SOURCE");
+      if (linux_kernel && !strcmp(linux_kernel, "yes") &&
+       !strncmp(savetok, "SYSCALL_DEFINE", 14 ) && *arg1) {
+       strcpy(savetok, "sys_");
+       strncat(savetok, arg1, sizeof(savetok) - 4);
+      }
       if (target == DEF)
        PUT(savetok, savelineno, saveline);
      } else {
@@ -611,7 +618,7 @@ function_definition(int target, char arg1[MAXTOKEN])
  if (c == EOF)
   return 0;
  brace_level = 0;
- while ((c = nexttoken(",;[](){}=", c_reserved_word)) != EOF) {
+ while ((c = nexttoken(";[](){}=", c_reserved_word)) != EOF) {
   switch (c) {
   case SHARP_IFDEF:
   case SHARP_IFNDEF:

global/gtags support for per frame tag stack

The below patch should get you per frame tag stack.
diff --git a/gtags.el b/gtags.el
index bbe4e8a..1eb3f23 100644
--- a/gtags.el
+++ b/gtags.el
@@ -69,12 +69,6 @@
   :group 'gtags)
 
 ;; Variables
-(defvar gtags-current-buffer nil
-  "Current buffer.")
-(defvar gtags-buffer-stack nil
-  "Stack for tag browsing.")
-(defvar gtags-point-stack nil
-  "Stack for tag browsing.")
 (defvar gtags-history-list nil
   "Gtags history list.")
 (defconst gtags-symbol-regexp "[A-Za-z_][A-Za-z_0-9]*"
@@ -165,22 +159,34 @@
 
 ;; push current context to stack
 (defun gtags-push-context ()
-  (setq gtags-buffer-stack (cons (current-buffer) gtags-buffer-stack))
-  (setq gtags-point-stack (cons (point) gtags-point-stack)))
+  (let (cur-frame (selected-frame))
+  (setq gtags-buffer-stack (cons (current-buffer) (frame-parameter cur-frame 'buffer-stack)))
+  (modify-frame-parameters cur-frame `((buffer-stack . ,gtags-buffer-stack)))
+  (setq gtags-point-stack (cons (point) (frame-parameter cur-frame 'point-stack)))
+  (modify-frame-parameters cur-frame `((point-stack . ,gtags-point-stack)))))
 
 ;; pop context from stack
 (defun gtags-pop-context ()
-  (if (not gtags-buffer-stack) nil
-    (let (buffer point)
-      (setq buffer (car gtags-buffer-stack))
-      (setq gtags-buffer-stack (cdr gtags-buffer-stack))
-      (setq point (car gtags-point-stack))
-      (setq gtags-point-stack (cdr gtags-point-stack))
+  (if (not (frame-parameter (selected-frame) 'buffer-stack)) nil
+    (let (buffer point (cur-frame (selected-frame)))
+      (setq buffer (car (frame-parameter cur-frame 'buffer-stack)))
+      (setq gtags-buffer-stack (cdr (frame-parameter cur-frame 'buffer-stack)))
+      (modify-frame-parameters cur-frame `((buffer-stack . ,gtags-buffer-stack)))
+      (setq point (car (frame-parameter cur-frame 'point-stack)))
+      (setq gtags-point-stack (cdr (frame-parameter cur-frame 'point-stack)))
+      (modify-frame-parameters cur-frame `((point-stack . ,gtags-point-stack)))
       (list buffer point))))
 
+(defun gtags-get-current-buffer ()
+  (frame-parameter (selected-frame) 'g-current-buffer))
+
+(defun gtags-set-current-buffer (buffer)
+  (modify-frame-parameters (selected-frame) `((g-current-buffer . ,buffer))))
+
+
 ;; if the buffer exist in the stack
 (defun gtags-exist-in-stack (buffer)
-  (memq buffer gtags-buffer-stack))
+  (memq buffer (frame-parameter (selected-frame) 'buffer-stack)))
 
 ;; get current line number
 (defun gtags-current-lineno ()
@@ -396,9 +402,9 @@
   "Move to previous point on the stack."
   (interactive)
   (let (delete context buffer)
-    (if (and (not (equal gtags-current-buffer nil))
-             (not (equal gtags-current-buffer (current-buffer))))
-         (switch-to-buffer gtags-current-buffer)
+    (if (and (not (equal (gtags-get-current-buffer) nil))
+             (not (equal (gtags-get-current-buffer) (current-buffer))))
+         (switch-to-buffer (gtags-get-current-buffer))
       (if (not (gtags-exist-in-stack (current-buffer)))
    (setq delete t))
       (setq context (gtags-pop-context))
@@ -407,7 +413,7 @@
         (if delete
      (kill-buffer (current-buffer)))
         (switch-to-buffer (nth 0 context))
-        (setq gtags-current-buffer (current-buffer))
+        (gtags-set-current-buffer (current-buffer))
         (goto-char (nth 1 context))))))
 
 ;;
@@ -520,7 +526,7 @@
         ;; move to the context
         (if gtags-read-only (find-file-read-only file) (find-file file))
         (if delete (kill-buffer prev-buffer)))
-      (setq gtags-current-buffer (current-buffer))
+      (gtags-set-current-buffer (current-buffer))
       (goto-line line)
       (gtags-mode 1))))
 
@@ -592,7 +598,7 @@ Turning on Gtags-Select mode calls the value of the variable
  truncate-lines t
         major-mode 'gtags-select-mode
         mode-name "Gtags-Select")
-  (setq gtags-current-buffer (current-buffer))
+  (gtags-set-current-buffer (current-buffer))
   (goto-char (point-min))
   (message "[GTAGS SELECT MODE] %d lines" (count-lines (point-min) (point-max)))
   (run-hooks 'gtags-select-mode-hook))