Skip to content

Latest commit

 

History

History
302 lines (226 loc) · 10.7 KB

README.md

File metadata and controls

302 lines (226 loc) · 10.7 KB

built with C CXML License c 11 issues stars forks


Table of Contents


Overview

cxml (C XML Minimalistic Library) is a powerful and flexible XML library for C with a focus on simplicity and ease of use, coupled with features that enables quick processing of XML documents.

cxml provides a DOM, and streaming interface for interacting with XML documents. This includes XPATH (1.0) support for simple/complex operations on the DOM, a built-in, simple and intuitive query language and an API for selection/creation/deletion/update operations (which may be used as an alternative to the XPATH API or in tandem with it), and a SAX-like interface for streaming large XML documents with no callback requirement. cxml works with any XML file encoded in an ASCII compatible encoding (UTF-8 for example).

One should be able to quickly utilize the library in processing or extracting data from an XML document almost effortlessy.

Note: cxml is a non-validating XML parser library. This means that DTD structures aren't used for validating the XML document. However, cxml enforces correct use of namespaces, and general XML well-formedness.

Quick Start

Say we have an XML file named "foo.xml", containing some tags/elements:

<bar>
    <bar>It's foo-bar!</bar>
    <bar/>
    <foo>This is a foo element</foo>
    <bar>Such a simple foo-bar document</bar>
    <foo/>
    <bar>So many bars here</bar>
    <bar>Bye for now</bar> 
</bar>

foo.xml



Using XPATH


We can perform a simple XPATH operation that selects all bar elements that have some text child/node and also are the first (element) child of their parents (as an example).

#include <cxml/cxml.h>

int main(){
    // load/parse xml file (`false` ensures the file isn't loaded 'lazily')
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    // using the xpath interface, select all bar elements.
    cxml_set *node_set = cxml_xpath(root, "//bar[text() and position()=1]");
    char *item;

    // display all selected "bar" elements
    cxml_for_each(node, &node_set->items){
        // get the string representation of the element found
        item = cxml_element_to_rstring(node);
        // we own this string, we must free.
        printf("%s\n", item);
        free(item);
    }
    // free root node
    cxml_destroy(root);
    // cleanup the set
    cxml_set_free(node_set);
    // it's allocated, so it has to be freed.
    free(node_set);

    return 0;
}

A large subset of XPATH 1.0 is supported. Check out this page for non-supported XPATH features.



Using CXQL


Suppose we only need the first "bar" element, we can still utilize the XPATH interface, taking the first element in the node set returned. However, cxml ships with a built-in query language, that makes this quite easy.

Using the query language:

#include <cxml/cxml.h>

int main(){
    // load/parse xml file
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    // find 'bar' element
    cxml_element_node *elem = cxml_find(root, "<bar>/");

    // get the string representation of the element found
    char *str = cxml_element_to_rstring(elem);
    printf("%s\n", str);

    // we own this string, so we must free.
    free(str);

    // We destroy the entire root, which frees `elem` automatically
    cxml_destroy(root);

    return 0;
}


An example to find the first bar element containing text "simple":

#include <cxml/cxml.h>

int main(){
    // load/parse xml file
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    cxml_element_node *elem = cxml_find(root, "<bar>/$text|='simple'/");

    char *str = cxml_element_to_rstring(elem);
    printf("%s\n", str);

    free(str);

    // We destroy the entire root, which frees `elem` automatically
    cxml_destroy(root);

    return 0;
}

In actuality, this selects the first bar element, having a text (child) node, whose string-value contains "simple". The query language ins't limited to finding only "first" elements. Check out the documentation for more details on this.


Here's a quick example that pretty prints an XML document:

#include <cxml/cxml.h>

int main(){
    // load/parse xml file
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    // get the "prettified" string
    char *pretty = cxml_prettify(root);
    printf("%s\n", pretty);

    // we own this string.
    free(pretty);

    // destroy root
    cxml_destroy(root);

    return 0;
}



Using SAX


The SAX API may be the least convenient, but can be rewarding for very large files.

Here's an example to print every text and the name of every element found in the XML document, using the API:

#include <cxml/cxml.h>

int main(){
    // create an event reader object ('true' allows the reader to close itself once all events are exhausted)
    cxml_sax_event_reader reader = cxml_stream_file("foo.xml", true);

    // event object for storing the current event
    cxml_sax_event_t event;

    // cxml string objects to store name and text
    cxml_string name = new_cxml_string();
    cxml_string text = new_cxml_string();

    while (cxml_sax_has_event(&reader)){ // while there are events available to be processed.
        // get us the current event
        event = cxml_sax_get_event(&reader);

        // check if the event type is the beginning of an element
        if (event == CXML_SAX_BEGIN_ELEMENT_EVENT)
        {
            // consume the current event by collecting the element's name
            cxml_sax_get_element_name(&reader, &name);
            printf("Element: `%s`\n", cxml_string_as_raw(&name));
            cxml_string_free(&name);
        }
        // or a text event
        else if (event == CXML_SAX_TEXT_EVENT)
        {
            // consume the current event by collecting the text data
            cxml_sax_get_text_data(&reader, &text);
            printf("Text: `%s`\n", cxml_string_as_raw(&text));
            cxml_string_free(&text);
        }
    }

    return 0;
}


Quick Questions

If you have little questions that you feel isn't worth opening an issue for, use cxml's discussions.

Tests and Examples

The tests folder contains the tests. See the examples folder for more examples, and use cases.

Documentation

This is still a work in progress. See the examples folder for now.

Installation

Check out the installation guide for information on how to install, build or use the library in your project.

Dependencies

cxml only depends on the C standard library. All that is needed to build the library from sources is a C11 compliant compiler.

Contributing

Your contributions are absolutely welcome! See the contribution guidelines to learn more. You can also check out the project architecture for a high-level description of the entire project. Thanks!

Reporting Bugs/Requesting Features

cxml is in its early stages, but under active development. Any bugs found can be reported by opening an issue (check out the issue template). Please be nice. Providing details for reproducibility of the bug(s) would help greatly in implementing a fix, or better still, if you have a fix, consider contributing. You can also open an issue if you have a feature request that could improve the library.

Project Non-goals

cxml started out as a little personal experiment, but along the line, has acquired much more features than I had initially envisioned. However, some things are/will not be in view for this project. Here are some of the non-goals:

  • Contain every possible feature (DTD validation, namespace well-formedness validation, etc.)
  • Be the most powerful/sophisticated XML library.
  • Be the "best" XML library.

However, to take a full advantage of this library, you should have a good understanding of XML, including its dos, and dont's.

License

cxml is distributed under the MIT License.