dcc - Direct C Compiler
A C99 compliant C compiler with additions implementing many extensions and features, as well as arbirary-precision integer arithmetic.
The main feature that differentiates this compiler from others, is its ability to directly read, preprocess, tokenize, parse, assemble and link c source code, all at the same time, in a way allowing you to execute C code in an environment similar to that of an interactive commandline. If you are interested in how this is achieved, take a look at /include/drt/drt.h
Currently only able to target I386 and above, support for x86-64 is planned and already partially implemented.
Supported output formats are ELF, windows PE, as well as direct execution of generated code.
DCC supports AT&T inline assembly syntax, emulating gcc's __asm__ statement and the GNU assembler as well as direct parsing of assembly sources.
Using TPP as preprocessor to implement a fully featured perprocessor, DCC implements many GCC extensions such as __asm__, __builtin_constant_p, many __attribute__-s, __typeof__, __auto_type, and many more, including my own twist on awesome C extensions.
Development on DCC started on 17.04.2017, eversince then being the usual one-person project.
Current state:
Note that DCC is still fairly early in its development, meaning that anything can still change and that more features will be added eventually.
- Link against windows PE binaries/libraries (*.dll).
- Statically link against PE binaries (as in: clone everything from a *.dll)
- Dynamically/Statically link against ELF binaries/libraries/object files (*, *.so, *.o)
- Output windows PE binary/library (*.exe, *.dll).
- Output linux ELF binary/library (*, *.so).
- Output ELF relocatable object files (*.o)
- Process and merge (link) multiple source-/object files/static libraries.
- Compiling DCC is mainly tested and working on windows using Visual C or DCC itself. GCC and linux support is present, but may occasionally be broken.
- Full STD-C compliance up to C99.
- Full AT&T assembly support with many GNU assembler extensions (see below).
- Full ELF binary target support.
- Fully working live execution of C source code.
- DCC can fully compile itself (And the result can compile itself again!)
Planned features:
- Support for X86-64/AMD64 CPU architectures.
- Compiling DCC on linux (most of the work's already there, but nothing's tested yet).
- Compiling DCC with DCC (because every C compiler must be able to do that!).
- Generation of debug information (recognizeable by gdb).
- Finish many partially implemented features (see below).
- Support for true thread-local storage (aka. segment-based)
Features (Compiler):
- DCC as host compiler can easily be detected with defined(__DCC_VERSION__).
- Using TPP as preprocessor, every existing preprocessor extension is supported, as well as all that are exclusive to mine.
- Live-compilation-mode directly generates assembly.
- C-conforming symbol forward/backward declaration.
- K&R-C compatible
- Full STD-C89/90 compliance
- Full STD-C95 compliance
- Full STD-C99 compliance
- Supports all C standard types.
- Supports 64-bit long long integrals (using double-register storage).
- Supports all C control statements.
- Supports C11 _Generic.
- Supports C11 _Atomic (Not fully implemented).
- Supports C99 _Bool.
- Supports C99 __func__ builtin identifier.
- Supports Variable declaration in if-expressions and for-initializers.
- Supports nested function declaration, as well as access to variables from surrounding scopes.
- Supports C++ lvalue types (int y = 10; int &x = y;).
- Supports C structure bitfields
- Support for GCC statement-expressions: int x = ({ int z = 10; z+20; }); // x == 30.
- Support for __FUNCTION__ and __PRETTY_FUNCTION__, including use by concat with other strings: char *s = "Function " __FUNCTION__ " was called"; printf("%s\n",s);.
- Support for GCC __sync_* builtin functions (__sync_val_compare_and_swap(&x,10,20)).
- Supports all compiler-slangs for alignof: _Alignof, __alignof, __alignof__ and __builtin_alignof.
- Support for compile-time type deduction from expressions: typeof, __typeof, __typeof__.
- Support for GCC scoped labels: __label__.
- Support for GCC-style inline assembly: __asm__("ret").
- Support for MSVC fixed-length integer types: __int(8|16|32|64).
- Support for GCC __auto_type (as well as special interpretation of auto when not used as storage class. - auto int x = 42 auto is storage class; auto y = 10; auto denotes automatic type deduction).
- Support for C99 variable-length arrays: int x = 10; int y[x*2]; assert(sizeof(y) == 80);.
- Support for old (pre-STDC: K&R-C) function declarations/implementations.
- Support for new (post-STDC: C90+) function declarations/implementations.
- Support for floating-point types (Assembly generator is not implemented yet).
- Support for GCC x86 segment address space (__seg_fs/__seg_gs)
- Debugging aids for pre-initializing local variables with 0xCC bytes and memory allocated using alloca with 0xAC.
- Inherited from assembly: Named register identifiers.
-
int x = %eax; (CPU-specific, on i386 compiles to mov %eax, x).
-
int x = *(int *)%fs:0x18; (Can also be used to access segment register, on i386 compiles to movl %fs:(0x18), x).
- Inherited from assembly: Get current text address.
-
void *p = .; (Evaluates to the current text address with void * typing).
- Use label names in expressions:
-
void *p = &&my_label; my_label: printf("p = %p\n",p);
- Support for new & old GCC structure/array initializer:
- dot-field: struct { int x,y; } p = { .x = 10, .y = 20 };
- field-collon: struct point { int x,y; } p = { x: 10, y: 20 };
- array-subscript: int alpha[256] = { ['a' ... 'z'] = 1, ['A' ... 'Z'] = 1, ['_'] = 1 };
- Support for runtime brace-initializers: struct point p = { .x = get_x(), .y = get_y() };
- Split between struct/union/enum, declaration and label namespaces:foo: struct foo foo; // Valid code and 3 different 'foo'
- Support for unnamed struct/union inlining:
-
union foo { __int32 x; struct { __int16 a,b; }; };
-
offsetof(union foo,x) == 0, offsetof(union foo,a) == 0, offsetof(union foo,b) == 2
- Support for builtin functions offering special compile-time optimizations, or functionality (Every builtin can be queried with __has_builtin(...)):
-
char const (&__builtin_typestr(type_or_expr t))[];
- Accepting arguments just like 'sizeof', return a human-readable representation of the [expression's] type as a compile-time array of characters allocated in the '.string' section.
-
_Bool __builtin_constant_p(expr x);
-
expr __builtin_choose_expr(constexpr _Bool c, expr tt, expr ff);
-
_Bool __builtin_types_compatible_p(type t1, type t2);
-
void __builtin_unreachable(void) __attribute__((noreturn));
-
void __builtin_trap(void) __attribute__((noreturn));
-
void __builtin_breakpoint(void);
- Emit a CPU-specific instruction to break into a debugging environment, or do nothing if the target CPU doesn't allow for such an instruction
-
void *__builtin_alloca(size_t s);
-
void *__builtin_alloca_with_align(size_t s, size_t a);
-
void __builtin_assume(expr x),__assume(expr x);
-
long __builtin_expect(long x, long e);
-
const char (&__builtin_FILE(void))[];
-
int __builtin_LINE(void);
-
const char (&__builtin_FUNCTION(void))[];
-
void *__builtin_assume_aligned(void *p, size_t align, ...);
-
size_t __builtin_offsetof(typename T, members...);
-
T (__builtin_bitfield(T expr, constexpr int const_index, constexpr int const_size)) : const_size;
- Access a given sub-range of bits of any integral expression, the same way access is performed for structure bit-fields.
-
typedef ... __builtin_va_list;
-
void __builtin_va_start(__builtin_va_list &ap, T &start);
-
void __builtin_va_end(__builtin_va_list &ap);
-
void __builtin_va_copy(__builtin_va_list &dstap, __builtin_va_list &srcap);
-
T __builtin_va_arg(__builtin_va_list &ap, typename T);
- Compiler-provided var-args helpers for generating smallest-possible code
-
int __builtin_setjmp(T &buf);
-
void __builtin_longjmp(T &buf, int sig) __attribute__((noreturn));
- Requires: sizeof(T) == __SIZEOF_JMP_BUF__
- Compile-time best-result code generation for register save to 'buf'
- Optimizations for 'sig' known to never be '0'
-
void *__builtin_malloc(size_t s);
-
void *__builtin_calloc(size_t c, size_t s);
-
void *__builtin_realloc(void *p, size_t c, size_t s);
-
void __builtin_free(void *p);
-
void __builtin_cfree(void *p);
-
void *__builtin_return_address(unsigned int level);
-
void *__builtin_frame_address(unsigned int level);
-
void *__builtin_extract_return_addr(void *p);
-
void *__builtin_frob_return_address(void *p);
-
void *__builtin_isxxx(void *p);
- ctype-style builtin functions
-
void *__builtin_memchr(void *p, int c, size_t s);
-
void *__builtin_memrchr(void *p, int c, size_t s);
- Additional functions are available for mem(r)len/mem(r)end/rawmem(r)chr/rawmem(r)len
-
T __builtin_min(T args...);
-
T __builtin_max(T args...);
-
void __builtin_cpu_init(void);
-
int __builtin_cpu_is(char const *cpuname);
-
int __builtin_cpu_supports(char const *feature);
-
char (&__builtin_cpu_vendor(char *buf = __builtin_alloca(sizeof(__builtin_cpu_vendor()))))[?];
-
char (&__builtin_cpu_brand(char *buf = __builtin_alloca(sizeof(__builtin_cpu_brand()))))[?];
- Returns a target-specific '\0'-terminated string describing the brand/vendor name of the host CPU. The length of the returned string is always constant and known at compile-time.
-
__builtin_cpu_init is required to be called first, and if the string cannot be determined at runtime, the returned string is filled with all '\0'-characters.
-
uint16_t __builtin_bswap16(uint16_t x);
-
uint32_t __builtin_bswap32(uint32_t x);
-
uint64_t __builtin_bswap64(uint64_t x);
-
int __builtin_ffs(int x);
-
int __builtin_ffsl(long x);
-
int __builtin_ffsll(long long x);
-
int __builtin_clz(int x);
-
int __builtin_clzl(long x);
-
int __builtin_clzll(long long x);
- Generate inline code with per-case optimizations for best results
-
T __builtin_bswapcc(T x, size_t s = sizeof(T));
-
int __builtin_ffscc(T x, size_t s = sizeof(T));
-
int __builtin_clzcc(T x, size_t s = sizeof(T));
- General purpose functions that works for any size
-
void *__builtin_memcpy(void *dst, void const *src, size_t s);
- Replace with inlined code for sizes known at compile-time
- Warn about dst/src known to overlap
-
void *__builtin_memmove(void *dst, void const *src, size_t s);
- Optimize away dst == src cases
- Hint about dst/src never overlapping
-
void *__builtin_memset(void *dst, int byte, size_t s);
- Replace with inlined code for sizes known at compile-time
-
int __builtin_memcmp(void const *a, void const *b, size_t s);
- Replace with compile-time constant for constant
- Replace with inline code for sizes known at compile-time
-
size_t __builtin_strlen(char const *s);
- Resolve length of static strings at compile-time
- Split between declaration and assembly name (aka. __asm__("foo") suffix in declarations)
- Arbitrary size arithmetic operations (The sky's the limit; as well as your binary size bloated with hundreds of add-instructions for one line of source code).
- Support for deemon's 'pack' keyword (now called __pack):
- Can be used to emit parenthesis almost everywhere (except in the preprocessor, or when calling macros)
- Explicit alignment of code, data, or entire sections in-source
- Support for #pragma comment(lib,"foo") to link against a given library "foo"
- Support for #pragma pack(...)
- Supports GCC builtin macros for fixed-length integral constants (__(U)INT(8|16|32|64|MAX)_C(...)).
- GCC-compatible predefined CPU macros, such as __i386__ or __LP64__.
- Support for GCC builtin macros, such as __SIZEOF_POINTER__, __SIZE_TYPE__, etc.
Features (Attributes):
- Ever attribute can be written in one of three ways:
- GCC attribte syntax (e.g.: __attribute__((noreturn)))
- cxx-11 attributes syntax (e.g.: [[noreturn]])
- MSVC declspec syntax (e.g.: __declspec(noreturn))
- The name of an attribute (in the above examples noreturn) can be written with any number of leading, or terminating underscores to prevent ambiguity with user-defined macros:
-
__attribute__((____noreturn_)) is the same as __attribute__((noreturn))
- The following attributes (as supported by other compiler) are recognized:
-
__attribute__((noreturn*))
-
__attribute__((warn_unused_result*))
-
__attribute__((weak*))
-
__attribute__((dllexport*))
-
__attribute__((dllimport*))
-
__attribute__((visibility("default")))
-
__attribute__((alias("my_alias")))
-
__attribute__((weakref("my_alias")))
-
__attribute__((used*))
-
__attribute__((unused*))
-
__attribute__((cdecl*))
-
__attribute__((stdcall*))
-
__attribute__((thiscall*))
-
__attribute__((fastcall*))
-
__attribute__((section(".text")))
-
__attribute__((regparm(x)))
-
__attribute__((naked*))
-
__attribute__((deprecated))
-
__attribute__((deprecated(msg)))
-
__attribute__((aligned(x)))
-
__attribute__((packed*))
-
__attribute__((transparent_union*))
-
__attribute__((mode(x))) (Underscores surrounding x are ignored)
- All attribute names marked with '*' accept an optional suffix that adds an enabled-dependency on a compiler-time expression. (e.g.: __attribute__((noreturn(sizeof(int) == 4))) - Mark as noreturn, if int is 4 bytes wide)
- Attributes not currently implemented (But planned to be):
-
__attribute__((constructor))
-
__attribute__((constructor(priority)))
-
__attribute__((destructor))
-
__attribute__((destructor(priority)))
-
__attribute__((ms_struct))
-
__attribute__((gcc_struct))
- Attributes ignored without warning:
-
__attribute__((noinline...))
-
__attribute__((returns_twice...))
-
__attribute__((force_align_arg_pointer...))
-
__attribute__((cold...))
-
__attribute__((hot...))
-
__attribute__((pure...))
-
__attribute__((nothrow...))
-
__attribute__((noclone...))
-
__attribute__((nonnull...))
-
__attribute__((malloc...))
-
__attribute__((leaf...))
-
__attribute__((format_arg...))
-
__attribute__((format...))
-
__attribute__((externally_visible...))
-
__attribute__((alloc_size...))
-
__attribute__((always_inline...))
-
__attribute__((gnu_inline...))
-
__attribute__((artificial...))
- New attributes added by DCC:
-
__attribute__((lib("foo")))
- Most effective for PE targets: 'foo' is the name of the DLL file that the associated declaration should be linked against.
- Using this attribute, one can link against DLL files that don't exist at compile-time, or create artificial dependencies on ELF targets.
-
__attribute__((arithmetic*))
- Used on struct types of arbirary size to enable arithmetic operations with said structure. Using this attribute you could easily create e.g.: a 512-bit integer type.
- Most operators are implemented through inline-code, but some (mul,div,mod,shl,shr,sar) generate calls to external symbols.
- When this attribute is present, the associated structure type can be modified with 'signed'/'unsigned' to control the sign-behavior.
- In addition, the following keywords can be used anywhere attributes are allowed.
-
{_}_cdecl: Same as __attribute__((cdecl))
-
{_}_stdcall: Same as __attribute__((stdcall))
-
{_}_fastcall: Same as __attribute__((fastcall))
-
__thiscall: Same as __attribute__((thiscall))
Features (Warnings):
- DCC features an enourmous amount of warnings covering everything from code quality, to value truncation, to syntax errors, to unresolved references during linkage, etc...
- Any warning can be configured as
-
Disabled: (Compilation is continued, but based on severity, generated assembly/binary may be wrong)
-
Enabled: Emit a warning, but continue compilation as if it was disabled
-
Error: Emit an error message and halt compilation at the next convenient location
-
Supress: Works recursively: Handle the warning as Disabled for every time it is suppressed before reverting its state to before it was.
- Warnings are sorted into named groups that can be disabled as a whole. The main group of a warning is always displayed when it is emit. (e.g.: W1401("-WSyntax"): Expected ']', but got ...)
- The global warning state can be pushed/popped from usercode:
-
Push:
-
#pragma warning(push)
-
#pragma GCC diagnostic push
-
Pop:
-
#pragma warning(pop)
-
#pragma GCC diagnostic pop
- Individual warnings/warning group states can be explicitly defined from usercode:
-
Disabled:
-
#pragma warning("[-][W]no-<name>")
-
#pragma warning(disable: <IDS>)
-
#pragma warning(disable: "[-][W]<name>")
-
#pragma GCC diagnostic ignored "[-][W]<name>"
-
Enabled:
-
#pragma warning(enable: <IDS>)
-
#pragma warning(enable: "[-][W]<name>")
-
#pragma GCC diagnostic warning "[-][W]<name>"
-
Error:
-
#pragma warning(error: <IDS>)
-
#pragma warning(error: "[-][W]<name>")
-
#pragma GCC diagnostic error "[-][W]<name>"
-
Suppress (once for every time a warning/group is listed):
-
#pragma warning(suppress: <IDS>)
-
#pragma warning(suppress: "[-][W]<name>")
-
#pragma warning("[-][W]sup-<name>")
-
#pragma warning("[-][W]suppress-<name>")
- Revert to default state:
-
#pragma warning(default: <IDS>)
-
#pragma warning(default: "[-][W]<name>")
-
#pragma warning("[-][W]def-<name>")
-
IDS is a space-separated list of individual warning IDS as integral constants
- Besides belonging to any number of groups, each warning also has an ID
- Use of these IDS should be refrained from, as they might change randomly
- Similar to the extension-pragma, #pragma warning(...) accepts a comma-seperated list of commands.
-
#pragma warning(push,disable: "-Wsyntax")
- All warnings can be enabled/disabled on-the-fly using pragmas:
-
#pragma warning(push|pop) Push/pop currently enabled extensions
-
#pragma warning("-W<name>") Enable warning 'name'
-
#pragma warning("-Wno-<name>") Disable warning 'name'
-
#pragma GCC system_header treats the current input file as though all warnings disabled
- Mainly meant for headers in /fixinclude which may re-define type declarations, but are not meant to cause any problems
Features (Extensions):
- Extensions are implemented in two different ways:
- Extensions that are always enabled, but emit a warning when used.
- The warning can either be disabled individually (e.g.: #pragma warning("-Wno-declaration-in-if")).
- Or all extension warnings can be disabled using #pragma warning("-Wno-extensions").
- Don't let yourself be fooled. Writing "-Wno-extensions" disables warnings about extensions, not extensions themself!
- Some warnings are also emit for deprecated or newer language features.
-
"constant-case-expressions": Emit for old-style function declarations.
-
"old-function-decl": Emit for old-style function declarations.
- Extensions that may change semantics and can therefor be disabled.
- All of these extensions can be enabled/disabled on-the-fly using pragmas:
- As comma-seperated list in #pragma extension(...)
-
push: Push currently enabled extensions (e.g.: #pragma extension(push))
-
pop: Pop previously enabled extensions (e.g.: #pragma extension(pop))
-
"[-][f]<name>": Enable extension name (e.g.: #pragma extension("-fmacro-recursion"))
-
"[-][f]no-<name>": Disable extension name (e.g.: #pragma extension("-fno-macro-recursion"))
-
"expression-statements": Recognize GCC statement-expressions.
-
"label-expressions": Allow use of labels in expression (prefixed by &&).
-
"local-labels": Allow labels to be scoped (using GCC's __label__ syntax).
-
"gcc-attributes": Recognize GCC __attribute__((...)) syntax.
-
"msvc-attributes": Recognize MSVC __declspec(...) syntax.
-
"cxx-11-attributes": Recognize c++11 [[...]] syntax.
-
"attribute-conditions": Allow optional conditional expression to follow a switch-attribute.
-
"calling-convention-attributes": Recognize MSVC stand-alone calling convention attributes (e.g.: __cdecl).
-
"fixed-length-integer-types": Recognize fixed-length integer types (__int(8|16|32|64)).
-
"asm-registers-in-expressions": Allow assembly registers to be used in expressions (e.g.: int x = %eax;).
-
"asm-address-in-expressions": Allow assembly registers to be used in expressions (e.g.: int x = %eax;).
-
"void-arithmetic": sizeof(void) == __has_extension("void-arithmetic") ? 1 : 0.
-
"struct-compatible": When enabled, same-layout structures are compatible, when disabled, only same-declaration structs are.
-
"auto-in-type-expressions": Allow auto be be used either as storage class, or as alias for __auto_type.
-
"variable-length-arrays": Allow declaration of C99 VLA variables.
-
"function-string-literals": Treat __FUNCTION__ and __PRETTY_FUNCTION__ as language-level string literals.
-
"if-else-optional-true": Recognize GCC if-else syntax int x = (p ?: other_p)->x; // Same as '(p ? p : other_p)->x'.
-
"fixed-length-integrals": Recognize MSVC fixed-length integer suffix: __int32 x = 42i32;.
-
"macro-recursion": Enable/Disable TCC recursive macro declaration.
- Many more extensions are provided by TPP to control preprocessor syntax, such as #include_next directives. Their list is too long to be documented here.
Features (Optimization):
- Dead code elimination
- Correct deduction on merging branches, such as if-statement with two dead branches
- Re-enable control flow when encountering a label
- Correctly interpretation of __builtin_unreachable()
- Correctly interpretation of __{builtin_}assume(0)
- Automatic constant propagation
- Even capable of handling generic offsetof: (size_t)&((struct foo *)0)->bar
- Automatic removal of unused symbols/data
- Recursively delete unused functions/data symbols from generated binary
- Can be suppressed for any symbol using __attribute__((used))
- Automatic merging of data in sections marked with M (merge) (Not fully implemented, because of missing re-use counter; the rest already works)
- Using the same string (or sub-string) more than once will only allocate a single data segment:
-
printf("foobar\n"); printf("bar\n"); Re-use "bar\n\0" as a sub-string of "foobar\n\0"
Features (Assembler):
- Full AT&T Assembly support
- Extension for fixed-length
- Supported assembly directives are:
-
.align <N> [, <FILL>]
-
.skip <N> [, <FILL>]
-
.space <N> [, <FILL>]
-
.quad <I>
-
.short <I>
-
.byte <I>
-
.word <I>
-
.hword <I>
-
.octa <I>
-
.long <I>
-
.int <I>
-
.fill <REPEAT> [, <SIZE> [, <FILL>]]
-
. = <ORG>
-
.org <ORG>
-
.extern <SYM>
-
.global <SYM>
-
.globl <SYM>
-
.protected <SYM>
-
.hidden <SYM>
-
.internal <SYM>
-
.weak <SYM>
-
.local <SYM>
-
.used <SYM>
-
.unused <SYM>
-
.size <SYM>, <\SIZE>
-
.string <STR>
-
.ascii <STR>
-
.asciz <STR>
-
.text
-
.data
-
.bss
-
.section
-
.previous
-
.set <SYM>, <VAL>
-
.include <NAME>
-
.incbin <NAME> [, <SKIP> [, <MAX>]]
- CPU-specific, recognized directives:
- Directives ignored without warning:
-
.file ...
-
.ident ...
-
.type ...
-
.lflags ...
-
.line ...
-
.ln ...
Features (Linker):
- Integrated linker allows for direct (and very fast) creation of executables
- Merge multiple source files into a single compilation unit
- ELF-style visibility control/attributes (__attribute__((visibility(...))))
- Directly link against already-generated PE binaries
- Add new library dependencies from source code (#pragma comment(lib,...))
- Output to PE binary (*.exe/*.dll)