遇到一个valgrind自身的bug

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://latelee.blog.csdn.net/article/details/82226124

背景

公司C++项目代码使用了cppcheck做静态代码检查,也使用valgrind检查是否有内存泄漏问题。我多次强调要做到0警告,虽然有了CICD的Jenkins自动检查,也将结果通过邮件发给项目人员,但有的人还是没去修正警告,由于不是自己管辖范围,不好多说什么。
最近使用valgrind测试,遇到了未识别指令的问题(运行的程序被认为是非法指令)。经查发现是valgrind版本太低造成的。

问题出现

运行命令如下:

valgrind  --leak-check=full --show-leak-kinds=all  ./a.out

错误提示如下:

vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x6 0xF 0x42 0xC1
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==3562== valgrind: Unrecognised instruction at address 0x4ef1b15.
==3562==    at 0x4EF1B15: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3562==    by 0x4EF1CB1: std::random_device::_M_getval() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3562==    by 0x400D8B: std::random_device::operator()() (in /home/latelee/test/mytest/warningtest/a.out)
==3562==    by 0x400FA1: Init() (in /home/latelee/test/mytest/warningtest/a.out)
==3562==    by 0x400DD7: GetRandomC11() (in /home/latelee/test/mytest/warningtest/a.out)
==3562==    by 0x400B55: GetRandomNum() (in /home/latelee/test/mytest/warningtest/a.out)
==3562==    by 0x400D0A: main (in /home/latelee/test/mytest/warningtest/a.out)

==26759== Process terminating with default action of signal 4 (SIGILL)
==26759==  Illegal opcode at address 0x4EF1B15
==26759==    at 0x4EF1B15: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==26759==    by 0x4EF1CB1: std::random_device::_M_getval() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==26759==    by 0x400D8B: std::random_device::operator()() (in /home/latelee/test/mytest/warningtest/a.out)
==26759==    by 0x400FA1: Init() (in /home/latelee/test/mytest/warningtest/a.out)
==26759==    by 0x400DD7: GetRandomC11() (in /home/latelee/test/mytest/warningtest/a.out)
==26759==    by 0x400B55: GetRandomNum() (in /home/latelee/test/mytest/warningtest/a.out)
==26759==    by 0x400D0A: main (in /home/latelee/test/mytest/warningtest/a.out)

同时,valgrind也进行了提醒:

==2636== Your program just tried to execute an instruction that Valgrind
==2636== did not recognise.  There are two possible reasons for this.
==2636== 1. Your program has a bug and erroneously jumped to a non-code
==2636==    location.  If you are running Memcheck and you just saw a
==2636==    warning about a bad jump, it's probably your program's fault.
==2636== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2636==    i.e. it's Valgrind's fault.  If you think this is the case or
==2636==    you are not sure, please let us know and we'll try to fix it.
==2636== Either way, Valgrind will now raise a SIGILL signal which will
==2636== probably kill your program.

大概意思是说,要么是程序代码真的有bug,要么是valgrind本身有bug(顺便反馈给作者)。反复阅读代码,统计new和delete出现次数,都没问题。
后来在https://stackoverflow.com/questions/37032339/valgrind-unrecognised-instruction上看到有介绍,大概意思是不支持_M_getval(),帖子中还附带了补丁,也建议使用新版本。本文使用新版本valgrind进行测试。

新版本编译、测试

先查看当前版本,如下:

$ valgrind --version
valgrind-3.11.0

在官网上查看到最新版本是3.13,下载地址

tar jxf valgrind-3.13.0.tar.bz2 

./configure --prefix=/home/latelee/bin/valgrind
make -j
make install

使用新版本进行测试,命令如下:

$/home/latelee/bin/valgrind/bin/valgrind --leak-check=full --show-leak-kinds=all  ./a.out

这次的结果如下:

==30979== Memcheck, a memory error detector
==30979== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30979== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==30979== Command: ./a.out
==30979== 
==30979== Conditional jump or move depends on uninitialised value(s)
==30979==    at 0x400E36: Uninit() (in /home/latelee/test/mytest/warningtest/a.out)
==30979==    by 0x400B65: GetRandomNum() (in /home/latelee/test/mytest/warningtest/a.out)
==30979==    by 0x400D0A: main (in /home/latelee/test/mytest/warningtest/a.out)
==30979== 
-316985884
==30979== 
==30979== HEAP SUMMARY:
==30979==     in use at exit: 72,704 bytes in 1 blocks
==30979==   total heap usage: 7 allocs, 6 frees, 84,856 bytes allocated
==30979== 
==30979== LEAK SUMMARY:
==30979==    definitely lost: 0 bytes in 0 blocks
==30979==    indirectly lost: 0 bytes in 0 blocks
==30979==      possibly lost: 0 bytes in 0 blocks
==30979==    still reachable: 72,704 bytes in 1 blocks
==30979==         suppressed: 0 bytes in 0 blocks
==30979== Rerun with --leak-check=full to see details of leaked memory
==30979== 
==30979== For counts of detected and suppressed errors, rerun with: -v
==30979== Use --track-origins=yes to see where uninitialised values come from
==30979== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

可以看到,虽然有error,但已经没有valgrind: Unrecognised instruction at address的错误信息了。

小结

建议使用源码编译安装valgrind,减少其自身bug带来的误判。

李迟 2018.8.30 周四 夜

展开阅读全文

貌似遇到一个ADO.Net的BUG

06-23

using (SqlConnection conn = new SqlConnection("server=(local);user=sa;pwd=sa;database=Test"))rnrn conn.Open();rn using (SqlTransaction trans = conn.BeginTransaction())rn rn tryrn rn SqlCommand sqlCommand = new SqlCommand();rnrn sqlCommand.Connection = conn;rn sqlCommand.Transaction = trans;rn sqlCommand.CommandText = "CREATE VIEW V_TEST AS SELECT Name, Name FROM Test";rnrn int result = sqlCommand.ExecuteNonQuery();rnrn Console.WriteLine("ExecuteNonQuery result: 0", result);rnrn trans.Commit();rn rn catchrn rn trans.Rollback();rn throw;rn rn rnrnrn以上代码想在数据库中创建一个视图,当然,由于重复选择了Name列,又没有为其中一个指定别名,肯定是创建不成功的,但出乎我意料的是,并没有在执行ExecuteNonQuery()的时候抛出异常,而是在trans.Commit()的时候出异常,异常信息为“COMMIT TRANSACTION 请求没有对应的 BEGIN TRANSACTION。”rnrn我前前后后做了很多次测试,发现以上代码在执行ExecuteNonQuery()的时候发现无法创建这个视图时“偷偷的”把事务回滚了,而且没有抛出异常,只是在后面的代码中试图提交事务的时候,发现事务的状态已经不对了,才抛出了异常。rnrn如果真的向我上面所说的这样,在应用程序中抛出的异常并不是我们想要的真正的异常(我不知道这个话该怎么描述,反正大体就这个意思),又无法通过ExecuteNonQuery()的返回值去判断执行正确与否,不知道这算不算ADO.Net的BUG?rnrn我的.Net Framework版本是2.0.50727rn 论坛

没有更多推荐了,返回首页